Anomaly detection on comments/chat data

ghmerti · September 16, 2018, 11:32am

Hi,

I’m completely new to HTM. I’ve been searching this forum and the web for an answer to this question, but I couldn’t find anything of help. The question is:

Can I do anomaly detection on natural language data like this:

ip_1, comment_1
ip_1, comment_2
ip_2, comment_3

The comments are strings (natural language). I want to detect anomalous comments, for example advertisement, repeated comments (not necessarily exact match), etc.

Can HTM help me in solving such a problem? If so, is there any example for implementing it (or a problem similar to this?)

Thank you very much

jimmyw · September 17, 2018, 2:31am

Welcome, @ghmerti!

Before solving this from scratch, it would be worth looking at cortical.io, it can aid with detecting text similarity and building a semantic fingerprint.

As far as plain HTM goes, its current strength is more around sequence based prediction. So if you were trying to predict the next word following a sequence of words (or how anomalous a word is within the context of a sentence), it would be better suited. But it looks like most of what you’re solving is more of a classification problem (e.g. “does this sentence constitute advertising?”) which is the domain where deep learning currently excels.

I would add that IP address might not be an ideal way of identifying users, as these days you can have a very large number of people behind a single IP address.

Hope this helps, and I’d encourage you to learn HTM regardless of whether it’s is a good fit for your current scenario.

Topic		Replies	Views
I want advice on Using HTM for Anomaly Detection in Streaming Data Machine Learning	0	32	September 26, 2024
Learning Normal & Ignoring Anomalous Behavior NuPIC anomaly-detection	1	661	July 26, 2018
Online anomaly detection（Sequence abnormality） Engineering question	5	511	November 20, 2019
Can anomaly detection be based on history in new sequences when learning is off? NuPIC usage-help , encoders , question	1	956	November 15, 2016
Comparison between HTM and other anomaly detection algorithms NAB	3	1146	November 29, 2019

Anomaly detection on comments/chat data

Related topics