How can I decide which variables to be fed to HTM model for anomaly detection?

AbdulRahman.Mansour · March 28, 2018, 6:12pm

I’m new to HTM and I want to use it for anomaly detection to decide for a preventive maintenance.

I think that I have the basic understanding of the model but still can’t tell what technique should I use to chose the right variables to be fed to the model

sheiser1 · March 28, 2018, 6:29pm

How many variables are you considering? Its generally advised not to have more than 4 or 5 in a single model, and sometimes best to have a separate model for each one. There is a process for this called swarming which is a guided search for optimal parameter values, including which variables contribute to better predictions. Here’s a link to the docs on it:

http://nupic.docs.numenta.org/1.0.0/guides/swarming/running.html

AbdulRahman.Mansour · March 28, 2018, 6:48pm

Thank you @sheiser1 for your fast reply.

Actually, I don’t have the data yet as I’m proposing HTM for anomaly detection as a solution for incident prediction problem, but I’m assuming the number variables will be much greater than 4 or 5.

But now I have two questions

First, what if the variables for some reason, each one separately (in a single model) won’t predict an anomaly (in any of the models) and in a multivariant model it would predict an anomaly?

Second, if I had a single model for each variable how would I combine the result of each model to have only one score for anomaly?

sheiser1 · March 28, 2018, 7:07pm

This is certainly possible. With separate models for each variable those models would only tell you if that one variable was acting strangely. Important to understand is that when multiple variables are included in a single model they are concatenated together and fed into the system, so the anomalies produced by that model would tell you that the collection of variables as whole was acting strangely, but it wouldn’t be possible to say which ones or which combinations are the culprits precisely. Ideally having both single-variate and multi-variate models could offer the most insight, so when the multi model was anomalous you could check the single models for which were doing so.

I don’t think there’s any rule of thumb for how to do this. I think your intuition as the one who understands what you’re modeling would be the way to go. Most simply you could average each of the different anomaly scores to get one overall anomaly metric.

AbdulRahman.Mansour · March 28, 2018, 7:27pm

That was very helpful.
Really, thank you very much for your fast reply and, precise and great answers.

sheiser1 · March 28, 2018, 7:40pm

Glad to hear it and you’re certainly welcome. Of course post again anytime with any questions or implementation help once you get the data.

AbdulRahman.Mansour · March 28, 2018, 7:43pm

I would really appreciate that and thank you very much.

ansh979 · September 12, 2018, 10:26am

sheiser1 how does one combine multiple metrics into one single model?

sheiser1 · September 12, 2018, 2:46pm

Check out the nupic quick start quides:
http://nupic.docs.numenta.org/1.0.6.dev0/quick-start/algorithms.html

It’s done here:

for count, record in enumerate(reader):

  if count >= numRecords: break

  # Convert data string into Python date object.
  dateString = datetime.datetime.strptime(record[0], "%m/%d/%y %H:%M")
  # Convert data value string into float.
  consumption = float(record[1])

  # To encode, we need to provide zero-filled numpy arrays for the encoders
  # to populate.
  timeOfDayBits = numpy.zeros(timeOfDayEncoder.getWidth())
  weekendBits = numpy.zeros(weekendEncoder.getWidth())
  consumptionBits = numpy.zeros(scalarEncoder.getWidth())

  # Now we call the encoders to create bit representations for each value.
  timeOfDayEncoder.encodeIntoArray(dateString, timeOfDayBits)
  weekendEncoder.encodeIntoArray(dateString, weekendBits)
  scalarEncoder.encodeIntoArray(consumption, consumptionBits)

  # Concatenate all these encodings into one large encoding for Spatial
  # Pooling.
  encoding = numpy.concatenate(
    [timeOfDayBits, weekendBits, consumptionBits]
  )

The numpy.concatenate here combines the separate encodings from the 3 fields (timeOfDayBits, weekendBits, consumptionBits)

ansh979 · September 12, 2018, 2:57pm

Thanks I’ll give it a try

momentum · September 14, 2018, 9:41am

I have a related question. Say one incorporates three variables, as in the Quickstart example. Does the amount of bits a variable occupies in the input space have a bearing on its “weighting” in the prediction? I presume it does but just want to make sure.

eg. timeOfDayBits has 300, weekendBits 50, consumptionBits 400, for an input space of dimensions 750.

Does weekendBits have a relatively minimal impact on the next predicted value?

It seems to me that the attribution of input bit percentage to encoded variables is a very important parameter to monitor.

rhyolight · September 14, 2018, 1:40pm

Yes, you are right about that intuition.

momiji · September 14, 2018, 2:03pm

Interesting,

Would it be beneficial to therefore increase the size so they all have an equal ‘ratio’ of input values so to speak?

I currently have tried including a saccade vector across a 2D MNIST image and it sits at a measly 10 bits compared to the 7000+ feature bits :).

rhyolight · September 14, 2018, 2:21pm

Right. 10 bits out of 7000 is not going to mean much at all.

momiji · September 14, 2018, 2:40pm

Hmm,

I’ll investigate alternatives, though I’d rather not look to increase size. I wonder if it is possible to use the saccade vector as a means to shift or hash the feature space in a way that it can represent the change in direction and magnitude whilst still retaining the same semantic meaning of the features,… hmm interesting.

rhyolight · September 14, 2018, 3:29pm

Stay tuned for papers coming out in October that talk about how movement vectors can get into the system as proximal input to a “location layer”. This approach is more complicated but more biologically plausible than just shoving movement data and sensory data into the same bit arrays.

Topic		Replies	Views
Explainability and HTM Numenta Theory	2	493	July 6, 2019
Handling multivariate data with hundreds of variables Applications	5	741	July 9, 2019
How to feed data from multiples sources to htm for anomaly detection? Engineering research , live , question , community	3	567	November 4, 2020
Question about supporting multi variable anomaly detection HTM.Java question	1	1099	September 19, 2018
Multi-variate data Getting Started question	2	493	May 5, 2021

How can I decide which variables to be fed to HTM model for anomaly detection?

Related topics