HTM for IT

Thank you @sheiser1
I found this
http://nupic.docs.numenta.org/1.0.3/api/algorithms/anomaly-detection.html
I have ư questions, hope get advice from you and others in thí forum

  1. is there any quickstart for above anomaly detection algorithm code
  2. in the above code they use the active columns and predicted columns to calculate the anomaly score. But as i know the output of SP is active columns indecies í ok but the output of TM is predicted cells (not columns). Can any one explain ?
    Thank you very much

class nupic.algorithms.anomaly. Anomaly ( slidingWindowSize=None , mode=‘pure’ , binaryAnomalyThreshold=None )

compute ( activeColumns , predictedColumns , inputValue=None , timestamp=None )

Compute the anomaly score as the percent of active columns not predicted.

Parameters: * activeColumns – array of active column indices

  • predictedColumns – array of columns indices predicted in this step (used for anomaly in step T+1)
  • inputValue – (optional) value of current input to encoders (eg “cat” for category encoder) (used in anomaly-likelihood)
  • timestamp – (optional) date timestamp when the sample occured (used in anomaly-likelihood)
    Returns: the computed anomaly score; float 0…1

Sorry, no.

When the TM predicts cells, they are in minicolumns structures. To see if the prediction was correct, we compare the next set of active minicolumns from the SP and see if the predicted cells were within those minicolumns.

1 Like

thank you @rhyolight. i will study more

Yes that sounds right to me, and I’d recommend starting here:

https://nupic.docs.numenta.org/1.0.3/quick-start/algorithms.html

I’d say first get familiar with how to instantiate encoders, SP and TM objects and feed them data.

Once you’re there you can TM.getPreictedCells(), then for each of these cells get their corresponding column with TM.columnFromCell(cell).

From there you can store those columns as like ‘previouslyPredictedColumns’ and compare them to the columns activated by the SP at the next time step. The proportion of the SP active columns which were also previouslyPredicted is the anomaly score.

Then the next step is to feed the anomaly scores into an anomaly likelihood object to get the likelihood values.

1 Like

thank you very much for your advice. that 's a clear route

1 Like

thankyou @rhyolight

I have a dump question: In the TM space , we talk about 2-D matrix ( horizon is number of colums and verizon is cells per columns)
Default value is :
columnCount: 2048
cellsPerColumn: 32

But after we do

Execute Temporal Memory algorithm over active mini-columns.

  tm.compute(activeColumnIndices, learn=True)

  activeCells = tm.getActiveCells()  

the getActiveCells() function returns the indices of Active cells . It’s a list ( 1-D) not a matrix ( 2-D). How can i convert an index ( that the getActicell() returns) to exact the position of column and the cell in 2-D matrix ?

Thank you very much.

The array is simply flattened, so you have to do the unflattening. There is a minicolumn represented every cellsPerColumn elements.

1 Like

thank you. Is there any function to map a cell index to exactly position of column and and cell ?

By the way do we have an simple example about RandomDistributedScalarEncoder ? I searched but i can not find anything. The explaination in the http://nupic.docs.numenta.org/1.0.5/api/algorithms/encoders.html is not easy to understand.

Thank you very much

I believe it is just a couple lines of code. Or use numpy.shape().

Check the nupic walkthrough notebook.

@life_you can easily write an simple code to split sdr array into nColumns subarrays and each subarray has nCellsPerColums

1 Like

There’s a function in TM called columnForCell(). I think its not in BacktrackingTM, though you can replicate it with:

colIndex = int(cellIndex / SPsize)

Then if you want the position of that cell in the column you could use mod/remainder:

cellPositionInCol = cellIndex % SPsize

2 Likes

thank you. i will try

thank you

thank you very much. I will follow the note book to understand more.

Dear all!

Now i can understand Classifier . So i move forward to anomaly detect.

In the document http://nupic.docs.numenta.org/1.0.3.dev0/api/algorithms/anomaly-detection.html

I see this:

compute ( activeColumns , predictedColumns , inputValue=None , timestamp=None )

Compute the anomaly score as the percent of active columns not predicted.

Parameters: * activeColumns – array of active column indices

  • predictedColumns – array of columns indices predicted in this step (used for anomaly in step T+1)
  • inputValue – (optional) value of current input to encoders (eg “cat” for category encoder) (used in anomaly-likelihood)
  • timestamp – (optional) date timestamp when the sample occured (used in anomaly-likelihood)
    Returns: the computed anomaly score; float 0…1

I have a question:

In Parameters above:

activeColumns : i can get it from the SP in this round ( each round is in one data row in the dataset file)

But predictedColumns: do i get it from the TM in this round or i have to get it from the last round.

Thank you very much.

From the last round. The anomaly score is the proportion of activated columns at time (t) which were not predicted at time (t-1).

2 Likes

You use the predicted columns from t-1 and the actual active columns from t. Yes this causes the anomaly score to always be 1 at t = 0 (Since t=-1 doesn’t exist) but you can just ignore that special case.

2 Likes

after get your guides/advices @marty1885 @sheiser1… i have run successfully the anomaly, anomalylikelihood, loglikelihood for hotgym dataset ( i use agorithms not OPF ). I tried change some paramether in scalar encoder, TM, anomly() and anomalylikelihood() ( eg: resolution, permernace increase/decrase, slidingwindow , learningPeriod …) and i got different anomaly detection results. I would like to check the right anomaly from the dataset.
I have some question below :

  1. As i search and read in this forum , there is no reference result for hotGym datasets. Is it right ?
  2. I read the NAB. I am using python 2.7. i found this link https://github.com/numenta/NAB/tree/master/nab/detectors/numenta
    Is this link is right for my python 2.7, Nupic and algorithm ? if i use this guide, can i check my algorithm with my own parameter ?
  3. Or can i use 1 dataset (datafile) like this https://github.com/numenta/NAB/blob/master/data/realKnownCause/machine_temperature_system_failure.csv
    (i see this datafile look like HotGym datafile) and then run my algorithm and check the result. But how can i know my anomaly detection result is right or not ? Or i check maunally with the result file https://raw.githubusercontent.com/numenta/NAB/master/results/numenta/realKnownCause/numenta_machine_temperature_system_failure.csv

timestamp,value,anomaly_score,raw_score,label,S(t)_reward_low_FP_rate,S(t)_reward_low_FN_rate,S(t)_standard
2013-12-02 21:15:00,73.96732207,0.0301029996659,1.0,0,0.0,0.0,0.0
2013-12-02 21:20:00,74.935882,0.0301029996659,1.0,0,0.0,0.0,0.0
2013-12-02 21:25:00,76.12416182,1.0,1.0,0,0.0,0.0,0.0

–> but i dont know what is exacly anomly point in this file

Thank you very much

  1. People have been using the Hot Gym data a lot. See
    Hot Gym running in Python 3 via nupic.cpp & pybind
    High anomalylikelihood values for hot gym anomaly example
    etc…

  2. Yes, you can modify the parameters there. Please do be aware that the actual model creation happens in a model factory in OPF.

  3. It should work.

1 Like

thank you @marty1885

I found this https://www.businesswire.com/news/home/20170620005440/en/New-Research-Paper-Numenta-Demonstrates-Results-Machine
This paper show that there are 3 anomalies in detail ( 1 is planned, 2 are outages) and a plot it’s very clear.
https://plot.ly/~sjd171/2636.embed

All are good for me to check my model.

That you mean i should use OPF for real applications ?

Now i follow algorithm http://nupic.docs.numenta.org/1.0.3/api/algorithms/index.html

I think it 's esier to understand because it follow the HTM theory. After i understand Algorithm, i will move forward to OPF for real applications . Do you think so ? Is it a right approach ?

Thank you very much.