Multiple features and binary output

I’m still a little confused. One set is a sequence of timestamps, each one having a 0/1 state. Taken together they represent an event occurring over time and the state of the system at each point in time.

Some events are very small in length and some are orders of magnitude larger (you said 0.001s to 45s?). Are the very small events structurally different than the larger events? Do they always contain the same amount of samples?

1 Like

I probably need to understand the problem space better. A few initial questions:

Is the system ever in both state “0” and state “1” simultaneously?
Is the goal to predict what state the system will be at any arbitrary given point in time in the future?
Is the goal to predict when the next state “1” will occur?
Where do the timestamps come from? Are these random samplings that are just not evenly spaced? Or are these driven by events? If event driven, what are the events?

1 Like

No, I don’t think Numenta has put GridCells into NuPIC yet. But they are trivial to implement. Please refer to this video.

Ohh. Let me explain it in more detail. By x and y I mean the input and output of the TM (as how xy is commonly used in ML terms).
What a TM does is that it predicts a possible SDR based on the current state. So let’s assuming the following pesudo code.

for x, y in zip(input, desired_output):
    tm.compute(x, True)
    tm.compute(y, True)
    tm.reset()

TM has no way to know that the values we sent to it isn’t temporally coherent. So it learns how to map the input into desired_output. To avoid TM learning useless relation between different pairs of out input; we reset the TM every time. Sure the two SDR have to have the same shape. But since TM doesn’t grow connections to cells that haven’t being fired. That’s not a problem. Just fill the empty spaces with 0.

And to generate predictions.

def genPredictino(tm, x):
    tm.compute(x, false)
    pred = tm.getPredictiveCells()
    tm.reset()
    return pred
1 Like

I have updated my previous post with better formatted samples. i think this will help. To answer your questions:

@rhyolight, @Paul_Lamb

  • The event happening at each timestamp is the same regardless of where your are in the 50 timestamps.
  • You can think of it this way: imagine that a short beep is playing at each timestamp. When hearing this sequence of 50 beeps over time you must guess if the system emitting this particular sequence of beeps is in state 1 or 0.
  • The duration of the series of 50 beeps can vary. In the 2 example above they both finish after about 17 seconds but in some cases it can take up to 43 seconds to emit the 50 beeps.
  • Each sequence of 50 timestamps can only be associated with a single state. it’s either 0 or 1.

Is that clearer ?

3 Likes

Ok, I think I understand now. Essentially a full set of 50 timestamps goes into the definition of “State 0” or “State 1”. The timestamps are sorted, and although there are always 50 of them, they can range anywhere between 0.001 and 45.0. One could think of the state (0 or 1) as an object, and the timestamps as that object’s features.

In this case, I would alter the strategy a bit. If we assume there is a temporal pattern to the timestamps that can be learned, then you could encode the deltas between timestamps as your input. Once the TM layer learned the pattern, the activity in that layer would then represent a specific feature (timestamp) in a specific object (state). Using a strategy like Numenta described in the Columns Paper (adding an output layer), activity in the TM layer would activate the proper representation for State 0 or State 1 in the output layer.

I realize that is a tad theoretical. You’d probably have to be a little creative to set up the system in NuPIC.

1 Like

Hi @laurent,

What if you had 2 NuPIC models, one trained on all the samples of state 0 and the other on all examples of state 1. Then when a new sample sequence comes in you could run it through both and get total anomaly scores, and classify to whichever state model has the lower anomaly score.

If there are many more sequences of state 0, then that model will have more training data and should usually be harder to surprise than the other model. This could imply that if a given sequence is getting a lower total anomaly score from the state 1-model then it could be distinguishing itself as state-1.

Of course this approach or any is sensitive to the encoding scheme so I’d definitely advise doing some exploration there first, to make sure the values of the different scales have appropriate amounts of overlap.

1 Like

Thank you all for your suggestions.
@rhyolight: with the above clarification on my dataset, any suggestion on your side ?

The easiest way to try HTM on this would be to train only on state “0” data. How many sequences of this data do you have? If several thousand I would encode the times as deltas. Just send in one delta after the next until the sequence is through, then reset the TM every time.

Once the model has seen as much training data as you can give it, disable learning. Now give it some state “1” sequences and see how anomaly scores compare vs state “0”.

2 Likes

Thanks for the advice. As I have about 13 000 samples of state 0, I think this is what I’m going to try first.

Also as the deltas span roughly 4 orders of magnitude (0.001 to 10) I was wondering what would be the most appropriate encoder to use or if applying a log transformation before the training would be in order.

Kr

I think you will need to experiment, but a logarithmic encoding of the deltas is certainly something I would try.

Hi Matt,

I’m currently trying the strategy you suggested. What I’m observing though is that the processing time to learn a new sequence seem to take longer as time goes (e.g an additional 20 sequences takes 30 seconds to learn after 200 sequences, 45 seconds after 400 sequences, 55 seconds after 600,…). is that an expected behavior ? (note: there is no memory swap happening)

Thanks.

How do you define “learn a new sequence”?

This is typical of the TM when there are no resets. If you want to decrease model size, there is some good advice in HTM anomaly detection model too large for streaming analysis.

A sequence is a set of 50 values the model is learning, then a reset happen (the sequence ID changes in the CSV file fed into the OPF model, declared as a ‘S’ column) and then another sequence of 50 values and so on,…

So what you are saying here is that the reset may not happen ?

No, it should be happening, but I don’t know of any tests that look at how the model size grows with resets vs no resets. I thought that models that don’t reset would perform worse over time, but my intuition could be wrong. @Paul_Lamb do you have any thoughts on how well models perform (reset vs no reset)?

If you do not use resets, the length of the sequence grows longer the more often it is repeated. For example, sequence A-B-C-D repeated for a number of times, the sequence of representations becomes:

With reset: A -> B’ -> C’ -> D’
Without a reset: A -> B’ -> C’ -> D’ -> A’ -> B’’ -> C’’ -> D’’ -> A’’ -> B’’’ -> C’’’ -> D’’’ …

In the second case (no reset) each time it reaches the end of the known sequence, the minicolumns for the next input bursts, and a new input is added to the end of the sequence. Because of the bursting, this results in the next input activating a union of that next input in all itterations of the repeating sequence. For example, if A bursted at the end of my above example, the next timesteps would have the following activity:

A -> B’+B’’+B’’’ -> C’+C’’+C’’’ -> D’+D’’+D’’’ -> A’+A’’+A’’’ -> B’’+B’’’ -> C’’+C’’’ -> D’’+D’’’ …

With the unions becoming more sparse each cycle through the repeating sequence. Since predictions are generated based on active cells, having all of these unions of activity that become denser and denser over time could definitely impact performance.

2 Likes

Ok so let’s see if I’m doing anything wrong either in the OPF model parameters or in the file format that could cause the reset not to happen:

Here is an extract of the OPF params : neuron_id is the first colum of the CSV file and is not encoded. It’s just here for sequence numbering. And then right below an extract of the sample file:

'modelParams': {
        # The type of inference that this model will perform
        'inferenceType': 'TemporalAnomaly',

        'sensorParams': {
            # Sensor diagnostic output verbosity control;
            # if > 0: sensor region will print out on screen what it's sensing
            # at each step 0: silent; >=1: some info; >=2: more info;
            # >=3: even more info (see compute() in py/regions/RecordSensor.py)
            'verbosity' : 0,

            # Include the encoders we use
            'encoders': {
                u'neuron_id': None,
                u'value': {
                    'clipInput': True,
                    'fieldname': u'value',
                    'maxval': 25.0,
                    'minval': 0.0,
                    'resolution': 0.01,
                    #'n': 50,
                    'name': u'value',
                    'type': 'ScalarEncoder',
                    'w': 21},},

            # A dictionary specifying the period for automatically-generated
            # resets from a RecordSensor;
            #
            # None = disable automatically-generated resets (also disabled if
            # all of the specified values evaluate to 0).
            # Valid keys is the desired combination of the following:
            #   days, hours, minutes, seconds, milliseconds, microseconds, weeks
            #
            # Example for 1.5 days: sensorAutoReset = dict(days=1,hours=12),
            #
            # (value generated from SENSOR_AUTO_RESET)
            'sensorAutoReset' : None,
        },

        'spEnable': True,

Sample CSV file:

neuron_id,value
int,float
S,
53,0.004258
53,0.005851
.... other lines with 53 as neuron_id.....
53,0.021194
53,0.015249
7229,0.080382
7229,0.027950
.... other lines with 7229 as neuron_id.....
7229,0.368929
7229,0.162005
etc....

Do you anything wrong in this ?

Thanks!

The only suspicious thing I see is that you are not specifying an n value for the value encoder. I’m not cure what the default n value is, but it affects the total width of the encoding expected by the SP, which is specified by inputWidth in the sp params.

If this is the only value, I would make it 400 bits and adjust the inputWidth in the sp params to be the same (whatever you make it).

1 Like

But I thought I read in the documentation that you can only specify one of n, resolution or radius and the 2 others are calculated. here I’m specifying resolution hence the comment on the specification of n. So you think specifying n instead of resolution would be better, right ?

Laurent

1 Like

Figures from my thesis (page 83, section 7.2.4) comparing the performance of reset vs non-reset. Reset performed 5x better on my specific case. However, I do not use resets by default. This is not Nupic of course but it is still something.

2 Likes

Yes you are right. I wasn’t thinking.