Hi All,
Scenario
So I’m training separate NuPIC TemporalAnomaly models each with separate data. There are 7 subjects with 7 distinct data sets, used to train 7 distinct models. I’m saving each model and then calling it to run on each data set, so model 1 was trained on subject 1’s data, and then run against subject 1’s data along with subject 2’s through subject 7’s. For each set of data (1-7) run against model 1, I record the average anomaly score (and anomaly likelihood), to get a sense of which of the 7 data sets model 1 was least surprised by.
My Intuition/Curiosity
Intuitively I would expect for the average anomaly score to be lowest when model 1 was run again subject 1’s data, which is the same data it was trained on, though this actually isn’t the case. The model has a lower average anomaly score when run against subject 4’s data instead.
This isn’t totally shocking to me, though I wonder if there are more potential reasons for this than I realize. For one the data is extremely noisy. I’m using the simple scalar encoder on real-valued inputs that seem to move pretty chaotically. The N and W for the metric (called ‘Response’ are 275 and 21 respectively, and 115 and 21 for the classifier).
I figure that all this noise causes a lot of synapses of low permanence value to be formed, since there are many different sequences occurring that are likely not repeating with much regularity or even at all. With this big tangled mess of transitions learned, it seems reasonable to not expect the system to remember exactly what happened in time steps 1-10 when it’s just finished step 2000 with a ton of noise all along the way.
I imagine it like if you had a sequence of letters that began say ‘ABC…’ but then over the next 2000 steps went something like ‘A%^*B–#@C…’. By the time it finished learning all those largely noise-y sequences you couldn’t just expect it to predict a clean ‘ABC’ once the data is fed in again from the beginning.
The Data
I’m including a paste with the data from the first few time steps here. Column 1 marked ‘subject_1’ is the metric (what was originally fed in to train the model). Column 2 marked ‘subject_1_prediction’ is what the saved model predicted for each ‘subject_1’ value. Columns 3 and 4 are the anomaly score and anomaly likelihoods.
http://pastebin.com/fcHStUJP
My Questions (finally):
– Does my theory about this make sense to you?
– Are there other aspects I’m not thinking about here?
– May there be a more effective way to summarize the matching between a saved model and a new data set than simply the average anomaly score or likelihood?
I really want to have as full an intuition as possible about how NuPIC learns, and how the resulting models are affected by the data and its noise level. I eagerly welcome anyone’s take on any of this. Thanks,
– Sam