Scenario
So I have 20 saved NuPIC models, each of which has been trained on the data generated by an individual subject playing a simple game. With these 20 ‘base models’ saved, each is compared to the observed behavior of an unknown player. Each of the 20 base models is applied to predicting the game playing behavior of the unknown player. As a result I have two columns of data for each base model: 1) Predicted action made by base model; 2) Observed action taken by player.
Question
What I’m wondering is, what may be some good way(s) to evaluate the performance of each base model to decide which one (if any) best matches the incoming data? I’d appreciate any thoughts!
I would love to see a sample of the data each model is getting, that would help characterize the problem. What are the data types? What encoders are you using?
But I do have an answer to your question, or at least what I would do… but you may need to retrain your models. You say they are prediction models, so they are probably TemporalMultiStep models (assuming you are using the OPF)? If these models were TemporalAnomaly models, they would produce an extra output called an “anomaly score”, which indicates how anomalous current data is when compared to the patterns it has learned over time.
The problem is, I don’t think you can change the type of an OPF model once it has been created, so you might need to retrain all these models as TemporalAnomaly models. Here’s an example project (including video tutorial) that does this exact thing and plots it over time.
Once you have 20 anomaly detection models, pass the data stream into each of the 20 models. I highly suggest using the AnomalyLikelihood module to post-process the anomaly score. You can see an example usage of this here:
And here’s a video about how the anomaly likelihood algorithm works and why it exists:
Anyway… the model with the lowest anomaly likelihood is the classification of the data stream.
It had occurred to me that the Anomaly score could be a great fit for this question. The data are small float values and I’m using the most simple scalar encoder (declaring min and max values). I’ll attach one example of my data as well (the model learned on subject 10 predicting the behavior of subject 11).
What those around me had recommended to rate the models (not knowing about NuPIC) was to do some sort of correlation testing between the TemporalMultistep Prediction output column and the observed behavior column, though I knew HTM and NuPIC must be smarter than that! And of course they were, and I really appreciate and respect your being the messenger.
I’ll rerun the data through NuPIC, this time getting TemporalAnomaly output instead since it outputs the anomaly score associated to each prediction. Is this equivalent to the proportion of columns that burst at that time step? I want to be sure I can explain exactly what the values I’m using to conclude are of course! Then I’ll follow the code you posted and get the AnomalyLikelihood, choosing the model with the lowest AnomalyLikelihood as the matching model. Do I have this right?
There’s one more twist to this hadn’t mentioned, that the unknown data stream may be coming from a subject that we have no model for (an impostor in other words). Since it’s not known whether the unknown subject belongs to any of the models, we want to have a way of deciding that this user doesn’t match any of the learned ‘base’ models. Would you have any thoughts on how to determine that there is no match? Do you think that in these ‘impostor’ cases the AnomalyLikelihoods would be much lower than they would be in the standard match cases? Do you think there may be a AnomalyLikelihood threshold above which there is no match? You certainly don’t need to answer all these! Just trying to tease out any ideas you (and everyone) may have.
Alright, I pasted the first 100 points in the gist.github in my account (gotham29), in a file called SamSampleData. Do I need to share this with you specifically or do anything else? I made it public hoping you could see it that way but am new to this. My apologies for my novice and thanks again!!
Yes. [quote=“sheiser1, post:3, topic:1003”]
Would you have any thoughts on how to determine that there is no match? Do you think that in these ‘impostor’ cases the AnomalyLikelihoods would be much lower than they would be in the standard match cases? Do you think there may be a AnomalyLikelihood threshold above which there is no match?
[/quote]
I think you’re on the right track, but you’ll have to do some experiments to define some thresholds. For example, you could make up a rule like “if all models emit an anomaly likelihood higher than 0.9 the subject is unknown”. But I know from experience you’re going to need to figure this out for this specific problem. I don’t have any general advice to give you. But I have an example in my nupic.critic plotting code that will only highlight anomalies on the output chart if a certain number of the models have anomaly likelihoods are above a certain threshold. You might do something similar.
That’s an output file, the ‘Observed’ column is the input stream (representing the actions of the unknown user) and the “Model Prediction” column is what the given base model predicted the “Observed” column would be. In this case, the actions of subject 11 compose the “Observed” column and the predictions made by the model trained on subject 10 composed the “Model Prediction” column, in other words, what Subject 10 would have done in that situation.