How to measure NuPIC performance?



Hi all,

So I’ve been running NuPIC on several datasets, curiously rerunning them with different settings to see what effects there are on performance. For instance there’s a 3-field dataset on riverview of traffic data, with ‘flow’, ‘speed’ and ‘occupancy’ values at each time step. I want to test how much benefit there is to including all 3 fields when trying to predict one of the fields like ‘flow’, so I ran it once including all 3 fields and again just including ‘flow’ itself. I’m also curious what effect of the timestamp value and datetime encoding it produces have on performance, since the SDR’s now contain this information instead of just the metric itself.

Given this performance-comparing objective, what performance-measuring value(s) would you recommend? I’ve done a couple very simple things, looking at average anomaly score, correlations between predicted and observed metric values and also an ‘anomaly’ rate. This is the proportion of all anomaly likelihood values that exceed 0.9. I’m thinking that a model that has learned on a given dataset better will be less surprised in general and thus have fewer anomalies occurring per timestep. Any ideas are eagerly welcomed. Thanks!

– Sam


Maybe a simple mean error is the way to go?


I am hoping some other @committer will help answer this question.



I don’t have the reference at hand, but perhaps you could direct @sheiser1 to the CLAClassifier “lunch room” talk video where Subutai talks about the effect of including more or less fields in the inference input? I think that video may be relevant here?



It depends on what task you are interested in, anomaly detection or prediction. For prediction tasks, we typically use negative log-likelihood or mean absolute percent error (MAPE). You can look at this paper for details of these metrics

For anomaly detection tasks, you need to know the ground-truth anomaly labels to compute an anomaly score. The exact details regarding how we compute anomaly scores are documented in this paper.


My current work is looking at this exact problem (traffic flow prediction), in my experiments to compare HTM and LSTM I’ve been using MAPE, RMSE and GEH , GEH is better when dealing with low values (a prediction of 5 when the real value is 1 is a large relative error, when the flow is very high predicting 200 vehicles when there is actually 220 vehicles isn’t that important), GEH should handle this better than MAPE and RMSE.


I wrote those shortcuts .

def mape(ys,yhat): …
def nll(ys,yhat,bins=1000): …
def rmse(ys,yhat) : …
def mae(ys,yhat) : …

Also you may follow this discussion :

where Yuwei had the patience to explain to me NLL :wink:


How did your work go, with regards to the LSTM/HTM comparison? I’m currently working with LSTMs for time series predictions, and would like to know more about using HTMs.