So I’ve been running NuPIC on several datasets, curiously rerunning them with different settings to see what effects there are on performance. For instance there’s a 3-field dataset on riverview of traffic data, with ‘flow’, ‘speed’ and ‘occupancy’ values at each time step. I want to test how much benefit there is to including all 3 fields when trying to predict one of the fields like ‘flow’, so I ran it once including all 3 fields and again just including ‘flow’ itself. I’m also curious what effect of the timestamp value and datetime encoding it produces have on performance, since the SDR’s now contain this information instead of just the metric itself.
Given this performance-comparing objective, what performance-measuring value(s) would you recommend? I’ve done a couple very simple things, looking at average anomaly score, correlations between predicted and observed metric values and also an ‘anomaly’ rate. This is the proportion of all anomaly likelihood values that exceed 0.9. I’m thinking that a model that has learned on a given dataset better will be less surprised in general and thus have fewer anomalies occurring per timestep. Any ideas are eagerly welcomed. Thanks!