How to measure NuPIC performance?

sheiser1 · October 15, 2016, 5:09pm

Hi all,

So I’ve been running NuPIC on several datasets, curiously rerunning them with different settings to see what effects there are on performance. For instance there’s a 3-field dataset on riverview of traffic data, with ‘flow’, ‘speed’ and ‘occupancy’ values at each time step. I want to test how much benefit there is to including all 3 fields when trying to predict one of the fields like ‘flow’, so I ran it once including all 3 fields and again just including ‘flow’ itself. I’m also curious what effect of the timestamp value and datetime encoding it produces have on performance, since the SDR’s now contain this information instead of just the metric itself.

Given this performance-comparing objective, what performance-measuring value(s) would you recommend? I’ve done a couple very simple things, looking at average anomaly score, correlations between predicted and observed metric values and also an ‘anomaly’ rate. This is the proportion of all anomaly likelihood values that exceed 0.9. I’m thinking that a model that has learned on a given dataset better will be less surprised in general and thus have fewer anomalies occurring per timestep. Any ideas are eagerly welcomed. Thanks!

– Sam

sheiser1 · October 19, 2016, 3:07pm

Maybe a simple mean error is the way to go?

rhyolight · October 19, 2016, 3:10pm

I am hoping some other @committer will help answer this question.

cogmission · October 19, 2016, 5:19pm

@rhyolight,

I don’t have the reference at hand, but perhaps you could direct @sheiser1 to the CLAClassifier “lunch room” talk video where Subutai talks about the effect of including more or less fields in the inference input? I think that video may be relevant here?

rhyolight · October 19, 2016, 6:19pm

ycui · October 19, 2016, 9:28pm

It depends on what task you are interested in, anomaly detection or prediction. For prediction tasks, we typically use negative log-likelihood or mean absolute percent error (MAPE). You can look at this paper for details of these metrics

For anomaly detection tasks, you need to know the ground-truth anomaly labels to compute an anomaly score. The exact details regarding how we compute anomaly scores are documented in this paper.

Jonathan_Mackenzie · October 20, 2016, 12:11am

My current work is looking at this exact problem (traffic flow prediction), in my experiments to compare HTM and LSTM I’ve been using MAPE, RMSE and GEH , GEH is better when dealing with low values (a prediction of 5 when the real value is 1 is a large relative error, when the flow is very high predicting 200 vehicles when there is actually 220 vehicles isn’t that important), GEH should handle this better than MAPE and RMSE.

mraptor · October 25, 2016, 7:22pm

I wrote those shortcuts .

def mape(ys,yhat): …
def nll(ys,yhat,bins=1000): …
def rmse(ys,yhat) : …
def mae(ys,yhat) : …

github.com

vsraptor/bbhtm/blob/master/lib/stats.py

import numpy as np

class stats():

	"""
		Calculate the deviations in a sample.
	"""
	@staticmethod
	def dev(xs): return xs - np.mean(xs)

	"""
		Calculate the covariance between two data sets.
			sample=True : sample covarince
			sample=False: population covariance
	"""
	@staticmethod
	def cov(xs,ys,sample=True):
		dec = 1 if sample else 0 #if sample-cov decrement len by 1
		#sum of products of deviations
		return np.dot( stats.dev(xs), stats.dev(ys) ) / (len(xs) - dec)

This file has been truncated. show original

Also you may follow this discussion :

http://lists.numenta.org/pipermail/nupic-theory_lists.numenta.org/2016-April/003693.html

where Yuwei had the patience to explain to me NLL

mraxilus · April 18, 2017, 3:52pm

How did your work go, with regards to the LSTM/HTM comparison? I’m currently working with LSTMs for time series predictions, and would like to know more about using HTMs.

brev · April 5, 2019, 11:01pm

Does anyone happen to know what yhat stands for? I think ys is for yseries (acutal data), and I think yhat is the prediction (comes from the world of linear regression?), but I can’t for the life of me figure out the “hat” part. thanks.

Topic		Replies	Views
Not Getting Anomaly Results NuPIC anomaly-detection	3	990	July 9, 2016
NUPIC Performance NuPIC nupic	1	801	May 15, 2017
NuPIC model better matches new data than data it learned on NuPIC	6	986	February 11, 2017
Unsupervised learning for Anomaly Detection on Streaming data without Date field NuPIC usage-help , question	2	1084	September 8, 2016
Anomaly Detection - Poor results - Build issues or Tuning issues on Real Data NuPIC	1	397	June 7, 2020

How to measure NuPIC performance?

Related topics