Difference between Actual and Prediction is high but anomaly score is low

Hi
I am getting low anomaly score but difference between actual and prediction value is high,

Here is my setup
I have generated sample data as shown below

y = (.6x1+.3x2+ .1x3)

I have created MultiEncoder for the below fields

timestamp 
y
x1
x2
x

created publisher and subscriber and assinged to network object as shown below

   	Network nt= Network.create(netWorkName, p).add(Network.createRegion("Region 1")
	.add(Network.createLayer("Layer 1/3", p).alterParameter(KEY.AUTO_CLASSIFY, Boolean.TRUE)
	.add(Anomaly.create()).add(new TemporalMemory()).add(new SpatialPooler()).add(me)
	add(Sensor.create(ObservableSensor::create, SensorParams.create(Keys::obs, "", manual)))));
	nt.observe().subscribe(getSensorSubscriber());
        nt.start();

passed values in the below format

timestamp,y,x1,x2,x3

sample value

06/29/17 23:34:36, 41.7,60,13,18

Here is graphical output of actual vs predicted and anomaly scores

when we pass multiple values , is the predicted value is based on all values (y,x1,x2,x3 ) or single value(y) ,
another problem I am facing is ,I am always getting .5 Anomaly Probability.

 Date date = new Date(Long.parseLong(this.sensorMessage.getDateTime()));
DateTime dateTime = new DateTime(date);
AnomalyLikelihood AnomalyLikelihood =new AnomalyLikelihood(false,0,false ,200,200);
			  double anamolyLikelyhood =AnomalyLikelihood.anomalyProbability(Double.parseDouble(message.getRawValues()[0]),response.getAnomalyScore(),dateTime);

Thanks & Regards
MH

As I understand it the predicted value is based on just one of the 4 values, which would be specified under ‘predicted value’ (in the OPF at least). The anomaly score however is based on the whole system, all four values which are essentially concatenated into one spare vector. So it could be that while one value is acting unexpectedly (high difference between predicted and actual for that one value), the temporal dynamics of the system overall is acting more predictably.

As for the anomaly likelihood, that is calculated by looking at the current anomaly score minus the average anomaly score over a certain window of time steps and then divided by the standard deviation of anomaly scores over that window, basically like a z-score from statistics. I find that it takes some time for the anomaly score to change from 0.5, I think depending on the window size. In my cases it’ll take around 500 time steps before it changes though its not always the same, so I’m not exactly sure where that variation comes from. I hope that helps somewhat, I know others can tell you more.

1 Like

Thanks sheiser1 , You are correct it is showing predicted value for second one x1, this due to wrong configuration .

  1. If I pass 5 values sat (time, y, x1,x2, x3 ) Does numenta recognize y is function of (.6x1+.3x2+ .1x3) ?

  2. Is AnomalyLikelihood configuration good for y?
    AnomalyLikelihood AnomalyLikelihood =new AnomalyLikelihood(false,0,false ,200,200);
    double anamolyLikelyhood =AnomalyLikelihood.anomalyProbability(Double.parseDouble(message.getRawValues()[0]),response.getAnomalyScore(),dateTime);

Thanks & Regards
MH

No problem @wip_user, I’ll do my best to clarify as my knowledge allows. I can tell you for #1 that when you feed those 5 fields into NuPIC it won’t see one as a function of the others, rather it encodes all of them into one vector (Spare Distributed Representation). So it has one SDR encoding containing all 5 values [time, y, x1, x2, x3] at time ‘t’, and it sees that SDR as a function of all previous [time, y, x1, x2, x3] SDR’s at all prior times (‘t-1’,‘t-2’,‘t-3’… back to the first time step).

Unlike a lot of other machine learning frameworks, NuPIC isn’t setup to learn the relationship between independent variables (time,x1,x2,x3) and a dependent variable (‘y’), but rather the temporal patterns for all variables involved over time. This can still be used to predict a ‘y’ value given any of the other variables, though it does it in a fundamentally different way than the traditional regression or ‘neural network’ approaches.

As for your second question I personally don’t know enough about the guts of the code to tell you for sure, though I bet some real Numenta brains may be of help. I’m talking about you @rhyolight, @scott, @Austin_Marshall among many others (if I may shout you guys out like that). They’ve all helped me a TON with stuff like this, and though they’re surely quite busy I’m sure one will lend an eye.

Thanks sheiser1 for the clarification . suppose if I want to get prediction for next n steps , do I need to pass x1,x2,x3 values ?
take 2 case , Assume that I pass the data
1 ) in regular intervals
2) in irregular intervals

second issue , I made a mistake by initiating AnomalyLikelihood AnomalyLikelihood =new AnomalyLikelihood(false,0,false ,200,200); at every data point instead of initiating at the start. now I am getting right values .

First off I’m glad to hear you got the AnomalyLikelihood issue worked out! Nice work.

As for getting more steps out into the future you need to go into the model params file and under ‘cl params’ adjust ‘steps’ from ‘1’ to ‘1,2,…n’. This will give you predictions for 1 step, 2 steps and …n steps out. If you just want n steps out then just replace ‘1’ with ‘n’. In order to write these predictions out to csv, you’ll also have to go into the runIoThroughNupic() function in the run.py file and change the 'prediction = ’ line from:

prediction = result.inferences[“multiStepBestPredictions”][1]

to:

prediction = result.inferences[“multiStepBestPredictions”][n]

@rhyolight outlines this all nicely in this post for further reference. Let me know how it goes

Thanks sheiser1 for valuable responses . I didn’t find similar example in java .
how to configure multi step in Java, I have to do this prediction on streaming data .

You’re certainly welcome, and I’m glad to hear things are moving forward :slightly_smiling_face:

1 Like