Is my data being predicted correctly?

Addonis · June 16, 2016, 8:46pm

Okay, I ran my data and it looks pretty nice.

I modified Setus’s template for multiple fields and put this in (along with the import of the anomalyLikelihood library)

anomalyLikelihood = anomaly_likelihood.AnomalyLikelihood()
likelihood = anomalyLikelihood.anomalyProbability(results, anomaly_score, 0)

It gives me a likelihood of 0.5 throughout the entire data. So I assume I did something wrong in assigning these values. I tried other values though, and it still gave me that likelihood, so I’m stuck.
I just remembered something interesting. When I was trying out the hotgym anomaly demo with my data, I would always get an output of 0.5 in the graph below. Well, it looks like this is the same deal here. However, in the hotgym demo I didn’t really modify the code much, I just ran it through my data.

Setus · June 17, 2016, 1:41pm

I can see from your code that you’re doing it wrong as I’ve successfully used likelihood, and although the first 400-ish values were indeed 0.5, all the rest were other than 0.5.
Insert these lines inside my template code in process_input.py at the corresponding line numbers

31 | from nupic.algorithms import anomaly_likelihood
44 | anomaly_likelihood_helper = anomaly_likelihood.AnomalyLikelihood()
218 | outputRow = [row[0], row[predicted_field_row], “prediction”, “anomaly score”, “anomaly likelihood”]
251 | anomaly_likelihood_score = anomaly_likelihood_helper.anomalyProbability(original_value, anomaly_score, time_index)
254 | outputRow = [time_index, original_value, “%0.2f” % inference, anomaly_score, anomaly_likelihood_score]

Addonis · June 17, 2016, 2:03pm

Oh, that’s the right way to do it, thank you!

I ran a large swarm over my data, it took 9+ hours to run, haha.

voiceclonr · June 17, 2016, 3:21pm

@Addonis: Unsolicited comment We are in similar boat (trying to get nupic run with most optimal settings). I am considering just using NAB as proxy with my own dataset. Afterall, it’s the key benchmark - so any optimal setting would make its way into this. Link here https://github.com/numenta/NAB

vkruglikov · June 17, 2016, 5:12pm

@Setus, what’s line 218 for in your

Insert these lines

code block

Setus · June 20, 2016, 2:04pm

@vkruglikov nothing more than just printing out the header line/first line for the output file, describing which values are located in which column.
[row[0], row[predicted_field_row], “prediction”, “anomaly score”, “anomaly likelihood”] will stand for

your_time_index_name, your_predicted_metric_name, prediction, anomaly score, anomaly likelihood

so for example

date, EKG, prediction, anomaly score, anomaly likelihood
23.03.2016, 3045.6, 3030.3, 0.8, 0.4
…

The point of that line was to update the header from the previous header consisting of

your_time_index_name, your_predicted_metric_name, prediction, anomaly score

to

your_time_index_name, your_predicted_metric_name, prediction, anomaly score, anomaly likelihood

Addonis · July 8, 2016, 8:38pm

For some reason I get an error saying columnCount and inputWidth need to be above zero when I try to run a RandomDistributedScalarEncoder over your template.
I just change

{'encoders': {u'Vals': {'clipInput': True,
                                                         'fieldname': 'Vals',
                                                         'maxval': 952,
                                                         'minval': 337,
                                                         'n': 222,
                                                         'name': 'Vals',
                                                         'type': 'ScalarEncoder',
                                                         'w': 211}},

to

{'encoders': {u'Vals': {'clipInput': True,                     'classifierOnly': True,
                                                                         'fieldname': 'Vals',
                                                                         'name': '_classifierInput',
                                                                         "resolution": 100,
                                                                         "seed": 42,
                                                                         'type': 'RandomDistributedScalarEncoder'

but I get those errors as I mentioned above. But I didn’t get errors when I did this in the one_gym demo.
I guess it won’t work like this because the one_gym uses other code with its encoder, right? I was wondering if you have a github with your code

Setus · July 12, 2016, 9:55am

Hi, sorry for the late answer
I’ve personally never experienced such an error before, so I don’t really know what the ‘columnCount and inputWidth above zero’ thing is about. However, I know that the “_classifierInput” is different from the normal non-classifierInput encoder (ie when name is _classifierInput and classifierOnly is true). I’ve noticed that the generated encoder parameters for the classifierInput are always bigger than for normal encoders and seems to affect the results of the CLA quite profoundly (in a negative outcome for me at least). I couldn’t find very much information about it so I decided to ask about it and other things in my post here, although none have dared to answer so far

If I remember correctly, one_gym simply runs a swarm, and uses the values from the model_params file to run the CLA on, which is exactly what my template code does, unless there is a difference in the input parameters that are sent to the swarm between my template and one_gym. No, I don’t have a github of my code, because my code is nothing more than the template really with ever so small variations here and there when I’m experimenting with different data sets. The NuPIC codebase is quite a behemoth so I haven’t dared altering any code really. All I’ve done is tried my best to understand HTM and NuPIC it, run it, understand why I get the results that I get, and use all that understanding to get the best possible results.

rhyolight · July 12, 2016, 2:11pm

Remember, if you are looking for anomalies on scalar input data, you don’t need to swarm.

Addonis · July 12, 2016, 5:39pm

lol

I think there is a difference, since the hotgym data uses the coded parameters from the energy and the weekly dates. But your template is more general, since it can take any file/input into it.
Anyway, thanks for the answer.

So then swarming is only needed if you want to predict something, not to detect anomalies, right?

rhyolight · July 12, 2016, 5:41pm

For the most part, yes. If you are doing anomaly detection on non-scalar input data, you’re going to have to experiment because we don’t have pre-established model parameters for that stuff.

mellertson · August 6, 2016, 9:21am

Is there an encoder to encode the difference between numbers in the data set? Or perhaps a percent change? It seems like encoding data using a diff or % change might preserve the pattern, while making it easier for HTM to predict this type of data.

cogmission · August 6, 2016, 12:08pm

@mellertson

Couldn’t this be done using a pre-processing step to output, say for instance: Original | Delta | %Change ? …in a cvs format and then have a Scalar or RandomDistributedScalar Encoder process these 3 columns?

mellertson · August 6, 2016, 11:31pm

Seems like that would work. Good idea.

rhyolight · August 6, 2016, 11:48pm

See the DeltaEncoder:

github.com

numenta/nupic-legacy/blob/master/src/nupic/encoders/delta.py#L29-L34


      
            import capnp
          except ImportError:
            capnp = None
          if capnp:
            from nupic.encoders.delta_capnp import DeltaEncoderProto

mellertson · August 8, 2016, 4:40am

That’s just what I was looking for. Thanks!

I guess I should take a more in depth look through all of the encoders. I’m guessing there might be others, I’m still unaware of, that could be useful.

rhyolight · August 8, 2016, 4:49am

The forum search is pretty good, but this one is better because it searches GitHub issues and mailing list archives.

rhyolight · August 9, 2016, 3:34pm

A post was split to a new topic: Model is so slow to track the new trend

Topic		Replies	Views
Don't swarm for Anomaly models NuPIC swarming , anomaly-detection	16	5321	October 17, 2019
Should I 'swarm' some sample data? NuPIC	18	1399	June 5, 2017
Issue with getting anomaly parameters instead of swarming NuPIC	52	2127	May 1, 2019
Financial Anomaly detection NuPIC question	29	1735	January 19, 2019
Force the usage of a field by swarm Education	18	659	February 4, 2020

Is my data being predicted correctly?

Related topics