About getting anomalyProbability

oreore · March 28, 2017, 12:05pm

Hi,

Could any one please help me on getting anomalylikelihood?

1: I have input data stream of just timestamp and a field which contains categorical values in string type (I flagged the field with ‘C’).
likelihood = anomalyLikelihood.anomalyProbability(actualValue, anomalyScore, timestamp)
In this code, I put string typed value for the actualValue and it gives error like the followings:

Traceback (most recent call last):
  File "D:/PY_WORKSPACE/nupic_inhouse/pretest/run_anomalyDetection.py", line 159, in <module>
    runAnomalyDetection(scalar=False)
  File "D:/PY_WORKSPACE/nupic_inhouse/pretest/run_anomalyDetection.py", line 137, in runAnomalyDetection
    likelihood = anomalyLikelihood.anomalyProbability(actualValue, anomalyScore, timestamp)
  File "C:\Python27\lib\site-packages\nupic\algorithms\anomaly_likelihood.py", line 317, in anomalyProbability
    skipRecords=numSkipRecords)
  File "C:\Python27\lib\site-packages\nupic\algorithms\anomaly_likelihood.py", line 473, in estimateAnomalyLikelihoods
    performLowerBoundCheck=False)
  File "C:\Python27\lib\site-packages\nupic\algorithms\anomaly_likelihood.py", line 689, in estimateNormal
    "mean": numpy.mean(sampleData),
  File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 2942, in mean
    out=out, **kwargs)
  File "C:\Python27\lib\site-packages\numpy\core\_methods.py", line 65, in _mean
    ret = umr_sum(arr, axis, dtype, out, keepdims)
TypeError: cannot perform reduce with flexible type

But with different data stream with a timestamp and a field of float typed values, no errors.

Is there any way to avoid such error with categorical field?

2: How should I put actualValue when I run the anomalyDetection for multiple fields data (want to know anomaly of the combination of two values)?
For example, data looks like below:

c0,c1,c2
datetime,float,float
T,,
2017-01-01 1:00,0,0
2017-01-01 2:00,0.114,0.114
2017-01-01 3:00,0.226,0.228
2017-01-01 4:00,0.336,0.343

rhyolight · March 28, 2017, 1:57pm

@scott @subutai I have never done anomaly likelihood with string categories. Can NuPIC handle that?

subutai · March 29, 2017, 9:31pm

Yes it should work, as long as the correct encoders are used (category encoder?)

oreore · March 30, 2017, 3:10am

It’s good to hear from you and rhyolight.

I used the category encoder by specifying the MODEL_PARAMS like the followings:
(nupic 0.5.7)

def createModel():
  return ModelFactory.create(category_model_params.MODEL_PARAMS)

The contents of category_model_params:

MODEL_PARAMS = {'aggregationInfo': {'days': 0,
                     'fields': [],
                     'hours': 0,
                     'microseconds': 0,
                     'milliseconds': 0,
                     'minutes': 0,
                     'months': 0,
                     'seconds': 0,
                     'weeks': 0,
                     'years': 0},
 'model': 'CLA',
 'modelParams': {'anomalyParams': {u'anomalyCacheRecords': None,
                                   u'autoDetectThreshold': None,
                                   u'autoDetectWaitRecords': None},
                 'clParams': {'alpha': 0.09946475054821349,
                              'regionName': 'SDRClassifierRegion',
                              'steps': '1',
                              'verbosity': 0},
                  # 'inferenceType': 'NontemporalMultiStep',
                 'inferenceType': 'TemporalAnomaly',
                 # 'inferenceType': 'TemporalMultiStep',
                 'sensorParams': {'encoders': {
                                               '_classifierInput': {'classifierOnly': True,
                                                                    'fieldname': 'cat',
                                                                    'n': 521,
                                                                    'name': '_classifierInput',
                                                                    'type': 'SDRCategoryEncoder',
                                                                    'forced': True,
                                                                    'w': 21},
                                               u'cat': {'fieldname': 'cat',
                                                        'name': 'cat',
                                                        'w': 21,
                                                        'n': 521,
                                                        'forced': True,
                                                        'type': 'SDRCategoryEncoder'},

I have also tried categoryEncoder instead of SDRCategoryEncoder.

Could you recommend me to do some modification?

oreore · March 30, 2017, 5:11am

Answer by myself:
I analyzed the codes (anomaly_likelihood.py) and found that there is a block that investigate the varience of actualValue with comments like followings:

# HACK ALERT! The CLA model currently does not handle constant metric values
# very well (time of day encoder changes sometimes lead to unstable SDR's
# even though the metric is constant). Until this is resolved, we explicitly
# detect and handle completely flat metric values by reporting them as not
# anomalous.
s = [r[1] for r in aggRecordList]
metricValues = numpy.array(s)
print metricValues
metricDistribution = estimateNormal(metricValues[skipRecords:],
                                    performLowerBoundCheck=False)
if metricDistribution["variance"] < 1.5e-5:
   distributionParams = nullDistribution(verbosity = verbosity)

I guess this part is used for the cases where the numeric input data shows very little variation.
However, this part does not seems to properly deal with categorical input data.

Assuming that my categorical input data is sufficiently not static, I removed the block above and runs well.

Please let me know if I misunderstood.

rhyolight · March 30, 2017, 6:20pm

@scott Can you read @oreore’s comment above and provide your viewpoint? Is this a bug?

scott · March 30, 2017, 7:23pm

What do you mean by this? Does it throw an exception or do you see bad anomaly likelihood values?

rhyolight · March 30, 2017, 7:25pm

@scott what concerns me is his comment that when removing the “hack” code in the snippet he pasted, it ran as he expected.

scott · March 30, 2017, 7:49pm

@rhyolight - I don’t understand. Of course the code will run just fine without the “hack”, it just might get poor performance on values that stay constant for a while when there are other fields (like a timestamp) that are changing.

But why did he remove it? What was the problem?

oreore · March 31, 2017, 1:13am

Hi,
What I mean was the error attached in the first post of this thread.
I guess the metricValues array contain “categorical values” which are actual values input from the stream.
But I doubt that estimateNormal function using the array makes error.

scott · March 31, 2017, 6:56pm

Oh I see now. Yes, we should only do the “hack” for numeric values.

I have a PR to address the issue here:

Topic		Replies	Views
Getting Anomalylikelihood from Multi-field anomaly analysis NuPIC question	2	583	June 14, 2017
How to detect anomalies based on types? Applications	1	1049	February 14, 2018
Understanding AnomalyLikelihood? Engineering	1	408	October 8, 2019
Newbie question: How to get both anomaly score, anomaly likelihood and predictions NuPIC question	5	1080	January 29, 2020
How to get Anomaly Likelihood using anomaly_likelihood.updateAnomalyLikelihoods NuPIC anomaly-likelihood	3	752	May 9, 2019

About getting anomalyProbability

Related topics