Help debugging the accuracy of anomaly detection on NYC_taxi data


#1

Hi,
This is my first time to play around with the HTM.java.
I tried it with nyc_taxi.csv data in NAB. But I got inaccurate results.
An abstract of the results

record_num :10051 raw_value :15926.0 prediction :17002.625064371106 raw_score :1.0
record_num :10052 raw_value :13785.0 prediction :16679.637545059773 raw_score :1.0
record_num :10053 raw_value :13905.0 prediction :15811.24628154184 raw_score :1.0
record_num :10054 raw_value :13575.0 prediction :15239.372397079287 raw_score :1.0
record_num :10055 raw_value :14094.0 prediction :14740.0606779555 raw_score :1.0
record_num :10056 raw_value :14488.0 prediction :14546.242474568848 raw_score :1.0
record_num :10057 raw_value :14428.0 prediction :14528.769732198192 raw_score :1.0
record_num :10058 raw_value :14402.0 prediction :14498.538812538734 raw_score :1.0
record_num :10059 raw_value :14747.0 prediction :14469.577168777112 raw_score :1.0
record_num :10060 raw_value :13915.0 prediction :14552.804018143976 raw_score :1.0
record_num :10061 raw_value :11432.0 prediction :14361.462812700782 raw_score :1.0
record_num :10062 raw_value :9659.0 prediction :13482.623968890546 raw_score :1.0
record_num :10063 raw_value :7681.0 prediction :13482.623968890546 raw_score :1.0
record_num :10064 raw_value :6257.0 prediction :13482.623968890546 raw_score :1.0
record_num :10065 raw_value :5520.0 prediction :13482.623968890546 raw_score :1.0
record_num :10066 raw_value :5159.0 prediction :13482.623968890546 raw_score :1.0
record_num :10067 raw_value :5283.0 prediction :13482.623968890546 raw_score :1.0
record_num :10068 raw_value :5821.0 prediction :13482.623968890546 raw_score :1.0
record_num :10069 raw_value :5586.0 prediction :13482.623968890546 raw_score :1.0
record_num :10070 raw_value :4729.0 prediction :13482.623968890546 raw_score :1.0
record_num :10071 raw_value :4402.0 prediction :13482.623968890546 raw_score :1.0
record_num :10072 raw_value :3877.0 prediction :13482.623968890546 raw_score :1.0
record_num :10073 raw_value :3384.0 prediction :13482.623968890546 raw_score :1.0
record_num :10074 raw_value :3203.0 prediction :13482.623968890546 raw_score :1.0
record_num :10075 raw_value :2611.0 prediction :13482.623968890546 raw_score :1.0
record_num :10076 raw_value :1783.0 prediction :13482.623968890546 raw_score :1.0
record_num :10077 raw_value :866.0 prediction :13482.623968890546 raw_score :1.0
record_num :10078 raw_value :297.0 prediction :13482.623968890546 raw_score :1.0
record_num :10079 raw_value :189.0 prediction :13482.623968890546 raw_score :1.0
record_num :10080 raw_value :109.0 prediction :13482.623968890546 raw_score :1.0
record_num :10081 raw_value :80.0 prediction :13482.623968890546 raw_score :1.0
record_num :10082 raw_value :40.0 prediction :13482.623968890546 raw_score :1.0
record_num :10083 raw_value :39.0 prediction :13482.623968890546 raw_score :1.0
record_num :10084 raw_value :26.0 prediction :13482.623968890546 raw_score :1.0
record_num :10085 raw_value :32.0 prediction :13482.623968890546 raw_score :1.0
record_num :10086 raw_value :8.0 prediction :13482.623968890546 raw_score :1.0
record_num :10087 raw_value :11.0 prediction :13482.623968890546 raw_score :1.0
record_num :10088 raw_value :9.0 prediction :13482.623968890546 raw_score :1.0
record_num :10089 raw_value :20.0 prediction :13482.623968890546 raw_score :1.0
record_num :10090 raw_value :21.0 prediction :13482.623968890546 raw_score :1.0
record_num :10091 raw_value :37.0 prediction :13482.623968890546 raw_score :1.0
record_num :10092 raw_value :69.0 prediction :13482.623968890546 raw_score :1.0
record_num :10093 raw_value :107.0 prediction :13482.623968890546 raw_score :1.0
record_num :10094 raw_value :216.0 prediction :13482.623968890546 raw_score :1.0
record_num :10095 raw_value :332.0 prediction :13482.623968890546 raw_score :1.0
record_num :10096 raw_value :570.0 prediction :13482.623968890546 raw_score :1.0
record_num :10097 raw_value :1049.0 prediction :13482.623968890546 raw_score :1.0
record_num :10098 raw_value :1589.0 prediction :13482.623968890546 raw_score :1.0
record_num :10099 raw_value :2285.0 prediction :13482.623968890546 raw_score :1.0
record_num :10100 raw_value :2945.0 prediction :13482.623968890546 raw_score :1.0
record_num :10101 raw_value :3544.0 prediction :13482.623968890546 raw_score :1.0
record_num :10102 raw_value :3876.0 prediction :13482.623968890546 raw_score :1.0
record_num :10103 raw_value :4535.0 prediction :13482.623968890546 raw_score :1.0
record_num :10104 raw_value :4923.0 prediction :13482.623968890546 raw_score :1.0

As you can see the raw_anomaly_score is always 1 and for records with decreasing or increasing trends the prediction is always 13482. Actually, the raw_anomaly_score is 1 for all 10000 data

The parameters:

{
	Spatial: {
		learn:true
		inputDimensions:[64]
		potentialRadius:64
		potentialPct:0.85
		globalInhibition:true
		inhibitionRadius:0
		localAreaDensity:-1.0
		numActiveColumnsPerInhArea:10.0
		stimulusThreshold:0.0
		synPermInactiveDec:0.008
		synPermActiveInc:0.05
		synPermConnected:0.1
		synPermBelowStimulusInc:0.01
		synPermTrimThreshold:0.05
		minPctOverlapDutyCycles:0.001
		minPctActiveDutyCycles:0.001
		dutyCyclePeriod:1000
		maxBoost:10.0
		wrapAround:true
	}
	Temporal: {
		columnDimensions:[2048]
		cellsPerColumn:32
		activationThreshold:13
		learningRadius:2048
		minThreshold:10
		maxNewSynapseCount:20
		maxSynapsesPerSegment:255
		maxSegmentsPerCell:255
		initialPermanence:0.21
		connectedPermanence:0.5
		permanenceIncrement:0.1
		permanenceDecrement:0.1
		predictedSegmentDecrement:0.0
	}
	Other: {
		random:org.numenta.nupic.util.UniversalRandom@16b4a017
		seed:42
		n:500
		w:21
		minVal:0.0
		maxVal:1000.0
		radius:21.0
		resolution:1.0
		periodic:false
		clipInput:false
		forced:false
		fieldName:UNSET
		fieldType:int
		encoderType:ScalarEncoder
		fieldEncodings:{value={fieldName=value, fieldType=float, resolution=0.01, encoderType=RandomDistributedScalarEncoder}, timestamp={fieldName=timestamp, formatPattern=yyyy-MM-dd HH:mm:ss, fieldType=datetime, encoderType=DateEncoder, timeOfDay='21':1.0}}
		hasClassifiers:true
		inferredFields:{value=class org.numenta.nupic.algorithms.CLAClassifier}
	}
}

I don’t have much experience in HTM.I set up those parameters by following Network examples and some documents.
Thank you in advance for anyone’s help


#3

Update:
My bad. I missed the TemporalMemory. Add this component to the Network and re-run the model.

record_num :10271 raw_value :26000.0 prediction :26579.4935567617 raw_score :0.0
record_num :10272 raw_value :25778.0 prediction :26405.64548973319 raw_score :0.025
record_num :10273 raw_value :23304.0 prediction :26217.35184281323 raw_score :0.0
record_num :10274 raw_value :21318.0 prediction :25343.34628996926 raw_score :0.0
record_num :10275 raw_value :19024.0 prediction :24135.74240297848 raw_score :0.0
record_num :10276 raw_value :17022.0 prediction :22602.21968208494 raw_score :0.025
record_num :10277 raw_value :14733.0 prediction :20928.153777459454 raw_score :0.0
record_num :10278 raw_value :12593.0 prediction :19069.607644221614 raw_score :0.0
record_num :10279 raw_value :11048.0 prediction :4582.055976811634 raw_score :0.425
record_num :10280 raw_value :9364.0 prediction :4582.055976811634 raw_score :1.0
record_num :10281 raw_value :5209.0 prediction :6016.639183768143 raw_score :0.0
record_num :10282 raw_value :3683.0 prediction :5774.3474286376995 raw_score :0.025
record_num :10283 raw_value :3329.0 prediction :5146.943200046389 raw_score :0.0
record_num :10284 raw_value :3714.0 prediction :4601.560240032472 raw_score :0.0
record_num :10285 raw_value :4531.0 prediction :4335.292168022731 raw_score :0.0
record_num :10286 raw_value :4803.0 prediction :4394.004517615911 raw_score :0.025
record_num :10287 raw_value :7049.0 prediction :4516.703162331138 raw_score :0.05
record_num :10288 raw_value :8363.0 prediction :5276.392213631796 raw_score :0.025
record_num :10289 raw_value :11899.0 prediction :17126.62535095513 raw_score :0.05

The raw_anomaly_score looks fine. Something I can hardly understand is that how can the raw_anomaly_score be 0.0 when the prediction is far deviated from the raw_value.
Example:
record_num :10275 raw_value :19024.0 prediction :24135.74240297848 raw_score :0.0

By the way, I got the prediction value by calling

inference.getClassification("value").getMostProbableValue(1);

And set the AUTO_CLASSIFY to TRUE by

alterParameter(Parameters.KEY.AUTO_CLASSIFY, Boolean.TRUE)