I am currently just trying to observe how the predictability of my data changes when I manually add different fields with different levels of correlation to my target field. I’m currently using the simple Hot Gym example script as a template (https://github.com/numenta/nupic/blob/master/examples/opf/clients/hotgym/simple/hotgym.py) and am monitoring the hourly altMAPE (data resolution is 5 seconds). I tried adding the dayOfWeek and Solar_Panel_Voltage_X to the model params file as so (My predicted field is Total_Photo_Current):
However, I get exactly the same altMAPE as before (screenshot below), when I only had Total_Photo_Current and timeOfDay. Am I missing something? Any help would be much appreciated :))
If you grep the NuPIC examples directory, you’ll find examples of using dayOfWeek. Here is one:
Try adding 'dayOfWeek': (21, 1), which informs the OPF how on how to dimension the encoding.
Also, it could be that neither of these new fields make a difference to the algorithm’s ability to predict. We have found many times that fields we thought would contribute to better predictions do not help.
Yeah, in this case i know the Solar Panel Voltage is 70% correlated, so although I didn’t expect a huge change, the fact that I was getting exactly the same % error was suspicious. In the case of dayOfWeek being important, the model’s % error increases every Sunday on a weekly basis, probably due to a mode change that I don’t have telemetry for. So I do expect dayOfWeek to be important here. Ill try what you suggested, thanks :))
I think it must be the script or my useage of the ‘simple’ hotgym example script. It seems it only needs the predicted field itself to work. Even when I remove all the date/time encoders it works exactly the same. I think ill try the ‘prediction’ example script which is more similar to the ‘anomaly’ example script which I have had success using and adding/removing features.
Yeah, in this case the Solar Panel Voltage ranges from 0 to ~4000 and the Current from 0 to ~400 so I guess a higher resolution value for the Voltage would be more suitable.
Here’s how they find the resolution value for each metric in NAB (to my knowledge):
class NumentaTMDetector(NumentaDetector):
"""
This detector uses the implementation of temporal memory in
https://github.com/numenta/nupic.core/blob/master/src/nupic/algorithms/TemporalMemory.hpp.
It differs from its parent detector in temporal memory and its parameters.
"""
def __init__(self, *args, **kwargs):
super(NumentaTMDetector, self).__init__(*args, **kwargs)
def initialize(self):
# Get config params, setting the RDSE resolution
rangePadding = abs(self.inputMax - self.inputMin) * 0.2
modelParams = getScalarMetricWithTimeOfDayAnomalyParams(
metricData=[0],
minVal=self.inputMin-rangePadding,
maxVal=self.inputMax+rangePadding,
minResolution=0.001,
tmImplementation="tm_cpp"
)["modelConfig"]
The ‘minVal’, ‘maxVal’ & ‘minResolution’ inputs are used to calculate the resolution value within getScalarMetricWithTimeOfDayAnomalyParams():
def _fixupRandomEncoderParams(params, minVal, maxVal, minResolution):
"""
Given model params, figure out the correct parameters for the
RandomDistributed encoder. Modifies params in place.
"""
encodersDict = (
params["modelConfig"]["modelParams"]["sensorParams"]["encoders"]
)
for encoder in encodersDict.itervalues():
if encoder is not None:
if encoder["type"] == "RandomDistributedScalarEncoder":
**resolution** = max(minResolution,
(maxVal - minVal) / encoder.pop("numBuckets")
)
encodersDict["c1"]["resolution"] = resolution
This leads me to wonder, how much of each metric’s data is being used in NAB to get the min & max values? Maybe this is the irreducible batch-like element of NuPIC at this point, that you need some chunk to get the min & max from. I use a parameter for what proportion of a metric’s data will be used to find its min & max.
Yes my focus was anomaly detection, but I’m waiting on an external source to validate the anomalies which would allow me to quantify the performance of the model. I thought until that happens I could experiment on predictability, since its pretty easy to quantify a models’ effectiveness in that sense. I also understand that anomalies don’t require as precise predictability, but I still think it helps to understand why certain anomalies are given.