Issue recognising multiple inputs for prediction

mh01223 · July 17, 2019, 8:43am

Hi,

I am currently just trying to observe how the predictability of my data changes when I manually add different fields with different levels of correlation to my target field. I’m currently using the simple Hot Gym example script as a template (https://github.com/numenta/nupic/blob/master/examples/opf/clients/hotgym/simple/hotgym.py) and am monitoring the hourly altMAPE (data resolution is 5 seconds). I tried adding the dayOfWeek and Solar_Panel_Voltage_X to the model params file as so (My predicted field is Total_Photo_Current):

and simply added another modelInput to the run file:

However, I get exactly the same altMAPE as before (screenshot below), when I only had Total_Photo_Current and timeOfDay. Am I missing something? Any help would be much appreciated :))

rhyolight · July 17, 2019, 2:55pm

If you grep the NuPIC examples directory, you’ll find examples of using dayOfWeek. Here is one:

github.com

numenta/nupic/blob/master/examples/opf/experiments/opfrunexperiment_test/simpleOPF/hotgym_1hr_agg/description.py#L135-L138


u'timestamp_dayOfWeek':     {   'dayOfWeek': (21, 1),
  'fieldname': u'timestamp',
  'name': u'timestamp_dayOfWeek',
  'type': 'DateEncoder'},

Try adding 'dayOfWeek': (21, 1), which informs the OPF how on how to dimension the encoding.

Also, it could be that neither of these new fields make a difference to the algorithm’s ability to predict. We have found many times that fields we thought would contribute to better predictions do not help.

sheiser1 · July 17, 2019, 5:16pm

I wonder if the RDSE resolutions should be sensitive to the different fields’ distributions?

mh01223 · July 17, 2019, 8:44pm

Yeah, in this case i know the Solar Panel Voltage is 70% correlated, so although I didn’t expect a huge change, the fact that I was getting exactly the same % error was suspicious. In the case of dayOfWeek being important, the model’s % error increases every Sunday on a weekly basis, probably due to a mode change that I don’t have telemetry for. So I do expect dayOfWeek to be important here. Ill try what you suggested, thanks :))

mh01223 · July 18, 2019, 12:54pm

I think it must be the script or my useage of the ‘simple’ hotgym example script. It seems it only needs the predicted field itself to work. Even when I remove all the date/time encoders it works exactly the same. I think ill try the ‘prediction’ example script which is more similar to the ‘anomaly’ example script which I have had success using and adding/removing features.

mh01223 · July 18, 2019, 12:57pm

Yeah, in this case the Solar Panel Voltage ranges from 0 to ~4000 and the Current from 0 to ~400 so I guess a higher resolution value for the Voltage would be more suitable.

rhyolight · July 18, 2019, 2:47pm

Are you trying to make predictions are get anomalies? NuPIC is much better getting anomalies than making predictions.

sheiser1 · July 18, 2019, 7:24pm

Here’s how they find the resolution value for each metric in NAB (to my knowledge):

class NumentaTMDetector(NumentaDetector):
  """
  This detector uses the implementation of temporal memory in
  https://github.com/numenta/nupic.core/blob/master/src/nupic/algorithms/TemporalMemory.hpp.
  It differs from its parent detector in temporal memory and its parameters.
  """

  def __init__(self, *args, **kwargs):

    super(NumentaTMDetector, self).__init__(*args, **kwargs)


  def initialize(self):
    # Get config params, setting the RDSE resolution
    rangePadding = abs(self.inputMax - self.inputMin) * 0.2

    modelParams = getScalarMetricWithTimeOfDayAnomalyParams(
      metricData=[0],
      minVal=self.inputMin-rangePadding,
      maxVal=self.inputMax+rangePadding,
      minResolution=0.001,
      tmImplementation="tm_cpp"
    )["modelConfig"]

The ‘minVal’, ‘maxVal’ & ‘minResolution’ inputs are used to calculate the resolution value within getScalarMetricWithTimeOfDayAnomalyParams():

def _fixupRandomEncoderParams(params, minVal, maxVal, minResolution):
  """
  Given model params, figure out the correct parameters for the
  RandomDistributed encoder. Modifies params in place.
  """
  encodersDict = (
    params["modelConfig"]["modelParams"]["sensorParams"]["encoders"]
  )

  for encoder in encodersDict.itervalues():
    if encoder is not None:
      if encoder["type"] == "RandomDistributedScalarEncoder":
        **resolution** = max(minResolution,
                         (maxVal - minVal) / encoder.pop("numBuckets")
                        )
        encodersDict["c1"]["resolution"] = resolution

This leads me to wonder, how much of each metric’s data is being used in NAB to get the min & max values? Maybe this is the irreducible batch-like element of NuPIC at this point, that you need some chunk to get the min & max from. I use a parameter for what proportion of a metric’s data will be used to find its min & max.

mh01223 · July 18, 2019, 8:16pm

Yes my focus was anomaly detection, but I’m waiting on an external source to validate the anomalies which would allow me to quantify the performance of the model. I thought until that happens I could experiment on predictability, since its pretty easy to quantify a models’ effectiveness in that sense. I also understand that anomalies don’t require as precise predictability, but I still think it helps to understand why certain anomalies are given.

Topic		Replies	Views
Doubt: Predicted field, anomaly Likelihood and multiple inputs NuPIC	13	1948	March 30, 2017
Run model without Predicted field input NuPIC question	3	993	February 24, 2017
Anomaly detection for multi features NuPIC	15	1873	May 15, 2019
Anamoly Model Detection NuPIC	10	930	January 23, 2019
Predict anomalies on more than a single input field NuPIC htm	2	770	November 8, 2018

Issue recognising multiple inputs for prediction

Related topics