Issue recognising multiple inputs for prediction

Hi,

I am currently just trying to observe how the predictability of my data changes when I manually add different fields with different levels of correlation to my target field. I’m currently using the simple Hot Gym example script as a template (https://github.com/numenta/nupic/blob/master/examples/opf/clients/hotgym/simple/hotgym.py) and am monitoring the hourly altMAPE (data resolution is 5 seconds). I tried adding the dayOfWeek and Solar_Panel_Voltage_X to the model params file as so (My predicted field is Total_Photo_Current):


and simply added another modelInput to the run file:

However, I get exactly the same altMAPE as before (screenshot below), when I only had Total_Photo_Current and timeOfDay. Am I missing something? Any help would be much appreciated :))

1 Like

If you grep the NuPIC examples directory, you’ll find examples of using dayOfWeek. Here is one:

Try adding 'dayOfWeek': (21, 1), which informs the OPF how on how to dimension the encoding.

Also, it could be that neither of these new fields make a difference to the algorithm’s ability to predict. We have found many times that fields we thought would contribute to better predictions do not help.

2 Likes

I wonder if the RDSE resolutions should be sensitive to the different fields’ distributions?

1 Like

Yeah, in this case i know the Solar Panel Voltage is 70% correlated, so although I didn’t expect a huge change, the fact that I was getting exactly the same % error was suspicious. In the case of dayOfWeek being important, the model’s % error increases every Sunday on a weekly basis, probably due to a mode change that I don’t have telemetry for. So I do expect dayOfWeek to be important here. Ill try what you suggested, thanks :))

I think it must be the script or my useage of the ‘simple’ hotgym example script. It seems it only needs the predicted field itself to work. Even when I remove all the date/time encoders it works exactly the same. I think ill try the ‘prediction’ example script which is more similar to the ‘anomaly’ example script which I have had success using and adding/removing features.

Yeah, in this case the Solar Panel Voltage ranges from 0 to ~4000 and the Current from 0 to ~400 so I guess a higher resolution value for the Voltage would be more suitable.

Are you trying to make predictions are get anomalies? NuPIC is much better getting anomalies than making predictions.

Here’s how they find the resolution value for each metric in NAB (to my knowledge):

class NumentaTMDetector(NumentaDetector):
  """
  This detector uses the implementation of temporal memory in
  https://github.com/numenta/nupic.core/blob/master/src/nupic/algorithms/TemporalMemory.hpp.
  It differs from its parent detector in temporal memory and its parameters.
  """

  def __init__(self, *args, **kwargs):

    super(NumentaTMDetector, self).__init__(*args, **kwargs)


  def initialize(self):
    # Get config params, setting the RDSE resolution
    rangePadding = abs(self.inputMax - self.inputMin) * 0.2

    modelParams = getScalarMetricWithTimeOfDayAnomalyParams(
      metricData=[0],
      minVal=self.inputMin-rangePadding,
      maxVal=self.inputMax+rangePadding,
      minResolution=0.001,
      tmImplementation="tm_cpp"
    )["modelConfig"]

The ‘minVal’, ‘maxVal’ & ‘minResolution’ inputs are used to calculate the resolution value within getScalarMetricWithTimeOfDayAnomalyParams():

def _fixupRandomEncoderParams(params, minVal, maxVal, minResolution):
  """
  Given model params, figure out the correct parameters for the
  RandomDistributed encoder. Modifies params in place.
  """
  encodersDict = (
    params["modelConfig"]["modelParams"]["sensorParams"]["encoders"]
  )

  for encoder in encodersDict.itervalues():
    if encoder is not None:
      if encoder["type"] == "RandomDistributedScalarEncoder":
        **resolution** = max(minResolution,
                         (maxVal - minVal) / encoder.pop("numBuckets")
                        )
        encodersDict["c1"]["resolution"] = resolution

This leads me to wonder, how much of each metric’s data is being used in NAB to get the min & max values? Maybe this is the irreducible batch-like element of NuPIC at this point, that you need some chunk to get the min & max from. I use a parameter for what proportion of a metric’s data will be used to find its min & max.

1 Like

Yes my focus was anomaly detection, but I’m waiting on an external source to validate the anomalies which would allow me to quantify the performance of the model. I thought until that happens I could experiment on predictability, since its pretty easy to quantify a models’ effectiveness in that sense. I also understand that anomalies don’t require as precise predictability, but I still think it helps to understand why certain anomalies are given.

1 Like