Why can't I encode time in very small increments with DateEncoder?

rhyolight · September 2, 2016, 4:05pm

This question came up on Github, but I thought I would answer here.

The type of patterns that datetime encoding encodes are things like “time of day”, “day of week”, “is weekend”, “minute of hour”, etc. These are human concepts (for the most part), and we are projecting our human organization of time (hours, weeks, minutes) into SDRs. That is great for timing that works at a scale that humans can rationalize about.

When you get into sub-second timing, this idea of encoding breaks down.

There may be sub-second patterns in your data that HTM could pick up, but it wouldn’t need additional semantic encodings (like “thousandth of second”) because the patterns within the data would not necessarily align with these human concepts. Computer generated data that comes many times per second should not be encoded with our standard datetime encoding mechanism.

Also, there is the scaling problem. In our laptops, NuPIC takes something like 20ms average to process one point of data, so of you are expecting to live-stream nanosecond-frequency data into an HTM system today, it is going to get bogged down very quickly and will not be able to keep up.

Austin_Marshall · September 2, 2016, 4:27pm

Technically, you can add the nanosecond component separately as either a scalar or rdse. The “time of day” component of the date encoder is a scalar encoder, after all. That is, after you’ve addressed the question of whether or not you need it in the first place, independent of usage of date encoder (which may not be necessary, given the use-case).

In other words, if there are patterns that require sub-second resolution, then use a traditional scalar encoder and ignore the date encoder. Of course, if you’re processing the data in real time, you might have performance issues, as Matt points out.

Komal · September 8, 2016, 2:45am

Thank you for your suggestion.

Komal · September 8, 2016, 2:47am

Thank you for your suggestion. I am planning to do Distributed Denial of Service attack detection using NuPIC and have dataset with time field in sub seconds. So, if I don’t take time field value while building model, do I have to use NonTemporalAnomaly ?

sheiser1 · September 8, 2016, 3:56pm

I’d like to get in line with a follow-up question as well as my project has a related issue. In a nutshell, the data that I have comes in with 1 metric 50 times per second but in the wrong format. There are 2 columns, one for the metric and one for time, with time in fractions of a second, so the first time step is at .02 seconds, then .04, .06, .08 etc (hence 50 per second). I have a script which these time steps to the DateTime format friendly to NuPIC. I hadn’t realized tho that it makes a difference to NuPIC how far apart the time steps are. I thought it just mattered what their sequential order was, so in my conversion each time step (representing 1/50 of a second) is given its own day (as if each value is coming in once daily).

Question is: Will this effect my results? Again I hadn’t realized that it mattered whether the time steps were one second, one day or one year apart, as long as they took place in the same order. Thanks!!

– Sam

rhyolight · September 8, 2016, 4:10pm

Yes, this will absolutely affect your results. If you are tricking NuPIC into thinking that one second of data is actually occurring over 50 days, it will be encoding weekly and monthly semantics into that data that don’t mean anything. This will mask the semantic encodings you actually want to convey with meaningless encodings that will just confuse the system.

I have not experimented with any sub-second data streams in NUPIC, but I would suggest that if you are using them, don’t use a dateencoder at all.

sheiser1 · September 9, 2016, 6:02pm

That’s critical to know, thank you! Just to verify that I have it right, since my sampling rate is sub-second I should skip the DateEncoder entirely, and I’ll do this by simply passing in the metric values one after another with no timestamp column? Thanks!!

rhyolight · September 9, 2016, 7:14pm

Yes, just be sure that the readings are at the same interval. It won’t make sense unless the same time interval is between each row you are feeding in.

sheiser1 · September 9, 2016, 10:23pm

They are yes. Thanks again Matt!!

sheiser1 · October 3, 2016, 3:57pm

Hi Matt and everybody,
So I just got NuPIC running without the datetime column or date encoder, and I want to make sure I’m doing it in a viable way (which I may well not be). Here are the changes I made to the runIoThroughNupic function in the run.py file. I basically just commented out anything with ‘timestamp’. When I removed ‘timestamp’ from the output.write() function I got an error for missing argument, so instead of deleting it outright I just replaced it with a blank space ‘___’. I can’t imagine this is the best way to do this, just a quick patch to get it running, which it did at least.

def runIoThroughNupic(inputData, model, gymName, plot):
  """
  Handles looping over the input data and passing each row into the given model
  object, as well as extracting the result object and passing it into an output
  handler.
  :param inputData: file path to input data CSV
  :param model: OPF Model object
  :param gymName: Gym name, used for output handler naming
  :param plot: Whether to use matplotlib or not. If false, uses file output.
  """
  inputFile = open(inputData, "rb")
  csvReader = csv.reader(inputFile)
  # skip header rows
  csvReader.next()
  csvReader.next()
  csvReader.next()

  shifter = InferenceShifter()
  if plot:
    output = nupic_anomaly_output.NuPICPlotOutput(gymName)
  else:
    output = nupic_anomaly_output.NuPICFileOutput(gymName)

  counter = 0
  for row in csvReader:
    counter += 1
    if (counter % 100 == 0):
      print "Read %i lines..." % counter
    #timestamp = datetime.datetime.strptime(row[0], DATE_FORMAT)
    consumption = float(row[0])  #float(row[1])
    result = model.run({
      #"timestamp": timestamp,
      "kw_energy_consumption": consumption
    })

    if plot:
      result = shifter.shift(result)

    prediction = result.inferences["multiStepBestPredictions"][1]
    anomalyScore = result.inferences["anomalyScore"]
    output.write('___', consumption, prediction, anomalyScore)  #(timestamp, consumption, prediction, anomalyScore)

  inputFile.close()
  output.close()

I also deleted the date encoders from the model_params.py file, which now looks like this. Is this how you’d actually do this? Also I’m sure there’s a better way to post cost I just forget what it is, so if there’s a preferred way to view please let me know and I’ll have it up. Thanks again!!

MODEL_PARAMS = \
{ 'aggregationInfo': { 'days': 0,
                       'fields': [],
                       'hours': 0,
                       'microseconds': 0,
                       'milliseconds': 0,
                       'minutes': 0,
                       'months': 0,
                       'seconds': 0,
                       'weeks': 0,
                       'years': 0},
  'model': 'CLA',
  'modelParams': { 'anomalyParams': { u'anomalyCacheRecords': None,
                                      u'autoDetectThreshold': None,
                                      u'autoDetectWaitRecords': None},
                   'clParams': { 'alpha': 0.01962508905154251,
                                 'verbosity': 0,
                                 'regionName': 'SDRClassifierRegion',
                                 'steps': '1'},
                   'inferenceType': 'TemporalAnomaly',
                   'sensorParams': { 'encoders': { '_classifierInput': { 'classifierOnly': True,
                                                                         'clipInput': True,
                                                                         'fieldname': 'kw_energy_consumption',
                                                                         'maxval': 1900.0,
                                                                         'minval': 250.0,
                                                                         'n': 115,
                                                                         'name': '_classifierInput',
                                                                         'type': 'ScalarEncoder',
                                                                         'w': 21},
                                                   u'kw_energy_consumption': { 'clipInput': True,
                                                                               'fieldname': 'kw_energy_consumption',
                                                                               'maxval': 1900.0,
                                                                               'minval': 250.0,
                                                                               'n': 29,
                                                                               'name': 'kw_energy_consumption',
                                                                               'type': 'ScalarEncoder',
                                                                               'w': 21}},
                                     'sensorAutoReset': None,
                                     'verbosity': 0},
                   'spEnable': True,
                   'spParams': { 'columnCount': 2048,
                                 'globalInhibition': 1,
                                 'inputWidth': 0,
                                 'maxBoost': 2.0,
                                 'numActiveColumnsPerInhArea': 40,
                                 'potentialPct': 0.8,
                                 'seed': 1956,
                                 'spVerbosity': 0,
                                 'spatialImp': 'cpp',
                                 'synPermActiveInc': 0.05,
                                 'synPermConnected': 0.1,
                                 'synPermInactiveDec': 0.08568228006654939},
                   'tpEnable': True,
                   'tpParams': { 'activationThreshold': 12,
                                 'cellsPerColumn': 32,
                                 'columnCount': 2048,
                                 'globalDecay': 0.0,
                                 'initialPerm': 0.21,
                                 'inputWidth': 2048,
                                 'maxAge': 0,
                                 'maxSegmentsPerCell': 128,
                                 'maxSynapsesPerSegment': 32,
                                 'minThreshold': 10,
                                 'newSynapseCount': 20,
                                 'outputType': 'normal',
                                 'pamLength': 1,
                                 'permanenceDec': 0.1,
                                 'permanenceInc': 0.1,
                                 'seed': 1960,
                                 'temporalImp': 'cpp',
                                 'verbosity': 0},
                   'trainSPNetOnlyIfRequested': False},
  'predictAheadTime': None,
  'version': 1}

Topic		Replies	Views
How to use DateEncoder for the data which has been generated persecond NuPIC encoders	2	760	August 29, 2018
HTM for fast moving dataset Getting Started question	2	553	April 6, 2020
SDR for numeric time series NuPIC	16	853	January 28, 2019
HTM School Episode 6: Datetime Encoding YouTube encoders	3	1458	June 27, 2016
Time & SDR'S & Structures Numenta Theory	6	659	July 8, 2018

Why can't I encode time in very small increments with DateEncoder?

Related topics