Imputing numeric missing values using HTM algorithm

SH.Bayan · June 10, 2019, 5:03pm

Dear all

I want to implement HTM algorithm to impute missing data in a data set. I have read the “NuPIC 1.0.5 API Documentation” of HTM algorithm, as it is stated in the document there are two types of classification to predict data categories, but I need to predict actual values of missing data in the desired data set.

Is it possible to implement HTM algorithm to do so?

Would you please introduce a resource to me to know how to do that?

Does the data set need to include “timestamp” the same as “gymdata.csv”?

Best regards,

Shaghayegh Bayan

sheiser1 · June 10, 2019, 5:45pm

What data type(s) are you dealing with? Any types that the Classifier can handle can be predicted in their raw data form, and you could use the top prediction as the imputed value. I remember reading the there was no classifier for coordinate data, though if you’re dealing with numeric or categorical it should be doable in the current NuPIC setup.

SH.Bayan · June 10, 2019, 6:36pm

Thank you! my data types are numeric and categorical in the data set.
Do you mean the result of a defined classifier as the top prediction of data? ( for example SDR classifier)

sheiser1 · June 10, 2019, 7:36pm

Right, the classifier will output a set of predictions with different confidence levels, so the highest confidence would be the top. Key to wielding NuPIC in general is to understand the model parameters config structure, which is used to instantiate every model. Here’s an example of one used for anomaly detection:

github.com

numenta/nupic/blob/master/examples/opf/clients/hotgym/anomaly/one_gym/model_params/rec_center_hourly_model_params.py

MODEL_PARAMS = \
{ 'aggregationInfo': { 'days': 0,
                       'fields': [],
                       'hours': 0,
                       'microseconds': 0,
                       'milliseconds': 0,
                       'minutes': 0,
                       'months': 0,
                       'seconds': 0,
                       'weeks': 0,
                       'years': 0},
  'model': 'HTMPrediction',
  'modelParams': { 'anomalyParams': { u'anomalyCacheRecords': None,
                                      u'autoDetectThreshold': None,
                                      u'autoDetectWaitRecords': None},
                   'clParams': { 'alpha': 0.01962508905154251,
                                 'verbosity': 0,
                                 'regionName': 'SDRClassifierRegion',
                                 'steps': '1'},
                   'inferenceType': 'TemporalAnomaly',

This file has been truncated. show original

All you really need to do is replace the ‘TemporalAnomaly’ inferenceType with ‘TemporalMultiStep’, and change the filenames of the encoders plus their min and max values to match your data. That applies to your numeric columns at least. The categorical ones can generate predictions too, they just need to be encoded using the categorical encoder. I’d have a separate NuPIC model for each field you’re doing imputation for.

SH.Bayan · June 13, 2019, 12:06pm

Thank you so much.

Topic		Replies	Views
Prediction results for HTM rec-cent data set Implementations htm-implementations , matlab	8	1352	October 6, 2017
Classification and Predict value of Output class HTM.Java question	7	1531	December 13, 2017
Integrating aspects of the HTM Algorithm Engineering	8	799	January 1, 2019
Strange behaviour of predictions for decoder/encoder parameters Engineering question , community	2	577	October 27, 2017
Anamoly detection with HTM NuPIC	2	819	January 22, 2018

Imputing numeric missing values using HTM algorithm

Related topics