I want to implement HTM algorithm to impute missing data in a data set. I have read the “NuPIC 1.0.5 API Documentation” of HTM algorithm, as it is stated in the document there are two types of classification to predict data categories, but I need to predict actual values of missing data in the desired data set.
Is it possible to implement HTM algorithm to do so?
Would you please introduce a resource to me to know how to do that?
Does the data set need to include “timestamp” the same as “gymdata.csv”?
What data type(s) are you dealing with? Any types that the Classifier can handle can be predicted in their raw data form, and you could use the top prediction as the imputed value. I remember reading the there was no classifier for coordinate data, though if you’re dealing with numeric or categorical it should be doable in the current NuPIC setup.
Thank you! my data types are numeric and categorical in the data set.
Do you mean the result of a defined classifier as the top prediction of data? ( for example SDR classifier)
Right, the classifier will output a set of predictions with different confidence levels, so the highest confidence would be the top. Key to wielding NuPIC in general is to understand the model parameters config structure, which is used to instantiate every model. Here’s an example of one used for anomaly detection:
All you really need to do is replace the ‘TemporalAnomaly’ inferenceType with ‘TemporalMultiStep’, and change the filenames of the encoders plus their min and max values to match your data. That applies to your numeric columns at least. The categorical ones can generate predictions too, they just need to be encoded using the categorical encoder. I’d have a separate NuPIC model for each field you’re doing imputation for.