The CLA CLassifier
The CLA Classifier is not biologically inspired, but is a useful tool for interpreting the SDR output from the temporal memory and generating predictions. Essentially, it attempts to learn a function of an SDR at time (), such that it produces a probability distribution over the predicted field (), steps into the future:
The CLA Classifier takes the following parameters:
alpha: The value used to compute the moving average. Lower
alphavalues give a longer memory
steps: The set of steps into the future that the classifier will learn and predict for, eg. (1,3,7,12).
To do this, for each predicted step (), the CLA Classifier maintains a mapping of:
This mapping essentially stores a history of input SDRs it has seen, so, given an input, it can refer to the history and determine the probability distribution over the PF from a given input. It does this by:
If we are predicting a categorical value, the moving average array is omitted.
Storing two arrays
A, with shape
N x B, where
Nis the number of bits in the SDR and
Bis the number of buckets on the PF as defined by the input encoding:
H: A histogram that stores the relative frequency of bucketed input values from when its corresponding SDR bit (
n) is active. That is:
A: A moving average of the input values, whose length is defined by
alpha. When this array’s corresponding SDR bit
nis active with a given predicted field value
vthat falls into bucket
b, the array is updated by:
This ensures that when a bucket covers a range of values (ie. non-categorical values), we don’t get a prediction about a particular range, but rather, the average value that fell into that bucket.
For a given input SDR of length with active bits, predictions are generated for each bucket (
b) of the predicted field, at each timestep (
k) by averaging the product of the associated histogram value and moving average table for each active bit:
Thus we have a probability for each bucket of the predicted field, which may end up being very low for all buckets. We can use the bucket given the highest probability as our prediction, or not, depending on the context and the significance of the prediction. For example, the highest prediction may be for 100% engine load with a 0.1% probability, such a low probability would not necessitate the same response that a 95% probability would with the same load.
Another useful property of these predictions, is that they essentially form an ensemble, since each on-bit’s associated prediction makes a small contribution to the final probability distribution.
- The CLA classifier was introduced as an alternative to Reconstruction and has shown better results in general.