Regarding the index codes used for short term memory: This jibes well with what I have been learning recently with respect to sparse representation theory as it is applied to image processing applications. I will take some liberties with those concepts and try to apply them to HTM/TBT.
An agent possesses a set of (Hebbian) learned filters that are representative of commonly occurring input patterns (e.g. edges, corners, textures, etc.). These are collectively referred to as the dictionary (or over-specified basis set) and the individual filters are referred to as atoms (basis functions). Initially this dictionary is not very good, but the agent attempts to encode new sensory input as a sparse linear combination of these atoms. So, the top k-matching-atoms can be composed together to form an approximation of the input signal. At some point, the agent is permitted to update its dictionary, usually in an offline context (e.g. while sleeping). During this process, the filters are adjusted so that they more accurately and efficiently represent the most commonly occurring input signals. Besides improving the overall accuracy, this consolidation also enables the agent to form better approximations using fewer atoms (i.e. sparser representations).
I’ve also been reading through the work of Laurent Perrinet (https://laurentperrinet.github.io/), and he has really taken this approach to the next level with his biologically inspired vision algorithms. He has papers back to 2010 and earlier which describe a process whereby certain non-linear attenuation functions can be employed to improve the distribution of representations across the filters in such a way as to ensure that they all have very similar probabilities of being activate at any given time. This jumped right out at me as being essentially a boosting operation.
In more recent work, he’s published some articles with Karl Friston that begin to touch on the area of active sensing (i.e. sensor-motor interactions) with the goal of motor movements being to gather more information with the sensors in such a way as to minimize surprise. In other words, an agent moves to gather additional data and context to reduce the amount of uncertainty in its predictions about what is coming next. If I’m understanding and interpreting their formulation correctly; they are treating representational efficiency (i.e. sparseness) as a proxy for predictability (lack of surprise). They refer to it as minimizing Free Energy.
After listening to Florian describe short term memory late in this interview, I began to see how this sparse representation theory could be applied. Consider this: an HTM temporal memory encodes, stores, and predicts the set of coefficients used to address the stored atomic filters. The association between the stored filters with the active SDR pattern in the TM is then used to drive predictions about future input signals. The degree to which these predictions are satisfied (or not) can then be used to further refine the filters at a later time by examining the most recent updates to the HTM permanences.
I’d love to hear any feedback you might have on these thoughts.