Really enjoyed this episode, interesting topic and so many good analogies/illustrations included (excavator mishaps, looking in the fridge for missing food, parrots called Charlie, flashing cards)
Regarding the index codes used for short term memory: This jibes well with what I have been learning recently with respect to sparse representation theory as it is applied to image processing applications. I will take some liberties with those concepts and try to apply them to HTM/TBT.
An agent possesses a set of (Hebbian) learned filters that are representative of commonly occurring input patterns (e.g. edges, corners, textures, etc.). These are collectively referred to as the dictionary (or over-specified basis set) and the individual filters are referred to as atoms (basis functions). Initially this dictionary is not very good, but the agent attempts to encode new sensory input as a sparse linear combination of these atoms. So, the top k-matching-atoms can be composed together to form an approximation of the input signal. At some point, the agent is permitted to update its dictionary, usually in an offline context (e.g. while sleeping). During this process, the filters are adjusted so that they more accurately and efficiently represent the most commonly occurring input signals. Besides improving the overall accuracy, this consolidation also enables the agent to form better approximations using fewer atoms (i.e. sparser representations).
I’ve also been reading through the work of Laurent Perrinet (https://laurentperrinet.github.io/), and he has really taken this approach to the next level with his biologically inspired vision algorithms. He has papers back to 2010 and earlier which describe a process whereby certain non-linear attenuation functions can be employed to improve the distribution of representations across the filters in such a way as to ensure that they all have very similar probabilities of being activate at any given time. This jumped right out at me as being essentially a boosting operation.
In more recent work, he’s published some articles with Karl Friston that begin to touch on the area of active sensing (i.e. sensor-motor interactions) with the goal of motor movements being to gather more information with the sensors in such a way as to minimize surprise. In other words, an agent moves to gather additional data and context to reduce the amount of uncertainty in its predictions about what is coming next. If I’m understanding and interpreting their formulation correctly; they are treating representational efficiency (i.e. sparseness) as a proxy for predictability (lack of surprise). They refer to it as minimizing Free Energy.
After listening to Florian describe short term memory late in this interview, I began to see how this sparse representation theory could be applied. Consider this: an HTM temporal memory encodes, stores, and predicts the set of coefficients used to address the stored atomic filters. The association between the stored filters with the active SDR pattern in the TM is then used to drive predictions about future input signals. The degree to which these predictions are satisfied (or not) can then be used to further refine the filters at a later time by examining the most recent updates to the HTM permanences.
I’d love to hear any feedback you might have on these thoughts.
Thanks for the compliments guys. I just love explaining things.
Yes, I’m a very fascinated by Karl Friston’s work, and the implications it has for advanced ai (active inference !) frameworks that integrate sensory-motor inference to solve both the maximization of reward, and the exploration of the environment as a means to reduce uncertainty without separate mechanisms through the unified principle of minimization of free energy. This makes earlier debates about that balancing act (exploration versus exploitation) obsolete and allows us think anew about neuroanatomical correlates of the underlying mathematical description. I highly, highly recommend his CCN-2016 Talk: https://www.youtube.com/watch?v=b1hEc6vay_k
A lot of this has implications for working memory delay activity that I haven’t had time to wrap my head around yet. Its clear that the minimization of variational free energy needs a bunch of index-like fast buffers to work properly, and I feel it’s a useful lens onto much of whats happening lately at the intersection of cognitive science and computational neuroscience. It seems to me that ML companies are a bit asleep at the switch here, and should hire more computational cognitive neuroscientists if they wanted to get ahead of the soon ending amortization of “Deep Neural Networks” (which should more properly be referred to as “compositional function approximators”). Its an exciting time to join this field though.