Deep learning meets sparse distributed representations: The Kanerva Machine



“We present an end-to-end trained memory system that quickly adapts to new data and generates samples like them. Inspired by Kanerva’s sparse distributed memory, it has a robust distributed reading and writing mechanism. The memory is analytically tractable, which enables optimal on-line compression via a Bayesian update-rule. We formulate it as a hierarchical conditional generative model, where memory provides a rich data-dependent prior distribution. Consequently, the top-down memory and bottom-up perception are combined to produce the code representing an observation. Empirically, we demonstrate that the adaptive memory significantly improves generative models trained on both the Omniglot and CIFAR datasets. Compared with the Differentiable Neural Computer (DNC) and its variants, our memory model has greater capacity and is significantly easier to train.”

The Kanerva Machine: A Generative Distributed Memory

Kanerva, Pentti (1988). Sparse Distributed Memory

Results are still not as good as with GANs


The adversarial component that makes GANs work is orthogonal to the contribution proposed here. You could add an adversarial loss to this model to encourage it to generate higher quality samples in the same way that people add adversarial losses to other generative models like regular VAEs. The VAE-GAN is a successful example of this, and the same could be applied to the KM generator.


Fun fact: Pentti Kanerva actually worked for Jeff at one point.

Kanerva, Pentti (1988). Sparse Distributed Memory

WOW! This is excellent. It is easier to understand the world if one has a memory of the world and feed forward perception synergisticly combines with feed back predictions (Hawkins). There is also this paper below, Deepmind papers seem to come in chunks.
Demis is on a decades long march. Immensely impressive.

Demis Hassabis, Dharshan Kumaran, Seralynne D Vann, and Eleanor A Maguire. Patients with hippocampal amnesia cannot imagine new experiences.
Proceedings of the National Academy of Sciences, 104(5):1726–1731, 2007.


Can we use this on a temporal data? Or what will be a limitation?


I think there is more clarity to be had by simplifying neural networks and memory systems to the greatest extent possible. They tend to be more machine efficient anyway.
Can you simplify something that is already simple? Sure you can, whether that is by architectural changes or using bit hacks and ideas that reduce the number of hardware logic gates. There is always something you can do.
Hence I object to complex papers on AI which anyway introduce human designed biases/restrictions into systems that are ultimately very limiting.
I’m kind of looking at ‘spinner’ projection based neural networks at the moment.
Of course you can and should simplify what they have done.
Basically you can have fully connected neural networks with only one weight per neuron if you like. Or a few weights.


I hear you. The reason we are working on DL / HTM mashups are because we want people to pay more attention to the lessons we have learned from the brain. I’m not sure these enhancements we’re uncovering by applying HTM ideas to DL platforms will go anywhere in production systems. But these are ways of re-thinking the current AI landscape, and it helps to get other people thinking in our direction.