Grid Cell Path Integration For Movement-Based Visual Object Recognition

Hi there! I started reading the preprint of the latest Numenta paper on grid cells, and early on they mention an encoder for MNIST images (figure 2).
The encoder is a CNN with a k-winners layer, whose activations are the encoding.

I was wondering if it would make sense to use a sparse autoencoder for the encoding instead.

Yes. One could probably use a sparse autoencoder as long as its output is stable. Did you have a specific algorithm or implementation in mind?


I’m thinking to implement this paper together with the older algorithms I have already written in HTM.jl, probably in July.
Apart from the general interest, I’m starting an anomaly detection project in my day job and it will be an opportunity for me to play with HTM on real data.

As for the autoencoder, I would start here, but instead of regularizing the loss, I would use the same k-winners layer as the paper’s CNN for the latent space.

A couple of weeks ago, I put together an app to demonstrate a simple image encoder here that utilizes Orthogonal Matching Pursuit. The MNIST digits are broken down into a 4x4 array of 7x7 pixel patches, encoded, and then reconstructed as a linear combination of a very small number of basis filters (atoms). The point of this app was just to show how you could generate a decent quality reconstruction of unseen MNIST digits as a composite image from something as simple as a small number of random sub-samples (atoms) taken directly from the target image set. At the moment, the filters are static (i.e. there is no learning enabled yet).

You can adjust how many atoms are composed together within each patch using the slider in the control box.

As a point of comparison: if you click on “random atoms” in the control box, the dictionary will repopulate with randomly generated atoms. The performance of the fitting algorithm is dramatically reduced. With this comparison I was trying to demonstrate just how much useful information we can extract from simply sampling a relatively few features from the target image space to use as a set of basis filters. This gives me hope that this technique can be useful for one-shot / few-shot learning.

The next step would be to implement a form of Hebbian learning or K-means averaging to improve the quality of the filters (atoms). Right now all of the atoms are just segments of MNIST images, but ideally we would like for some of them to start adapting to model the features of the image residuals after the first atom has been removed from the input. Ultimately, I’d like to work out an algorithm that can learn/adapt to efficiently encode the relevant features of the image data set and the residuals.

Well, I just gave a final exam to my students today. So, hopefully life will begin to calm down enough for me to get back to work on this project.