Grid Cell Path Integration For Movement-Based Visual Object Recognition

Oblynx · April 19, 2021, 2:33am

Hi there! I started reading the preprint of the latest Numenta paper on grid cells, and early on they mention an encoder for MNIST images (figure 2).
The encoder is a CNN with a k-winners layer, whose activations are the encoding.

I was wondering if it would make sense to use a sparse autoencoder for the encoding instead.

CollinsEM · April 21, 2021, 12:11am

Yes. One could probably use a sparse autoencoder as long as its output is stable. Did you have a specific algorithm or implementation in mind?

Oblynx · April 26, 2021, 7:31pm

I’m thinking to implement this paper together with the older algorithms I have already written in HTM.jl, probably in July.
Apart from the general interest, I’m starting an anomaly detection project in my day job and it will be an opportunity for me to play with HTM on real data.

Oblynx · April 26, 2021, 7:35pm

As for the autoencoder, I would start here, but instead of regularizing the loss, I would use the same k-winners layer as the paper’s CNN for the latent space.

CollinsEM · April 26, 2021, 11:04pm

A couple of weeks ago, I put together an app to demonstrate a simple image encoder here that utilizes Orthogonal Matching Pursuit. The MNIST digits are broken down into a 4x4 array of 7x7 pixel patches, encoded, and then reconstructed as a linear combination of a very small number of basis filters (atoms). The point of this app was just to show how you could generate a decent quality reconstruction of unseen MNIST digits as a composite image from something as simple as a small number of random sub-samples (atoms) taken directly from the target image set. At the moment, the filters are static (i.e. there is no learning enabled yet).

You can adjust how many atoms are composed together within each patch using the slider in the control box.

As a point of comparison: if you click on “random atoms” in the control box, the dictionary will repopulate with randomly generated atoms. The performance of the fitting algorithm is dramatically reduced. With this comparison I was trying to demonstrate just how much useful information we can extract from simply sampling a relatively few features from the target image space to use as a set of basis filters. This gives me hope that this technique can be useful for one-shot / few-shot learning.

The next step would be to implement a form of Hebbian learning or K-means averaging to improve the quality of the filters (atoms). Right now all of the atoms are just segments of MNIST images, but ideally we would like for some of them to start adapting to model the features of the image residuals after the first atom has been removed from the input. Ultimately, I’d like to work out an algorithm that can learn/adapt to efficiently encode the relevant features of the image data set and the residuals.

Well, I just gave a final exam to my students today. So, hopefully life will begin to calm down enough for me to get back to work on this project.

Topic		Replies	Views
Proof of concept: Trainable universal encoder architecture Engineering encoders	1	1321	July 17, 2017
Numenta Research Meeting - October 5, 2020 Current Research grid-cells , niels-leadholm	4	628	October 13, 2020
Grid Cell Inspired Scalar Encoder Engineering encoders , grid-cells	21	4220	March 15, 2021
Grid Cells @ Deepmind: Emergence of grid-like representations by training recurrent neural networks to perform spatial localization Related Papers	3	1094	November 13, 2019
Marcus Lewis on Using Grid Cells as a Prediction-Enabling Basis - December 21, 2020 Current Research	5	1219	February 3, 2021

Grid Cell Path Integration For Movement-Based Visual Object Recognition

Related topics