Allocentric location inference from feature data

Given a network trained on feature+location data as discussed in the recent SMI paper, is it possible to present a model with only feature data and have it infer the corresponding predicted location data from the input? If so, how would the model be structured?

Also, I haven’t quite grasped how this location data is presented to the input layer, would it be encoded in the same SDR as the feature data? For example, 8 bits of feature data and 8 bits of location data. Going off intuition alone it would seem reasonable to then pass in the same SDR with only the 8 bits of feature data and 8 bits of zeroed location data and let the network infer the location, presumably being the predicted cells for that feature.

I may be under some wrong assumptions, so please excuse any ignorance.

1 Like

Yes, that is what we do in our code supporting the paper, and that’s what’s happening in your brain too.

This is happening naturally in the “input layer” of the SMI circuit we detailed in the paper. Every proximal sensation entering the layer will cause predictions to be made representing where on a library of learned objects that sensation has been felt before.

No, the location data is a distal signal. It is presented to the layer through distal input. If it were encoded in the same SDR as the sensory input, it would be a part of the proximal signal. See How the allocentric locations are encoded for SMI? for similar discussion.

I can add a bit more to what Matt said. How the representation of location is determined is a bit tricky. As we suggest in the paper, grid cells in the entorhinal cortex solve a very similar problem. We have a detailed model for how this is done in the cortex. This model is not described in the latest paper. It will be in our next paper. Our model is based on grid cells.

The basic method to determine the location is to use sensory input and movement. For a single column, sensory data on its own is not sufficient. Sensory input selects a set/union of possible locations that match the input. Movement then predicts a set/union of new locations. Sensory input narrows down the new union of possible locations to those that match the new input. Movement then predicts the next set/union of new locations, etc. A single column can do this until it knows both the object and location. Imagine trying to recognize an object by looking through a straw. You have to move the straw to do this.

Multiple columns working together can help each other determine their locations much faster. In the paper we showed that columns can eliminate possible objects by voting across columns. There is another means of voting that Marcus Lewis described in his talk at the recent HTM meetup. If each column knows where its “location” relative to the body, then they can vote on a shared representation of where the object is relative to the body. If they can determine where the object is relative to the body then each column can determine where it is relative to the object. I believe Marcus’ presentation is posted somewhere.

All of this relies on a trick that grid cells do, which allows them to make predictions of new locations based on movement.We believe there are grid cell equivalents in L6 in the cortex.

The representation of location in L6a is just an SDR. It is presented to the distal synapses on L4 cells. The location SDR is a bit strange to comprehend at first. It is unique to both the object and to the location on the object. You can’t look at two location SDRs and determine where they are relative to each other. Their metric relationship is defined by movement. This is called “path integration” in the entorhinal cortex. It takes a while to fully understand this, but Marcus’ presentation is a good place to start.
Jeff

3 Likes

Much appreciated clarification guys.

Here’s a link to the talk Jeff mentioned for anyone interested: https://youtu.be/c6U4yBfELpU?t=6774
And the corresponding slides here: https://www.dropbox.com/s/kb40zk3jd0yeemi/Recognizing-locations-on-objects.pdf

1 Like