First time poster and a Math & Stats student that’s very new to neuroscience, let me know if I’ve put this under the wrong topic or have misunderstood something about the Thousand Brains Theory.
So I’ve recently read the “Locations in the Neocortex” paper and I get the sense that a lot of what the paper stipulates can be put in terms of coding theory. The purpose of that would be to create a mathematical framework of this form intelligent inference that could be used to lead theoretical results on the computational side of cognitive science.
For instance, there is a large body of literature by GOFAI researchers, cognitive scientists, and robotics dealing with how to create, say, rule-based/computationalist models of intelligent reasoning. Creating a mathematical generalization of Numenta’s TBT is then something that could be really appealing to those audiences, because it deals a lot with encodings and representations that can be designed at a high level as opposed to the ‘cram it into an ML model and cross your fingers’ approach to AI that’s getting most of the attention today.
What I want to argue here is, the framework described in the Locations paper has a very compelling number of similarities to ideas in (mathematical) coding theory, and perhaps introducing the results of coding theory could potentially inform future research on the subject of how these cortical mechanisms work, including theoretical bounds of error detecting & error correcting of these mechanisms, and perhaps computational models extending into other application domains in AI.
To explain why I think the theory can be generalized, my high-level understanding of the model in the Locations paper is:
- The brain has natural mechanisms to encode arbitrary dimensional continuous spaces into finite encodings to a certain degree of precision. The mechanism that does this is the sensory input to the sensory layer (SL) being encoded in grid cell modules in the location layer (LL).
- When a movement happens and no grid cells are active in the SL, nothing happens. However, when they are active, information about the movements are fed into the LL which updates the grid cell modules to make an encoded prediction about the next sensory input. This prediction is fed back to the SL.
- From the intersection of the activations of the now decoded prediction and a new sensory input to the SL, the model can decide on a unique representation of the identity of the object given that no ambiguity arises from this pattern.
What I believe to be the essence of the model in terms of coding theory, is then:
A) We have the sensory features of the object that we want to encode from the world and decode into the brain, where the message is the identity of that object.
B) We have a source code that represents our input into the sensory layer that represents the space we want to make predictions on (say, an array of binary values representing sensory input to the model). We also have a channel code over a finite field that corresponds to our location layer (grid cell modules).
C) When an action is taken, the model knows how to calculate the difference in the encoded messages in the channel code based on this action, and can therefore calculate an updated representation of what is supposed to be the next message in terms of the channel code (recall the channel code is the state of the location layer, and the source code is the state of the sensory layer). In the case of the TBT, this is the ‘motor’ aspect of the sensorimotor input to the model updating the representation in the grid cell modules in the location layer.
D) With this updated representation in the channel code, it can be decoded back into what the new input is predicted to be in the original source code (i.e. the input layer represented as binary values), and take the similarity between this predicted input and the new input by the intersection of both those patterns in the source code.
E) Using this intersection, we can form a uniquely decoded message (the identity of the object) given that no ambiguity arises between two possible messages in this code.
If we were to conceptualize this process at a high level using coding theory, we could say that:
Identity of an object in the world (Message) ->
Sensory features of the object in the world (Source encoding I) ->
Neural representation of sensory input (Source encoding II) ->
Location layer representation (Channel encoding)
Updated location from movement (Channel decoding I) ->
Intersection of new input with prediction (Channel decoding II) ->
Neural representation of object’s identity (Source decoding) ->
Identity of object in the brain (Decoded message)
Using this framework, we can prove additional formal results about the model in more abstract ways. Maybe some of the members here are more well versed in coding theory or TBT and can critique or help extend the analogy. I know this is very hand-wavy in terms of the actual mathematics, but let me know what you think nonetheless.