Numenta release the thousands brains theory more than a year ago. However there’s no new development on TBT by Numenta nor the community have the ability to experiment with it. While it also has been a while since last the community came up with well-designed, impressive demos. And building demos is a important way to explore what TBT can/can’t do and spread the information around. In short, the Thousands Brains Theory look amazing but we (as a community) can’t utilize it without actionable code.
So I want to start a new project to build TBT in a way that people can start experimenting with the theory. Here’s some ideas
Implement either using Etaler or HTM.core
Hopefully resulting in a library everyone can use
It definitely will be a community project due to it’s large scope. And have a lot of open questions:
How to verify the implementation is correct?
What can we use TBT for?
How to formulate lateral connections?
How do we know if we have implemented stuff wrong or it’s just something TBT can’t do?
The first step in going to be figuring out how to implement lateral connections. I know people have been toying about TBT and related subjects for a very long time. We could start brainstorming what we should/could do and see where we ended up.
The basic computing unit of TBT (a cortical column) requires a function for combining moving sensory input over time to form object representations. From my perspective, one of the main components missing for a workable model, is an “object pooling” algorithm that pools both spatial and temporal information. The SP algorithm alone is insufficient for this purpose, because it only is able to pool related bits that are separated spatially within a single timestep. It won’t pool related bits that occur in different timesteps.
Personally, I think the hex grid idea will lead to an algorithm for the “output layer” which best fits this purpose. Because the hex grids transmit information over long distances, they fill the role of formulating lateral connections for voting between cortical columns. Because their activity is impacted by a larger context involving multiple cortical columns, their activity will inherently be more temporally stable than activity in the “input layer”. This temporal differential will support pooling over multiple time steps.
Of course there may be simpler ways to implement the object pooling and voting functions. I’m just sharing one area that I have been interested for a while.
The voting is a relaxation process where multiple puzzle pieces are tried in each local and the local pieces are the ones that fit best in the local neighborhood. If each learned some global pattern then reconstructing that pattern (or a generalization) should be one of the lowest energy configurations.
Yes, or (as I am currently exploring) a Spatial-Temporal Pooler. Temporal pooling is necessary to recognize objects, because the object is sensed over multiple time steps.
Maybe, but I have been struggling to visualize how it would work. Where I always get stuck is that the order of inputs is not always the same. Unlike TM, the input layer gets its context from a location signal. The current theory says that this location signal is based on grid cells, and that the locations they represent are specific to the object. I think for that to work, it requires that somewhere there is a representation (or series of representations) for “the object”. These representation(s) need to be consistent while attending to the object, so must be something more temporally stable than the inputs themselves.
Yes, one way you can probably do this to understand it in the context of TBT, is to trace through the theory starting from the Frameworks Paper. From section " Rethinking Hierarchy, the Thousand Brains Theory of Intelligence":
The reference to “Lewis et al. 2018” is the Columns Plus Paper, which goes into more technical detail on object recognition and grid cells. From this paper, it discusses the method of voting between many sensory patches:
This “additional population of cells” is of course is a reference to the output layer described in the Columns Paper. This is where the concept of lateral voting was introduced.
@dmac, I took a quick look at your “Prototype of Stability Mechanism for Viewpoint Invariance”. It appears that input for the algorithm consists of the calculated input from the previous timestep plus weighted sensory input in the current timestep, correct? In its current form (assuming I understand it correctly), the algorithm seems like it would be applicable to a single cortical column. We would probably need to extend it to allow voting across multiple cortical columns (maybe adding output from other cortical columns as additional weighted input to the algorithm, for example).
Distant cortical columns are connected through distal dendrites, which the SP does not deal with. The “columns paper” has a prototype for distal dendrites, but I did not successfully combine the proximal & distal dendrites. I had meant to but then my research changed direction.
It is the most simplified experiment, but challenges the issues.
Moving agent equipped with more than one sensor, exploring “2D object”.
The goal is to achieve stable representation in the object pool.
I have written the base code in python for working with htm.core.
I have used grid cell encoder to encode the agent position, location layer is now just this encoder
see plots and results in the thread
how to implement object pool, i.e. the Spatial-Temporal Pooler
how to implement distal input to location layer (arrow no.4 on picture)
Any cooperation on current code is welcome.
Anybody can write it’s own code from scratch also, we can share findings and improvements.
I am now enhancing HTMpandaVis to work with data history to be able to better see what is going on inside layers. The prior mission for the vis is to get better insight in this project and push it forward.
There is current code, my fork