“Thinking is a form of movement.” What directs the movement?
The 2019 paper “Locations in the Neocortex” suggests a fundamental benchmark for macrocolumn research: a rat (or mouse) navigating dark 2-D environments (only features local to a given location can be sensed). The mouse is first introduced to a set of environments, say 20-50, and has the opportunity to explore and learn each of them. Then, it is dropped into a random environment at a random location. It orients itself by moving about in the environment and associating features it senses with what it previously learned. It eventually converges to a unique location within its environment: it is oriented.
A macrocolumn contains the learned information. As initial explorations take place, the macrocolumn learns spatial relationships among features belonging to each of the environments. A single macrocolumn can learn and hold multiple environments at the same time. This capability is demonstrated via simulations in the 2019 paper.
Say we have a macrocolumn that stores an environment as a directed displacement graph as proposed by Lewis in his “Hippocampal Spatial Mapping As Fast Graph Learning” paper. The environment is stored in the synapses as a directed graph with labeled edges that give the spatial displacements between two features. The graph is not complete (having a direct path between any two nodes would be costly). However, the graph should be connected so at a minimum there is a multi-edge path from any feature to any other feature.
This macrocolumn can support three basic tasks:
- Exploration: It can learn environments through exploration;
- Orientation: When placed in a learned environment, it supports the orientation function;
- Navigation: After orientation has taken place, it supports navigation through the environment.
Regarding 3: One of the features may be “cheese”, so if the mouse is placed in a learned environment in a “hungry” state, it can use the macrocolumn to navigate to find the cheese. There may not be a single graph edge from its initial oriented location to the cheese, however, so at a minimum it can use some simple trial and error method to travel along a series of edges until it finds the cheese.
In the modeling work that I am doing, I employ an explicit, architected agent that generates the movements necessary to achieve the three basic tasks. The efficiencies with which the three tasks are performed depend on the movements required to achieve them, and the movements are determined by the agent. The agent is as important as the macrocolumn; neither would work without the other. And overall efficiency is determined by the quality of the agent.
So, an agent is an essential part of the overall system and implementing a biologically plausible agent becomes a research project on its own. For example, one might use neurons to implement a plausible reinforcement learning method that can optimize (shorten) paths to the cheese. Or the initial exploration phase might be part of an overall optimization plan to reduce the path length. An agent might re-invoke exploration from time-to-time so the short path to the cheese can evolve.
In the biological brain where is this agent functionality performed? It is not performed by the macrocolumns as described above.
Q-Learning is a classic method for implementing reinforcement learning. It is based on a data structure known as the Q-Table. During exploration, an agent moves through an environment, receiving rewards and punishments as it goes. The results of many exploratory episodes are duly recorded and processed by the Q-Table. During exploration, the Q-Table only takes in information that is provided to it; it does not affect the exploration path. Then, after sufficient exploration, the agent may consult the Q-Table to decide on movements. In the classic algorithm, the result of a Q-Table lookup is used. However, the agent can use the Q-Table in any way it sees fit and can even ignore what the Q-Table “suggests” in favor of some heuristic-based move.
Macrocolumns may play a role similar to the Q-Table in Q-Learning methods. That is, they are large (sophisticated) data structures that support functions (exploration, orientation, navigation) under the direction of an agent.
Say the metaphorical space aliens come to earth and examine a state-of-the-art microprocessor. Most of what they will see is SRAM – multi-level on-chip caches and predictors. And they may observe that the more SRAM, the better the performance (sometimes by huge amounts depending on working set sizes).
They would then endeavor to discover how an SRAM works, motivated by the belief that it is the most important part of the computer. It is an essential part, to be sure, but one can argue that the CPU is where the real magic takes place – it uses SRAM as a large data structure to support its operation.
What all this may mean
An overall research approach is to co-develop macrocolumn architecture and agent architecture.
Given a working macrocolumn as described above, there are (at least) two major research directions. One is to lash together multiple macrocolumns to form a region that can be used for achieving higher level objectives. The Numenta group uses lateral connections to implement a form of distributed consensus (“voting”) amongst groups of macrocolumns (see 04/05/2022 Numenta Research Meeting video).
The other direction is to pursue biologically plausible optimized agents, with emphasis on plausible reinforcement learning methods. This can give insight regarding the capabilities that macrocolumns should provide. And advanced agents will be essential for demonstrating the capabilities of human-engineered neocortices as they develop.