Generalizing the Numenta Neuron

I have recently been trying to imagine how generally intelligent structural phenomena such as hierarchy, slowly mutating protocols and an affinity towards increased choice can flow out of the raw elements of networks themselves (or mathematical principles of graph theory).

Its got me thinking about how the Numenta model of a neuron differs from the neurons used in NN today.

As I understand it, and I only understand what I have accumulated in passing, that the Neumenta neuron has proximal and distal connections which are treated differently (mainly have different effects on the cell to fire or not). Moreover, the Numenta neuron has essentially 3 states instead of 2. Inactive, active and predictive. If this summary is not accurate please let me know.

( TDLR: )
It occurred to me last night, that the Numenta model could be generalized as an ā€˜activation weighting over distance function.ā€™ That is, a neuron has three signatures:

  1. a number of other neurons it listens to,
  2. a signature (distance ratios) of how many far away neurons vs. nearby neurons it listens to, and
  3. a signature (distance weights) of how much of an effect those neurons have upon this neuron to fire by distance. (That is, weights indicating how many neurons must fire at the same time at every distance and combination of distances to activate this neuron).

Allow me to explain:

Iā€™m first of all working off the assumption that anything can be modeled by/as a network, which I think we should take as an initial assumption because we use a physical network to understand all that we understand. Thus we actually know that anything we can model can be modeled by/as a network (anything else, we canā€™t model we have no choice but to ignore).

Any generic network has 2 parts, of course: nodes and edges. In a fully connected network every node listens to every other node. Most networks are not fully connected though and therefore each neuron has some signature (distance ratios) of how far away all the neurons are that it listens to.

Letā€™s talk about this distance metric. Iā€™ve shown this image on this forum before but I think it does a good job of showing distance:

A is close to C, B, and D but pretty far away from E. Perhaps you ask, ā€œWhy? A has a direct connection to all of them, shouldnā€™t they be equally distant with respect to A?ā€ In a sense, yes, but the distance metric I mean to highlight is ā€˜interconnectedness.ā€™ If A didnā€™t have a connection to D, it would mean very little because it can get to it pretty quickly by going through B. However, if A didnā€™t have a direct connection to E, it would have to make quite the journey to get at the information disseminating from E; there are 6 nodes in between A and E without that direct connection.

So you can see the ā€œdistance signatureā€ of any node, meaning how many nodes it connects to that are far away vs how many are nearby is highly affected by how many nodes it connects to, and how many nodes the average node connects to, and furthermore, is highly affected by the ā€œdistance signaturesā€ of the specific nodes it connects to.

What does a network with highly variable distance signatures for its nodes end up looking like in the end? A brain with different ā€œcell types.ā€ That is why I wonder if this metric, along with an activation signature by distance is away, or in basic terms perhaps, the way to generalize the Numenta (or brain) model neuron.

Iā€™m suggesting, (and Iā€™d like to know if Iā€™m up in the night or on to something), that with, perhaps only 3 dials you can generate a network of any structure or repeating structure possible. Those dials or parameters being the signature of each cell:

  1. connection variable: how many other cells does this cell listen to?
  2. distance signature: what is the ratio of distances of nodes it listens to? (What is itā€™s affinity to listening to nodes that are distant from each other?)
  3. weighting signature: by distance how much does an activation count to activate this node?

All of these metrics can be expressed in terms of the metrics each node connects to, they donā€™t have to be constants. In this way canā€™t you simply specify a distribution by percentage and determine the shape the network would be?

Canā€™t you say, ā€œgive me a network where x% of nodes have this parameter-signature, and y% have this parameter-signature and z% have this parameter-signature.ā€ and produce a network that attempts to come nearest those signatures and those percentages given the actual data that is provided by the environment? Doesnā€™t it feel like giving these metrics you define a sphere in the state space of the networkā€™s shape that it attempts to revolve around?

Anyway, I know this is quite the tangential theory but Iā€™m just trying to find a way to generalize the Numenta neuron because I think by doing so you might be able to express the possibility space of the shape of the whole network most generally. (Which in turn would help with automatically generating AGI structures).

What are your thoughts?

3 Likes

I canā€™t say it better than this quote from @Paul_Lamb regarding the hex-grid model:

"For some time it has been clear to me that distributed semantics can be distilled from an input stream in two ways ā€“ spatially (relating two or more bits which are physically distant from each other in a single input of the stream) and temporally (relating two or more bits which are active in separate inputs of the stream).

In order to bring together these relationships in a hierarchy, it is necessary to ā€œpoolā€ them in some way. Classic HTM has addressed half of the problem via the Spatial Pooler algorithm. My quest for the last three years has been to address the other half of the problem by developing a Temporal Pooler algorithm. I have developed a few TP algorithms, but so far all have had flaws and fallen short of the concept that I have in my mind for how TP should work.

Performing both SP and TP in the same algorithm is a new epiphany for me, so feeling like I am at least on the right track now."

I bolded the part most relevant to the current thread.

3 Likes

Can you make it more explicit how this quote connects to what Iā€™ve described above? Iā€™m not seeing a specific connection, though I get the feeling its talking about generally the same ideas. What triggered the memory of this quote for you?

and

ā€œCan you make it more explicit how this quote connects to what Iā€™ve described above?ā€

Certainly.

In a meta sense, over and above the connectivity details (I will come back to this) what are all these connections trying to do? Why are they connected?

I start with the assertion that the network is looking for patterns in time and space.
Coincidences. Parsed from a view micro level to a macro level in some continuum. Hopefully with some useful semantic meaning.

Bits that are in some spatial arrangement or in some temporal arrangement. Preferably, a continuum where you could think of matching up this bit pattern in this place at this time with that bit pattern at that time.

Spatially, we have the SDRs that are some pattern of bits on one tiny segment of one dendrite at a relatively small segment of time. HTM establishes a time element, which is THIS SDR before THAT SDR.
One SDR pattern on the apical dendrite primes the prediction that some SDR will be matched on the cell soma dendrites. This is a time/space match but it is still on a very small scale.

The voting mechanism expands the number(size) of time/space patterns that can be matched in space. The 1000 brains theory expands this to a fabric of cells that vote laterally. As time progresses the sheet of matching TM cells is marching in goose-step with the sensed pattern. As a very useful side-effect, the spatial extent of this match is signaled down to the thalamus as activation of the tonic mode.

In the hex-grid pattern, the voting mechanism is proposed to be more of an organized pattern, and the stable hex-grid is both spatial and temporal pooling. It goes from a collection of tiny time/space patterns to a more stable hex-grid pattern that covers a larger space within the map/region. The other cells that did not win the competition are suppressed so there is a maximum separation between the winning and non-winning patterns. The ratio of lateral axon length to basket cell size sets the spatial extent of the hex-grid stride. You drew a fixed connectivity graph - I say that this is a dynamic connection pattern that is sized to just straddle the size of the typical apical arbor sensing span.

This pooled TM & SP pattern is formed with reciprocal connections to the next map/area in parallel with other maps so that coincidences between these two maps/areas can be computed. This extends the time/space correlation to a much larger portion of the cortical sheet, and likely, in concert with an area that has extracted some other semantic sense of the incoming sensory stream.

So - you are trying to work out the connectivity as a fine-pitched graph as if that is what is repeated at all scales. I am thinking of this from the micro view (single-cell) all the way up to massive numbers of cells in multiple regions. There are multiple mechanisms depending on the scale of the representation. The Numenta TM cell is the basic building block of the TM part of the system, and the lateral voting mechanisms connects within an area/map and reciprocal map-2-map connections form the large scale connections using layer 2.

Opinions vary on the details of the lateral voting. I say that it is roughly fixed-length connections that have a rather stereotyped connection pattern.

What triggered my memory of Paulā€™s quote was the line between the micrographs you are drawing and the next logical unit of lateral voting with SP & TM pooling. This is transmitted as the active nodes to the next map to repeat the process. With a distributed representation the connections never get longer than about 10 mini-columns apart and this mini-column never knows what a cell 20 mini-columns might be thinking, only that it is seeing (or not) something that is has learned in the past so there is some larger pattern of resonance formed.

4 Likes

Donā€™t worry, Einstien went through the same thing. :slight_smile: Can we start calling it the Spacetime Pooler?

6 Likes