Connecting Layers in HTM.core

Hey Everybody,

I am currently working on a ‘mouse in the box’ experiment with htm.core. That is, a ‘mouse’ moves randomly in a box and has vision sensing.

I encode and pool vision both spatially and temporaly to create an L4 layer, and same for motion to create an L6a layer.

Numenta’s papers and videos theorize the following about a specfic flow of information in the cortical column:

  1. Encode motor input, pool to L6a
  2. Pass information from L6a to L4
  3. Encode sensory input, pool to L4
  4. Pass information from L4 to L6a

My question deals with steps 2) and 4). How exactly are these steps enacted? I initially assumed that using the tm.compute() methods with externalPrectiveInputsActive and externalPredictiveInputsWinners could do the job, but I am not quite satisfied with the result.

How does one pass context between layers? How would one do this between layers L4 and L2/3?


Yes, the externalPredictiveInputs are the correct way to pass contextual inputs to the TM.
In practice it can be a pain to tune the parameters to work well.

1 Like

Numenta experiments with L4 and L2/3 have used the “Apical Tiebreak Temporal Memory” (originally called Extended TM) and “Column Pooler” algorithms, which are in htm.core’s advanced folder

1 Like

Thanks for the quick feedback :slight_smile:

To students of HTM school, and others here even slightly interested in the theory, it may be obvious that I am trying to replicate the mouse grid cell experiment of Moser and Moser. If such a… simple (heh) experimental result can be duplicated in some sense using HTM.core, I would feel much more confident in the architecture everyone discusses here.


The (two-dimensional) experiment so far:

One single Cortical Column

Feedfoward input to L4 and L6a:
The ‘mouse’ casts rays to the walls of a ‘box.’ Each ray returns the distance to the wall. That distance, plus the egocentric angle of the ray are encoded. The encoded input of all rays is concatenated into a single ‘sensory SDR’ and pooled to layer 4

The mouse also moves at a constant speed forward. It’s xy direction of motion is pseudo-randomly generated using perlin noise. Every ten timesteps, the egocentric distance travelled, as well as the change in ‘head direction’ are encoded and pooled to layer 6a.

Contextual Input between L4 and L6a:
In intermediate steps, the external active cells and winners from L6a are passed to L4, and the activation of L4 temporal memory cells are computed on L4 active columns. Vice versa for L4 to L6a.


I had been hoping to get some semblance of grid cells in L6a with this architecture. The results are interesting, but I haven’t yet achieved the main goal.

At this point I am considering extending to an object layer, L2/3. I am unsure, though, with which active columns to compute L2/3 temporal memory cell activation. Those from L4 (i.e. L4’s spatial pooler)? Perhaps I should spatially and temporally pool the active cells of L4…

My other option is to begin tweaking the parameters of my current L4/L6a layers.
Play with topology a bit perhaps.

My other idea is to introduce some form of ‘motivation’ or ‘incentive’ for the mouse, rather than random motion. I suspect adding this concept to the simulation may introduce, let’s say, local maxima to cell activation.

…but I don’t know, so I figured I would run it by the community. See what you all think :slight_smile:


If I recall correctly, L5 is where you should be computing movement output. For a small network like this, it should probably suffice to just associate each neuron in L5 with a small delta of agent motion. Simply sum up the deltas from each active neuron to get the cumulative motion of the agent. An efference copy of these signals should also be sent to L6 so that it can learn to anticipate changes in it’s position from these motor signals.

I made a model of grid cells using a modified spatial pooler, you can read about it here: Video Lecture of Kropff & Treves, 2008

This model used much simpler inputs than yours: instead of having vision & head direction, it just has “location-cell” inputs.

So I have yet to take a deep dive into the advanced sections of htm.core, partially out of fear of its complexity, but also out of a reluctance to complicate the algorithm.

This model used much simpler inputs than yours: instead of having vision & head direction, it just has “location-cell” inputs.

One major goal I have is to observe the formation of grid cells using completely egocentric inputs. I don’t know what goes into a ‘location cell’ input. Would you consider it egocentric?

In my current iteration, I would say that movement is automatically generated. I have not taken the time to write movement output into the cortical column.

I would consider L6a to be the interpreter of movement input, that is, ‘how much have I moved since the last iteration?’ ‘Where do I want to move?’ would be layer 5.

Perhaps this assumption is incorrect though

No, the model does not include sensory inputs. I should have said “place cells” instead of “location cells”, my bad. Place cells are a type of cell in the hippocampus: Place cell - Wikipedia

I think your assumption is basically correct. The following might be a reasonable approach to a sensor-motor feedback loop.

L5: Based on my current state, I will move my sensor by some small delta. I will send this information to the motor execution area, and to L6.

L6: Based on my current state, and the input from L5, I will update my estimate of where I am at in my environment. I will send this information to L4.

L4: Based on my current state and context provided by L6 and L4 from other nearby columns, I will make a prediction about what I expect to see next. If I am surprised by what I see my column will burst. Otherwise, my predicted neuron(s) will activate.

L5: If a column bursts, then I will preferentially update my delta movement to direct the sensor towards the location of the novel input. Otherwise, my neurons will fire with some predetermined amount of randomness. (Noisy leaky neurons will help keep the sensor from getting stuck.)