Hypothesis for voting in output layer cells

Background: In Numenta’s 2017 paper “how columns enable learning the structure of the world”, they described the output layer where stable representations of objects are formed, and they describe how the output layer cells can communicate via distal dendrites, with the goal of determining which of many possible objects they are sensing.

I hypothesis that the output layer’s distal dendrites could be implemented using a temporal memory.

I have not tested this idea, but i thought I would put it out here because other people have been discussing this topic.


One possible flaw with my hypothesis is that there is no competition amongst the distal dendrite activity. In numenta’s algorithm the cells can compete to activate on the basis of how many active distal segments they have.

One solution might be to implement a competition inside of the mini-columns of the TM.

1 Like

IMO, I don’t think this is necessary. If we leverage what we’ve learned from vanilla HTM, we would set up the system such that predictions are always tested against reality. With that in mind, the driving input should always be derived from reality, allowing wrong predictions to be quickly eliminated via the TM algorithm.


What do you mean by output layer? Are you referring to the output of the temporal pooling algorithm? Or perhaps the more recent voting mechanism?

1 Like

(references for background): the 2017 experiments reported in the “columns” paper used Numenta’s “column pooler” algorithm for the “output layer”. This has proximal segments connecting to (apical tiebreak) temporal memory output, and lateral segments for inter-column (cortical column) “voting”; CP distal segments are for intra-column connections. Column pooler output provided the input to TM apical segments.

1 Like

HTM-scheme re-implemented the apical tiebreak temporal memory (“ATTM”) and column pooler algorithms and replicated Numenta’s figures 3B/C, showing faster convergence with 3 cortical columns voting on object representation.

I’m currently looking at extending ATTM to replace CP as the output layer.


@rogert i am interested in knowing what and why is the modification of yours from the original CP and ATTM for your improvement on object classification.
Could you please give me more information?

1 Like

Your definitely correct for a single cortical area, but i worry that multiple areas could allow wrong answers to persist?


I might be thinking of it wrong, but I believe more CC’s would lead to faster convergence on the right answer, not slower. I think I actually have an old experiment using htm.js set up the way I described in this post. I’ll see if I can dig it up and post a link when I get back home from vacation next week.


In the text of the paper they describe how the output layer is supposed to
work. Notice the similarities between this and the temporal memory algorithm.

Output Layer (Page 3)

The output layer also contains HTM neurons. The set of
active cells in the output layer represents objects. Cells in the
output layer receive feedforward driver input from the input
layer. During learning, the set of cells representing an object
remains active over multiple movements and learns to recognize
successive patterns in the input layer. Thus, an object comprises
a representation in the output layer, plus an associated set of
feature/location representations in the input layer.

The modulatory input to cells in the output layer comes from
other output cells representing the same object, both from within
the column as well as from neighboring columns via long-range
lateral connections. As in the input layer, the modulatory input
acts as a bias. Cells with more modulatory input will win and
inhibit cells with less modulatory input. Cells representing the
same object will positively bias each other. Thus, if a column
has feedforward support for objects A and B at time t, and
feedforward support for objects B and C at time t+1, the output
layer will converge onto the representation for object B at time
t+1 due to modulatory input from time t. Similarly, if column 1
has feedforward support for objects A and B, and column 2 has
feedforward support for objects B and C, the output layer in both
columns will converge onto the representation for object B.

Network Convergence (Page 6)

As discussed earlier, the representation in the output layer
is consistent with the recent sequence of sensed features
and locations. Multiple output representations will be active
simultaneously if the sensed features and locations are not unique
to one particular object. The output converges to a single object
representation over time as the object is explored via movement.

A Theory of How Columns in the Neocortex Enable Learning the Structure of the World.
Hawkins J, Ahmad S and Cui Y (2017)


I think it’s explained more clearly in Tutorial and Discussion on Cortical Column Voting Mechanisms Developed by Numenta - 5 April, 2022 - YouTube. It’s long so I’ll try to summarize.

Purpose / Scenario

  • An object is a set of features, each at a location on the object. The output layer receives an SDR which represents the feature in context of the location.

  • Each cortical column receives features sensed by e.g. a fingertip, or whatever patch of the sensor.

  • Initially, it has no clue what the object’s identity is. Over time, the fingertips (or whatever) sense different parts of the object. The output layer contains the set of objects consistent with the feature-locations sensed by the fingertips.


  • The neuron is slightly different from the temporal memory neuron.

  • To learn an object, there’s a learning mode. For the object, a random set of cells in the output layer are forced to stay on. They learn the feedforward inputs (feature-location), which are from the cortical column’s corresponding fingertip. As a result, when the feature-location is part of the object which a given output layer cell represents, that cell receives feedforward input.

  • Feedforward input is just one type of bias towards firing. The cells with the highest bias fire. For example, if the highest bias score is 4, only the cells with that score fire. It’s not top-k like in the spatial pooler.

  • Another bias is whether the cell fired last timestep. There’s some persistence.

  • So with the feedforward bias and the persistence bias, here’s what happens. When it starts trying to identify an object, it receives feedforward input (feature-location). That input is consistent with a bunch of different objects, and all the output layer cells which represent those objects fire. Then, the second feedforward input arrives, and the output layer cells only fire if they represent an object consistent with that input AND the previous input. That process continues, disambiguating the object.

  • The last bias is for voting between cortical columns. Each output layer cell has a dendritic segment for each cortical column. It learns lateral inputs which represent the same object. Each dendritic segment adds 1 to the bias score, so columns vote on the object. If a column thinks it’s either object A or B, and another column thinks it’s B or C, then object B wins.

There are some interesting questions, because it’s incomplete. They have to be able to make something, so the learning mode and location signal might just be placeholders.

The location signal is supposed to be relative to the object, but it doesn’t yet know what the object is. In research meetings (maybe papers too), the location signal is itself ambiguous until it figures out exactly which object.

L5tt cells seem to burst both to detect a stimulus, and to convert to a representation of location (described in a, b). That’d be an anchoring signal, initializing the set of possible objects. It’s attentional, so significance matters. So maybe instead of objects, things of significance. Those things could be sequential in nature. So maybe some sort of mix of the output layer and temporal memory. It might not need strict sequences (e.g. path integration is multiple sequences between locations, so transitions between things of significance). So it could be better than sequence memory for long term sequences.