Hypothesis for voting in output layer cells

I think it’s explained more clearly in Tutorial and Discussion on Cortical Column Voting Mechanisms Developed by Numenta - 5 April, 2022 - YouTube. It’s long so I’ll try to summarize.

Purpose / Scenario

  • An object is a set of features, each at a location on the object. The output layer receives an SDR which represents the feature in context of the location.

  • Each cortical column receives features sensed by e.g. a fingertip, or whatever patch of the sensor.

  • Initially, it has no clue what the object’s identity is. Over time, the fingertips (or whatever) sense different parts of the object. The output layer contains the set of objects consistent with the feature-locations sensed by the fingertips.

Mechanisms

  • The neuron is slightly different from the temporal memory neuron.

  • To learn an object, there’s a learning mode. For the object, a random set of cells in the output layer are forced to stay on. They learn the feedforward inputs (feature-location), which are from the cortical column’s corresponding fingertip. As a result, when the feature-location is part of the object which a given output layer cell represents, that cell receives feedforward input.

  • Feedforward input is just one type of bias towards firing. The cells with the highest bias fire. For example, if the highest bias score is 4, only the cells with that score fire. It’s not top-k like in the spatial pooler.

  • Another bias is whether the cell fired last timestep. There’s some persistence.

  • So with the feedforward bias and the persistence bias, here’s what happens. When it starts trying to identify an object, it receives feedforward input (feature-location). That input is consistent with a bunch of different objects, and all the output layer cells which represent those objects fire. Then, the second feedforward input arrives, and the output layer cells only fire if they represent an object consistent with that input AND the previous input. That process continues, disambiguating the object.

  • The last bias is for voting between cortical columns. Each output layer cell has a dendritic segment for each cortical column. It learns lateral inputs which represent the same object. Each dendritic segment adds 1 to the bias score, so columns vote on the object. If a column thinks it’s either object A or B, and another column thinks it’s B or C, then object B wins.

There are some interesting questions, because it’s incomplete. They have to be able to make something, so the learning mode and location signal might just be placeholders.

The location signal is supposed to be relative to the object, but it doesn’t yet know what the object is. In research meetings (maybe papers too), the location signal is itself ambiguous until it figures out exactly which object.

L5tt cells seem to burst both to detect a stimulus, and to convert to a representation of location (described in a, b). That’d be an anchoring signal, initializing the set of possible objects. It’s attentional, so significance matters. So maybe instead of objects, things of significance. Those things could be sequential in nature. So maybe some sort of mix of the output layer and temporal memory. It might not need strict sequences (e.g. path integration is multiple sequences between locations, so transitions between things of significance). So it could be better than sequence memory for long term sequences.

3 Likes