A question about temporal memory, I guess


Let’s say we have supplied two different sequences of tones to the HTM, such as [B#, C, G] and [B#, C, D].

As far as I’ve understood, after some repetition, if we give the sequence [B#, C] as input then both of the cells that are responsible for D and C will be in a predictive state.
But the thing that I do not understand, when I encounter such repetitive sequences as a human after I saw the sequence [B#, C] I think like “G OR D is the next tone.” but I think the representation in HTM is more like not G OR D but a union of them, which is not an “or” statement as I understand.
I think that this union of predictions represents a tone that is a somewhat “mental” mixture of G and D but not an “or” of two separate tones.

Basically, I mean that the HTM predicts both of the tones successfully but the representation feels incomplete in the terms of separation of these tones.

I don’t know if I explained myself clearly or does the question make sense.
Please feel free to help me with my confusion.

The system “predicts” both as possible.
If neither is seen then it reacts with surprise as this is a novel sequence.

A predictive state is not the same thing as the activation state. The predictive state just prepares those neurons to fire sooner if a previously learned pattern appears on the proximal inputs. There may be multiple such learned patterns that share common features. In that case, the network would not be surprised to see any of them. However, if a new pattern arrives and a set of neurons in a minicolumn fires without being in the predictive state, then we register that as surprise (bursting) and that triggers learning of the new pattern by growing new distal synapses.

TLDR: The activation state responds to the actual input, while the predictive state merely prepares the network to anticipate previously learned sequences of inputs.

EXAMPLE: When you flip a coin, I fully expect that the result will either be heads or tails. I am not surprised to see either condition. The union of those two states is the predictive state (i.e. all neurons participating in either representation are put into the predictive state). When the coin lands on either heads or tails, one of those two predictions is satisfied and the neurons representing that state are now in the activated state. The neurons for the other representation just fail to activate.


I think the OP may be unclear (just like me) on how the predictive state can lead to something like a memory recall which has Hopfield network like properties where it can recall and settle with a specific memory even with ambiguous inputs.

It’s an inclusive ‘or’ (like ‘and/or’ as opposed to ‘either/or’). You can’t predict precisely G or D, but you wouldn’t be surprised by either. You would be surprised by another letter like W tho.


The addition of the “thousand brains” lateral connections brings about pattern completion behavior.

The current HTM/thousand brains state of the art network has these distinctive properties:

  • Sparsity where the representation is widely distributed across the network,
  • detecting/reporting a novel step in a sequence,
  • learning a novel sequence in very few presentations,
  • pattern completion across the 2D structure of the network.

Important emergent properties:
The combination of pattern completion combined with sequence prediction is an extraordinary filtering mechanism.
The “surprise” (bursting) is an important orienting signal for attention.


Thank you for the clarification. I’d like to learn more about HTM’s pattern completion capabilities. Is the code with said capabilities available for the public to peruse?

The pattern completion part of the Thousand Brain Theory (TBT) is only hinted at indirectly in Numenta papers such as this one:

See figure 3, showing that the small pattern matches in columns are combined by voting. What is not shown is that this voting is happening all over the sheet of columns at the same time. The end effect is that the final activation state is the one that satisfies the largest number of columns at the same time.

Don’t let the wording that suggests that every column learns everything about about every object mislead you. It makes more sense to think about it as every column learns what it sees during the sensing of that object. In practice, in the eye, this sense stream is driven by highly stereotyped saccades over objects. That means that the column associated with a given part of the visual field will be fed sequences that correspond with what it sees of the object as that sequence is executed. It votes with other columns that are seeing other parts of the object. Collectively they vote that this stream is consistent with the previously learned object.

As to sample code, I don’t do python so I have not made any effort to work with the Numenta codebase; I can’t offer any advice in this area.

Thanks for the link and explanations. After a quick read I think I understand some of Numenta’s concept of pattern completion. It seems that it must have all the minimally necessary information before it can filter out and/or output a single object type.

But I don’t think I read anything on how it would guess the object type given some incomplete or blended information of different objects. And once it identifies the object type in SDR format would that SDR be used as a predictive state or as an output to another column’s input dendrite (sort of like proximal dendrite inputs but instead of sensory data it takes processed abstract outputs from other columns)?

You are now venturing into the realm of the H of HTM.

Note that in the brain there are many (100+) regions or maps connected together in rough hierarchies. Each of these areas is a “stand-alone” collection of cortical columns. These areas are interconnected by bundles of fibers. The details are an unnecessary distraction from the answer you are looking for in your questions.

Information is passed from the sensory areas to hub areas, and counter-flowing streams add to disambiguation. Lateral tracts add further disambiguation between sensory modalities. These counter-flowing streams are able to add context to pick one “object” out of many possible interpretations. You can experience this directly in the “cocktail party” effect.

Perception is actually an active memory task - you construct your internal reality to “match” perception. Learning adds more detail to this internal representation.

Understanding this will help put the “prediction” and pattern completion into context. Trying to predict two or more steps of processing runs into the combinatorial explosion[1] wall; we don’t do that in the brain. We “only” predict the next step. This is still an awesomely powerful filter and driver of attention. This prediction and filter combine to the pattern completion that results in an activation state in the cortex that corresponds to the perception of “reality.”

I could go on about this but I will just say that this could be a good starting point to help you grasp the utility of this pattern completion process.

As far as trying to isolate proximal/distal/temporal portions of the process, they all act together in an integrated unit.

Where does this leave you, the student of HTM? I have been criticized by others in this forum for suggesting that you will have to learn a great deal more about how the brain works before any of this fits into the bigger picture. That said, some of the features of cortical columns will seem hard to grasp until you see how it fits into this larger picture.

I would suggest that you look at spatial and temporal pooling as your next step.

Keep asking and members of the forum may be able to help you in your studies.

[1] combinatorial explosion


Lol, you read my mind on being not that keen on getting a neuroscience degree. And on the combinatorial explosion part, too, as I was thinking about google’s RL algo where it evaluates several potential next-steps ahead. Thanks for taking the time to reply in length. Will need to research more to digest your insights.

1 Like