Predictive Coding of Novel versus Familiar Stimuli in the Primary Visual Cortex

Some results that agree very strongly with the specific details of HTM theory.

“Using 2-photon calcium imaging to record from layer 2/3 neurons in the primary visual cortex, we found that novel images evoked excess activity in the majority of neurons. When a new stimulus sequence was repeatedly presented, a majority of neurons had similarly elevated activity for the first few presentations, which then decayed to almost zero activity. The decay time of these transient responses was not fixed, but instead scaled with the length of the stimulus sequence. However, at the same time, we also found a small fraction of the neurons within the population (~2%) that continued to respond strongly and periodically to the repeated stimulus. Decoding analysis demonstrated that both the transient and sustained responses encoded information about stimulus identity. We conclude that the layer 2/3 population uses a two-channel predictive code: a dense transient code for novel stimuli and a sparse sustained code for familiar stimuli.”

I’m particularly interested in the italicized part. Do we observe a relationship between sequence length and time to learn that sequence in HTM? With perfect input sequences I don’t think we do. But maybe with slightly noisy sequences? It could also be due to distal segments requiring repeated presentations to form new connections, or unlearn old ones.


Yes, for first order sequences the time to learn in the TM is about 2*n, where n is the length of the sequence. This matches the result in the paper quite closely. In the TM we can slow it down or speed it up a bit by tweaking the increments/decrements.

For high order sequences, the TM takes O(2nh), where h is the length of the longest shared subsequence. In this paper they did not try high order sequences.

HTM theory agrees with many of the results in this paper - they included some of these comparisons in a discussion paragraph on page 12.

1 Like

37 AM

And in case you did not realize it, one of the authors of this paper is the recently interviewed Michael Berry.


Thanks for those points; I hadn’t read the paper in detail until now.

In terms of points of disagreement with HTM, it looks like they’re seeing a sort of temporal pooling going on in L2/3: this agrees with broad HTM theory, but I think the current implementations wouldn’t exhibit this.

Currently, the set of active cells is selected based on the previous set, and so even for sequences that are themselves first-order, the L2/3 population, which is envisioned to model higher order sequences, would preserve contextual information about the whole history of the sequence.

Do you think this is an analysis effect introduced by the fact that they’re averaging over trials? Maybe on average you’ll see the same set of cells active at the same times, even though on a trial-by-trial basis they depend on the whole history of the TM population.

Yeah, I agree. The current temporal memory by itself wouldn’t exhibit it, but temporal pooling and the feedback paper we are working on would be consistent with this. We would see a stable representation that would be unique to the sequence, just as in their paper.

This is actually a point of consistency between the TM and their findings. Their “sustained code” includes a periodic component. There, specific cells become active at specific points in the sequence, just like the active cells in the TM. If you repeated the same short sequence over and over again in the TM (with either noise or resets in between), you’d see these “periodic” cells.

That’s true for sequences with resets or unpredictable noise between them. In these experiments however, the sequences repeat with no separation indicator. And of course the idea of an externally imposed boundary indicator is not very realistic. Has anyone at Numenta examined the behavior of high-order TM activity on short, repeated first-order sequences?

What I would expect to see is a diverging activity pattern that, if it ever repeats, does so only by chance. This would disagree with the results in the paper.

Noise (or an extra delay) between sequences is pretty realistic, but I don’t think they are doing that here.

With pure repeats, the TM representation might not diverge if you had feedback from the stable representation. You could use feedback to get the TM to stay locked on to the same high order pattern. Another possible tweak is a hysteresis on winner cells. The tradeoff would be a reduced ability to handle high order sequences containing repeated elements.

We’ve tried a few variations in how feedback impacts which cells become active. Each variation comes with some tradeoffs.