Deep Predictive Learning: A Comprehensive Model of Three Visual Streams


A very recent competing / complementary model of deep predictive coding in the brain.

O’Reilly, Randall C., Dean R. Wyatte, and John Rohrlich. “Deep Predictive Learning: A Comprehensive Model of Three Visual Streams.”


How does the neocortex learn and develop the foundations of all our high-level cognitive abilities? We present a comprehensive framework spanning biological, computational, and cognitive levels, with a clear theoretical continuity between levels, providing a coherent answer directly supported by extensive data at each level. Learning is based on making predictions about what the senses will report at 100 msec (alpha frequency) intervals, and adapting synaptic weights to improve prediction accuracy. The pulvinar nucleus of the thalamus serves as a projection screen upon which predictions are generated, through deep-layer 6 corticothalamic inputs from multiple brain areas and levels of abstraction. The sparse driving inputs from layer 5 intrinsic bursting neurons provide the target signal, and the temporal difference between it and the prediction reverberates throughout the cortex, driving synaptic changes that approximate error backpropagation, using only local activation signals in equations derived directly from a detailed biophysical model. In vision, predictive learning requires a carefully-organized developmental progression and anatomical organization of three pathways (What, Where, and What * Where), according to two central principles: top-down input from compact, high-level, abstract representations is essential for accurate prediction of low-level sensory inputs; and the collective, low-level prediction error must be progressively and opportunistically partitioned to enable extraction of separable factors that drive the learning of further high-level abstractions. Our model self-organized systematic invariant object representations of 100 different objects from simple movies, accounts for a wide range of data, and makes many testable predictions.

The authors note similarity to an older incarnation of HTM:

Hawkins’ Model
The importance of predictive learning and temporal context are central to the theory advanced by Jeff Hawkins (Hawkins & Blakeslee, 2004). This theoretical framework has been implemented in various ways, and mapped onto the neocortex (George & Hawkins, 2009). In one incarnation, the model is similar to the Bayesian generative models described above, and many of the same issues apply (e.g., this model predicts explicit error coding neurons, among a variety of other response types). Another more recent incarnation diverges from the Bayesian framework, and adopts various heuristic mechanisms for constructing temporal context representations and performing inference and learning. We think our model provides a computationally more powerful mechanism for learning how to use temporal context information, and learning in general, based on error-driven learning mechanisms. At the biological level, the two frameworks appear to make a number of distinctive predictions that could be explicitly tested, although enumerating these is beyond the scope of this paper.


Are there similar attempts to understand the neocortex on a high level?
What happens in layers 5a and 6a?
Free computational neuron models
Why is HTM Ignored by Google DeepMind?
Retrieving things stored in memory in HTM
Can the brain do backprop?
Hex Grids & 1000 Brains Theory
Esperanto NLP using HTM and my findings
Numenta turns attention to The Thalamus!
What are the flaws in Jeff Hawkins's AI framework?
Is the neocortex only a pattern recognizer?
Not Oscillations Traveling Waves
SDR theoretical properties and HTM
TemporalMemory running very slow after long training

I do wish they would have provided reasoning for this claim:

We think our model provides a computationally more powerful mechanism for learning how to use temporal context information, and learning in general, based on error-driven learning mechanisms.
1 Like


I am reading this in the context of my “dumb boss, smart advisor” model and I have to say - it’s sending shivers down my spine.

One of the parts that the authors point out as needing more work is the source of the high-level training patterns to generate the seed errors.

If you assume that the older lizard brain is going about its normal behavior in a mewling infant - looking, feeling, tasting, and living in general - and the cortex is getting this as the higher order input for the pattern to seed training - the explanations match up very nicely.



Does the learning model in this paper include the HTM Neuron? Meaning is there a predictive state (dendritic spike modeling)?

1 Like



Very similar, perhaps enough to use it directly. Since is related to the phase between the prediction and update inside a single wave the order of evaluation would serve much the same function.
It also works with a scanning pattern that is similar to the biological model that convolving tries to emulate.
This may well be the first “killer app” that deep learning HTM nay-sayers need to see to be shown that the biologically based model is as capable as the applications that statistically based point neurons are typically used for. It learns in an “unsupervised” manner in a few hundred presentations, not epochs of thousands. And without the forms of back-prop that I think everyone can agree is somewhat of a crutch.

I posted something to Jeff recently that ties in with this:

Please note that this also includes some of the oscillatory/phase involvement we were touching on in a different thread:

In this you stated that your research is looking into phase related processing. This model has it in spades.

Lastly - they mention in passing that this same general system would be applicable to sensorimotor systems. When you get the overall scheme it does seem extensible.

How extensible?

I am struggling to see how one could combine the coritical-IO system with speech hearing & production using this general approach. It may take some time to work this out but I will be mulling to see if it could make sense.

Emotional coloring from the amygdala seems to be an important feature in what has to be stored in the word sequence-grammar & word store.

What goes on in the lizard brain grows every-more important to understanding the cortex.



This paper adds some powerful support to the proposed “three visual streams” model.

Now if I can find some papers supporting the proposed plus/minus-phase temporal learning mechanism …


Retrieving things stored in memory in HTM

Nobody ever said that understanding what the brain does is going to be simple or easy to explain.

I find it hard to imagine that modeling the entire visual hierarchy and all the sub-mechanisms at each level could be much simpler. This is an enormously complicated task and this paper combines years of prior papers into a masterwork showing how all the parts can work together as a system.

The emergent property of counterflowing streams of information interacting to guide memory formation can only emerge at this level of implementation. This is a highly desirable goal.

I am impressed by how much of the visual system they have captured and how well it does work. I am trying to see how to incorporate many of the features of this system in my own work. I think that combining that insight with the HTM SDRs and predictive behavior should work even better. It would be great to see the training time drop from hundreds of thousands of trials to a few dozen.


TemporalMemory running very slow after long training

I guess we disagree on that. It should be easy. Otherwise, a minor change in the operating conditions will make it fall apart. Robustness and flexibility comes from simplicity.

My best analogy is a modern processor: if you take it apart, it looks extremely complex with many billions of transistors. But the basic architecture (storage-program) can be explained in a couple of sentences. You need all those transistors to solve an “engineering” problem. Mainly memory and bandwidth walls. You never will be able to understand from the latest IBM Power9 (with 8 B transistors) how storage-program works. It’s much better Numenta approach: try to replicate how Power9 behaves by connecting sparse pieces of knowledge together.



If we are arguing just to score points then you win. The brain can be described in a child’s picture book.

I’m not sure if that level of description will allow you to build an entire visual learning system but - yes - there should always be a simple overview. As a teacher of technical topics I will agree with you that the Feynman principle should apply: “If you can’t explain something in simple terms, you don’t understand it”

I was assuming that forum members are the ones that are digging into the 8 B transistor level models to make things work and are still working out how to pull this off. At this level, the explanations can get complicated. As you well know, elsewhere on the forum they are hashing out how to do branch prediction and while the basics of a CPU instruction execution unit are conceptually straight forward - tweaks to make it work better are proving to be less so.



No … just to learn a bit more :slight_smile:

I know something about that :slight_smile: I’ve worked on cache coherence… although at first sight is really complex, the basic principles and why something will work or something won’t, isn’t that hard. ( Computer architecture is half based on intuition/experience half on engineering ).



It’s been 40 years since I did bit-slice level CPU design; I made a nice little 8/16-bit CPU based on the 74170/74181/74182/74200 chips. Microcode used to be chip logic and not Verilog code. FPGAs were not really a thing yet. Things have gotten much more complicated since then.

That said - do you have a pointer to different papers that do deep predictive learning?
As far as I know - this is an emergent property of a “larger” multi-layered processing system.

I have been a long-time fan of the Global workspace theory, in particular the work of Stanislas Dehaene. This is also a large scale model of interconnected maps. There are also interactions between the counter-flowing streams that drive much of the interesting behavior in these systems. I see this emergent behavior as a recurring theme in larger systems.

My focus in all this is more oriented to the larger system level engineering and how the maps have to work together. The HTM model is surely part of this on one end of the size scale. The engineering scale at this level covers several orders of magnitude. You have to consider everything from individual synapses to large scale fiber tracts and population densities of fiber projections.

I could be way off base but I do think it’s time to take off the training wheels and put the HTM model in play as part of larger systems. I expect that this will change some of the current notions of what various parts are doing and refine the model.

In particular, I expect that the focus will shift from object representation to communications of object representation and deeper consideration of the distributed nature of those representations. I also expect that the coordination functions of cortical waves will take on more importance than it currently holds in the HTM canon.

1 Like


Noup… sorry but I don’t.

The “theater metaphor” is nice.

Indeed… we need to advance, but step-by-step.