Deep Predictive Learning: A Comprehensive Model of Three Visual Streams

The reading is not easy, but this “3 visual streams” paper combines lots of interesting ideas about L4/L5/L6/thalamus functions, temporal learning, error coding, alpha oscillation, top-down shortcuts increasing the learning rate, sequential development of the visual pathways (“where”, “where & what”, then “what”).

Combined with other readings, it helps me to conceptualize a different potential explanation of cortico-thalamic interactions (mainly based on an action-based interpretation of perception, and on cortical efferent copies to the thalamus). I’ll post about it when I will have formalized the mess in my mind.

Concerning the paper, I would like to mention some key points that would deserve a discussion:


The thalamus has a double function: attention (not studied in the paper) and cortical learning (the main focus of the paper). The authors take the visual cortex to explain their theory, but it could be generalized to other areas as well.


There are two kinds of learning:

  • Self-organizing learning which extracts statistical regularities (like classic auto-encoders). This is the kind of learning is used to create our internal models of our body and the environment.
  • Error-driven learning which leverages differences between expectations and outcomes. This is the kind of learning is used to shape our “alpha-long” predictive abilities (the term is mine) and our longer-term reward system.

In predictive learning, the learning could be both self-orginizing and error-driven.
At the beginning of development, cortical areas learn slowly and independently in this manner, until some “fast-learning” cortical areas could then help the other ones by providing top-down inputs for more efficient error-driven learning (more on this later).

Learning over alpha oscillations

Let’s focus on the “alpha-long” predictive ability of the neocortex.
By “alpha-long”, I mean the very short-term predictions of what will be experienced in the next 100ms (alpha oscillation).

The alpha oscillation creates two timeframes:

  • A 75ms “minus-phase” (= 3 gamma ticks) where the deep layers are isolated from L4 inputs to allow the computation of a pure prediction in L5/L6
  • A 25ms “plus-phase” (= 1 gamma tick) where the current state of the environment and the ongoing internal mental state of the organism are shared with deep layers.


The error-signal is then computed in high-order nuclei of the thalamus (pulvinar in this example for vision) which receive both:

  • The current state from L5IB cells (which relay the “ground truth” signal of the current moment coming from superficial layers L2/3 & L4)
  • The prediction from L6CT cells (which is the result of an interaction with local L5IB and L6CC cells, and long distance top-down inputs from L6CC).

Importantly, this error-signal is encoded by the thalamus as a STDP-dedicated temporal differences so that the cortex knows what synaptic weights to increase/decrease. The Spike Timing Dependent Plasticity (STDP) is a biologically-supported learning process allowing local backpropagation of errors to neighboring neurons.

Getting around the credit assignment problem with development phases

However, we have to keep in mind that an error could have several causes originating either from the given area, from other areas, or from both!

If each cortical areas correct the synaptic weights in parallel, we could end with a non-convergent system where connected areas iteratively adapt and unadapt their weights given the reciprocal changes. That would explain why we have critical periods in brain development.

The authors take the example of the different visual pathways:

  1. First, the “where” pathway can learn pretty easily on its own
  2. Then, it helps the development of an hypothesized “where & what” intermediate pathway thanks to top-down inputs from the stabilized “where” pathway
  3. Finally, the “what” pathway can construct complex spatially-invariant object representations


Overall, their explanations and their computer implementation are full of smart ideas that look very promising. And their model would fit well on top of a L2/3 object representation model (for instance Bitking’s hexgrid or other).

My doubts

However, I still have some reserves on my side. Mainly on those two points:

  • Given my other readings, I don’t buy their interpretation of L6CT cells. For now, I prefer to stick to Sherman & Guillery’s view of a modulation role of L6 cortico-thalamic cells. I may have a different view that could reconcile those two views, but I need to take more time to think about it.
  • It is not yet clear for me how the error-signal from higher areas is adapted to be understandable by lower areas, did someone get it?

If you want further reading about the paper:

Still thinking about it!