Big picture of supervised learning in the cerebellum and amygdala

Hi folks, I wrote this blog post as I tried to understand the big picture of supervised learning in the cerebellum and amygdala—what’s it for, how does it work, where do the supervisory signals come from, how does it interact with other systems, etc.:

I feel like I must be reinventing a lot of wheels here—can anyone suggest references that cover similar ground (or present different perspectives)? Anything big that I’m missing or misunderstanding? Also, while there’s loads of theorizing about what the cerebellum is doing and how, I’ve found much less about the amygdala; any references or ideas?

Thanks in advance, ~~Steve


I think that the reward mechanisms are highly associated to the Basal Ganglia[1]. Mi guess is that cortico-BG-thalamic-cortico loops might be more relevant than your approach.

Cerebellum might be close to an in/out filter (involved in learning by the means of enhancing distinctive sensory information and suppressing superfluous, such as self-generated one [3]). Sounds more aligned with Universal Cerebellar Transform[2] than your take.

Anyway thanks for the contribution



Thanks! Strong agree that reward mechanisms are related to basal ganglia and are important for understanding what the neocortex does. Indeed, I have this earlier blog post that discusses that:

I don’t know what you mean by “this might be more relevant than your approach”. It’s not a competition, right? Reward learning is happening in the neocortex / thalamus / BG system, AND supervised learning is happening in the cerebellum and amygdala, AND the hypothalamus is doing something else entirely, AND, … right? Or do you think that “reward mechanisms” are a unified theory that explains absolutely everything in the brain?

I’m not sure what the difference is between “my take” and “Universal cerebellar transform”. The paper you linked seems to be mainly about whether or not the whole cerebellum does the same type of calculation (I currently think “Yes it does”), and barely mentions in passing the question of what that calculation actually is. The little it says on the topic is mainly endorsing the Marr-Albus-Ito model, I think. I think that my post is also endorsing the Marr-Albus-Ito model, or at least something awfully similar. Unless I’m misunderstanding what the Marr-Albus-Ito model is, which is absolutely possible, maybe even likely!

I’m intrigued by the idea of an “in/out filter”…

You say that for the “in” part of the in/out filter it “enhances distinctive sensory information and suppresses superfluous”. How does it know which is which? Like what is “superfluous”? If I trip on the sidewalk now and then, that’s a problem that needs to be corrected, whereas if I drag my feet now and then, that’s fine, I meant to … So how does the cerebellum figure out which is which?

And for the “out” part of the in/out filter, what does it filter? Any ideas?


Of course no, but confronting ideas might be useful to all :slight_smile:

Perhaps I got the “supervised” term in the wrong way.

My (wild) guess is that cerebellum will adjust sensory input to cortex expectations (to prevent it to run out of synapses or distinguish seemingly similar sensory inputs in clearly different contexts).

Perhaps an “easier” path to explain myself, is the Dorsal Cochlear Nuclei (DCN). It some sort of cerebellar circuit (Purkinje alike cells) in the auditory pathway. A significant L5 input comes from auditory, visual cortex, and other deeper regions of the cortex (via inferior colliculus). DCN will affect to what from the auditory nerve “passes” to MGN (and hence A1).

My intuition is that cortex in conditioning, via L5 projections, what passes and what no (for example cancel self generated noise, compensate head movement, cancel “echo”, etc…). I think that not only “cancel” but also “amplify” the distinctive part of the auditory nerve.

In the case of the cerebellum, this might be quite more complex, since “sensory” input flow has a feedback ( sacadic movement, via SC in V1)

“Focus” L5 motor cortex output? Cancel output noise by using other region of the cortex L5. Purkinje are just “frame” interpreters. The whole cortex state is the “frame”. Because of that Cerebellar peduncles are the most fiber dense structure in the whole brain.

Don’t take it seriously :slight_smile:

1 Like

@vpuente, After reading what you wrote (especially about the DCN) and thinking about it, I am now convinced that cerebellum-like supervised learning systems can be used as input filters. I already started writing a follow-up blog post. Thanks! :slight_smile:


Be carefull… the DCN it is a beast! I suggest you to take a look at tinnitus papers.

1 Like

As promised:

I’m now up to 4 different reasons (that I understand) for why and how the brain can benefit from supervised learning, listed near the beginning of this new post. I didn’t dive very deep into this one, I just looked into it enough to convince myself that it’s a sensible and plausible design, at a big-picture level. I’m still open to ideas and feedback and references. I linked to your comment @vpuente :slight_smile:

Sorry, but still can’t see it. If the cortex is fed with errors, how learn the context provided to the DCN?

The cortex already knows everything in the DCN’s “context” data (= input to the DCN trained model). So DCN tells cortex: “After accounting for everything I know, auditory signal #17 is more active than I expected”. The “everything I know” part of that includes the states of neurons in the cortex itself, and maybe things like jaw motion signals which are also going directly to the cortex. So the cortex isn’t really losing any information, or in other words, the cortex has all the information it needs to understand what the DCN is doing to the signal.

I guess my concern is more general… I don’t get the idea of learning only with errors of “predictive coding” (the experimental evidence of rise in activity after a prediction error might be a quite different thing)

In this particular case, there is a feedback loop between DCN and cortex. Both will move progressively into a stable state. DCN can’t be “trained” before cortex (distal dendrites of FC and CWC keeps learning parallel fibers in association with input PF includes cortex context). Note that the system is continuously learning, so the “trained model” notion does not apply here.