Burst as a local learning rule in apical but not basal dendrites

bkaz · November 1, 2021, 1:11pm

"
But for this teaching signal to solve the credit assignment problem without hitting “pause” on sensory processing, their model required another key piece. Naud and Richards’ team proposed that neurons have separate compartments at their top and bottom that process the neural code in completely different ways.

“[Our model] shows that you really can have two signals, one going up and one going down, and they can pass one another,” said Naud.

To make this possible, their model posits that treelike branches receiving inputs on the tops of neurons are listening only for bursts—the internal teaching signal—in order to tune their connections and decrease error. The tuning happens from the top down, just like in backpropagation, because in their model, the neurons at the top are regulating the likelihood that the neurons below them will send a burst. The researchers showed that when a network has more bursts, neurons tend to increase the strength of their connections, whereas the strength of the connections tends to decrease when burst signals are less frequent. The idea is that the burst signal tells neurons that they should be active during the task, strengthening their connections, if doing so decreases the error. An absence of bursts tells neurons that they should be inactive and may need to weaken their connections.

At the same time, the branches on the bottom of the neuron treat bursts as if they were single spikes—the normal, external world signal—which allows them to continue sending sensory information upward in the circuit without interruption.
"

dmac · November 1, 2021, 2:56pm

DOI link for the original publication: Burst-dependent synaptic plasticity can coordinate learning in hierarchical circuits | Nature Neuroscience

Random thought: An alternative explanation for all of this is that burst firing and the apical dendrites implement the “Global Neuronal Workspace” theory.

maxerbubba · November 3, 2021, 4:12pm

Random: cerebellum generally has two output types- simple spikes and burst spikes.

bkaz · November 3, 2021, 10:16pm

So, the difference from Global Neuronal Workspace is that the later assumes that basal dendrites also learn from bursts, vs. processing new inputs concurrently?

dmac · November 4, 2021, 3:02pm

The global neuronal workspace (GNW) theory is a theory of how consciousness works.

The GNW theory is biologically constrained.

The best article about GNW is:
(and I highly recommend reading at least the introduction & overview)

The Global Neuronal Workspace Model of Conscious Access: From Neuronal Architectures to Clinical Applications. Stanislas Dehaene, Jean-Pierre Changeux, and Lionel Naccache, 2011.
DOI 10.1007/978-3-642-18015-6_4
PDF LINK

Excerpt:
[GNW’s] main postulate is that conscious access is global information availability (see Baars 1989): what we subjectively experience as conscious access is the selection, amplification and global broadcasting, to many distant areas, of a single piece of information selected for its salience or relevance to current goals.

The GNW theory does not attribute specific mechanisms to the implementation of these two networks, instead hand waving at “layer 2/3 pyramidal neurons”. We will do the work of figuring out which mechanisms implement the GNW.

The GNW theory goes on to describe two distinct networks in the brain:

Local bottom-up processing of sensory stimuli (the unconscious mind),
Global broadcasting of selected information (referred to as conscious access).

So how does the brain implement two networks? And using only one set of cells? How can a neuron -with only one axon- transmit two different types of information?

Answer:

Burst firing indicates that the activity is part of the GWN.
Synapses have the ability to detect burst firing and respond appropriately.
The mechanism for this is called short-term-plasticity.
- Strongly facilitating synapses do not respond to single AP’s but rather only respond to burst firing.
- Strongly depressing synapses respond to single AP’s and then stop responding to any following APs.
There are two sets of dendrites for the two networks.
- Basal dendrites are for local information processing, and contain depressing synapses.
- Apical dendrites are for the GNW, and contain facilitating synapses.
Apical dendrites have a mechanism which causes them to burst-fire, but only when both the basal and apical dendrites are simultaneously active. The effect of this mechanism is to limit burst-firing to the set of cells which were already going to activate at least once anyways.

dmac · November 4, 2021, 3:41pm

Yes basal dendrites learn from burst firing, but they also learn from single APs, and they might (depending on how the synapses are tuned) treat bursts as though they were single spikes.

bkaz · November 5, 2021, 12:45pm

" if the stimulus is selected for its adequacy to current goals and attention state, it is amplified in a top–down manner and becomes maintained by sustained activity of a fraction of GNW neurons, the rest being inhibited. The entire workspace is globally interconnected in such a way that only one such conscious representation can be active at any given time"

Ok, that sounds like working memory. I am guessing competitive inhibition is mediated by thalamus, probably TRN, while “sustained activity” is through direct lateral connections in L2-L3? But I think WTA inhibition requires that the whole conscious representation is mediated by a single thread in TRN, probably formed by another WTA in sub-TRN thalamic nuclei? Sorry if that’s covered somewhere.

Still, as you noted GNW is not specific enough on a single-neuron level to be an alternative to OP paper, there are no contradictions?

dmac · November 5, 2021, 4:35pm

It’s not a working memory. The conscious activity (burst-firing) is sustained for as long as the stimulus is sustained, but once the stimulus stops the activity should also stop.

Now interestingly, if the stimulus moves then the activity should move with it.
This possible because the conscious activity is visible to the entire cortex.
I have a prototype of this effect in action: A Model of Apical Dendrites

Stanislas Dehaene’s 2011 paper does have a neuron-level model.
I’m putting forward my neuron-level model.
But I haven’t read & understood your article well enough the say?

bkaz · November 5, 2021, 8:59pm

I think anything relayed by thalamus counts as stimulus, be that primary or higher-order relays? Surely, we can have conscious experience with our eyes closed?

Further, how do you implement winner-take-all directly? It requires real-time competitive inhibition, which I think is only possible if you localize it in places like thalamus?

dmac · November 5, 2021, 11:15pm

The claustrum is a likely candidate for modulating the GNW.

bkaz · November 5, 2021, 11:40pm

I am sure claustrum plays a role, but TRN is where you can directly block non-WTA stimuli.
And it’s much smaller / faster / better-connected.

dmac · November 6, 2021, 3:55pm

While I’m sure that the brain modulates and does gain control on the GNW, I don’t think you really need a strict WTA competition. Multiple cortical areas can be broadcasting at the same time on the GNW.

My prototype I that I linked to does not use a competition.

Remember that cells (normally) only burst first from apical input if they also spike at least once from their basal & somatic inputs, so local inhibition can prevent burst-firing and control the sparsity (at least within a local area). There is also the inhibitory inter-neuron the Martinotti cell - Wikipedia which targets apical dendrites.

“The entire workspace is globally interconnected in such a way that only one such conscious representation can be active at any given time”

This means that if multiple areas are active at the same time then they all become part of (concatenated into) a single large SDR, as opposed to multiple discrete SDR’s representing different areas. There can only be one representation active in the GNW at a time because there is only one GNW.

HTH

bkaz · November 7, 2021, 1:34am

I don’t see what makes it “one” then. Not everything that happens at the same time in the brain is one representation. Composition of GNW is changing all the time, globally active neurons can become locally active and vice versa?
Also, that “concatenated” SDR probably won’t be in the same dendrite? It’s likely to be multiple concurrently active dendrites, with outputs summed in the soma?

I think the reason why singular consciousness exists, vs. just massively parallel processing in the brain, is a brain-to-body bottleneck, implemented in thalamus. Brain neurons can work in parallel, exploring many scenarios subconsciously. But the body can generally execute only one, even two hands can’t work separately, or two eyes look in different directions.
So these scenarios / threads / motor patterns have to compete for the control of the body, even if that control is imaginary. That requires WTA, and the winning thread becomes conscious.

We like to think about the brain as an information processor, but it evolved for a single purpose: to guide the body. That informs the whole process, even if we are thinking about abstract math.

dmac · November 7, 2021, 7:00pm

So I finished reading the OP’s article and I have to say I’m impressed by their model’s faithfulness to biology.
They get a lot of things right, including:

The role of burst-firing to signify task relevant / salient information.
Dendrites being tuned to respond to burst-firing or single spikes, using short term plasticity.

I think their results are mostly correct, but I still disagree with their interpretation.
They argue that their network approximates back-propagation and solves the credit assignment problem.
And maybe it does, but I think that misses the bigger picture of what apical dendrites are capable of doing?

Regardless, it is interesting to see when other people start with the same information and then come to very different conclusions.

bkaz · November 8, 2021, 12:03am

A bit off-topic wrt OP article, but I think it supports my opinion about WTA in TRN:

" Intrathalamic Inhibition of HO Thalamus

A central regulator of thalamic function is feedback inhibition via thalamic reticular nucleus (TRN), a thin layer of GABAergic neurons partially encapsulating the relay nuclei which project to cortex (Pinault, 2004). As well as TC afferents to cortex, thalamic relay neurons also send thalamoreticular projections to TRN which in turn provide feedback inhibition to relay neurons (Figure 1); the temporal scale of this inhibition is sensitive to spiking patterns (Figure 2), with high-frequency bursts triggering long-lasting IPSCs due to GABA “spillover” to extrasynaptic receptors, while tonic spiking patterns trigger shorter IPSCs (Halassa and Acsady, 2016). A recent pair of milestone studies in the somatosensory thalamus reveal that properties of HO and FO intrathalamic inhibitory circuitry differ significantly: HO nucleus POm excites and is inhibited by a discrete shell population of TRN neurons; furthermore, the synaptic dynamics of POm-TRN connections as well as the intrinsic properties of POm-connected TRN neurons are functionally distinct from those in VP-TRN circuits (Li et al., 2020; Martinez-Garcia et al., 2020). Thus, it may be that the dynamics of intrathalamic inhibition are matched to the distinct signal processing requirements of HO and FO circuits carrying L5tt and sensory information, respectively.

Given its role in gating thalamocortical transmission as well as its positional and physiological properties, the TRN has been implicated in the regulation of attention in the “searchlight hypothesis” (Crick, 1984; Crabtree, 2018). Regions in the TRN show increased activity in response to attentional stimuli, and the specific region in which this response is found is modality-dependent (McAlonan et al., 2000, 2006). Moreover, limbic TRN projections correlate with arousal states, while sensory TRN projections are suppressed by attentional states (Halassa et al., 2014). Work by Halassa et al. (2011) demonstrates TRN-dependent control of thalamocortical firing mode and state regulation, where selective drive of TRN causes a switch from tonic to burst firing and generates state-dependent neocortical spindles (Halassa et al., 2011).

Likewise, there is evidence for an attentional role of HO thalamus. For example, the MD is activated in humans during tasks requiring a rule-dependent shift in attentional allocation (i.e., set-shifting), such as the Wisconsin card-sorting task (Monchi et al., 2001; Halassa and Kastner, 2017). Human and monkey studies also point to a role of the pulvinar in visual attention. Pulvinar lesions in patients result in impairments in filtering distracting information, while pulvinar inactivation in monkey impairs spatial attention (Danziger et al., 2004; Snow et al., 2009; Wilke et al., 2010; Halassa and Kastner, 2017). In addition, Yu et al. (2008) describe the pulvinar’s role in sustained attention, employing the five-choice serial reaction time task to show that half of recorded units in this nucleus were attention-modulated (Yu et al., 2018). However, TRN control of HO thalamus in the context of attention and arousal has yet to be systematically investigated.
"

They are talking about its role in attention, which I think is just a different POV on GNW or working memory: “spotlighted” or globally active areas.

dmac · November 9, 2021, 5:05pm

I much appreciate a new review article! Very interesting, and I see what you mean about WTA in the Thalamus & TRN.

I had a different line of thinking about what the thalamus does, but I think its compatible with yours. My hypothesis is that the thalamus uses some kind of reinforcement learning to control attention and the GNW.

By the way: here is another good review article about the thalamus:

Functioning of Circuits Connecting Thalamus and Cortex
S. Murray Sherman, 2017
DOI: 10.1002/cphy.c160032

bkaz · November 9, 2021, 11:13pm

My hypothesis is that the thalamus uses some kind of reinforcement learning to control attention and the GNW.

Thalamus is very complex, so I am sure there is some RL in it. As in any region that gets phasic dopamine. The way I think of it, thalamus is close to a map-reduced brain. And it’s reduced to enable efficient real-time inhibition among competing stimuli and responses, basically a battleground.

That’s in real brain, but brain is a kluge, we need to distinguish functional from biologically plausible.
Most cognitive processes are “hypothetical”, they don’t actually need this real-time WTA for the body. To the extend that they do in the brain, it’s probably just an evolutionary artifact.

So, you may not need WTA in your model, depending on application. But then, you also don’t this binary distinction between globally active GNW and local processes. It will just be a continuous spectrum of activity scope for various co-activated ensembles, Fuster’s “cognits”.

Casey · November 10, 2021, 4:01am

I don’t necessarily disagree, but I don’t think those findings are enough to say anything specific about generic cortex. [9 levels of asterisks about a neuroscience fact in context of generic cortex, yeah I’m cutting the rest of this.]

I vaguely recall the basal ganglia only influences cortex indirectly, mainly by inhibiting thalamus, which would make thalamus a key part of cortical RL.

bkaz · November 10, 2021, 1:13pm

It’s speculative, but thalamus does seem to have all the mechanisms needed for map-reduced competition. Which is necessary, considering that only one motor pattern can be implemented by the body at any given time.

Striatum seems to have direct cortical afferents but not efferents. Which is strange, these connections are usually bidirectional.

dmac · November 10, 2021, 2:09pm

Agreed. The rules of short term plasticity can be tuned so that synapses respond in arbitrary ways to an input spike, as a function of the recent history of spikes. So there is a lot of wiggle room for different synapses to have different responses based on their particular situation.

I think that both interpretations of the thalamus can be correct:

The thalamus does reinforcement learning.
The thalamus does a WTA competition.

About WTA in the thalamus:
The winner is going to be the representation of a thing, and that representation is going to be multi-modal and so will show up in many areas of the thalamus.

For example: I have a big ball in each hand and I toss one of them to you and you catch it. Both balls have a representation in your thalamus, but when you go to catch one ball you need to not pay attention to the other ball. This is the WTA in action.

Another example: I experience this one a lot. When playing a first-person-shooter video game (cs:s): two enemy targets appear on my screen at the same time.

Sometimes I will focus on one of them and I am able to aim directly at them. In this case I ignore the other enemy; I can see the other and I’m vaguely aware of what they’re doing but I’m incapable of dealing with them until I mentally disengage from my current task of shooting the first enemy that I initially targeted.
Othertimes I will focus on both enemies at the same time! This does not work and I usually split the difference and shoot halfway between the two enemies, missing both of them. This could be a failure of WTA inhibition in the thalamus?

And to control the body with a purpose!
The reason I did not like the phrase “WTA” is that in theory: a lot of the cells could be active in the GNW as long as they’re all in agreement about what task you’re doing.

Topic		Replies	Views
Larkum 2013 & A State of Attention Tangential Theories	31	2379	September 17, 2018
Numenta turns attention to The Thalamus! Lounge	31	4205	November 9, 2021
Non-motor Regions Creating Motor Commands? I'm Skeptical Tangential Theories basal-ganglia , displacement-cells , thousand-brains	9	1725	September 6, 2019
Is there a Basal Ganglia theory equivalent to HTM? Numenta Theory	15	2639	December 5, 2017
A book about dendritic computation and cortical layers Tangential Theories	18	2062	October 3, 2017

Burst as a local learning rule in apical but not basal dendrites

Related topics