Larkum 2013 & A State of Attention


Larkum 2012,
A basic feature of intelligent systems such as the cerebral cortex is the ability to freely associate aspects of perceived experience with an internal representation of the world and make predictions about the future. Here, a hypothesis is presented that the extraordinary performance of the cortex derives from an associative mechanism built in at the cellular level to the basic cortical neuronal unit: the pyramidal cell. The mechanism is robustly triggered by coincident input to opposite poles of the neuron, is exquisitely matched to the large- and fine-scale architecture of the cortex, and is tightly controlled by local microcircuits of inhibitory neurons targeting subcellular compartments. This article explores the experimental evidence and the implications for how the cortex operates.

A State of Attention
This is my response to the article (Larkum 2012) which discusses backpropagation activated Ca 2+ (BAC) spike firing in Pyramidal neurons in the neocortex. Larkum argues that BAC firing is a critical component of cortical feedback. I disagree with this assessment; I argue that BAC firing is a critical component of attention. I hypothesize that BAC firing is a distinct computational state of neurons which represents a state of attention. This additional state fits into the HTM paradigm and explains how to control an HTM system.

BAC Firing
Here I summarize key evidence from (Larkum 2012). Pyramidal neurons have a calcium-spike initiation area in their apical dendrite, which is isolated from lower areas of the neuron. The apical dendrite is a separate compartment from the basal dendrites which is usually unable to initiate APs. If the basal dendrites first initiates an AP then the apical dendrite becomes suddenly very sensitive and if activated then the neuron will emit 2-4 APs at approximately 200 Hz, an event known as backpropagation activated Ca 2+ (BAC) spike firing. Apical dendrites, like basal dendrites, utilize NMDA receptors and dendritic spines. Some inhibitory neurons target apical dendrites.

A State of Attention
BAC-firing in pyramidal neurons represents a state of attention. This state is in addition the HTM states of: inactive, unpredicted-active, and predicted-active; the lower half of the neuron is a state of the art HTM neuron. Pyramidal neurons are either at attention or at ease. Neurons enter the state of attention if they are activated by both their basal and apical dendrites. This rule may have exceptions, such as if there is no feed forward proximal input and the whole cortical area becomes inactive and unused, or if there is overwhelming apical input. Apical dendrites utilize NMDA receptors which indicates that they operate in a similar manner to basal dendrites, including that they constantly learn in an unsupervised manner. This means even if you aren’t paying attention that the apical dendrites are getting ready to pay attention.

BAC firing could be reliably detected by its signature short high frequency bursts. It’s possible that apical dendrites are tuned to respond primarily to BAC-firing (at attention) inputs, with other inputs ignored or having a lesser effect. This would have an important effect: it would allow assemblies of neurons at attention to persist over time by forming stable local recurrent networks through apical dendrites. It would also block assemblies of neurons which are at ease from interfering with or spreading through apical dendrites. Attention would persist across cortical areas assuming that topologically related areas were connected through apical dendrites, which is important because the “thousand brains” theory states that objects are represented by all neurons which contain the object in their proximal receptive field. This means that as an object moves across an animals field of vision its representation in the brain moves through many cortical areas, and attention follows with it. Attention would persist without any intervention.

Attention would make all of the neocortex into a single global workspace where ideas come together in a controlled fashion to produce a desired result. All neurons which are at attention would participate by broadcasting pertinent information. This would allow arbitrary areas of the cortex to cooperate, and areas which aren’t currently participating quietly watch and wait.

Methods of Cortical Control
The basal ganglia performs reinforcement learning which concerns what good or bad things may befall the animal. The basal ganglia projects to the thalamus, which in turn projects to the cortex, and then finally to the muscle neurons. I hypothesize that the thalamus uses reinforcement learning to associate conditions in the basal ganglia and cortex with ways to manipulate the cortex. Cortical pathways which connect to muscles or other brain outputs will be especially controlled. Here are three ways that the thalamus could manipulate the cortex:

  1. Turn off all of the feed forward proximal input, which will shut down a whole cortical area. This could be used either to restart the network or to allow attention to inject an idea into an unused cortical area. This mechanism could allow attention to recruit unused cortical areas to help with problems which the unused areas have experience solving. This could also prevent the motor cortex from performing an undesired action.

  2. Turn off attention. There are inhibitory interneurons which target the apical dendrites, both in layer 1 and Martinotti cells throughout the neocortex. This inhibition would prevent activity from participating on the global workspace. Activity suppressed in this way is still present in the cortex and can be brought back to attention if needed.

  3. Promote attention in a cortical area by increasing the overall magnitude of the feed forward proximal input so that neurons activate as strongly as BAC firing neurons. This could allow an idea to gain attention where before there was no attention.


I’m going to focus on the small scale details. The overall picture you describe is interesting, so don’t take any of this as me disagreeing with you.

It makes sense for bursting to be for attention, but it probably isn’t all-or-nothing. I think of attention as how aware the neurons are of each thing. There are probably some aspects which are sort of all-or-nothing, like switching attention to something for which there was no awareness whatsoever.

Be careful when researching bursting and dendrites. In my experience, it’s very slow to read the articles because each one is pretty different and technical, and the results from different studies are often quite different. Interpreting things in context of the rest of the cortical circuitry (what inputs are on which parts of the cell and what types of synapses they use and the sparsity of those inputs firing) is another nightmare.

Don't trust results based on calcium imaging. They're useful for some things but they're the bane of my existence.

A lot of studies on bursting and apical events use calcium imaging, where they inject a dye that fluoresces in response to calcium. Calcium imaging is not good for showing anything except that there was an increase in calcium. It can’t be used to identify the timing, duration, or magnitude of an event. It can theoretically show those things, but there are all sorts of problems it can cause. The results of these problems are often reported in studies because they seem new and interesting. For example, they can cause signals lasting seconds, possibly because of saturation (I don’t really know what that means), and they can cause repetitive calcium events (e.g. burst frequencies) to produce much larger signals. The usage of calcium imaging varies a lot between studies, like the concentration of the dye in the cell, the sensitivity of the dye, and probably the time scale. I don’t know much about chemistry so I might be totally wrong, but it seems like calcium imaging is very misleading.

Bursting probably isn’t a binary event based on L6 and the thalamus.
In thalamic cells, bursting isn’t an all-or-nothing event in terms of magnitude, although whether or not it happens is all or none. Thalamic bursts are caused by low threshold calcium spikes (different calcium channels than the main ones on the distal apical trunk in cortex), and they only occur if the cell has been hyperpolarized for a bit because that de-inactivates T-type calcium channels. When the cell becomes depolarized by an input, the magnitude of the calcium spike depends on how hyperpolarized the cell was and how long it has been hyperpolarized up to some limits. This can cause single spike bursts.

In L6 cells and presumably in L5, it’s sort of linear. I strongly disagree with the idea of a critical frequency, which for L5 thick tufted cells is generally around 100 hz, both for artificially induced spikes generating large tuft calcium signals and for a spike train to be considered a burst.

ADPs (afterdepolarizing potentials) are recorded at the soma and are what directly trigger bursts. This study measured the amplitude of the ADP triggered by different frequencies of artificially induced firing. Around 40 hz and up to about 80 hz, so practically all firing rates above 40 hz since L6 cells fire slower than L5 cells, the ADP amplitude is basically linear. That’s not the same as a bAP and apical depolarization together triggering a burst. However, in L5 TT cells, 100 hz is the supposed critical frequency for both the artificially induced firing and the burst that results from an apical calcium event, so there’s likely no difference. Also, since it only starts at a fairly high firing frequency, this is almost certainly caused by a calcium event in the apical dendrite, so the calcium event that results from a coincident bAP and dendritic input is probably also not all-or-nothing. fig. 6 d, e.

My sources are in google docs linked here: Layer 6 Notes Summary. This is mostly based on memory so I might be wrong about some things. Some things might be hard to find so I can give you a list of sources if you want.

Some studies (maybe just one) found that the apical dendrite can produce APs at probably unrealistic levels of input, but also at much lower levels of input increases firing rates evoked by somatic injection, if I recall correctly in a multiplicative manner.

The proximal region of the dendritic arbor causes APs directly. This includes the proximal apical dendrite, something like .2 mm I think. Oblique dendrites are basically ignored so they probably also complicate things.

200 hz is an exaggeration by the researchers. It can happen but probably just because of injecting way more current than is realistic.

The NMDA receptors are on the apical tuft, whereas the distal apical trunk* is the location of the voltage gated calcium channels. They seem to perform localized summation like distal basal dendrites and HTM dendritic segments.
*and possibly the first order branches of the tuft, possibly because L5 TT cells can branch their apical shaft around L4 into effectively two apical shafts

If you haven’t read about them yet and want to, try searching martinotti cells and somatostatin positive cells, although be careful because not all of those are the same. They’re pretty interesting in regards to bursting. They can be inhibited by other interneurons, possibly some which are activated by motor cortex signals to L1.

I haven’t heard of apical dendrites only responding to burst frequency inputs, but this is possible indirectly. Higher order thalamus synapses in primary cortex in L1 and L5a (probably not much on L5 TT cells but perhaps even proximally on L5 slender tufted cells which might modulate L2/3 and L5 TT cells). It activates metabotropic receptors, which might be frequency-sensitive. This mirrors feedback from L6 to primary thalamus, which is thought to be involved in attention.
NMDA receptors can also be frequency-dependent, maybe. This is all pretty complicated though so it’s hard to tell what happens in the brain.

Those interneurons don’t receive input from the thalamus, at least most or all Martinotti cells don’t.


You bring up some great points about how this might actually work in the brain. I’m going to continue painting the larger picture (fewer calcium imaging studies that way). The next article I hope to read and respond to is:

Fundamental Components of Attention, Knudsen 2007
A mechanistic understanding of attention is necessary for the elucidation of the neurobiological basis of conscious experience. This chapter presents a framework for thinking about attention that facilitates the analysis of this cognitive process in terms of underlying neural mechanisms. Four processes are fundamental to attention: working memory, top-down sensitivity control, competitive selection, and automatic bottom-up filtering for salient stimuli. Each process makes a distinct and essential contribution to attention. Voluntary control of attention involves the first three processes (working memory, top-down sensitivity control, and competitive selection) operating in a recurrent loop. Recent results from neurobiological research on attention are discussed within this framework.


I would be interested in your feedback on this alternative view of the conscious experience:


I place the attention function in sub-cortical structures with expression of attention as drive to the fore-brain.


I think that the brain has a much better “body needs” sensor than the limbic system. The basal ganglia (BG) performs reinforcement learning which predicts the outcome of a situation in terms of how good or bad it will be for the animal. I think that information in the basal ganglia is routed to the PFC, where it is process and analysed. The result is a loop between Unsupervised Learning and Reinforcement Learning.

You should read my post in that thread too:

Some more details: The Striatum has two populations of cells D1 and D2, one population only activates in the presence of dopamine, the other in the absence. Dopamine here encodes the expected value, which is an output of reinforcement learning. D1 and D2 cells are learning to represent their input from the cortex, as Hebbian learning tends to do, but their ability to learn is gated on the current level of dopamine, so that they learn to detect only the things which are present when they’re allowed to learn. These cells then represent things which have a value to the animal; everything the animal cares about should show up somewhere in the Striatum.

N.B. I think that working memory cells could activate themselves via the Cortex->BG->Thalamus->Cortex pathway, which can only persist if the BG deems the subject matter valuable.


I find it quite surprising that you assert that the BG is somehow a better choice for sensing body needs. The body needs --> limbic --> PFC circuits are widely documented by many sources. Some examples:



Heat and cold (comfort seeking):

Overview of all body sensing based drives:

You may wish to reconsider this.


I did read your consciousness post when you first offered it. I certainly am familiar with the neural structures you describe and the documented ties between function and behavior.

I had trouble working out how the workings of the loop you described ended up in the conscious perception and particularly - the widely documented internal voice.

Perhaps you can help me understand how the mechanism you propose explains the blending of perception and internal life? Also - the creation of the internal narration?

While you are at it - cover the well-known delay between the initiation of action and perception of that decision. I am having some trouble matching that delay up to your description.

Lastly - the built-in behaviors in many mammals are active minutes after birth and certainly long before any learning of the environment could occur. Can you explain how that reinforcement learning based behavior could arise?


Perhaps I spoke too soon. Is it possible that both the limbic system and the basal ganglia project to areas of the forebrain?

I agree with your ideas about the forebrain injecting sensory input into other areas of the cortex, and then evaluating the results. What my attention hypothesis offers is an idea about how that top down control might happen.


Please reflect on the possibility that genetic programming built into the subcortical structures kick-starts the animal by driving the PFC in the early days.

I will accept that learning of all kinds, including RL, improves these basic built-in behaviors as the critter experiences the world. The loops you describe could certainly be the mechanism to do some of the tuning.

I read a lot of papers on these topics. I do try to match up any new information with the bulk of prior papers that I have encountered. It is entirely possible that everyone up to this very day had it wrong. Science does work that way but the new proposal does have to explain all the prior wrong stuff better to sway me to accept it.


I know you were talking about my Striatum hypothesis, but I’m going to direct the conversation back to my attention hypothesis. One of the big ideas behind attention is that you have the ability to manipulate your own brain. I proposed some mechanisms which could manipulate the cortex. The internal narrative is the result this manipulation, and is part of a larger phenomenon of cortical manipulation.

To understand the internal narrative you first need to understand how actions work and how attention controls actions. The cortex accepts sensory input and outputs a representation of what it is looking at and how it is moving. The cortex connects to muscle neurons in the brain stem, which drives movement onwards.

Controlling actions: Let’s start with a simple example: how to think the word “apple”. First training: look at an apple, pay attention to the sensory experience of the apple, and speak the word apple over and over again. By attending to the apple, the apical dendrites in the motor cortex associate speaking “apple” with the concept of an apple in other areas of the cortex. Then testing: I will present you with an apple and a lemon and ask you which you’d like to have. Your basal ganglia tells you that the apple is what you want, and the result of this decision is to pay attention to the apple and not to the lemon. Although your motor cortex has been trained to say both apple and lemon, you say apple because the SDR for apple is at attention so it is recognised by apical dendrites, whereas lemon is not at attention so it is filtered out by apical dendrites.

Internal Narrative: Speech is a very complex action, but I think it’s controlled in the same way as other muscle movements. I think that part of the reason that the internal narrative is special is because words have so much meaning. Words can concisely express or invoke any idea in the brain. The internal monologue is a really useful tool for moving information and it makes sense keep these circuits running pretty much non-stop.

This makes a lot of sense to me. What evolutionary benefit is there for being immediately notified of a decision? Stop hesitating, no more thinking, just do the thing which you’ve decided to do. If my brain gave me a moment of conscious reflection after every decision I’d second guess myself every time. Every area of the cortex is potentially critical to the decision making process as well as the action which immediately follows the decision. Notifying a cortical area that “a decision has been made” would distract that area to tell you something that isn’t related to the task at hand.

I think that the question should instead be: “How are we able to perceive a decision?”

I think you’re right here, that the limbic system drives animal behaviour from the start, before the animal has time to learn. The limbic system certainly has a large impact on animal behaviour even after reinforcement learning happens. The limbic system is in charge of keeping the animal alive so it can over-ride cognitive functions, even after the brain has learned a lot (example: angry ppl don’t think well). Just a guess here, but I think that many of those built-in behaviours are dealt with by the midbrain/brainstem. Birds don’t learn to fly, it works the first time or they fall and die.


This sounds testable.
What do you propose should be the changes as any cortex possessing critter goes from limbic-driven to cortex driven as the cortex is trained up? Keep in mind that this should apply to both speaking and non-speaking critters. They have been lots of conjecture on what kind of internal mental activity exists in animals.

I would expect that a fMRI should be able to see some sort of changes in activity.

If you can make a specific enough prediction I can put this to some of the neuroscientists that I am following on twitter and see if it can be tested.


Several years ago I heard about this phenomenon on the radio (NPR): babies eyes sometimes move weirdly for a few hours while they switch from instinct-driven eye control to conscious control.


This is behind a paywall. If you are able to see it you may notice that the visual attention is all driven by sub-cortical structures. I don’t know how that affects your hypothesis but it may be time for a re-think.

More accessable overview:


No paywall here:


I have no real biological evidence for my Striatum hypothesis. I argue for it because I think it explains a lot about how I think people think.

The Striatum filters for salient information, information which is directly related to the tasks which the animal is trying to solve. The result is a much smaller and more useful representation of the world. The result is then sent to another area of the cortex because why not? It seems like it would be a useful thing to have.
The input data to the sensory cortices is from the real 3D world, the input data to frontal brain would be more like a subway map.

That’s a lot of evidence and will take me time to work through. But at first glance it doesn’t really contradict my attention hypothesis, since I describe how attention effects the cortex and this describes the areas which controls attention.


I don’t think the idea of attention is exactly right, at least not attention as a filter. At the very least, there are probably multiple kinds. It might be worth listing different drives for and forms of attention. These are just off the top of my head and the neural implementations might do some of these with the same tricks and some of these with multiple tricks.

  1. Sensory surprise
  2. Novelty even once you know it’s there
  3. Shiny/food/poisonous colors
  4. Learned reward
  5. Ongoing cognitive processes, possibly themselves triggered by reward learning or behavior, or just as parts of normal sensory processing
  6. Levels of attention based on brain state
  7. Probably others.

Types of Attentional Targets:

  1. Precise timing
  2. Location on sensor
  3. Location on object
  4. Location in egocentric reference frames (which are much more varied than just sensor), and all other reference frames
  5. Continuously moving locations
  6. Particular objects, sensory characteristics, etc. regardless of their locations (like looking or waiting for a particular cue.)
  7. One’s own behaviors (how much do you think about walking? Walking cycles? The angle of your ankles at each point in time? You can choose.)
  8. Attention to particular sensory/cognitive modalities and more instinct-like modalities which words like emotion don’t describe well (what is the amygdala saying? How hungry am I? What are potential targets for eye movements according to the superior colliculus?)
  9. Dozens of combos of those things.
  10. Probably others.

I think of attention as the level of awareness/representation of something out there. It can’t only involve sensory filtering because then you wouldn’t be able to pay attention to objects which seem like possible matches as you search for an object in the dark among many other objects. Based on the thalamus and L6, hypothetically, when the cortex has no representation of something which can cause part of a new sensory input, it’s surprising and so it gets more representation. When the cortex has a lot of representation of that, it’s attention, so it also gets a lot of impact on the representation (that was a possible sensory input given these possible identities for this object, so it cuts out some other possible objects). When there’s just a bit of representation, it has already noticed whatever is causing that input and chosen to ignore it, or if you reframe this into attention towards each possible explanation for a sensory input and consider one of those, it has weak representation because whatever recently seemed like a potential explanation for the sensory input has mostly been ruled out. For example, you see part of a wheel and infer a bus and a car as possibilities. Then you see another part of a wheel but at this point know it’s not a bus, so the representation of “bus” from earlier is weak.

With that in mind, the basal ganglia inhibiting the thalamus has a slightly different implication, although this is pretty speculative. If it gets inhibited a little, it gets ignored. If it doesn’t get inhibited at all, it gets through the thalamus of course. But if it gets inhibited very strongly, possibly even if that inhibition continues, an unexpected input will cause those thalamic cells to burst.

Possibly, another way to think about this is, if it’s just inhibited, it means it thinks those thalamic cells have no chance of turning on so when they get input, they respond strongly. But if there’s also feedback from the cortex, it has already considered the possibility and decided it’s probably not going to happen, so the cells respond weakly.

Those two extremes (lots/little inhibition or representation) aren’t equivalent in what they do, but they both do more to the cortex than the middle.

This is sort of my understanding/interpretation of some things that BitKing has written.


I think you misunderstand me. On this thread I’ve discussed two hypothesises, one about attention, and another about the connectivity of the forebrain. The Striatum was involved in the forebrain hypothesis, where I think it filters for salient information using reinforcement learning and not attention.

This is close to what I think too. All activations in the cortex represent information, but large activations represent information which is called to attention. In this way the signal for attention is embedded inside of the information signal.


I found this interesting paper about Martinotti cells:

Martinotti Cells (MC) are activated by Pyramidal cells (PC) and the PC->MC synapses are strongly facilitating. Figure 3A in this article shows that MCs do not activate in response to inputs firing below 30 hertz, and that their response increases proportionally to the input firing frequency. MCs inhibit nearby PCs via their apical dendrites and tufts in layer 1. The article concludes that MCs provide rate-dependant negative feedback to PCs, and that this feedback is both triggered by and targets the apical calcium spiking. My interpretation of this article is that MCs control the sparsity of neurons at attention, which may be different from the overall sparsity of a population of PCs.


I’ve read articles saying that Martinotti cells operate in terms of bursts, so something like competition for bursts, but that isn’t right because they facilitate over many spikes, not just a few, and can facilitate a lot in response to moderate firing rates. It would make sense for them to operate in terms of bursts because they inhibit distal apical dendrites, but they don’t. Do distal apical dendrites just cause bursts or do they add to or multiply firing rate or something else? I don’t know, but it doesn’t work if distal apical dendrites just produce bursts.

So I wouldn’t think of this as a binary burst/no burst thing. Maybe some sort of graded firing rate code of attention, or rather, a graded degree of multiplication so it responds better to whatever is in its field of attention (or e.g. a cue it’s waiting for to do something. The type of attention is unimportant).

Or what would it mean if you don’t put it in terms of a firing rate code? If it does a multiplication function, it’s probably because of coincidence detection between backpropagating action potentials and brief excitatory inputs to the apical dendrite, controlled by (probably longer lasting) inhibitory signals. Since, if you don’t think of it as a rate code, that’s a single isolated event, the facilitating martinotti cells would be operating in terms of how much it has recently recently. So I guess along the lines of, it sees something in its field of attention causing wider receptive fields/a denser SDR, so martinotti cells reduce that sparsity for a bit while low firing rates cause facilitation to die down. As opposed to fixed attentional SDR sparsity, this would be an SDR of level of perception (firing sparsity) and receptiveness (based on receptive field width/firing selectivity) and if it perceives too much during a period of time, now that it knows what it could pay attention to or what could be out there, martinotti cells narrow its attention to pick something.

Just throwing some ideas out there. The specifics aren’t really the point. The point is that bursting isn’t binary and that might be important for the concepts describing attention. Or not.


I found another experiment which is related to my attention hypothesis. They stimulate thalamo-cortical projections to study the effects of different thalamic relay cell firing modes. Their findings and discussions are interesting and agree with my hypothesis but their final conclusions regarding a “frequency-dependent gate” is overly simplistic.

Thalamocortical Bursts Trigger Recurrent Activity in Neocortical Networks: Layer 4 as a Frequency-Dependent Gate, by Michael Beierlein, Christopher P. Fall, John Rinzel, and Rafael Yuste, 2002

Abstract Sensory information reaches the cortex via thalamocortical (TC) synapses in layer 4. Thalamic relay neurons that mediate information flow to cortex operate in two distinct modes, tonic and burst firing. Burst firing has been implicated in enhancing reliability of information flow between individual neurons. However, little is known about how local networks of neocortical neurons respond to different temporal patterns of TC activity. We studied cortical activity patterns evoked by stimulating TC afferents at different frequencies, using a combination of electrophysiology and calcium imaging in TC slices that allowed for the reconstruction of spatiotemporal activity with single-cell resolution. Stimulation of TC axons at low frequencies triggered action potentials in only a small number of layer 4 neurons. In contrast, brief high-frequency stimulus trains triggered wide spread recurrent activity in populations of neurons in layer 4 and then spread into adjacent layers 2/3 and 5. Recurrent activity had a clear threshold, typically lasted 300 msec, and could be evoked repetitively at frequencies up to 0.5 Hz. Moreover, the spatial extent of recurrent activity was controlled by the TC pattern of activity. Recurrent activity triggered within the highly interconnected networks of layer 4 might act to selectively amplify and redistribute transient high-frequency TC inputs, filter out low-frequency inputs, and temporarily preserve a record of past sensory activity.

The main take-away from this experiment is that thalamic relay cells can use their firing mode to control how information is processed in the cortex, without altering the underlying information which is transmitted. This is another example of a control signal embedded within a information signal.

They conclude that layer 4 acts as a frequency dependent gateway because when they stimulated thalamic relay cells in tonic-mode (low frequency inputs to cortex) it caused layer 4 to have a small amount of activity which did not spread beyond layer 4 and which quickly died out when stimulation ceased. I think that the results of this experiment would be different if they had increased the stimulus duration (they used 100-200 usec). Perhaps a longer stimulus duration would have resulted in activity throughout the cortical layers instead of just this truncated response in layer 4.