Can the brain do backprop?


How do we teach a machine to program itself ? — NEAT learning.


I remember Hinton saying once that gradient decent is essentially evolution. It tries many variations of weights, keep those that are good, change those that are bad. Search & selection.

They are awesome. Genetic algorithms on neural networks (neuroevolution) can have a lot more going on. This is my favorite GA work wherein they do generative encoding of the genome (weights) by mimicking the idea of chemical gradients during embryotic development. These guys really went to town on neuro-evolution.



There are other ways.
Like there could be this big dense volume neurons connected every
which way. oscillating like a constant storm.
A data stream is feed in from the real world, and another channel
that feed back in. And also, it is paired with highly organized transmission lines.
It work by searching through well ordered transmission line for the answers. It just
has to remember where to look depending on the situation before outputting into
the real world, Or back into the dense mass of chaotic nerves.
All the answers are already there. Just need to find them.
Is that what the internet is all about?

Reservoir computing?


That doesn’t describe neuroevolution, nor does reservoir computer. My question wasn’t actually serious (I was just poking fun at your deviation from the question at hand). But if you want a serious answer, then The Cerebral Code is probably the best theory for how a brain could hook into the process of evolution.


No neuroscientist I’ve ever talked to has said this. Hinton himself said we should look for another way less than a year ago, didn’t he? Now he’s flipped? I’m confused.

Numenta does not pursue backprop because there’s no evidence of it in the brain (that we have seen anyway). If we had some experimental neuroscience evidence of this, thats another story. If anyone has any references to scientific papers providing evidence, please post them!

I don’t think it’s worth the time theorizing how HTM could perform backprop without solid evidence of it’s biological plausibility. We don’t consider implementing backprop in Numenta’s HTM model because we don’t see a need for it.


It’s a crucial question to ask for anybody who is totally and completely invested in modern-day deep learning, for sure. My problem is that even assuming the existence of a common cortical circuit in the neocortex, the brain consists of a myriad of highly coupled unique functional and structural pieces that nearly all interact and thus at least influence each other’s function, if not, are inseparable from them. What exactly does Hinton mean to suggest? Does he mean to say cortex could be doing backprop? Where in cortex, exactly? That is to say, where in the cortical column and it’s connections to other brain regions does he suggest it is happening? Does it arise through a combination of structures? The number of possible manifestations of such a claim seem unwieldy at best.

Or is he suggesting every single neuron in the brain is constantly doing this brand of spike-frequency backprop? In that case, what about the many different functional and structural types of neurons in the CNS? Does that change anything? Also, not all neurons change their spike frequency in the same way. For instance, thalamic relay neurons have a special capability of shifting their mode of firing between tonic and bursting as a property of their t-type calcium channels. This switching is orchestrated by other components in the thalamocortical system like the thalamic reticulate nucleus providing inhibitory input to the relay neurons which hyperpolarizes them and causes the inactivation gate of the t-type calcium channels to open thus switching the firing mode from tonic to bursting. In contrast, the brainstem modulatory inputs read acetylcholine which depolarizes the relay neurons and has the opposite effect. Input from cortex and it’s effect on the mode of firing in relay neurons is more complicated and I’ll save the long explanation but through processes known as facilitation and synaptic depression the switch of firing modes is believed to modulate the degree of detail and manner in which information from the cortex ought to be relayed (tonic firing to express a linear relationship with firing frequency and thus signal strength and bursting which loses this signal strength information but helps strongly reinforce relevant synapses quickly). I find it very unlikely that a topic as multi-form and complex as synaptic firing rate modulation has an answer as simple as “because backprop…backprop everywhere.”

My problem with ANNs and their relationship to neuroscience is that they don’t actually have one. It seems to me the only detail that links ANNs to neuroscience is an extremely high-level concept of distinct computational units connecting to other distinct computational units that pass signals of some kind to each other. No further connection to neuroscience exists whether in the details of the computational units themselves or in the architecture of their connections. So forgive me if I find searching for backprop in the brain to be almost silly. Not to mention, there are many aspects of HTM that are also lacking (or even inconsistent) with neuroscience literature. Most obviously, neocortex comes in 6 layers…standard HTM networks have ventured to explain perhaps the function of layers 2/3. Layer 4 typically receives input from the thalamus and layers 5 and 6 are widely known to project back to the thalamus and other sub-cortical structures; this functionality cannot be ignored. Moreover, HTM has yet to explain layer 1 which is arguably inseparable from the idea of multiple, distinct cortical areas of the same “brain” unit talking to each other (likely in a hierarchical processing fashion) which has yet to be realized to my knowledge. Different modes of firing are also not modeled in HTM networks. In fact, the whole temporal component of neuron firing rates are ignored, to my knowledge. An HTM neuron has either “fired” or “not fired” for a timestep and that information is not carried to the next timestep with regards to whether it should or should not be fired in the next timestep. Boosting could potentially make my statement false, but that is in an effort to implement homeostatic excitability control and not realistic temporal firing characteristics, either way

ANNs of any modern type perform a single function, defined by labeled data, which they approximate through nonlinear optimization techniques (backprop). Every “neuron” in the ANN has dedicated all it’s representational and computational resources to this function. It makes sense in such an environment that a comprehensible error signal can be generated and used when your model has a singular, crystal clear goal in mind (curve fitting). In my experience, it is never so clear-cut in the brain. As I mentioned before, structures in the brain in general have connections going in many directions and do all kinds of different things simultaneously. Goal-driven learning akin to backprop doesn’t make sense to me at this low of a level considering the possible breadth of different purposes to which each multi-polar neuron contributes in general. In HTM, each dendritic branch is believed to be an independent pattern detector. HTM neurons (and real life multi-polar neurons) are associated with a large number of dendriric branches thus potentially recognize a large number of different patterns. It stands a chance in the end because of large, sparse pattern encoding. Biological (and HTM) neurons work together, but they do so independently of one another. In contrast, the characteristics (weights) of ANN neurons are dependent on the characteristics of every other neuron in the ANN that come before it. If you flip a weight in an ANN then it will impact the function of every neuron it talks to, which will impact the function of the neurons those talk to, cascading forward. The whole entire ANN has been decided with regards to the single optimization function. To my knowledge, there is no evidence in neuroscience of such a global supervisory influence (perhaps whose purpose is to perform some kind of optimization) on synaptic plasticity and the existence of one is not consistent with the concept of neurons acting independently.


The video I saw was from 2016, I didn’t know he changed his mind (glad to see a video/paper).

I fully agree, HTM is a biological theory and must follow empirical biological facts. I didn’t claim I had evidence for it, only that Hinton was theorizing about how the brain might do backprop and that he was not satfisfied with the standard arguments against that.



Great discussion. Yes, Blake and Hinton are saying the brain does backprop. I for my self will say there is no such thing as unsupervised learning. The brain learns to predict. The learning signal the feedback is the constant stream of input from the environment. i.e. If I do this will my arm move? Holy cow yes! Or, darn it did not move, try something else.

I am not clear if we predict up the hierarchy or down the hierarchy. Isn’t prediction part of Jeff’s book On Intelligence?


Blake Richards proposes the apical dendritic arbor of pyramidal neurons as the input for top down learning signals. He duplexes using events and bursts as the two signals (events = bursts plus spikes). A burst signals the basal dendrites. I think what they do is a form of thermal annealing that is the values are randomly changed a bit.

Hinton proposes signal and time derivative of signal as the duplexed signals.

For myself I prefer Blake’s approach.


I have an interview scheduled with Blake in November. I’m certainly going to discuss this with him.

Questions for Neuroscientist Blake Richards

see also Bengio’s take STDP as gradient descent on a cost function similar to a denoising autoencoder. Authors also discuss biological limitations of backprop and suggest that it can be avoided by propagating “targets” through local training.


You’ll find in “Why Neurons Have Thousands of Synapses, a Theory of Sequence Memory in Neocortex” that top-down signals from other regions of cortex coming in on apical dendrites of pyramidal neurons are theorized to play a similar role as lateral input from basal dendrites in terms of causing NMDA dendritic spikes leading to predictive neuron states. So, in that way, top-down feedback is theorized to influence the predicted state for a column’s own input. Each cortical column is constantly predicting it’s own input and only it’s own input regardless of where exactly that input is coming from. Assuming a common cortical circuit, each cortical column has no way of knowing where it’s input is coming from.


see also Dendritic error backpropagation in deep cortical microcircuits by Bengio


A post was split to a new topic: Questions for Neuroscientist Blake Richards


I’m actually coming to accept the cortex could be doing something like this but on a more abstract level.

Deep Mind has sort of produced some awesome autoencoder that uses recurrent nets to generate output. All unsupervised, like you said, using the inputs themselves as training data.


Seems like the wrong “Marr” in his illustration photograph, sadly.

Nice technology, though ^^


Saw it today also… amazing work.


Yes amazing work, DM is able to learn strong representations! Even jointed robot arms. It should be able to do a dog with jointed head, legs, and tail. How all we need to do is associate words with the representations.


If you check out the “three visual streams” paper they have a good temporal predictive cells based on the brain wave timing principle.

They break the wave into PLUS and MINUS phases where the plus phase is the upper layers forming an opinion about the “ground truth” of sensation and the minus phase (at the end) comparing a prediction in the lower layers to this ground truth.

The plumbing involves a pass through part of the pulvinar but that does not materially affect the basic mechanism of using timing of the wave to do temporal prediction.

They do take the outcome of this test and fire it back to the pulvinar to be distributed to other maps that are processing this same stream.

Why am I going on about this?

This particular pulvinar based/predictive mechanism is part of the only plausible scheme that I have seen that accomplishes the long sought goal of a biologically plausible back-prop behavior. If you are interested in this topic you owe it to yourself to do the hard work of reading the paper and references. Some very good stuff going on there.