Blake Richards proposes the apical dendritic arbor of pyramidal neurons as the input for top down learning signals. He duplexes using events and bursts as the two signals (events = bursts plus spikes). A burst signals the basal dendrites. I think what they do is a form of thermal annealing that is the values are randomly changed a bit.
Hinton proposes signal and time derivative of signal as the duplexed signals.
see also Bengio’s take https://arxiv.org/abs/1502.04156 STDP as gradient descent on a cost function similar to a denoising autoencoder. Authors also discuss biological limitations of backprop and suggest that it can be avoided by propagating “targets” through local training.
You’ll find in “Why Neurons Have Thousands of Synapses, a Theory of Sequence Memory in Neocortex” that top-down signals from other regions of cortex coming in on apical dendrites of pyramidal neurons are theorized to play a similar role as lateral input from basal dendrites in terms of causing NMDA dendritic spikes leading to predictive neuron states. So, in that way, top-down feedback is theorized to influence the predicted state for a column’s own input. Each cortical column is constantly predicting it’s own input and only it’s own input regardless of where exactly that input is coming from. Assuming a common cortical circuit, each cortical column has no way of knowing where it’s input is coming from.
Yes amazing work, DM is able to learn strong representations! Even jointed robot arms. It should be able to do a dog with jointed head, legs, and tail. How all we need to do is associate words with the representations.
If you check out the “three visual streams” paper they have a good temporal predictive cells based on the brain wave timing principle.
They break the wave into PLUS and MINUS phases where the plus phase is the upper layers forming an opinion about the “ground truth” of sensation and the minus phase (at the end) comparing a prediction in the lower layers to this ground truth.
The plumbing involves a pass through part of the pulvinar but that does not materially affect the basic mechanism of using timing of the wave to do temporal prediction.
They do take the outcome of this test and fire it back to the pulvinar to be distributed to other maps that are processing this same stream.
Why am I going on about this?
This particular pulvinar based/predictive mechanism is part of the only plausible scheme that I have seen that accomplishes the long sought goal of a biologically plausible back-prop behavior. If you are interested in this topic you owe it to yourself to do the hard work of reading the paper and references. Some very good stuff going on there.