Physical models typically assume time-independent interactions, whereas neural networks and machine learning incorporate interactions that function as adjustable parameters. Here we demonstrate a new type of abundant cooperative nonlinear dynamics where learning is attributed solely to the nodes, instead of the network links which their number is significantly larger. The nodal, neuronal, fast adaptation follows its relative anisotropic (dendritic) input timings, as indicated experimentally, similarly to the slow learning mechanism currently attributed to the links, synapses. It represents a non-local learning rule, where effectively many incoming links to a node concurrently undergo the same adaptation. The network dynamics is now counterintuitively governed by the weak links, which previously were assumed to be insignificant. This cooperative nonlinear dynamic adaptation presents a self-controlled mechanism to prevent divergence or vanishing of the learning parameters, as opposed to learning by links, and also supports self-oscillations of the effective learning parameters. It hints on a hierarchical computational complexity of nodes, following their number of anisotropic inputs and opens new horizons for advanced deep learning algorithms and artificial intelligence based applications, as well as a new mechanism for enhanced and fast learning by neural networks.
Is HTM being lumped into the broad “neural networks and machine learning” category by implications here?
The T of HTM is all about temporal processing. Phase to phase signaling does temporal processing in HTM networks. There is reciprical signaling via the central body & dendrite to force leaning in the distant dendrite - a local model driven by neighborhood signaling, both temporal and spatial.
The “three visual streams” model does temporal learning via the plus/minus phase reinforcement in spike timing.
In either case, excluding these (and other temporal models such as RNN) sets up a strawman. I will be happy to base judgment on what the model can do on its own - but it does not stand alone; there are significant existing examples to compare it with.
I’m interested to see if those two paper could be combined together to some extent or if some elements could be integrated into numenta to some extent.
The paper “Adaptive nodes enrich nonlinear cooperative learning beyond traditional adaptation by links” demonstrates (in-vitro) that dendrites learn. This is a departure from the usual models where the synapses actively exhibit Hebbian learning and the dendrites are passive devices which gather synaptic input in a static manner. The paper finds that dendrites exhibit Hebbian learning which persists for at least a few minutes.
Further work is needed to fully characterize the learning effects in the dendrites.
I can see a few possible benefits of dendritic learning as it’s described:
It could ‘prime’ dendrites which are used often. This would be useful if, after a dendrite has successfully detected something once, it is likely to see that same thing again in the near future.
It could block off dendrites which are making false-positive predictions (depolarized segments in inactive neurons). Numenta’s HTM prunes away these segments, but dendritic learning could suppress these segments without removing them.
Another explanation for the dendrite learning which this paper found is that it ‘tunes’ the dendrites such that the dendrite’s predictions to arrive at the soma at precisely the same time as the neuron initiates an AP.
The paper found that a dendrite depolarization which comes before a somatic AP by 2-5 ms causes a weaker dendrite response in the future. A depolarization which follows after an AP by 2-5 ms causes a stronger dendrite response in the future.
My hypothesis is that the dendrite is changing its internal properties with the intent that in the future: ‘late’ depolarizations will happen sooner, and ‘early’ depolarizations will happen later.
A benefit of this hypothesis is that it allows neurons to be more selective about the relative timings of their inputs; within a millisecond time scale as opposed to 100 msec time scale. This idea is referred to in the literature as “synfire chains”.