Blake Richards Ensembles of Neocortical Microcircuits

Blake Richards: Deep Learning with Ensembles of Neocortical Microcircuits ICLR 2018

How does this compare with the learning rules used in HTM?


The global approach of neurology and machine learning hand in hand benefiting from each other is something which ought to resonate here I believe

His differentiation of dendrites between basal and apical is leaning towards the HTM neuron
His Meso scale with different instances is reminiscent of TM “minicolumn” which have different possible bursting cells in different contexts

Now the DL stuff is well over my head, and so are learning equations, but results seem nice.


Thanks gmirey, that makes sense minicolumns == ensembles.

Residual Networks Behave Like Ensembles of Relatively Shallow Networks, NIPS 2016 Spotlight:

What I surprised me the finding he talks about at 24:00. This says that the ratio of bursts to all events is proportional to the the amount of apical depolarization, which, to me, doesn’t make sense in the context of HTM, where I am used to thinking about bursts as indication of prediction error, and apical depolarization as prediction of soon activation in much the same way as distal depolarization. Blake on the other hand suggests that this is used to distinguish bottom-up and top-down signals (latter used for credit assignment, not as noise reducing context bias as in HTM) in a way they can be processed in parallel.

As for the learning rules, in the talk, they are basing their model on difference target propagation with some convex nudges in the output layer, inhibitory SST cells, and comparing it with backprop. I am struggling to see where does HTM do anything remotely close to changing permanences so as to minimize any kind of loss function (comparing to local or any other targets).

But overall great talk full of interesting ideas, some of which some may be used in a future iteration of HTM.

1 Like

What is called minicolumn bursting in HTM is really to be distinguished from that bursting of cell here. Ceĺl burst means that it spikes at very high frequency, and is also assumed to happen in HTM right when a cell predicted something and gets activated. This cell bursting in HTM prevents the minicolumnar “bursting” (the latest meaning that all cells of a minicolumn are firing, and is indeed an unfortunate name imho)

Now for the other differences between the two models I don’t find it surprising. What’s interesting is that they try to take into account same clues of neurological research. Which are not yet understood in their fullest anyway.


I see, so the same name for almost the exact opposite - nice.

Currently, I believe the correctly predicted cells in TM are used in union / column / temporal pooling (learning, to be specific). So these ideas might inform / inspire further developing the TP (L2) layer.

I vote for renaming it from “burst” to Shakkahou :stuck_out_tongue:


This is ignoring the fact that L23 cells don’t have plateau potentials [1].

[1] M. E. Larkum, J. Waters, B. Sakmann, and F. Helmchen, “Dendritic spikes in apical dendrites of neocortical layer 2/3 pyramidal neurons.,” J. Neurosci., vol. 27, no. 34, pp. 8999–9008, Aug. 2007.

What I picked up on was the use of “feedback alignment” as the mechanism for deep learning. Though he declined to elaborate because they don’t really know how that works. Sometimes it is reported to work well for deep learning, sometimes not.
I would suggest it is a form of unsupervised feature learning despite being used in a supervised setting.

Also this money making concern seems to be using unsupervised feature learning in compounding fashion, where further features are learned from already learned features:

Unfortunately the company name Neurala is very like the name of a famous product in Europe:

1 Like

The big question is who “provides” the feedback. All those DL-centric things are assuming unbounded energy, perfect reliability and perfect precision. IMHO that is only another epicycle…

1 Like

It seems Neurala has found a way to fuse associative memory with deep neural networks, rather than what I said. They are claiming very short training times. The critical question is if the nets can retain the capacity to generalize well.

So, what are the main differences between this and HTM?

1 Like

Could anyone of you, @jhawkins, @subutai or @rhyolight, address this question?

I think that in HTM it is fair to say that “surprise” forms the training signal. If the thing being perceived is not recognized it is signaled by bursting. You could say that bursting is a signal that “I did not know what it is” that I am sensing and to trigger learning if we have learning turned on. If we don’t - then it is just an anomaly compared to my prior training.

This surprise is perfectly aligned with the sensed signal as whatever HTM assembly is processing either recognizes it or is surprised and then proceeds to learn “it” on whatever terms it is using to sense “it.”

Contrast that with some form of “feedback alignment” where you have to align something; with HTM it is automatic. This alignment between error and memory store is one of the thorny problems in deep learning.

1 Like

I agree 100%. I think that not only in HTM but in general, bursting is the Gordian knot.