HTM superior to NN?

I’ve always thought HTM was superior to NN in at least one major way and I’d like to know if the community feels the same way or if I’m not thinking about it correctly.

HTM does not need to do any backpropagation because it gets updated as the information flows through the structure the first time.

To me, this seems like a massive feature - that the structure essentially achieves the same ability to update itself, but can do it in real time rather than needing data to flow all the way to the end of the structure then get bounced back all the way to the front.

Not only does this seem like a good feature of the algorithm it also feels like it should be entirely self-evident. Am I thinking about this correctly?

Furthermore, the larger scale HTM structure, the hierarchy of regions does seem to have this ping-ponging of information from the bottom of the hierarchy to the top and back down. But this is, as I understand it, done more primarily to produce appropriate motor behavior, rather than simply adjust weights of connections between regions. I bring this up to make the point that it also seems self-evident that there is a space for this reverberation of information throughout the entire structure, but that it can be used to do so much more than what NNs use it for today.

Am I over-simplifying everything or are these principles in play in about the same way I’ve articulated them?

That’s one way to put it. I’d put it slightly differently (and I think I’ve heard Jeff say a variation on this).

Traditional neural networks are about function approximation, but HTM is about memory.

In function approximation, you want to accumulate many pieces of evidence to find a set of parameters that will generalize to new parts of the state space that you haven’t seen.

In memory systems, you want to memorize the salient details of every piece of evidence, and recall those details that are relevant in new situations.

One is not superior to the other in all cases. I consider them complementary. In the brain, there is evidence that we have similar slow parametric systems (maybe areas like neocortex and cerebellum) and fast memorizing systems (hippocampal regions).

Machine reinforcement learning of the sort that DeepMind and OpenAI do is usually based on slow parametric systems (traditional deep neural networks trained with backprop). But some recent work on episodic control has shown that you can learn much more quickly if you introduce a fast memory-based system to learn new experiences in a one-shot manner.

So I’m not sure whether backprop is an advantage or a disadvantage. I just consider it an implementation detail that allows you to make slow parametric updates. It’s worth noting that the spatial pooler of HTM makes relatively slow parametric updates, but it generally performs worse than it would if it were trained with backpropagation on batches of uncorrelated samples (as Hebbian learning usually does). Depending on your dogma, you may or may not accept that a backprop-like process can happen in the brain.

Either way, I think both types of systems are useful, and a combination of both may ultimately be necessary.


hi Jake
Can I ask if the feedback proces in htm to the soma (I think it is the basal dendrite) is a back prop?

By the way, I could not in Jeff´s papers see if the feedback is done through an axon? What transports the signal?

Rgds Finn

Hey Finn,

I’d be careful to distinguish backpropagation in the mathematics/machine learning sense from backpropagation in the biological sense. In machine learning, what we mean by backpropagation is the calculation of the partial derivatives of the global error signal with respect to the weights at each layer (for the purpose of doing gradient descent learning). In biology it refers to the transfer of electrical potential or chemical transmitters from one part of the cell to another (typically soma back to synapse).

HTM doesn’t explicitly model the transmission of learning signals across a cell. But the idea is the standard one in computational neuroscience: input that causes a cell to fire results in a strengthening of the synapse that conveyed the input. This is generally assumed to be through some kind of backpropagation of cellular messengers or electrical activity, but is not explicitly modeled or specified.

Could you elaborate on it a bit? It’s hard to imagine something like backprop in the brain, and if there are any alternative ideas how to make an efficient parametric system, it would be very interesting to have some clues.

There are people who argue for the plausibility of backprop-like processes in the brain. Recently a couple papers [1][2] formulated some reasonable ideas in this direction.

But you don’t need backprop, you can get slow parametric updates using any kind of Hebbian learning as well. Whether it can adjust weights across entire hierarchies is unknown, but synaptic processes often require multiple presentations of a pattern in order to achieve increasing levels of permanence, and this usually means physically larger synapses as well. Whether that also means varying connection strength is an ongoing debate, and I know the thinking behind HTM assumes that the strength is binary, but this is not yet the consensus.

Anyway, I didn’t mean to strongly imply backprop as a learning rule in the brain, but just that even Hebbian learning processes are thought to be slow and parametric, e.g. the original STDP paper.

See a post I made a while ago (Complementary Learning Systems theory and HTM as a theory of the hippocampus) for details about the parallel parametric/episodic systems, largely based on [3].

[1] Lillicrap TP, Cownden D, Tweed DB, Akerman CJ. Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications. 2016;7:13276. doi:10.1038/ncomms13276.

[2] Huh, Dongsung, and Terrence J. Sejnowski. “Gradient Descent for Spiking Neural Networks.” arXiv preprint arXiv:1706.04698 (2017).

[3] Kumaran, Dharshan, Demis Hassabis, and James L. McClelland. “What learning systems do intelligent agents need? Complementary learning systems theory updated.” Trends in cognitive sciences 20.7 (2016): 512-534.

1 Like

Now I see - you meant some indirect and slow processes, like hippocampus involvement for the transformation short-term memories to long-term ones, or synapsis pruning during sleeping. I just don’t think such operations can be placed in one row with the backprop.

Anyway, thanks for the links and thoughts!

The other huge advantage of HTM is that every one of us carries the evidence that HTM will develop into conscious, intelligent systems eventually. Maybe NN will too, but we know for sure that there is a path from here to there with HTM.

I would say there are only two difference between HTM and NN

  1. the learning rule used
  2. static synapse topology for NN versus growable and killable synapses for HMT

Both assume the building blocks are neurons. Both use a one type only approach. Both will need to move to a more generous pallet of choices as we move along.

HTM uses a structure with mini-column like features. NN just array neurons with connectivity that is not biologically related.

I am looking forward to more focus on the hierarchy in HTM work and in NN using capsules.

I think more in terms of pieces and parts, for example a dog, with a tail at the end of the truck, four legs below the trunk, one head at the other end of the trunk, and so on. I want a vision processing network, NN or otherwise, when shown a three legged dog to say “that is a dog, and by the way it is missing one leg” and confidence is high, rather than the current situation where is says “dog” and confidence is low. Not sure how HTM deals with vision. HTM seems to be working at a higher level of abstraction currently.

Not just the learning rule. HTM requires modeling active dendrites, or dendritic spikes. That is essential.

Yes I should have included that difference. I also hope both approaches will someday use the apical dendrites for layer N+1 to layer N signaling for local learning rules. Requires including neuron types that have apical dendrites.

I believe local learning N and N+1 back and forth and stackable for full end to end propagation will be better than simple end to end back propagation. That is faster, higher learning rate, more flexible.

1 Like