Why is HTM Ignored by Google DeepMind?

As the number of learned tasks goes to infinity, the number of important weights (and therefore the sparsity of gradient updates) approaches 100%. I stand by my claim.

The immediate advantage seems clear. If I memorize a new transition by creating a distal segment in an HTM network, then I haven’t forgotten any previously learned transitions. This is not commonly done in deep learning, although there are examples of similar things ([1606.04460] Model-Free Episodic Control, [1703.01988] Neural Episodic Control).

I don’t have any mature ideas on specific research to do in this direction to imbue deep nets with the sparsity advantages of HTM, but perhaps taking an approach similar to Hinton’s Capsules in which there is a winner-take-all routing process that decides which subnetworks to train for each example and leaves the others alone. That sort of thing should mitigate catastrophic forgetting, encourage modular representations and specialization, and could facilitate generalization across these learned modules if combined with techniques for pruning and consolidation.

1 Like