Why is HTM Ignored by Google DeepMind?

I’m not a trained data scientist, and you have given me some specific areas in which I can do my own research, thank you! :slight_smile:

I’m not sure what “experience replay” is in any depth, but a cursory survey of the topic doesn’t quite imply the same autonomy as biological network plasticity? [1511.05952] Prioritized Experience Replay

I’ve heard of supervised learning, but my take on it is that it is very much dependent on situational assumptions within specific boundaries? And that “online” as meant by classical learners is a bit different?

I haven’t made any claims that haven’t been made by Jeff or representatives from Numenta (that’s where I got them from :slight_smile:) ? If my statements are exaggerated to mean perfect performance when I say HTMs can learn a different problem paradigm by just being presented with the data - than yes - but of course I never implied that the technology was complete and perfect in its performance?

If this were true, than we would be a lot further along in the 70 years we’ve been up to this, and there would have been no AI winter? Of course, there is the possibility that we can “invent” intelligence, but the evidence so far suggests that development is very slow if not stuttering in its expansion.

I’m not an anti-DL person, but I definitely think that HTM technology is the underdog not DL in terms of public sentiment. Personally I have simply been repeating what I have heard are the shortcomings of DL (and that is only to point out the advantages of HTM research), - and I definitely think that the academic environment acts like weeds killing off anything that opposes it - and that DL hasn’t received any bullying from HTM advocates and that it predominantly is the other way around.

I wish that I had investigated DL and classic NNs before I had heard about HTMs because it is difficult to devote any serious time to a curriculum that to me represents the past - it’s just hard to generate the necessary enthusiasm to learn technologies that don’t in my mind represent the optimum or ideal. I have been within a hair’s distance from signing up for some courses - and maybe I will so that I can get up to speed on the current state - we’ll see.

Again, I phrase my statements in question form (If you look at the original sentences you quoted), because I acknowledge that I am not a data scientist and want to actually learn with more granularity what the differences are - but I did do a quick cursory glance at the networks you mentioned and there are differences which imply more manual intervention (I think), than what is talked about in an HTM context?

Anyway we’re all here because we are advocates of the potential of HTMs, true.

I’m not sure what you mean, but experience replay works as follows. First a bit of background.

In both the brain, and in deep networks, learning is not immediate. HTM abstracts the forming and strengthening of synapses into a single step process, but it’s actually a multiple-step chain of biochemical interactions that requires many repetitions of the pattern you want to learn. So each time a neuron sees a pattern, its connections are only updated by a small amount, and many repetitions are required to fully solidify the pattern detector. Deep networks are trained in an analogous way, where each presentation of a pattern only updates the synapses a small amount.

So to actually learn anything, you need to replay your experiences many times. In the brain, this is believed to be done in the hippocampus, which is capable of very fast one-shot learning, but does not store patterns for long timescales. As you sleep, and at other relevant times, the hippocampus replays your experiences and gradually consolidates those patterns more permanently in parts of the brain such as the neocortex.

In deep networks, this is done by keeping a buffer of past experiences during training time, and randomly sampling from this buffer for each training update. This is done for two reasons. Firstly, learning is achieved through small changes so you need to see each experience multiple times to sufficiently adjust your connections. Second, in order to avoid biasing your network toward only recent experiences and forgetting the past (“catastrophic forgetting”), you need to sample evenly across your past experiences.

That’s experience replay. It’s not “part” of the network and it’s not used at evaluation time, it’s just a training-time mechanism for evenly sampling the possible things you can learn.

Do you mean unsupervised learning? In supervised learning, you are mapping inputs to known outputs, like classifying digits. In unsupervised learning, you are discovering structure in the input data, and encoding it with useful representations, like an SDR encoder or the spatial pooler. And online learning just means you’re processing an ongoing input stream of indefinite length, and updating your model as you go.

Machine learning is capable of generalizing remarkably well, but it always depends on the training data you give it. There’s no magic algorithm that can learn things it wasn’t shown. The same is true of HTM, and the brain.

I strongly disagree with this. Deep learning as we see it today was mostly invented around 1980. The problem is not with engineering approaches to AI, but we just didn’t have fast computers and tons of training data back then. And before anyone complains about the huge amount of training data required, take a second to think about how much training data humans have. There are 30 million seconds in a year, and if we assume we process input at 10Hz or more, that’s hundreds of millions of training examples per year, and humans don’t become useful until after more than a decade of training.

If you think development in AI is very slow right now, then I’m not sure what to tell you. Look at a newspaper? And almost none of the researchers driving this development are neuroscientists right now, many don’t know or care much at all about the brain. I think the brain is a useful source of inspiration. But it is absolutely not certain that it’s the only way, or even necessarily the best way, to make working AI systems.

That kind of thing in academia can seem like bullying. We call it rational criticism, and it’s the reason science works. It provides incentive to prove it beyond a doubt, and that’s a healthy goal for all developing technologies. No one can argue with you when your technology is beating all the competition. Until then, more work is needed!

I highly recommend doing that. I’ve obtained dozens of useful insights in the process of understanding how deep learning works. Check out the universal encoder I posted a while ago. It’s written in TensorFlow which is powerful enough to build any deep network you want, and my code should be easy to understand.

It’s very hard for me to understand how the very cutting edge of AI technology, beating every benchmark it’s been tried on, solving problems computer scientists wouldn’t have dreamed were possible to solve even ten years ago, can represent the past. I can only assume that sentiment is coming from inexperience. Definitely have a look at the technology. Play around with it. And take what you learn and use it to improve HTM!

1 Like

Yes I mistyped that. I meant to say “Unsupervised”. But I’m not talking about a “magic” algorithm, just that the network will start to learn changes in the input data and then start predicting that. Learning a new problem with HTMs doesn’t depend on a magic bullet training set that anticipates all future problem domains it might encounter, they merely start learning the current data being introduced. What in DL land can learn new data without a properly prepared training set? This is the way HTMs work - maybe not perfectly and yes maybe not accomplishing much given the processing constraint, but they “constantly” learn. Is there a DL circuit that does that? This is the major advantage that Jeff Hawkins talks about with HTMs, isn’t it? I did do some reading on the DQNs you referred to and some others, but I haven’t used these of course.

I don’t have the background to say definitively what the differences are (other than the obvious structural differences), but I also don’t absolutely trust that there is absolute parity between the capabilities of DL tech and HTMs, as I’m sure Jeff Hawkins wouldn’t be propounding HTMs as an advancement on the current state of ML technology in terms of the general applicability, online learning advantages.

But I am grateful that I can talk to Siri, and other NLP advancements, but you won’t get me in a self driving car in human traffic in the near future, I can tell you that! :stuck_out_tongue:

This is the essence of all online learning algorithms, DL and HTM included. The training set is the continuously arriving input stream. When the statistics of the input stream change, the model adjusts to fit the new patterns. DQN is one example of learning online*, and if the task domain changes (let’s say you change how rewards are delivered) then the model will adapt to that. In fact, this is the concept behind a technique in RL called “reward shaping”, where you give the network an easier task to solve at first in order to learn useful preliminary behaviors, and then progressively increase the difficulty of the tasks in order to coax it into learning more complex skills.

So HTM has no monopoly on adapting to a changing world. It does it in a much more biologically appealing way, certainly, and as a result it has the potential to explain the brain, which most deep networks do not. But the capability to adapt is not beyond mainstream machine learning. As I mentioned above however, HTM has a silver bullet that should in principle help it adapt more quickly to a changing world, which is the sparsity of its learning updates. So each HTM synaptic update is less likely to interfere with the previous things you’ve learned than would a dense update in a standard deep network, and you can be more aggressive with your updates as a result, therefore learning faster.

I really do view the benefits of sparsity as the sole performance advantage of HTM. But that advantage is beyond massive, so it is certainly worth exploiting.

(*DQN is a bit of a special case because of experience replay: when the task changes the replay buffer will get stale, full of data about the world as it used to be, rather than it is now. A better example would be A3C, the new-ish hotness in reinforcement learning, which does on-policy updates using the present experience instead of a replay buffer, in a much more “online” way.)

1 Like

Really enjoying this discussion, and very useful to me as a relative newcomer to HTM. Thank you!

I just wanted to pick up on something that @Charles_Rosenbauer mentioned in an earlier post:

Does anyone have any thoughts about this, or how we might approach a solution?

… that is, in addition to this post by @Charles_Rosenbauer which deals with some of the biological aspects.

This is one area I have been exploring recently, and there are others on the forum who are actively working on it as well. To me this is the obvious next function to understand after sensory-motor integration (or rather part of it IMO), so I wouldn’t be surprised if Numenta begins tackling it themselves in the not too distant future. You can follow my project on this thread. Lately I’ve I got a little side-tracked writing my own implementation of semantic folding to try and understand that concept better, but I’ll be getting back to RL soon.

2 Likes

“A significant component of the DQN training algorithm is a mechanism called experience replay [5]. Transitions experienced from interacting with the environment are stored in the experience replay memory. These transitions are then uniformly sampled from to train on in an offline manner. From a theoretical standpoint this breaks the strong temporal correlations that would affect learning online.”-torch Dueling Deep Q-Networks

That does not appear to be online.

“In both the brain, and in deep networks, learning is not immediate. HTM abstracts the forming and strengthening of synapses into a single step process, but it’s actually a multiple-step chain of biochemical interactions that requires many repetitions of the pattern you want to learn. So each time a neuron sees a pattern, its connections are only updated by a small amount, and many repetitions are required to fully solidify the pattern detector. Deep networks are trained in an analogous way, where each presentation of a pattern only updates the synapses a small amount.”

Within seconds brain tissue experiences structural changes that can affect performance. Even without a hyppocampus short term memory works, and learning can occur, it is just that it is quickly forgotten and not permanent. The issue with the need for the hippocampus probably has to do with metaplasticity rules in the brain. If the higher areas have neurons active over longer periods of time as compared to areas lower in the hierarchy, they must have different requirements to make changes to permanence. Metaplasticity can supposedly solve catastrophic forgetting and drastically improve the memory capacity of a neural system.

As regards online learning, the brain can learn while being active in the environment and quickly even within seconds change and adapt to novel information, without being taken offline, that is it can experience changes even drastic ones without interrupting waking activity. True some cases it takes time to improve performance, but in the simpler examples even drastic performance is possible in seconds.

“There are 30 million seconds in a year, and if we assume we process input at 10Hz or more, that’s hundreds of millions of training examples per year, and humans don’t become useful until after more than a decade of training.”

Those are not millions of unique labelled data, a baby may spends most hours asleep, the few hours awake it may spend lots of time looking at one or two toys perhaps even a blank wall. The number of unique voices and sentences can be quite limited. Yet in a few years it will eclipse most anything, and some can even do advanced mathematics and multiple languages.

1 Like

As far as I know, @Paul_Lamb and me are working on it for quite some time among the HTM forum. You can just click on his name and check out the discussions he is mainly involved in :slight_smile: Other than that there are the works of Otahal [1] and Gomez [2] which involve coupling HTM with reinforcement learning. So we are hopefully getting there :slight_smile:

I will present my MS thesis in 10 days named “Hierarchical Temporal Memory Based Autonomous Agent For Partially Observable Video Game Environments”. A real-time and online HTM architecture combined with TD(Lambda) and directed by the research on computational models of basal ganglia. It can solve simple navigation tasks in a 3D video game environment with some actual results via its visual sensor. I will for sure share it here when it is presented.

[1] https://dspace.cvut.cz/bitstream/handle/10467/21143/F3-DP-2014-Otahal-Marek-prace.pdf
[2] http://studentnet.cs.manchester.ac.uk/resources/library/3rd-year-projects/2016/antonio.sanchezgomez.pdf

5 Likes

I can’t wait to see it!

3 Likes

It works, right? I mean many people above have emphasized on more practical reasons but even if you neglect the pragmatics, HTM is capable of results even at this stage. When compared to cutting edge machine learning techniques, HTM is supposed to perform worse at a single problem but also supposed to perform with considerably great accuracy over multiple types of streaming data, inference types and problems without any parameter tweaks to its overall model, which I believe ML cannot do; HTM can, even at this point. Again, not hating on ML, even though I’d like to, because it gets marketed as AI. I felt as though you are not being objective and fair about this.
I also feel as though some noteworthy properties of HTM are singularly added to ML algorithms and they do work there. Not saying that those ideas are taken from HTM but that those principles are already working in HTM.
Apologies if it seems rude. I think it deserves clarification.

I see a lot of people saying that HTM doesn’t work, would anyone care to explain why DARPA built their own version in 2015 budget including plans in 2016 for dedicated hardware on those neural chips? Maybe we are getting the civilian version :sweat_smile:

“Develop a hierarchical temporal memory (HTM) algorithm including new data representations, low precision and ability to adapt and scale.”

1 Like

I think the confusion is with the use of the term “works”.

A metaphor would be like if your goal was the commercial transport of > 50 passengers by airliner. Classical ML Techniques would be like propeller airplanes, and HTM would be a Jet engine in development stages (not yet shipped with commercial airliners). Everybody “knows” the potential of Jets however more research and development are needed to make it viable for everyday application.

In HTM’s case, a lot more development is needed, but its development is very accelerated; potential is long-range; and expected to deliver on its promise.

The term “works” is not a valid one. HTM “works” but not to the level of its potential just yet. But it’s that GREAT potential that thrusts it into comparison with techniques that are more applicable at the time.

It’s like someone has the idea of a “wheel” and immediately you can see the (conceptual) improvement over “skids”, but then someone asks immediately if it works! Well you first have to make the thing before you can judge that! :slight_smile: HTM is more “developed” than that, but the same mechanism applies.

To me it seems there are two differences between neural networks and HTM the learning rule and data representation. Neural networks programmed by back propagation require many small changes to the weights because we do not know the changes to the neighboring weights will impact the weight we want to change we only know the sign of the needed change. In HTM we know the needed change.

The representation in HTM is doubly sparse and less shared, in neural networks more inter-meshed not clear to me how sparse.

Jake, sparsity has been tried in a simple neural network to overcome catastrophic forgetting [1], and later it was shown to be not very effective, compared to other methods [2]. What is different about HTM that would make it work, in your opinion?

To me, approaches like [3] seem attractive, because they might eventually lead to complex hierarchical systems of diverse models, each of which would be trained in a traditional way to perform a single well defined task, and higher level controllers could be trained (perhaps using RL) to coordinate those “leaf” models and make decisions based on combinations of their outputs.

[1] http://web.eecs.utk.edu/~itamar/Papers/TNNLS_Coop_2013.pdf
[2] [1708.02072] Measuring Catastrophic Forgetting in Neural Networks
[3] [1706.05137] One Model To Learn Them All

1 Like

Backprop is all well and good with the data representation used in “those” models.

With sparse I suggest a different direction - look at how the brain does it - modification of the learning rate.
We know that the amygdala releases learning rate modifiers and I believe that the RAC gates the stream of senses, again, with the effect of greatly enhancing learning rate for novel presentations.
In this case - the bursting drives gating more in that area - in effect - sipping from the firehose.
There are numerous lines pointing to this method in the brain.

One other method to work around the learning rate bottleneck is the fast learning in the hippocampus/replay into the slower learning cortex.

Last but not least - the “three visual streams” paper proposes the the top-down temporal steam does do an effective back-prop learning rate modification as in that direction we do have a local target value to train against.

I just want to clarify that HTM is a neural network. It is a Hebbian NN, which is different from most deep learning systems that use back propagation or bayesian inference. But these are all examples of neural networks.

Perhaps I do not represent the norm in this community, but I am interested in HTM and neuroscience in general for the purposes of understanding the human brain and intelligence itself. There are very fundamental constraints and limitations with classical machine learning (including DNNs) that are underwhelming to me and any other person interested in true AGI.

Human brains are the ultimate general purpose machine. Classical ML, especially with the advent of deep nets for certain tasks, is fantastic at solving highly-constrained and very targeted cornerstone tasks in AI like classification, explicit logic manipulation, playing games, etc. but it cannot even attempt to explain the vast mysterious complexities of creativity, high-level reasoning, cognitive flexibility and so fourth that we humans rely on everyday to operate in an extremely dynamic world. To one day see intelligent agents interacting with the world in a totally autonomous and unconstrained manner, it will require fundamentally different algorithms and thinking strategies. I think it was Einstein who said problems of today cannot be solved with the same thinking that created them. In other words, deep nets cannot be expected to solve the same limitations fundamental of deep nets and other classical ML.

DNNs and classical ML are basically all we’ve got right now in the field of AI. The animal brain is still largely a mystery. HTM and other neuroscience-inspired methodologies are focused for long-term advancement. Business-focused and even some academic-focused individuals and entities are going to pour their effort into achieving success tomorrow and not 50 years from now. That’s fine.

In my view, neuroscience-inspired models are almost like a fork in the roadmap of AI development that seeks to answer more sophisticated questions about intelligence and, if successful, will wrap back around and solve those ultra-targeted classical ML tasks in a similar way that humans do.

3 Likes

Hey Michael, I would argue that different kinds of sparsity have produced some degree of success in avoiding catastrophic forgetting in DNNs. Perhaps most notably Elastic Weight Consolidation in which the weights that are important to previous tasks are identified and frozen when learning future tasks. This is sparsity in gradient updates, although not in activity or pattern matching zones on neurons etc.

Just simply adding sparsity to your DNN won’t solve the problem of course, but research in this direction seems promising.

You should take a look at the second link I posted, where EWC is compared to other methods.

Not really. The number of important weights is tiny (usually 1-2%). If we stop updating them, gradients will remain almost as dense.

If you mean EWC, sure it’s a good idea (in fact its variations have been tried a few times before), but I fail to see the importance of sparsity (e.g. as it was used in the first link I posted). That’s why I’m curious about your belief it’s a big advantage of HTM over DL models.

As the number of learned tasks goes to infinity, the number of important weights (and therefore the sparsity of gradient updates) approaches 100%. I stand by my claim.

The immediate advantage seems clear. If I memorize a new transition by creating a distal segment in an HTM network, then I haven’t forgotten any previously learned transitions. This is not commonly done in deep learning, although there are examples of similar things ([1606.04460] Model-Free Episodic Control, [1703.01988] Neural Episodic Control).

I don’t have any mature ideas on specific research to do in this direction to imbue deep nets with the sparsity advantages of HTM, but perhaps taking an approach similar to Hinton’s Capsules in which there is a winner-take-all routing process that decides which subnetworks to train for each example and leaves the others alone. That sort of thing should mitigate catastrophic forgetting, encourage modular representations and specialization, and could facilitate generalization across these learned modules if combined with techniques for pruning and consolidation.

1 Like