Why is HTM Ignored by Google DeepMind?

question
applications

#1

Some of you will probably have already seen the paper from Google DeepMind published this week which gives their take on the relationship between AI and Neuroscience.

I guess I wasn’t that surprised not to see HTM feature, since DeepMind is all about conventional machine learning with DNNs, but it did get me thinking about why it wasn’t mentioned at all:

  1. Is it the case that they just don’t know about it?!

  2. Is HTM not that relevant to their objectives (which seem to be mostly about navigating 3D environments, transfer learning and development of DNN technology in general)

  3. Is it that HTM has yet to be proven on any conventional performance measure for machine learning?

Pretty sure someone here has a better idea than me! :face_with_raised_eyebrow:


#2

I think a lot of 1. But there’s a 4 too. It’s a bit similar to 3. But four
is the other disciplines are very interested in a mathematical formalism
that doesn’t seem to lend itself to HTM, though I know some people have
done a lot of work on it. It’s it’s as if you made an amazing toaster that
makes great toast, but no one used it because they couldn’t justify its
operation with maths.


#3

They know about it. Jeff has talked at Google.

(That’s Ray Kurzweil who stands up to ask the first question.)


#4

It’s this. Demis has an old NIPS talk about their relationship to neuroscience (https://www.youtube.com/watch?v=zvBoOF01MY4) that touches on a lot of the same issues as that paper you cited. They respect neuroscience and are interested in drawing ideas from it. But the bottom line is that DeepMind and other AI businesses will use whatever works, so for them, biology is an inspiration rather than a design constraint.

HTM in its current form a) doesn’t work on any significant problem and b) is constrained by considerations other than whether it works (which is, of course, why it doesn’t work).

As for whether it’s relevant to their objectives, animals are very good at things like navigating in 3D and transfer learning so it’s very relevant. But, again, HTM doesn’t work.

(yet).


#5

That’s very interesting, thanks. Especially this:

Do you think that there are less constrained HTM-like approaches that would perform better?


#6

This is an alarmingly shallow and incomplete response - especially from you Jake! I have read over your responses on the forum and have come to respect your insights and contribution here, so I’m really rather aghast at the inaccuracy and near disdain vomited forth by your answer?

Please allow me to summarize what I believe is the “state” of things, and clarify what I believe to be a radically irresponsible response.

Let me start with an analogy:

Let’s say that the task is to come up with a whole inspired solution to heavier-than-air flight. Of course the current context of human innovation is constrained by the paradigm of economics, and so any behavior eliciting technological development is organized by the mandate of making money. That being the case, human beings develop the dirigible (a quick and expedient money-making solution), and while it isn’t truly flying, it is a roughly tolerable solution to the transport of humans across great distances - though the speed suffers, and the susceptibility to wind direction and strength makes it untenable in some situations - and of course their is the danger of explosion.

That is Machine Learning…

Now, along comes the Wright brothers, and let’s say they’re at the point where they’ve discovered the use of wings and cambers to the top surface of wings (thus giving the air going over the wing a longer distance to get to the trailing edge of the wing, than the air passing over the bottom edge - this means that in order to avoid a vacuum the air passing over the top edge must travel faster to arrive at the trailing edge at the same time as the air passing over the bottom edge - ERGO LOWER PRESSURE and lift - Bernoulli’s Theory)

Right away, everyone sees the enormous potential of the development of the wing, and papers are written about it and there is a huge celebratory convergence of understanding that as soon as the whole “plane” is developed it will be the true way to accomplish heavier than air flight.

But it is not yet an entire Plane!

…and so cannot be compared to the fully functional but hapless dirigible!

HTM Theory and the import of it as the eventual context for Machine Intelligence is OBVIOUSLY going to be the inevitable solution - but it is not yet complete, and so again - it cannot be compared fully to Machine Learning!

Jake please forgive my emotional response, again, I totally respect you as a person and as a contributor and knowledgable researcher in your own right. I just cannot abide the laziness of your response…

Anyone can have a bad day, and I’m a perfect example in the fact that I’m offering such a clumsy and inflammatory/inconsiderate response - but this tech is too important to let statements like that go unanswered…

Again, please forgive me…

Cheers,
David


#7

I would clarify to say that HTM doesn’t currently solve any problems that can’t be solved by current Weak AI methods. That’s the way I read it. I took no offense.


#8

Hey Buddy,

Understood - but the issue isn’t whether you or I took offense, but how HTM Theory is presented to new comers who have no context with which to interpret Jake’s response… IMHO…?


#9

I wonder if this is also why we don’t see grasshoppers flying commercial airplanes yet :smiley: (reference to this study)

(sorry for the off-topic comment… thought I would lighten the mood of this thread a little. I still LOL every time I read the first line of that paper: “Locusts were flown in a flight simulator…”)


#10

Very true! I’d never claim to be any good at marketing. I should have clarified that I’m essentially all-in on HTM and I do expect something that evolves from HTM ideas to be the ultimate solution. But with that said, we should be careful not to fool ourselves or others into thinking the technology is anything more than preliminary research (although promising!) in its current state.

Sorry for the bluntness. I’m confident enough in the value of the basic ideas behind HTM that I’ve spent a lot of time investigating ways to use them in my own research (robot perception and navigation), and the technology is really not there yet. It, to be blunt, doesn’t work.

We can talk about why (the spatial pooler is an unstable learner and highly dependent on sample ordering, the resource demands are massive for a region of useful size and the cache coherency of the algorithm is unavoidably terrible, and the temporal memory is a pattern separator that scrambles all of the information in a sequence beyond recognition) and we can discuss ways to improve it (hippocampal replay, dedicated hardware, temporal pooling), but this is the reason why people don’t pay attention to it in the mainstream yet. Use it to solve a 3D maze game in fewer than a million timesteps and DeepMind will start paying attention.

Again, sorry for the tone. It was meant to be deadpan humor. I’m confident in the potential of HTM-like approaches and as interested in improving the technology as anyone.


#11

In two words, deep learning. It’s kind of a taboo topic on this forum, but it shouldn’t be. It works. It’s also not that far from HTM. It really doesn’t deserve all the fear and derision that it gets from the HTM community.

Here’s what I have experienced as the one difference between deep learning and HTM.

It’s not online learning. Deep nets do this quite well by using a bastardized interpretation of hippocampal replay.

It’s actually not one-shot learning either. HTM does this by forming sparse connections. Gradient descent learning has to do this by running a bunch of training iterations per input sample, but both can do it equally well (actually backprop is better).

It’s not sequence learning, or multiple predictions, temporal hierarchy, efference copy for self predictions, or temporal-memory style context splitting. Recurrent deep networks have been used to do all this stuff.

It’s sparsity. Without sparsity of population activity, sparsity of connections, and independently plastic connection sites on neurons, you will catastrophically forget as a result of your dense gradient descent updates. No matter how big your replay minibatches are, they can’t be big enough to replay your entire life.

So I’m actually a big fan of deep learning, and even learning by gradient descent. But the magic of HTM that hasn’t been fully exploited in mainstream machine learning is the heavy application of sparsity in many forms, which will enable true online one-shot learning for the first time in machines.


#12

I really don’t want Deep Learning to be a taboo topic here. I only want to keep those conversations in #other-topics and not #htm-theory. Please, don’t be afraid to bring up DL topics anywhere you like. But I may move them.


#13

Skipping over 90% of the above, I can add one data point. A few weeks ago I was talking after a meeting with a someone from DeepMind about my proposed nucleic acid-based short-term memory for the cell (see off-topic post here), to give an idea of the level of rapport, and went on to describe HTM in my layman’s terms vis-a-vis ideas for living intelligence in another post here. His curiosity was piqued, and he asked me to send a link to HTM, then I remembered - it’s Numenta - and he said oh in a way that implied it was not new, and we went on to another topic. I wish I had asked more. So I think it is known, and has been discussed at least.


#14

Are you saying that Deep Learning can be an “online” learner without training?

Does this mean that “labelled training data” in huge amounts isn’t needed in a huge preparatory step? And that once trained, the network can be used for solving solutions outside of that trained problem domain, like HTMs can be?

I think this greatly simplifies and understates the difference where an HTM can be “reused” for completely different problem domains and can totally “re-write” itself in realtime to handle a different problem?

HTMs don’t require pre-labelled and painstakingly acquired training data - they learn on the problem itself, so to speak…

I don’t ask these questions to be “argumentative” just kind of restating the advantages that make it worth working out the resource consumption issues and size related processing issues which are present in HTMs which are sized large enough to do “real” useful tasks? I think the “magic” of HTMs sets it extremely far apart from DL networks in reality? I just think we need to be patient and eventually the rough spots will be worked out?


#15

Oh and the most crucial issue to me is the fact that DL’s rate of innovation can’t keep pace with HTMs because there is no guarantee that the technology can be continuously extended until general AI is reached. Biological solutions have a roadmap, and as long as the rigor of biological constraint is dutifully kept, there is a full expectation of eventually making it to the end-goal of Strong AI.

This doesn’t mean we shouldn’t use DL, but that there should be more emphasis on grooming academic talent toward the development of HTMs or other biological strategies?


#16

I think there’s actually a lot of overlap between how HTM and DL work, at least if you just look at the spatial pooler.

Compare this explanation of deep learning to this explanation of SDRs. In both cases, neurons are partitioning a possibility space and labeling a particular subspace as the space of recognized patterns. The difference is that DL uses linear algebra and topology while HTM uses combinatorics and binary vectors.

Then, take into account the fact that DL is starting to use less and less precision for synapses (DL has been one of the major reasons that AMD and nVidia are supporting half-precision floating point, and I think Google’s TPUs might even be using 8-bit values), and the fact that modern DL often uses sparse vectors, and I think it goes to show that HTM and DL have quite a bit in common.

The differences now are that HTM takes low-precision synapses to an extreme, and that HTM has temporal pooling and feedback while DL does not. Sure, DL has backprop, but that’s just what you need to do its topological transforms efficiently. I’m quite sure that you could swap out the learning algorithm for something else if you changed the way it partitions the input space, and still wind up with a very efficient learning algorithm.

That said, there is another big difference between HTM and DL; reinforcement learning. Currently, if you want to train a DL network to do a useful task, it’s very easy. If you want to train HTM to do the same thing, you have very few options. I think what HTM needs is some form of reinforcement learning. There’s plenty of evidence that it occurs in the brain; we just need to find out how it works. I’ve suggested a potential starting place before, so if anyone wants to try to implement reinforcement learning in HTM, while this may not be a perfectly biologically accurate method for reinforcement learning in HTM, but I think it would be a good place to start.

I would be working on this myself, but I’m too busy right now. I’ve read a lot about HTM, and made a few small toy implementations in the past, but I haven’t had time to get into NuPIC at all, so I’m not sure what it would take to do this with the current framework. If no one else is willing to try, I’ll probably get around to it in a few months. My main side project right now is an experimental compiler, so I’ll need some code to test it on anyway.

TL;DR: HTM and DL are very similar when you ignore the temporal pooler, and are mostly solving the same kind of classification problem. The main advantage DL has is reinforcement learning, making it very easy to train DL networks to do useful things. I’ve suggested a biologically-plausible way to implement this in HTM before, but lack the time to try it now. If anyone else wants, feel free to try to implement it, as it’s probably a good place to start.


#17

No, I’m saying a deep network can be trained online, just like HTM is. See Deep Q-Networks for example. The form of experience replay they use may be distasteful if you want biological plausibility, but the fact that it successfully learns online is beyond doubt.

There’s a machine learning paradigm called “unsupervised learning”, in which you don’t need labels on your training data. Autoencoders, generative adversarial networks, self-supervised learning in the form of prediction. These are all ways to do machine learning without labelled training data. And then there’s reinforcement learning, as Charles points out.

A lot of people have demonstrated transfer learning in deep networks already, including myself, in which you use a network that was trained for a particular task, i.e. ImageNet classification, and use it for a new task, like robot navigation, with or without fine-tuning the weights.

And you seem to be suggesting that HTM is somehow better at this? The brain is better at this, sure. HTM in its current state absolutely does not in any way improve over the state of the art in transfer learning. This would be a big deal and there would be Nature papers to read if it did.

See my comments above. These properties are far from unique to HTM. And you greatly overstate how well HTM, in its current state, can actually do them.

Of course. It’s a work in progress. But a developing technology has never been done any favors by overstating its capabilities. That’s how you overhype things and alienate people who take very seriously the kind of claims you’re making.

There is no guarantee that DL can “keep pace” with HTMs, but there is also no guarantee that HTM is a correct theory of the neocortex. Furthermore there’s a reasonable possibility that, by ignoring the constraints of biology, machine learning will far outpace bio-inspired solutions. It would be an act of pure unjustified faith to claim otherwise. Obviously I think a lot of inspiration can be found in the brain. But it’s unquestionably not the only way to solve intelligence, because there exist an infinite number of equivalent algorithms to solve any particular problem (proof left as an exercise for the reader).

I’ve heard from the HTM community a lot of criticism of machine learning, and of deep learning in particular. I’m all for scientific criticism, but unfortunately the specific criticism tends to betray a shallow understanding of the field and the technology. I do recommend going out and implementing solutions to real problems to evaluate the merits of these different technologies. The hype over any particular idea can safely be ignored in favor of real performance on real problems.

After all this I feel the need to reiterate my commitment to HTM. I think forming sparse connections on independently plastic connection sites to sparse population activity patterns is going to be the key. So HTM is going in the right direction. But claiming anything more grandiose at this time is just marketing, and therefore can and should be ignored.


#18

I’m not a trained data scientist, and you have given me some specific areas in which I can do my own research, thank you! :slight_smile:

I’m not sure what “experience replay” is in any depth, but a cursory survey of the topic doesn’t quite imply the same autonomy as biological network plasticity? https://arxiv.org/abs/1511.05952

I’ve heard of supervised learning, but my take on it is that it is very much dependent on situational assumptions within specific boundaries? And that “online” as meant by classical learners is a bit different?

I haven’t made any claims that haven’t been made by Jeff or representatives from Numenta (that’s where I got them from :slight_smile:) ? If my statements are exaggerated to mean perfect performance when I say HTMs can learn a different problem paradigm by just being presented with the data - than yes - but of course I never implied that the technology was complete and perfect in its performance?

If this were true, than we would be a lot further along in the 70 years we’ve been up to this, and there would have been no AI winter? Of course, there is the possibility that we can “invent” intelligence, but the evidence so far suggests that development is very slow if not stuttering in its expansion.

I’m not an anti-DL person, but I definitely think that HTM technology is the underdog not DL in terms of public sentiment. Personally I have simply been repeating what I have heard are the shortcomings of DL (and that is only to point out the advantages of HTM research), - and I definitely think that the academic environment acts like weeds killing off anything that opposes it - and that DL hasn’t received any bullying from HTM advocates and that it predominantly is the other way around.

I wish that I had investigated DL and classic NNs before I had heard about HTMs because it is difficult to devote any serious time to a curriculum that to me represents the past - it’s just hard to generate the necessary enthusiasm to learn technologies that don’t in my mind represent the optimum or ideal. I have been within a hair’s distance from signing up for some courses - and maybe I will so that I can get up to speed on the current state - we’ll see.

Again, I phrase my statements in question form (If you look at the original sentences you quoted), because I acknowledge that I am not a data scientist and want to actually learn with more granularity what the differences are - but I did do a quick cursory glance at the networks you mentioned and there are differences which imply more manual intervention (I think), than what is talked about in an HTM context?

Anyway we’re all here because we are advocates of the potential of HTMs, true.


#19

I’m not sure what you mean, but experience replay works as follows. First a bit of background.

In both the brain, and in deep networks, learning is not immediate. HTM abstracts the forming and strengthening of synapses into a single step process, but it’s actually a multiple-step chain of biochemical interactions that requires many repetitions of the pattern you want to learn. So each time a neuron sees a pattern, its connections are only updated by a small amount, and many repetitions are required to fully solidify the pattern detector. Deep networks are trained in an analogous way, where each presentation of a pattern only updates the synapses a small amount.

So to actually learn anything, you need to replay your experiences many times. In the brain, this is believed to be done in the hippocampus, which is capable of very fast one-shot learning, but does not store patterns for long timescales. As you sleep, and at other relevant times, the hippocampus replays your experiences and gradually consolidates those patterns more permanently in parts of the brain such as the neocortex.

In deep networks, this is done by keeping a buffer of past experiences during training time, and randomly sampling from this buffer for each training update. This is done for two reasons. Firstly, learning is achieved through small changes so you need to see each experience multiple times to sufficiently adjust your connections. Second, in order to avoid biasing your network toward only recent experiences and forgetting the past (“catastrophic forgetting”), you need to sample evenly across your past experiences.

That’s experience replay. It’s not “part” of the network and it’s not used at evaluation time, it’s just a training-time mechanism for evenly sampling the possible things you can learn.

Do you mean unsupervised learning? In supervised learning, you are mapping inputs to known outputs, like classifying digits. In unsupervised learning, you are discovering structure in the input data, and encoding it with useful representations, like an SDR encoder or the spatial pooler. And online learning just means you’re processing an ongoing input stream of indefinite length, and updating your model as you go.

Machine learning is capable of generalizing remarkably well, but it always depends on the training data you give it. There’s no magic algorithm that can learn things it wasn’t shown. The same is true of HTM, and the brain.

I strongly disagree with this. Deep learning as we see it today was mostly invented around 1980. The problem is not with engineering approaches to AI, but we just didn’t have fast computers and tons of training data back then. And before anyone complains about the huge amount of training data required, take a second to think about how much training data humans have. There are 30 million seconds in a year, and if we assume we process input at 10Hz or more, that’s hundreds of millions of training examples per year, and humans don’t become useful until after more than a decade of training.

If you think development in AI is very slow right now, then I’m not sure what to tell you. Look at a newspaper? And almost none of the researchers driving this development are neuroscientists right now, many don’t know or care much at all about the brain. I think the brain is a useful source of inspiration. But it is absolutely not certain that it’s the only way, or even necessarily the best way, to make working AI systems.

That kind of thing in academia can seem like bullying. We call it rational criticism, and it’s the reason science works. It provides incentive to prove it beyond a doubt, and that’s a healthy goal for all developing technologies. No one can argue with you when your technology is beating all the competition. Until then, more work is needed!

I highly recommend doing that. I’ve obtained dozens of useful insights in the process of understanding how deep learning works. Check out the universal encoder I posted a while ago. It’s written in TensorFlow which is powerful enough to build any deep network you want, and my code should be easy to understand.

It’s very hard for me to understand how the very cutting edge of AI technology, beating every benchmark it’s been tried on, solving problems computer scientists wouldn’t have dreamed were possible to solve even ten years ago, can represent the past. I can only assume that sentiment is coming from inexperience. Definitely have a look at the technology. Play around with it. And take what you learn and use it to improve HTM!


#20

Yes I mistyped that. I meant to say “Unsupervised”. But I’m not talking about a “magic” algorithm, just that the network will start to learn changes in the input data and then start predicting that. Learning a new problem with HTMs doesn’t depend on a magic bullet training set that anticipates all future problem domains it might encounter, they merely start learning the current data being introduced. What in DL land can learn new data without a properly prepared training set? This is the way HTMs work - maybe not perfectly and yes maybe not accomplishing much given the processing constraint, but they “constantly” learn. Is there a DL circuit that does that? This is the major advantage that Jeff Hawkins talks about with HTMs, isn’t it? I did do some reading on the DQNs you referred to and some others, but I haven’t used these of course.

I don’t have the background to say definitively what the differences are (other than the obvious structural differences), but I also don’t absolutely trust that there is absolute parity between the capabilities of DL tech and HTMs, as I’m sure Jeff Hawkins wouldn’t be propounding HTMs as an advancement on the current state of ML technology in terms of the general applicability, online learning advantages.

But I am grateful that I can talk to Siri, and other NLP advancements, but you won’t get me in a self driving car in human traffic in the near future, I can tell you that! :stuck_out_tongue: