Explain Like I'm 5 - Basics of Numenta Philosophy

A couple of weeks ago I wrote an ELI5 explanation about HTM and Numenta.

I meant to post it here for others to add better insights but I forgot til today. Anyway, here it is for anyone who has more color they’d like to add. And please feel free to correct anything I got wrong.


I think you did a pretty great job!

1 Like

The link is not working. Perhaps you could post it here @jordan.kay ?

1 Like

I’ll do my best to give an ELI5 explanation. I’ll focus on the concepts rather than specifics, thus giving you a context by which to understand the specifics later.

Also, it seems as though I’ve made the post too long so I’ll post the last section (Reasoning by Analogy) as a reply.

HTM, Hierarchical Temporal Memory is version 2.0 of their attempt to reverse engineer the brain. The Cortical Learning Algorithm is what they called it at first, but that wasn’t descriptive enough to mean anything. So since they changed the name, we might benefit by exploring it first of all:


Think about the brain not as a computer, but as a hard drive: think of it first as memory. The brain is, first and foremost a memory structure. The brain is made of neurons. Neurons connect to each other. The entire premise of Numenta is that there is a repeating pattern of neuron connections throughout the brain. This repeating pattern, this circuit is essentially what they’re trying to reverse engineer.

Why would they reverse engineer that circuit? Because it can learn anything, (well it can learn anything you can learn, for it is the thing that does the learning): it’s generally intelligent. The circuit mutates and adapts itself to fit the data it sees much more dynamically than anything we have today.


This memory structure changes over time. We’re very used to computers where instructions are carried out linearly, sequentially. But this is not natural to the world the brain evolved in. In the real world, everything is changing all the time, and nothing waits on anything else.

In fact, just about the only thing that the brain can count on is that everything will change; soon. So the brain uses time as a hook to gain intelligence. It makes predictions - in time. If some coordinating neurons can predict change before it happens then the network rewards them and pays them attention the next time a similar situation arises.


The ultimate result of HTM theory is a memory structure that works together to produce predictions through time. The natural shape of this memory structure is a hierarchy. This is probably best explained by an analogy:

Consider the employees of a large company. Many, many employees handle day to day, even moment to moment operations. Information from customers, vendors and other systems flow into the organization, and these employees typically handle it. They know where to place things on shelves, they know how to deliver the mail, etc.

Then there is middle management. Their job is to handle situations or problems that often take weeks to resolve. Their attention is not taken up by details, so they can oversee several people. Finally, there are very few executives, they make plans, for months and years (long term predictions).

Most information flows into the structure at the bottom of the hierarchy. Here is where the information is most detailed. When employees run into a problem they take it to their manager. Novel patterns flow up the hierarchy until they can be resolved. Those nodes at the top of the hierarchy can see broadly, but don’t see any details. Orders and directions (which contain implicit predictions about the future) flow down the hierarchy.

This hierarchical informational pattern is ubiquitous in intelligent (coordinated) structures, and the brain is no exception. In fact, the brain is probably the best example of this pattern. There are, in some areas, about ten times the number of connections going back as there are feed-forward connections in the brain. Why? Perhaps because higher-level memory structures are seeing the broad context of what’s going on and broadcasting predictions about the future down to lower levels that don’t see such a broad picture.

Notice specific decisions require specific information. Specifics don’t have to travel up the hierarchy unless something happens that your muscle memory isn’t capable of handling. Thus the bottom of the hierarchy is the area for most of the input and most of the output of the structure.

Compare and Contrast

There are, of course, a lot of overlap between the concepts of traditional machine learning and HTM, but I think the most significant part of the paradigm is that it seems to start with slightly different assumptions across the board, almost as if it turns much of ML thought on its head.

In traditional machine learning, spatial pattern recognition was developed first (think CNN). In HTM temporal patterns are primary. In ML, Algorithms are first-class citizens. In HTM, data structures are fundamental (more on SDR’s in a moment). In ML sequential calculations are natural, in HTM hierarchy is the key to efficiency.

Of course, these differences are due mainly to ML evolving out of (inherently sequential) computational paradigms while HTM considers forces that have shaped the (inherently parallel, or distributed) structure of the brain. As our computers have become more parallel and the need for computation over and across networks has become more ubiquitous we’ve begun to see aspects of the brain (distributed) paradigm appear more and more in ML technologies.


Perhaps the most fundamental concept in HTM theory, (besides the general ideas expressed by the name itself) is the concept of Sparse Distributed Representation. I actually wish it was known as SDSR, for Sparse Distributed Semantic Representation. This is the data structure of the brain. Think of it as “1’s and 0’s but not binary.”

Firstly, Representation means you’re giving the concept a ‘name’ in this data structure. Sparse means there are lots of 0’s and just a few 1’s. I think the term ‘Distributed’ is essentially a stand-in for Semantic. That is to say, representations that are similar things should look like similar representations. Let me give a few examples.

In the context of all possible objects, apples are a lot like oranges. They’re both fruit, they’re both sweet, they’re both round and about the same size, they don’t really complain if you eat them, etc. So if you have a representation of the concept ‘apple’ it should have lots of overlap with the representation for the concept ‘orange.’



Perhaps that first location (1 for apple, 0 for orange) typically indicates the color red. Oranges are not red, they’re orange. So they don’t share that overlapping bit. Maybe the next index indicates the object has bright colors, so they’re both 1 there.

Compare this with binary. Since every possible combination of 1’s and 0’s is taken (as in binary representations are dense, not sparse) individual indexes don’t really mean anything at all. They only mean something in aggregate. The ASCII code for the letter U is 85, which in binary is: 01010101. Which one of those 1’s means the letter is a vowel? None of them. See what I mean? You need the representation to be sparse in order to give meaning to the individual bits.

Anyway, that’s what SDRs are. Why are they so important? Imagine you’re a neuron for a moment, (or a cortical column, if you’d like). You’re connected to a bunch of other neurons. Those connections are a lot like the individual indexes of the SDR above. Most of our connections at any given moment are off (0), but some are firing (1). Since you know what each neuron means (more or less) you know what the rest of the brain is seeing: something round, something edible, something colorful, etc…

Of course, this is an oversimplification of everything, but that’s what ELI5 is.

Reasoning by Analogy

So if you want to relate HTM to stuff you know already (Neural Networks), we can try to put it in that context and see where it breaks the box.

You can think of HTM as a neural net whose smallest whole-unit isn’t really a neuron, but a collection of neurons connected in a specific, but malleable way (analogous to one single circuit in the repeating circuitry pattern of the neocortex which is often called a cortical column) arranged in a hierarchy.

In other words, take a neural net, make each neuron a bundle of neurons, and each layer a multilayered circuit, then push the whole neural net up on its side so data comes in at the bottom instead of on the left. Data will flow up the hierarchy, and back down where the output implies some motor function because this neural net is inherently existing through time and is therefore in a feedback loop with the environment (making it a sensorimotor inference engine). Make long-range connections too, so that some nodes connect to many layers above and below them. Then modify the backpropagation algorithm such that data gets bounced back and modifies connections at every timestep, also add feedback predictions to the backpropagation algorithm so higher layers in the hierarchy can give shorter time-scale layers a heads up about what they should expect to see, thus affecting how they interpret new data. Lastly, (and this may be going a bit beyond HTM now), stretch out the bottom of that pyramid so the two corners are next to the peak, so it’s a circle of nodes. Now you have a memory structure where data can flow into any node on the edges and, like an eye, get focused on the other side before getting bounced to modify behavior.

Perhaps that wasn’t as helpful as I had hoped. Well, anyway, let me conclude by saying; like Numenta, many are searching for (what I call) the “smallest unit of intelligence”. Think of Jeff Hinton’s Capsule networks as an example.

Lots of AI (especially AGI) researchers are hitting on the same theme from different angles. I’m looking forward to seeing how they all converge.