Is the topology in HTM similar to the convolution in CNN(Convolutional Neural Network)?

I am studying the HTM course.
What if we use a topological HTM region to classify pictures? How is the effect? Is there any experiment?

1 Like

Keep watching. You’re going to need motion (saccades) in order to explore pictures. There are some posts on the forum related to “nupic vision” you might want to search.


Thank you for your reply. Your HTM course is great!


There is much more to vision than most starting researchers usually consider; the brain is assembling tiny little snapshots of the visual field into the thing we think of as perception. Keep in mind that all these little pictures are registered in the exact same place in the visual cortex, sequentially.
Here is a little post to help you think about biologically plausible vision:


Definitely. Achieving similar functions in the visual cortex is a big challenge.
I think the feature-integration theory coincides with the hierarchical theory of HTM.
feature extraction and pattern recognition of images is a problem that must be solved.

The “pointing” functions of the motor portions of the cortical and sub-cortical structures (such as the frontal eye fields) are paired with the feature snap-shots as the location/displacement functions. This is a form of tagging that combines locations with features as is commonly described in the classic HTM theory.

In some ways this opens the door to the same explanatory theory that is used in “ordinary” convolution networks.

Good knowledge.
Is there a neural network that has been implemented that can combine learned features without re-learning?
For example, a well-trained network to recognize knives and scissors, no need to learn knives and scissors again when learning Swiss army knife.

First of all nice handle @OhMyHTM, I’m diggin it. Also to your question on

this idea sounds to me like “transfer learning” in mainstream DNN/CNN’s.

You may be interested in this post:

In terms of HTM, the current NuPIC system (Encoders–>SP–>TM) is great at doing this with temporal features, as long as new data scenario (that the model has been ‘transferred’ too) can be mapped by encoders to the qualitatively equivalent SP columns from the original scenario.

There are also ways to combine the activations of multiple NuPIC regions using the Network API. As you may know this general idea has been called ‘Temporal Pooling’ in the HTM lexicon. There have been some extensive threads on the topic as people have taken different approaches. I remember @Paul_Lamb and @sebjwallace having a quite in depth exchange on it if you guys don’t me name dropping.

One of the common questions is about HTM “learning something.” That may bring in some pre-conceived notions about what it might mean to learn something.

On one level, the HTM system either triggers a response of a predictive cell to indicate that the current perception or sequence has been seen before, or a burst to indicate that this is a novel perception. This bursting is a signal of novelty. It is local to the scope of that pool of perception and depending on the configuration of the HTM system, my be some topological portion of the map or the entire map as a whole.

This is the heart of the anomaly detection that is commonly touted as the strength of the HTM system.

As far as a pattern that can be passed to some outside viewer that transforms a perception to a unique output pattern that signals the perception of a learned pattern - not so much.

At least as currently formulated.

I have been proposing a hex-grid signaling method to replace the spatial pooling portion of the HTM system but that is certainly not widely adopted by the HTM community. One of the strengths of the Hex-Grid proposal is a stable grid pattern that does communicate a fixed pattern to signal recognition. This is an extension to the standard HTM model and not in competition. Several members have mentioned that they were going to try this extension - I have not heard on anyone reporting any positive results at this time.

1 Like

thank you @sheiser1, I feel that HTM is very powerful.

I think what I am asking may be transfer learning. This is what I will try in the future,
Of course, and NuPIC.

1 Like

5 posts were split to a new topic: Reality for machine intelligence: internal or consensus

@Bitking “Hex-grid” sounds very interesting. I think sparse pools is the way the cortex works.

Here’s a pretty extensive article I just saw on this too in case you’re interested:


CNN’s aren’t at all similar to HTM’s.

A convolution matrix is an image processing matrix that finds edges and increases the value when the center element is very different from its neighbors. The matrix is usually a 3x3 with all elements being -1 except the center which is 8 to balance the values.

You can look up Image Processing Kernels on wikipedia (had a link but I can only do 2 links). You can also test your own out in a program called GIMP under Filters > Generic > Convolution Matrix. It’s a pretty standard edge detection method. The max pooling in the CNN is used to discard all of the black space.

In normal neural networking and machine learning all you are doing is finding a probability that a state or collection of states is a label/next state. This method came about because the early methods of AI were about hand programming trees and responses and states. So the industry figured out how to automate that process through statistical analysis on labeled information.

So if you imagine the MNIST test, all of the data is nice and centered and you can get a good probability of a system being a 1 or an 8 just based on their position and strength in 28x28 grid of pixels. But in real life not everything is centered and not everything is a small 28x28 sized image.

So what the Convolution Matrix does is finds the important parts, which are the edges of an object. Then the max pool removes 75% or more (depending on the pool size) of all of the pixels, mostly being black or low value pixels because we want only the high value parts passed down. Eventually you get an image that has been rougly centered and reduced to a more managable “state” that you can do normal probability based analysis on.

The idea of using that method came about from research into eye neurons. They seem to only fire when their neighbors are very different from themself. If you manually program that out, you will get something almost exactly like a convolution kernel.

There were lots of experiments with LSD and some people came out with some interesting insights. One insight, which I have personally witnessed because I had brain surgery and now sometimes get headaches with accompanying periphrial visions, was useful into how the eye sees edges. There are many people who have occular migraines as well that see different patterns. Mine are jagged bright edges just outside the corner of my eye.

My personal theory is that if you have enough patterns overlayed, each responsible for a different distribution of the vision and a different orientation, or shape, like placing multiple fingers on a cup to get an a better idea, you can get more relevant features of what something is. Coupled with saccades you can get a pretty quick and immediate reaction to what something is.

You might be able to use convolution and max pooling though to reduce how much data HTM’s have to work on. But in general they are not related.

As for the “knives and scissors” question, normal neural networks will capture the blades and they will be overlayed under the same features. You can check this Google Seed Bank Tutorial out to see how normal neural networks save features. If a blade triggers the “average” blade then it will cascade more features down the line that need to match up. But only the ones it has seen and been told is a blade and only in the states it has been seen in. This is the reason normal neural networks are so training data heavy. Even then, they fail against noisy versions unless you introduce noise and variation in the training.

But that’s all getting outside the context of HTM school so I’ll stop. Also this was very long.


@Cairo so, convolution in CNN is not for spatial pattern detection?

Well spatial is a pretty ambiguous term and means different things in different contexts. In a way you could say that’s its only function if you get higher abstract level about it. To find relations in space. In other words, find edges of objects.

But its definitely not building the relation the same way HTM’s will be building them. CNN’s will literally be collecting averages. It’s part of the backpropagation method. You find the gradient and direction that the weights need to be adjusted, you add all of the adjustments of the batch, then you divide by the batch size. That by definition is an average. Unless of course you are using Stochastic Gradient Descent but you get pretty close to the same result, because you are moving back and forth finding the middle ground.

I’m newer to the HTM community but it seems that they are building relation by building unique identifiers of concepts/objects then slowly increasing a relational value. Similar to reinforcement learning but with noise robust “hashes” (kind of a hash) instead of defined states. The HTM’s seem to build their own states, which is exaclty what the AI community has been chasing for years.

On top of defining their own states they understand temporal relation between those states which is context that only recurrent neural networks can capture. HTM’s seem to couple the ability to capture state and temporal information, which is everything you need for intelligence.

CNN’s wouldn’t be able to tell you what an object was if you gave it tiny pieces at a time. It has to see the whole thing at one time. If there is a way to do that, its a very convoluted (pun definitely not intended) and in a hand crafted way. Just like most of the methods in AI. Every new biological characteristic the industry wants, mathematical, temporal, image, nlp, they have to invent a new type of net or function or method to approximate it.

So, in short, it is a type of spatial pattern dector, but not in the way HTM is. It’s primary purpose is to capture seperate features and reduce the system down to an easier probalistic problem space.


Really wonderful answer. Many thanks.

According to my understanding, HTM is unsupervised learning, no lables, no backpropagation, sample data driven, patterns are automatically precipitated, both spatial and temporal patterns. However, the sample data must first be encoded as a sparse representation.


One shot learning. It’s really fast.


Quite different at least in the intentions side.

Convolution in CNN is used for automated feature learning. Topology in HTM (at least in nupic) is another organization of receptive fields that may lead to prefer/see/care/focus topological features of an input.

Convolution kernels are learned (dynamic and computational) in CNN to prefer convolved input signals that maximize a prediction task, in HTM, topology on the other hand are simply an organization of receptive fields (static and structural).