Relation between HTM and convolutional network

hi, everyone, I’m currently learning convolutional neural network and wonder if there is a mapping between CNN and HTM ?

Hi @sudent,


This is a great question; and the fact that you’re asking this question causes me to come to the conclusion that you haven’t yet begun your investigation of HTM Theory?

There are some great resources for quickly coming up to speed…
Here are some links to get you started…

General Overview

New HTM School Resource:

I’m envious that you get to discover this for the first time! Have fun!


1 Like

@cogmission hi cogmission, thanks for replying me, I have already have some knowledge of HTM , and I know there are some similarities between HTM and CNN, as they both dealing with invariant representation with features. When I’m comparing HTM with CNN, I’m talking in the context of vision problem, what I’m looking for is a more exact match of the algorithm behind the hierarchy of HTM and the hierarchy of CNN, so that I can build a real hierarchical temporal structure that learn images.

1 Like

@sudent As far as I know there are absolutely no correlations between HTM and CNN - except on a very superficial level, but you can wait for other responses to verify this if you like?

They both deal with invariant representation because that invariance is the problem - but they are both two very orthogonal approaches to solving the problem of invariance. (imho). :slight_smile:

@cogmission yeah, after some researching I have the similar conclusions that they only have superficial connections, but I have talked to Subutai, he said that nupic has not yet built a model for image learning that both have hierarchical and temporal memory structure. The existing algorithm has only spatial pooler and temporal memory implemented , my question is how algorithmically can I build a hierarchical of layers and really achieve what Jeff described how HTM learns about a image? thank you

I think you may have to wait for them to publish a new version of the theory that includes “Hierarchy”? I have heard that they are currently working on it…

ok, @cogmission thanks for the information

1 Like


Please keep in mind that I do not work for Numenta, and can’t really say for sure what is intended for future development - I can only speak from my “intuition” about these things… ok? :wink:

Hi @sudent,
The Spatial Pooler (SP) in HTM systems can be compared with CNNs, specifically the max-pooling layer of CNNs. The max-pooling layer computes the max value of a selected set of output neurons from the preceding convolutional layer and uses these as inputs to higher layers; i.e., it subsamples the input layer. The SP creates a sparse, distributed representation of the input, learning connections to the input space. With an input space larger than the HTM column-space (e.g. a 10,000 pixel video frame to an HTM with 2000 columns), each HTM column learns to represent a subset of the input space; see the SP pseudocode for more. Beyond this there is little comparison between CNNs and HTM systems. In HTMs the SP precedes the Temporal Memory ™, which learns temporal sequences in the data representations. TM can be compared to RNNs, namely LSTMs; see “Continuous online sequence learning with an unsupervised neural network model” for an in depth comparison.



Hi Alex,

Can I ask a question to shore up my understanding about what you’re saying here?

Would it be correct to say that the comparisons you speak of are about the effect of their functioning and not about the way in which their functions are carried out?

  • …such as that the SP and the max-pooling layer of CNNs being compared in that they both subsample their respective input.

  • and wherein recurrent nets are a type of artificial neural network designed to recognize patterns in sequences of data - but there is no overt temporal component?

So for my edification (and edjumacation :blush:) , would it be fair to say that if one were looking for similarities in their function, there wouldn’t be any? But if one were looking to compare the problems they are applied to - then you could force a comparison between them?

Yes, the comparisons allude to the effects, not the underlying algorithms–HTM and DL are very different under the hood.

The sequence may be a document of words, which would not have a temporal component. Or it could could be a sequence of sensor readings, which may have discrete timesteps between each value. For more info on LSTMs I recommend this blog post by Chris Olah.

1 Like

Wow Alex, that was a great link thanks! :wink:

1 Like

Just came across this issue on aymericdamien/TopDeepLearning, which is “a list of popular github projects related to deep learning (ranked by stars).”

@rhyolight might want to comment on the issue?

1 Like

What I have understood, SP finds a pattern in image the same way as CNN. SP is optimized for cheap and slow but very distributed processors and memory, like neurons. CNN is optimized for centralized and very powerful processor optimized for matrix calculations. They both try to get the best out of their target processor architectures.

Simplistically SP a finds pattern, say horizontal line, by watching some pixels located horizontally to each other. CNN finds horizontal line by multiplying subimage with matrix that has horizontal pattern and by checking how well subimage corresponded. To my understanding, SP’s pixel references are roughly the same as CNN’s mask matrix.

For me it is much harder to see clear connection between LSTM and TM. Not least because LSTM is so complex that it feels like ’time aspect’ of LSTM is a hack. To me LSTM is time-invariant. You give the same input (previous output + new data) and you always get the same output. LSTM is purely functional? TM is time-variant or statefull.

Critic to my ”optimized differently theory”: It can be applied to everything. Cat is a dog, but optimized differently - true!

1 Like