Recommendation Systems


After learning about HTM for a while, I am attempting to build my first HTM system for recommendation systems. The purpose is to predict user ratings for items based on previous item ratings. An example database is the MovieLens ratings. This database has user ratings, from 0 - 5, of some movies and there is additional information on each movie (year, genre,etc.). The network should predict the user rating for movies a user did not explicitly rate.

I am interested to know if anyone has given such a problem any thought. Since this problem does not have a temporal aspect to it, could HTM even be a possible approach?
Are there any suggestions how to structure the network so that it can predict the user rating from the context of previous user encodings?
Beginning with the encoder, an initial encoder( user ) can be a concatenation of encodings of each movie rating, where a completely inactive SDR( movie ) corresponds to an unrated movie.

Any insight is greatly appreciated!

  • I am using nupic.core with python 3.7

Hi Shahar,

Welcome! Great to hear you’re ready to start building something. HTM theory is inherently temporal, being modelled on our neocortex. So I’m having trouble seeing this problem as a good fit, without some temporal pattern to recognise.

Someone else might see otherwise, but I’d encourage you to keep looking for a good application, there’s nothing like building it to cement your understanding!


The only temporal component to predicting movie ratings that I can think of is that a person’s interests often change over time. That is usually over fairly long time scales though. It might be interesting to see if there are identifiable temporal patterns to these changes in preference when comparing different people.

That said, I think there are other ML algorithms which would be better suited for this particular task.


Hi @Shahar, welcome to the forum.

If you consider these items as objects, and each object having features, then in principle it should be possible to connect each of these features of one object in a way that the system recognises a particular object, and based on a limited amount of features, predict not only which object it is observing, but also which other features it must have and how they relate to the features already observed.

These features are believed to be connected by equilateral triangles. Grid cells don’t only code for spacial information, but all information. Even if an object has a temporal component, the SDRs representing the features of the object first and foremost have relations encoded using grid cells.

(I don’t know if this is much help, but it is a very important point to understand).


@Shahar Thanks for your question! But I agree with others in this thread that HTM is not the best approach to solving a spatial pattern recognition problem like this, however you can go a long way with simple semantic representations like uses for words and simple overlap comparisons.


Thank you for your response!
If I understand correctly, are you suggesting that clustering similar items would be possible if the items are encoded using a grid cell encoder?
I did not think of the grid cell in that way, so thank you for the new point of view! Now, the current grid cell encoder converts a 2-D coordinate to a SDR. Does this mean we are restricted to 2 features per item?

1 Like

Thank you! I am unfamiliar with so I will start learning as much as I can about their methods.

Be sure to watch this video: Cortical IO new video on semantic folding!


I think it’s more fundamental than that. But I don’t understand how it works. It’s fascinating and terribly frustrating at the same time.

Encoders encode raw data into SDRs, but the semantic information enbedded into the SDRs must be connected through grid cells. If I’m allowed to speculate (that’s all I can do) I would say the grid cells are the mesh onto which the beads of information are stringed.

This mesh is not just a metaphor. There must be a neuronal structure that supports the activation of neurons that make up the SDRs. (Numenta researchers think this happens in layer 6b of the neocortex). So, a certain (unknown) part of the action potential for a particular neuron to fire, must come from this grid cell structure, while the rest comes from the sensorial information itself. (When I say sensorial, I mean the encoded information that was once input from a sense. It can be stored information too, long after the feature was observed. And it can also be abstract information, deduced from other stored information. Some people call this the what information, while the grid cell data is the where information. @Bitking knows a lot more about this).

I had a good reason to think why this grid cell information comes in before the Spacial Pooler and the Temporal Memory, but I have forgotten right now. (This happens a lot unfortunately). I’ll get back to you if I find it.

Please take all this with a serious grain of salt. I am very much out of my dept here. But it’s obsessively fascinating to me. And if someone could prove me wrong, I would be delighted with learning more.


Let’s start with @rhyolight to suggest how Numenta suggests using grid cells with HTM.
My recommendations are a bit off the Numenta cannon and I don’t want to warp the forum without letting Numenta show the preferred way first.


1 Like

I don’t think this is a problem I would confound with grid cells. Simple semantic encoding should work fine if you want to find an HTM-related solution. Just encode aspects of movies into semantic binary representations, OR favorite movies into a union representation and compare with the rest of the movie library via overlap scoring.


@rhyolight is right that there are defintely easier and more established ways to solve your initial problem. Maybe I’m overthinking this.

But to answers your question (again, speculatively): a grid cell module makes a connection between two features. Between each of two features. So if a simple object has n features, then there are (n-1)! connections, each represented by a grid cell module.

That’s in principle. It’s possible that in the messy reality certain connections are omitted, or perhaps even that certain connections are redundant with more that one grid cell module.

1 Like

As I think about grid cells, I see them serving the role of modeling a continuum of configurations (or state space). That is to say that they are useful for representing properties which can take on specific values from a continuous range of values. Grid cells modules allow the agent to smoothly transition (or transform) from one state to another, and could potentially allow the agent to explore a neighborhood of possible configurations. Such representation would be useful for continuous optimization problems.

Your recommendation system appears to work on discrete data points that have properties that can only take on discrete (categorical) values. I do not see these properties being particularly well represented by grid cells. If you wanted to shoe-horn them in somehow to the grid cell framework, then you would probably be looking at something more like displacement modules. However, at that point you are essentially just creating maps that jump from one SDR to another, perhaps following network connections between similar representations from one item that is similar to another. However, the structure connecting the items in this space (topology) will still be discrete. For example, there is no way to smoothly transition from Avengers Endgame to Avatar to Titanic, even though they are adjacent to one another in the “all time highest grossing film” dimension.

All that being said, there are ways of generating representations of these discrete items that somehow manage to incorporate their similarity. Matt mentioned as one possibility. They use a large corpus of text data to generate their semantic fingerprints for words. These fingerprints are determined by the context and the usage of the words in the corpus documents. For your recommendation algorithm to be successful, you would have to come up with a way of extracting properties that express the semantic similarity between the items you are recommending. These properties can be intrinsic to the items (e.g. movie genres, actors, directors, percentage of time in action scenes, etc.), or extrinsic (e.g. users that liked this movie also liked other movies). The encoded items will then express their similarity through the amount of overlap in their SDRs.


I don’t think grid cells have that much in common with a state space. (Unless I have a wrong idea of a state space).

When a system observes an object with a number of features, then that system should find in memory a set of SDRs that comply pretty much exactly with each of the features sensed. If it does not sense certain features of the model (because its sensors can’t scan each feature at the same time for instance) then it still should have a predictive state for the unsensed features. So the grid cells do not really change states.

If a system hesitates between a number of models for the observed object, then it does not start to cycle between states. It searches for the one set for which each sensed feature complies with the object’s. Again, it does not change states. If anything, it discards sets of states that do not totally comply.

If the system can’t find a model that complies completely, only then does it create a new set of feature SDRs to model the new object, perhaps with some features for which it has SDRs and with some new features for which it has to create new SDRs. But again, it does not change states.

It is kind of intuitive to understand this when we consider a physical 3D object. But remember that grid cells are more fundamental than helping represent 3D objects. At least if we consider that neocortical columns must have a general purpose algorithm. This is why I’m so interested in finding ways to model non-spacial objects using grid cells.

Why should that be the case? Doesn’t it depend on the encoder?

Also @Shahar didn’t say he was making a recommendation system for movies. That was just an example. If one wants to build a general purpose recommendation system, it kind of makes sense to generate connections without knowing what the sensed data represents.

In that regard, even though there are off-the-shelf solutions for this kind of problem, I think it is very interesting to at least think about it with grid cells in mind.


You might think of each dimension in space as a state. If GCMs can represent N-dimensional space, it would be possible to represent a rich feature in each dimension.

1 Like

Hey, I’m thinking out-of-the-box here, I’m not an expert in HTM however I’d like to share my idea on how might one can use HTM for this particular prediction task.

The idea is to have a pipeline that is Cluster Features -> Classify Features -> Predict Rating.

One may use the HTM Spatial Pooler (SP) to cluster the Movie_Features+ User_Features and then use a classifier to label these clusters (e.g. SDR Classifier) with the true rating value. If there are enough correlations between these features and the user ratings (true values) then one might get good results. After training, the SDR Classifier can be used to infer the label (rating) of the SP’s output (active columns).

1 Like

But couldn’t you do the same thing with a simple logistic regression neural network? A logistic regression classifier should be able to cluster features, right?

1 Like

More or less. But there is a subtle difference, the SP can encode the inputs using the bits that matter most, which is kind of a dimension reduction stage. When trained well, it can “locally generalize” similar inputs and hence they are clustered. The SDRClassifier is then used to simply label these clusters. I don’t think that a vanilla logistic/softmax regression classifier cares about dimension reduction.

As we know the SP takes in an encoding vector and outputs an SDR. The encoding vector is the concatenation of all input fields’s encoding vectors, each of which has its own size (‘n’ and ‘w’ parameters).

Since each SP column links to a subset of all encoding bits (the receptive field), it seems that the sizes of the encoders makes all the difference in terms of which inputs fields will most strongly impact SP column overlap.

So if all input field encodings were of equal size (same ‘n’ and ‘w’), then each field would have equal representation in the total encoding and thus equal impact on the SP. I’m not disagreeing that the SP can do dimensionality reduction as you say or find the bits that ‘matter most’. I’m just saying that this judgement call of which bits ‘matter most’ seems subject to another judgement call initially made when creating the SP - the relative sizes of the encoders.

They may or may not, this depends on the input field bit values and their occurrences so far. Maybe I misunderstood this statement.

Yes and I believe other parameters as well.