A Theory of How Columns in the Neocortex Enable Learning the Structure of the World

I also have a question related to this one. In that paper there is the following text in section “Simulation Details”:

We encode each location as a 2,400-dimensional sparse
binary vector with 10 random bits active. Each sensory feature
is similarly encoded by a vector with 10 random bits active. The
length of the sensory feature vector is the same as the number
of mini-columns of the input layer N
in.

This does not explain how the feature vector is built as shortly described in fig. 2. In figure 2 the feature-location pair is used as input in the input layer.
I think the wording might also be a bit confusing? What is the feature, sensory feature and feature-location pair?
How is the input exactly created from 2400 and N bits? Can you provide an example of the input of the SP in the input layer?

Figure 2 depicts the feature as proximal input to the input layer (In classic HTM, the encoded feature would have passed through the SP algorithm). For some reason, Figure 2 doesn’t show the location SDR, but that would be what is providing the distal input to the input layer (i.e. the context for which specific cells in the minicolumns are chosen). So in Figure 2, activity in the input layer is meant to represent a feature in the context of its location.

I’m pretty sure they didn’t encode any semantics in this particular case (if anyone knows otherwise, please correct me). I believe they simply chose random SDRs to represent each feature (i.e. features were all semantically dissimilar from each other)

2 Likes

Thanks @Paul_Lamb for the quick answer.
I’m looking for the exact vector that was used as input. By reading the paper at the high level all looks great, but some details seem to be missing or not completely described. For example, figure 1A suggest that location is modulatory input. As long I know the SP in the input layer receives the proximal input (FF). Modulatory input inside of the SP is internally handled between mini-columns.
Knowing this, figure 1 is not clear to me. It suggests that location is used as input. But how? Also later in the paper, this is not described.

Then figure 2 shows nicely how features are defined

image

and encoded as input to SP.
image

But this definitely is not how fig. 1 suggests it to be. Probably it is, but it is not described or I didn’t understand it. This is why I’m raising this question.

There is also somewhere something about the output layer that would use 4096 cells but not organized in mini-columns. What that does mean? Is it possible to create HTM without mini-columns?

To recap, I’m looking for the exact input used in the experiment when working with a single column (fig.2 I guess) and when working with 3 columns.

1 Like

The active minicolumns in the input layer exactly align with the active cells for the feature in Figure 2, implying that the feature is providing proximal (FF) input. I think you got that part, just making sure (the FF input in Figure 2 is the feature only, not a combination of feature and location).

Now to the location input. It is the modulatory (distal) input to the layer. It is not depicted in Figure 2, so I can see how this might cause you to think:

I think you meant TM here (not SP). The algorithm described in the paper is slightly different than TM. In TM, the modulatory (distal) input comes from other cells in other minicolumns in the same layer (i.e. active cells in the input layer at time T-1 provide the context for time T).

In the paper, the algorithm works a bit differently than that. Instead, a separate population of cells somewhere else (not in the input layer, and not the encoded feature) depict the location. Unfortunately, these cells are not drawn in Figure 2 (thus, I believe, the cause for your confusion). In this modification of the TM algorithm, active cells in the input layer at time T grow distal connections to the active LOCATION cells at time T (NOT to other cells in other minicolumns in the same layer).

The output layer in this paper is meant to represent an object. Keep in mind what minicolumns are used for in HTM – they are used to represent something in context. In the circuit described in this paper, they only needed to represent the object (not the object in some context), so there was no need for minicolumns in the output layer. Obviously, taking this beyond the paper, to move to a hierarchy, objects in context become a requirement. So just keep in mind that they are attempting to convey a single concept from one small piece of the cortical circuit in this paper, not fully describe it. There is still much work to be done…

I don’t have them myself (perhaps others can comment if they do), but like I mentioned earlier I’m pretty sure they just used random SDRs. The exact encodings are not important to the concept they are trying to convey in the paper (and even if random SDRs weren’t used, they could be used to repeat the experiment). The source code for this and others of Numenta’s papers is also available on GitHub. There is also this project which takes the concepts a bit further.

3 Likes