HTM School Episode 7: Spatial Pooling Input Space & Connections

Sorry this episode is about 10 minutes longer than I prefer them to be, but I did not want to skimp on the first SP episode.


Oh by the way, folks, I would love to try to get a lot of exposure for this episode of HTM School. You can help by sharing it on social media. Here are some places to start:

Twitter: https://twitter.com/Numenta/status/756503465531015168
FB: https://www.facebook.com/permalink.php?story_fbid=10154838109257119&id=321559142118
G+: https://plus.google.com/b/100642636108337517466/+NumentaOrg/posts/EJ2fpS29B4f
Reddit: https://www.reddit.com/r/MachineLearning/comments/4u34nm/spatial_pooling_input_space_connections/
HN: https://news.ycombinator.com/item?id=12143867

But also, send it to user groups that might be interested, post on news sites, etc. I would appreciate any promotion you can do. :smiley:

9 Likes

Matt,

It’s nice to be reminded of just how fascinating HTM Theory is; and that’s just what this episode does! Thank you Matt!

Brilliant new visualizations, btw! Love the new ideas to show the intricacies of Spatial Pooling.

1 Like

Excellent breakdown of the foundation for spatial pooling… I am really enjoying HTM School :slight_smile:

I have quick question on the “potential percent” parameter. The HTM Cheat Sheet thread mentions 2% as a standard for sparsity. I interpret that to be equivalent to setting the “potential percent” to 0.02 (idea being that 2% of columns with one cell per column active would represent an output SDR with 2% sparsity). Just checking if I am interpreting that correctly (I have misinterpreted this point previously, so making sure I have it correct in my mind this time).

1 Like

Actually, the “potential percent” parameter only affects the number of cells in the input space that a column could possibly connect to. It also affects the number of initial connections each column has, because I believe the SP randomly connects about 50% of the potential connections (or at least adjusts all the permanence values for each cell to be distributed nearby the “connection threshold” in a way that about 50% of them are initially connected).

The “2% sparsity” number that you are talking about can be changed by setting the “number of active columns per inhibition area” setting to about 2% of the total columns in the SP.

Got it… so I pick the best 2% of columns, which is a separate exercise from picking potentials. Is there a good standard starting value for “potential percent”?

That’s a good question for @alavin. Hey Alex, what are the affects of changing the potential percent parameter in the spatial pooler? My guess would be that it would affect how the SP learned over time. I would have to put together a couple of experiments to find out.

On that note, I’m setting up a testbed for running SPs with different parameters beside each other while sending the same data to each one and visualizing them side-by-side. This particular case could make a good study.

I can answer that…

From: https://github.com/numenta/nupic/blob/master/src/nupic/research/spatial_pooler.py#L1275

You see that it specifically determines the number of initial “members” of a single SP bit’s pool of input bits. Said differently, each SP bit has a number of input bits in its pool of potential connected bits. This value determines the number of input bits in each pool.

Thanks, to be specific, line 116 appears to indicate a default potential percent of 0.5 (i.e. 50%):

Yes, and I do believe that is the default setting for the _potentialPct parameter… :wink:

Yes, but I would not trust that those defaults are the best defaults. They need some inspection (at least @subutai told me that).

Actually appears a bit more to it than that. There is also the concept of potential radius. From the comments there, looks like you connect to a potential percent of cells within a potential radius. I should be patient and wait for future episodes of HTM school :smile:

Yes, I believe the potential radius comes into play when there is a topology involved. I will need to talk about that at some point.

Thanks for the informative video - I’m getting a clearer idea of what HTMs are. I’m here from the machinelearning reddit.

This random mapping reminds me of the Johnson Lindenstrauss lemma which states that a random linear between vectors of different sizes preserve distances between the two spaces very well. They differ in certain details, but I think are the same in spirit.

the steps towards a proof that spatial pooling works is already out there, I’m curious if there has been work in this direction in the neuroscience literature?

You’re welcome, and also welcome to this forum. I’m happy to see someone from the ML subreddit. HTM posts are usually not well-received there.

You might also note that SDR unions are comparable to Bloom filters.

It seems the brain is several million years ahead of us when it comes to math / number / set theory. :slight_smile:

Has the HTM School visualizer moved?

Update: found it… http://htm-community.github.io/htm-school-viz/site/index.html

@Paul_Lamb and @rhyolight sorry for the delayed response. I can try to clear up some of the confusion here, although future HTM School episodes and our upcoming Spatial Pooler chapter in BaMI (shameless plug) that I’m writing will help provide a complete understanding of the SP.

Yes, but more specifically “… the number of cells within a column’s inhibition radius that the column could possibly connect to.”

A column’s potential synapses are a random set of inputs selected from the column’s input space. A synapse is connected if its permanence is above the connected perm threshold, and the initial permanence values are selected such that they’re in a small range around this threshold, where 50% are above and 50% are below. Also, the initial permanence values are higher towards the center of the column’s input space, giving the column a natural center over its receptive field. Initializing the SP this way enables potential synapses to become connected (or disconnected) after a small number of training iterations. Sorry if that’s too much info :stuck_out_tongue_winking_eye:

This should be set so that on average, at least 15-20 input bits are connected when the spatial pooler is initialized. If the input to a column contains 40 ON bits, and permanences are initialized such that 50% of the synapses are initially connected, then you will want potentialPct to be at least 0.75 because 400.50.75 = 15.

Yes, this would affect the spatial pooler’s ability to self-adjust the columns’ receptive fields as it learns over time; maintaining a large pool of potential synapses is important for SP plasticity. The effects of changing the potential percent are of course very dependent on the sizes of the input space and the SP, and the inhibition radius.

You’re correct. A small potential radius will keep a column’s receptive field local, while a very large potential radius will give the column global coverage over the input space. In practice we typically use the latter, where a column can cover the entire input space.

3 Likes

I also want to point out this comment from YouTube user Sam Gallagher:

It’s misleading to say that a column is active if it has a high enough overlap. The columns are all assigned some overlap; if the overlap is below the threshold, the overlap is 0. If the overlap is above the threshold, it is kept as-is (and multiplied with the ‘boost’, haven’t watched the video to see if this is covered!). So far, in the process, NO columns have been selected as active. The segments, on the other hand, have been made active. Those are proximal segments: segments which go from region to region. Each column owns exactly 1 segment, which connects it to the input space. A column with an overlap > 0 is another way of saying that the column’s proximal segment is active. BUT the column is not.

To see which columns are active, you take the ENTIRE array of columns and their calculated overlap scores. Then, you break the whole thing up into sections called “neighborhoods”. A column array of 200 columns might have a neighborhood size of 30. That means columns 1-30 are in neighborhood 1, columns 31-60 are in neighborhood 2, and so on. This is the first part of generating sparsity. Now you list out and compare each of the neighborhood overlaps, and take, say, the top 3 per neighborhood. These columns are the selected active columns. Perhaps in neighborhood 1, columns 3, 5, and 19 have overlaps higher than any others. Then they are selected active, and all others are “inhibited”, or marked inactive. The number of columns per neighborhood that are actually selected is called the “desired local activity”, and in our example of 200 columns and neighborhoods of 30, we might say that desired local activity is 3 (remember we chose the top 3 columns, these are the same number!). From this, we expect a sparsity of:
[(200/30)*3]/200
where (200/30) is the number of neighborhoods, and 3 is the active columns per neighborhood.

Hope this clears things up!

As I responded on the video, I over-simplified for a reason. I probably should have mentioned that I was over-simplifying, but I could not introduce all these concepts in the first video. We will talk about how boosting and inhibition works in later episodes. In this initial example, global inhibition is turned on, so there is just one global neighborhood. (That does make sense, doesn’t it @alavin?)

2 Likes

Yes, with global inhibition the columns are in one big neighborhood. In practice, we typically set the inhibition to be global because computing inhibition relative to local neighborhoods of columns can be computationally expensive. However, if we want the SP to capture topological information, like in image data, then defining neighborhoods of columns through local inhibition would be useful.

3 Likes

How much has anyone looked at doing local inhibition “neighborhood” calculations in parallel, all at once? I have been drawn to this idea since I heard about the computational overhead of local inhibition and it seems one of the only actual places where there’s real opportunity for concurrency?

1 Like

Hi Matt. I think these videos could even be longer.

Q:

When you say “columns” in the video, do you mean

  1. a cell in this 2D matrix (the spatial pooler on the right)
  2. a column in this 2D matrix
  3. the 2D matrix is actually a 3D matrix, and one cell from the “top
    perspective” which we can see in the video has a whole column beneath
    (behind) it

?

I’m asking because you’ve used the word “column” a lot, but you haven’t stressed out any correlations among matrix cells within the same column, so I don’t really see a column anywhere in the spatial pooler.

2 Likes