HTM School Episode 7: Spatial Pooling Input Space & Connections

rhyolight · July 22, 2016, 2:43pm

Sorry this episode is about 10 minutes longer than I prefer them to be, but I did not want to skimp on the first SP episode.

Oh by the way, folks, I would love to try to get a lot of exposure for this episode of HTM School. You can help by sharing it on social media. Here are some places to start:

Twitter: https://twitter.com/Numenta/status/756503465531015168
FB: https://www.facebook.com/permalink.php?story_fbid=10154838109257119&id=321559142118
G+: https://plus.google.com/b/100642636108337517466/+NumentaOrg/posts/EJ2fpS29B4f
Reddit: https://www.reddit.com/r/MachineLearning/comments/4u34nm/spatial_pooling_input_space_connections/
HN: https://news.ycombinator.com/item?id=12143867

But also, send it to user groups that might be interested, post on news sites, etc. I would appreciate any promotion you can do.

cogmission · July 22, 2016, 2:44pm

Matt,

It’s nice to be reminded of just how fascinating HTM Theory is; and that’s just what this episode does! Thank you Matt!

Brilliant new visualizations, btw! Love the new ideas to show the intricacies of Spatial Pooling.

Paul_Lamb · July 22, 2016, 3:52pm

Excellent breakdown of the foundation for spatial pooling… I am really enjoying HTM School

I have quick question on the “potential percent” parameter. The HTM Cheat Sheet thread mentions 2% as a standard for sparsity. I interpret that to be equivalent to setting the “potential percent” to 0.02 (idea being that 2% of columns with one cell per column active would represent an output SDR with 2% sparsity). Just checking if I am interpreting that correctly (I have misinterpreted this point previously, so making sure I have it correct in my mind this time).

rhyolight · July 22, 2016, 3:56pm

Actually, the “potential percent” parameter only affects the number of cells in the input space that a column could possibly connect to. It also affects the number of initial connections each column has, because I believe the SP randomly connects about 50% of the potential connections (or at least adjusts all the permanence values for each cell to be distributed nearby the “connection threshold” in a way that about 50% of them are initially connected).

The “2% sparsity” number that you are talking about can be changed by setting the “number of active columns per inhibition area” setting to about 2% of the total columns in the SP.

Paul_Lamb · July 22, 2016, 4:00pm

Got it… so I pick the best 2% of columns, which is a separate exercise from picking potentials. Is there a good standard starting value for “potential percent”?

rhyolight · July 22, 2016, 4:04pm

That’s a good question for @alavin. Hey Alex, what are the affects of changing the potential percent parameter in the spatial pooler? My guess would be that it would affect how the SP learned over time. I would have to put together a couple of experiments to find out.

On that note, I’m setting up a testbed for running SPs with different parameters beside each other while sending the same data to each one and visualizing them side-by-side. This particular case could make a good study.

cogmission · July 22, 2016, 4:09pm

I can answer that…

From: https://github.com/numenta/nupic/blob/master/src/nupic/research/spatial_pooler.py#L1275

You see that it specifically determines the number of initial “members” of a single SP bit’s pool of input bits. Said differently, each SP bit has a number of input bits in its pool of potential connected bits. This value determines the number of input bits in each pool.

Paul_Lamb · July 22, 2016, 4:20pm

Thanks, to be specific, line 116 appears to indicate a default potential percent of 0.5 (i.e. 50%):

cogmission · July 22, 2016, 4:21pm

Yes, and I do believe that is the default setting for the _potentialPct parameter…

rhyolight · July 22, 2016, 4:22pm

Yes, but I would not trust that those defaults are the best defaults. They need some inspection (at least @subutai told me that).

Paul_Lamb · July 22, 2016, 4:37pm

Actually appears a bit more to it than that. There is also the concept of potential radius. From the comments there, looks like you connect to a potential percent of cells within a potential radius. I should be patient and wait for future episodes of HTM school

rhyolight · July 22, 2016, 4:38pm

Yes, I believe the potential radius comes into play when there is a topology involved. I will need to talk about that at some point.

gabgoh · July 22, 2016, 6:39pm

Thanks for the informative video - I’m getting a clearer idea of what HTMs are. I’m here from the machinelearning reddit.

This random mapping reminds me of the Johnson Lindenstrauss lemma which states that a random linear between vectors of different sizes preserve distances between the two spaces very well. They differ in certain details, but I think are the same in spirit.

the steps towards a proof that spatial pooling works is already out there, I’m curious if there has been work in this direction in the neuroscience literature?

rhyolight · July 22, 2016, 6:48pm

You’re welcome, and also welcome to this forum. I’m happy to see someone from the ML subreddit. HTM posts are usually not well-received there.

You might also note that SDR unions are comparable to Bloom filters.

It seems the brain is several million years ahead of us when it comes to math / number / set theory.

Containerhouse · July 23, 2016, 5:40am

Has the HTM School visualizer moved?

Update: found it… http://htm-community.github.io/htm-school-viz/site/index.html

alavin · July 23, 2016, 4:02pm

@Paul_Lamb and @rhyolight sorry for the delayed response. I can try to clear up some of the confusion here, although future HTM School episodes and our upcoming Spatial Pooler chapter in BaMI (shameless plug) that I’m writing will help provide a complete understanding of the SP.

Yes, but more specifically “… the number of cells within a column’s inhibition radius that the column could possibly connect to.”

A column’s potential synapses are a random set of inputs selected from the column’s input space. A synapse is connected if its permanence is above the connected perm threshold, and the initial permanence values are selected such that they’re in a small range around this threshold, where 50% are above and 50% are below. Also, the initial permanence values are higher towards the center of the column’s input space, giving the column a natural center over its receptive field. Initializing the SP this way enables potential synapses to become connected (or disconnected) after a small number of training iterations. Sorry if that’s too much info

This should be set so that on average, at least 15-20 input bits are connected when the spatial pooler is initialized. If the input to a column contains 40 ON bits, and permanences are initialized such that 50% of the synapses are initially connected, then you will want potentialPct to be at least 0.75 because 400.50.75 = 15.

Yes, this would affect the spatial pooler’s ability to self-adjust the columns’ receptive fields as it learns over time; maintaining a large pool of potential synapses is important for SP plasticity. The effects of changing the potential percent are of course very dependent on the sizes of the input space and the SP, and the inhibition radius.

You’re correct. A small potential radius will keep a column’s receptive field local, while a very large potential radius will give the column global coverage over the input space. In practice we typically use the latter, where a column can cover the entire input space.

rhyolight · July 23, 2016, 6:55pm

I also want to point out this comment from YouTube user Sam Gallagher:

It’s misleading to say that a column is active if it has a high enough overlap. The columns are all assigned some overlap; if the overlap is below the threshold, the overlap is 0. If the overlap is above the threshold, it is kept as-is (and multiplied with the ‘boost’, haven’t watched the video to see if this is covered!). So far, in the process, NO columns have been selected as active. The segments, on the other hand, have been made active. Those are proximal segments: segments which go from region to region. Each column owns exactly 1 segment, which connects it to the input space. A column with an overlap > 0 is another way of saying that the column’s proximal segment is active. BUT the column is not.

To see which columns are active, you take the ENTIRE array of columns and their calculated overlap scores. Then, you break the whole thing up into sections called “neighborhoods”. A column array of 200 columns might have a neighborhood size of 30. That means columns 1-30 are in neighborhood 1, columns 31-60 are in neighborhood 2, and so on. This is the first part of generating sparsity. Now you list out and compare each of the neighborhood overlaps, and take, say, the top 3 per neighborhood. These columns are the selected active columns. Perhaps in neighborhood 1, columns 3, 5, and 19 have overlaps higher than any others. Then they are selected active, and all others are “inhibited”, or marked inactive. The number of columns per neighborhood that are actually selected is called the “desired local activity”, and in our example of 200 columns and neighborhoods of 30, we might say that desired local activity is 3 (remember we chose the top 3 columns, these are the same number!). From this, we expect a sparsity of:
[(200/30)*3]/200
where (200/30) is the number of neighborhoods, and 3 is the active columns per neighborhood.

Hope this clears things up!

As I responded on the video, I over-simplified for a reason. I probably should have mentioned that I was over-simplifying, but I could not introduce all these concepts in the first video. We will talk about how boosting and inhibition works in later episodes. In this initial example, global inhibition is turned on, so there is just one global neighborhood. (That does make sense, doesn’t it @alavin?)

alavin · July 24, 2016, 4:36pm

Yes, with global inhibition the columns are in one big neighborhood. In practice, we typically set the inhibition to be global because computing inhibition relative to local neighborhoods of columns can be computationally expensive. However, if we want the SP to capture topological information, like in image data, then defining neighborhoods of columns through local inhibition would be useful.

cogmission · July 25, 2016, 10:04pm

How much has anyone looked at doing local inhibition “neighborhood” calculations in parallel, all at once? I have been drawn to this idea since I heard about the computational overhead of local inhibition and it seems one of the only actual places where there’s real opportunity for concurrency?

flavius · July 30, 2016, 6:59am

Hi Matt. I think these videos could even be longer.

Q:

When you say “columns” in the video, do you mean

a cell in this 2D matrix (the spatial pooler on the right)
a column in this 2D matrix
the 2D matrix is actually a 3D matrix, and one cell from the “top
perspective” which we can see in the video has a whole column beneath
(behind) it

?

I’m asking because you’ve used the word “column” a lot, but you haven’t stressed out any correlations among matrix cells within the same column, so I don’t really see a column anywhere in the spatial pooler.

Topic		Replies	Views
Matt, Congratulations On Another Great Episode! (re: Episode 7) YouTube	2	1031	July 22, 2016
Building HTM Systems (WIP Document) Live Streaming	50	4549	April 14, 2020
On monetizing HTM Lounge	2	662	August 6, 2019
HTM School Episode 8: Spatial Pooling Learning YouTube spatial-pooling	6	1500	January 24, 2019
Some animations explaining columns and SP Numenta Theory spatial-pooling , education	5	1003	May 7, 2018

HTM School Episode 7: Spatial Pooling Input Space & Connections

Related topics