Sorry this episode is about 10 minutes longer than I prefer them to be, but I did not want to skimp on the first SP episode.
Oh by the way, folks, I would love to try to get a lot of exposure for this episode of HTM School. You can help by sharing it on social media. Here are some places to start:
Excellent breakdown of the foundation for spatial pooling⊠I am really enjoying HTM School
I have quick question on the âpotential percentâ parameter. The HTM Cheat Sheet thread mentions 2% as a standard for sparsity. I interpret that to be equivalent to setting the âpotential percentâ to 0.02 (idea being that 2% of columns with one cell per column active would represent an output SDR with 2% sparsity). Just checking if I am interpreting that correctly (I have misinterpreted this point previously, so making sure I have it correct in my mind this time).
Actually, the âpotential percentâ parameter only affects the number of cells in the input space that a column could possibly connect to. It also affects the number of initial connections each column has, because I believe the SP randomly connects about 50% of the potential connections (or at least adjusts all the permanence values for each cell to be distributed nearby the âconnection thresholdâ in a way that about 50% of them are initially connected).
The â2% sparsityâ number that you are talking about can be changed by setting the ânumber of active columns per inhibition areaâ setting to about 2% of the total columns in the SP.
Got it⊠so I pick the best 2% of columns, which is a separate exercise from picking potentials. Is there a good standard starting value for âpotential percentâ?
Thatâs a good question for @alavin. Hey Alex, what are the affects of changing the potential percent parameter in the spatial pooler? My guess would be that it would affect how the SP learned over time. I would have to put together a couple of experiments to find out.
On that note, Iâm setting up a testbed for running SPs with different parameters beside each other while sending the same data to each one and visualizing them side-by-side. This particular case could make a good study.
You see that it specifically determines the number of initial âmembersâ of a single SP bitâs pool of input bits. Said differently, each SP bit has a number of input bits in its pool of potential connected bits. This value determines the number of input bits in each pool.
Actually appears a bit more to it than that. There is also the concept of potential radius. From the comments there, looks like you connect to a potential percent of cells within a potential radius. I should be patient and wait for future episodes of HTM school
Thanks for the informative video - Iâm getting a clearer idea of what HTMs are. Iâm here from the machinelearning reddit.
This random mapping reminds me of the Johnson Lindenstrauss lemma which states that a random linear between vectors of different sizes preserve distances between the two spaces very well. They differ in certain details, but I think are the same in spirit.
the steps towards a proof that spatial pooling works is already out there, Iâm curious if there has been work in this direction in the neuroscience literature?
@Paul_Lamb and @rhyolight sorry for the delayed response. I can try to clear up some of the confusion here, although future HTM School episodes and our upcoming Spatial Pooler chapter in BaMI (shameless plug) that Iâm writing will help provide a complete understanding of the SP.
Yes, but more specifically â⊠the number of cells within a columnâs inhibition radius that the column could possibly connect to.â
A columnâs potential synapses are a random set of inputs selected from the columnâs input space. A synapse is connected if its permanence is above the connected perm threshold, and the initial permanence values are selected such that theyâre in a small range around this threshold, where 50% are above and 50% are below. Also, the initial permanence values are higher towards the center of the columnâs input space, giving the column a natural center over its receptive field. Initializing the SP this way enables potential synapses to become connected (or disconnected) after a small number of training iterations. Sorry if thatâs too much info
This should be set so that on average, at least 15-20 input bits are connected when the spatial pooler is initialized. If the input to a column contains 40 ON bits, and permanences are initialized such that 50% of the synapses are initially connected, then you will want potentialPct to be at least 0.75 because 400.50.75 = 15.
Yes, this would affect the spatial poolerâs ability to self-adjust the columnsâ receptive fields as it learns over time; maintaining a large pool of potential synapses is important for SP plasticity. The effects of changing the potential percent are of course very dependent on the sizes of the input space and the SP, and the inhibition radius.
Youâre correct. A small potential radius will keep a columnâs receptive field local, while a very large potential radius will give the column global coverage over the input space. In practice we typically use the latter, where a column can cover the entire input space.
Itâs misleading to say that a column is active if it has a high enough overlap. The columns are all assigned some overlap; if the overlap is below the threshold, the overlap is 0. If the overlap is above the threshold, it is kept as-is (and multiplied with the âboostâ, havenât watched the video to see if this is covered!). So far, in the process, NO columns have been selected as active. The segments, on the other hand, have been made active. Those are proximal segments: segments which go from region to region. Each column owns exactly 1 segment, which connects it to the input space. A column with an overlap > 0 is another way of saying that the columnâs proximal segment is active. BUT the column is not.
To see which columns are active, you take the ENTIRE array of columns and their calculated overlap scores. Then, you break the whole thing up into sections called âneighborhoodsâ. A column array of 200 columns might have a neighborhood size of 30. That means columns 1-30 are in neighborhood 1, columns 31-60 are in neighborhood 2, and so on. This is the first part of generating sparsity. Now you list out and compare each of the neighborhood overlaps, and take, say, the top 3 per neighborhood. These columns are the selected active columns. Perhaps in neighborhood 1, columns 3, 5, and 19 have overlaps higher than any others. Then they are selected active, and all others are âinhibitedâ, or marked inactive. The number of columns per neighborhood that are actually selected is called the âdesired local activityâ, and in our example of 200 columns and neighborhoods of 30, we might say that desired local activity is 3 (remember we chose the top 3 columns, these are the same number!). From this, we expect a sparsity of:
[(200/30)*3]/200
where (200/30) is the number of neighborhoods, and 3 is the active columns per neighborhood.
Hope this clears things up!
As I responded on the video, I over-simplified for a reason. I probably should have mentioned that I was over-simplifying, but I could not introduce all these concepts in the first video. We will talk about how boosting and inhibition works in later episodes. In this initial example, global inhibition is turned on, so there is just one global neighborhood.ï»ż (That does make sense, doesnât it @alavin?)
Yes, with global inhibition the columns are in one big neighborhood. In practice, we typically set the inhibition to be global because computing inhibition relative to local neighborhoods of columns can be computationally expensive. However, if we want the SP to capture topological information, like in image data, then defining neighborhoods of columns through local inhibition would be useful.
How much has anyone looked at doing local inhibition âneighborhoodâ calculations in parallel, all at once? I have been drawn to this idea since I heard about the computational overhead of local inhibition and it seems one of the only actual places where thereâs real opportunity for concurrency?
Hi Matt. I think these videos could even be longer.
Q:
When you say âcolumnsâ in the video, do you mean
a cell in this 2D matrix (the spatial pooler on the right)
a column in this 2D matrix
the 2D matrix is actually a 3D matrix, and one cell from the âtop
perspectiveâ which we can see in the video has a whole column beneath
(behind) it
?
Iâm asking because youâve used the word âcolumnâ a lot, but you havenât stressed out any correlations among matrix cells within the same column, so I donât really see a column anywhere in the spatial pooler.