Experimenting with stacking Spatial Poolers

marty1885 · August 7, 2019, 2:32pm

@rhyolight I have been classifying the SP as a dimension reduction algorithm. But this behavior seems to indicate that it is also a clustering algorithm. What do you think?

rhyolight · August 7, 2019, 2:40pm

Sure. It can take many different “meaning vectors” encoded into one space and distributed their joint meaning across another space.

Bitking · August 7, 2019, 8:28pm

Post 35 updated:

Jose_Cueto · August 7, 2019, 9:00pm

Let me share my thoughts.

as Dimensionality Reduction algorithm
The SP intuitively can be considered as a dimension reduction algorithm, however, it is agnostic to the dimensions and structure of the input, hence is unsupervised similar to an autoencoder. The SP uses a metaheuristic algorithm rather than gradient descent, hence it is not searching for something, it’s simply reorganizing itself.

as Classifier, Clustering algorithm
In most use cases of the SP, it is being used as a classifier. However, quite agnostically the SP is really just encoding the inputs as a result of the algorithm (see above as Dimensionality Reduction) - it doesn’t care/know about the meaning of the inputs. Because it encodes inputs, and the output set (number of columns) is usually forced to be smaller than the input size, then it reuses encodings for inputs that are semantically similar. Hence, the result are groups/columns instead of individual encodings. But note these groupings have no meanings at all. For users like us using the SP, we perceive them as classifications by putting labels on them using different algorithms (e.g. softmax). Therefore, the SP is simply clustering per definition, because it groups inputs, and classification is really only realized when another algorithm interprets its groupings.

Encoder, Feature Extractor
Same as above as Dimensionality Reduction Algorithm, it is an unsupervised encoder. Encoding involves encoding of core features in reduced space (e.g. compression), this is why it is also a feature extractor because due to space constraints it will be forced to extract the bits that matter most.

when Stacked together
When stacked together think of a stacked encoder.

Generalization
It is counterintuitive to think about generalization here because encoders don’t necessarily generalize with regards to the meaning of generalization in ML. My personal opinion, seeking for generalization is a double-edged sword, when an algorithm is DNN it is ok it works for now, but otherwise it is fiction.

In the DL world, the stacked SP is similar to a convolution kernel. Why? The kernels (they are many by the way) in CNN extracts features and they are learned in an unsupervised manner, the stacked SP is learned by SP training (unsupervised) and when tested they encode/extract features. The stacked SP extremely (at least for this example) stabilizes its outputs/groupings, hence it is a better encoder and intuitively much similar to an autonecoder, hence it is not good for classification tasks,

Jose_Cueto · August 7, 2019, 11:44pm

I’m curious, are you suggesting that the active columns as SDRs in HTM SP output are not biologically plausible?

hsgo · August 8, 2019, 12:13am

AFAIK, the kernels in CNN are learned by gradient descent.
How is that unsupervised?
Am I missing something?

Jose_Cueto · August 8, 2019, 12:40am

You are correct. My statement was counterintuitive, sorry ESL I tend to sound naive when I use it. The main business of the CNN is to learn the “best” features by GD and I think this is also the basic goal of a supervised ANN. There is ground truth to what is A or B (e.g. a cat or dog) at the start and end of a supervised learning (e.g. CNN), however there is no ground truth (fully-labeled example) of a kernel. The kernels are learned along the way, this is what I meant about unsupervised. Depending on prespective, most CNN users really don’t care about kernels but care more about the set of extracted features which are the ones being used in the next fully-connected layer - these features are learned with supervision because they are compared to the ground truth. I would say its probably both supervised & unsupervised in a lower level perspective.

Bitking · August 8, 2019, 12:59am

Connecting to every node in the array is not biologically accurate. I have whined about this in the past but this is defended as being close enough that training fixes the discrepancy.

Also - the models that are being built are not being connected to sensors where the topology is that important.

The HTM models in use now are small enough that the divergence is not critical. There is some topology setting that would become important if the models become larger.

Even then, the deviation from the linear nature of a dendrite may also be missing some important part of the biology. I could see that two mini-columns that have dendrites passing each other in opposite directions could have some useful mutually reinforcing behavior.

Jose_Cueto · August 8, 2019, 5:06am

Thanks so much for your explanations. Most of the time when I read them I feel like I’m shrinked and suddenly dragged in a dark room where voices can be heard. I can only listen and try to open my eyes as wide as I can so I can see what I’ve hopefully understood. I think the dark room is neurobiology and I’m far from that context. I appreciate though, don’t get me wrong, it’s fun to play with our imaginations.

As far as I can tell there is not yet any HTM equivalent for this reinforcing behavior? Do you think this can be achieved by another “specialized” HTM component or is this just a matter of rearranging the existing htm components in some form. The results shown above by the OP also shows that rearranging these htm components may result to some interesting discoveries.

Bitking · August 8, 2019, 5:09am

I have talking about this here for a long time. Check out:

Bitking · August 8, 2019, 5:16am

BTW: don’t get me wrong - there are all kinds of tools that stray from the biology that accentuate some useful properties to make useful tools.

Deep learning springs to mind.

My comment is just to point out that this is straying far enough from the biology that it may not be helping to model and understand that biology.

Bitking · August 8, 2019, 5:21am

As far as the biology being hard to follow - please forgive my sketchy descriptions. I have been studying neurobiology since my teens in the late 1960’s. At this point I don’t remember any part of it being hard to follow. I assume that these details are common knowledge and that we are just quibbling over minor details.

My bad.

Jose_Cueto · August 11, 2019, 12:24am

I know, no worries. However with my knowledge and experience in computing, I do not believe that Neuroscience/Biology will always go first with respect to discovery of novel brain-like algorithms. What if these brain algorithms are counterintuitive to what we have seen so far in the cortex? What if these brain algorithms (sorry for the lack of terms) already exist in other parts of nature, it’s just that they don’t look like the cortex? One of the reasons why I’m equally interested with results of the algorithms such as this stacked SPs is because these algorithms are usually agnostic to where it can be seen and applied. Deep learning started with simple biology then all additions are math, I’m not quite interested with it, I reached a DL winter myself at least in my thoughts. What I’m interested is algorithm results (math or discovered from nature) and then if it is worth it, I formulate questions, does this make sense to the cortex? Why does it improve itself?

bkaz · August 11, 2019, 11:19am

Can you define “improve” at a node level? Without going through ridiculously long SGD chain, or, god forbid, evolution?

Jose_Cueto · August 11, 2019, 9:11pm

You are looking at the SP, it does not use SGD. How about the ACO algorithm. Not all you don’t know does not exist.

bkaz · August 11, 2019, 11:12pm

You were talking about a lot things, including Deep learning. And I didn’t mean only SGD, but anything that involves higher structures, beyond the node. Whatever.

Jose_Cueto · August 12, 2019, 12:14am

It was a hypothetical question for a discovered algorithm so it is dependent to the problem at hand. For example, here the stacked SP has improved itself with respect to stability of outputs. I can’t help talking about relevant things here, after all they are all computing machines.

marty1885 · August 13, 2019, 6:24am

I’m back with more result.

Out of curiosity, I plotted what each SP is generating. (Note SPs in Etaler by default have a respective field of infinite).

It is interesting that distinct bands are forming as more and more SP are stacked together.

This is a plot of what the Grid Cell encoder generates.

What the 1st SP generates

What the 2nd SP generates

What the 4th SP generates

What the 8th SP generates

What the 16th SP generates

(Here is a folder containing high-res version of the plots.

Jose_Cueto · August 13, 2019, 7:20am

What exactly is the x-axis (SDR) value here?

marty1885 · August 13, 2019, 9:58am

The y axis is what I send into the stack of SPs. And the x-axis is what the SP generates (blue being a 0 and yellow being a 1).

I flipped x/y axis because vertical carts are difficult to read.

Topic		Replies	Views
Can you stack multiple spatial pooler regions on top of each other? NuPIC spatial-pooling	8	1098	July 19, 2018
Basic Spatial Pooler Questions Numenta Theory spatial-pooling , question	25	4716	January 9, 2020
Measuring SPooler performance ? How? Implementations	2	458	November 27, 2019
Potential bug in spatial pooler HTM.Java	4	1000	September 1, 2016
Spatial pooler boosting and duty cycles Engineering spatial-pooling , visualization	9	1529	December 22, 2018

Experimenting with stacking Spatial Poolers

Related Topics