Question about Spatial pooler

shiva · December 9, 2021, 1:07pm

please explain about overlap score measure
thanks

jacobeverist · December 9, 2021, 4:27pm

I’m just guessing what your numbers for overlap mean since you didn’t define them here, nor did you define the training/testing protocol or what classification algorithm you’re using.

I think the small overlap here represents the learning and boosting algorithm efficiently allocating neurons for representing objects. This minimizes the number of bits needed to represent the object.

You can probably think of the overlap here as “noise” in the representation. The boosting/learning here reduces the amount of noise in the representation as much as possible.

Actually, it may be something completely different. If its the average overlap per activated neuron, it could be an artifact of the boosting process which keeps neurons from being overcommitted to a particular input signal. This favors neurons that activate over a broad array of inputs.

This is one of the reasons I don’t like boosting for practical applications and have never found a way to keep it stable enough to use.

dmac · December 10, 2021, 12:26pm

I’m not sure that the overlap is really what you want to be analyzing here…

The overlap is a measure of similarity between two SDRs. If you take the overlap between two representations of different things then the overlap should be very small. Between similar things it should be very large.

The MNIST training data is shuffled. If you take the overlap between random pairs of training data then the overlap does not really have any meaning, because sometimes the two SDRs represent the same thing and sometimes they represent different things.

Maybe try splitting your overlap measurement into two measurements:
one for when both samples are from the same input classification,
and a second for when they’re from different input classifications?

By the way: htm.core’s SDR_Metrics class will automatically measure the overlap, but not for you!
It measures the overlap between consequtive inputs, and is intended for use with time-series datasets.
MNIST is not a time-series dataset and so the built-in SDR_Metric.overlap will probably not be useful.

I hope this helps.

dmac · December 10, 2021, 2:37pm

No, the spatial pooler does not have any overlap between inputs and outputs.
Usually those two SDRs are different sizes and so its not even really possible to measure that overlap.

Two similar inputs will have similar outputs, but there is almost no similarity between an input and its corresponding output.

cezar_t · December 10, 2021, 2:38pm

What is the relevance of this? I thought only comparing differences between two outputs vs differences between their respective inputs has some equivalence (might be a linear transition between the two overlaps).

overlap(x1,x2) ~ overlap(y1,y2)

Otherwise the output of a network is a totally different encoding than its input.

shiva · December 10, 2021, 3:11pm

please help me. please

dmac · December 10, 2021, 5:58pm

Yes, that comparison is wrong. You can not compare the SP input & output like that.

I think that the best way to compare those things is by measuring the classification accuracy on the dataset. IMO it should score at least 95% if you use both learning and boosting.

dmac · December 13, 2021, 1:03pm

Ok, so if you have a sparse distributed representation (SDR) there are a lot of different ways to measure it and analyze its contents.

htm.core has pieces of code to measure SDR’s in the following ways: sparsity, activation-frequency, entropy, and overlap. Htm.core has a class named Metrics which applies all of these different ways to measure an SDR. It is supposed to be a convenience, its supposed to be easier than making each type of measurement individually.

The Metric class measures the overlap between every two consecutive assignments to the Metrics class. (The order that you give SDRs to the Metrics class is important for the overlap.) In the context of time-series-datasets this can be quite useful. However MNIST is not a time-series dataset.

In the MNIST example, the overlap is not useful. It shouldn’t really be part of the example, but it gets printed anyways alongside the other, more useful, information.

dmac · December 24, 2021, 1:23pm

Yes, that’s correct, that’s what the code does.
In addition to the mean, it also prints the min,max, and standard-deviation of the overlap.

That code is also accessible from python via: htm.Metrics

cezar_t · December 25, 2021, 4:40am

He means to compare an image of 1 with another 1, 2 with 2, etc…
Overlap apples with apples, oranges with oranges. Same class.
It doesn’t help much, a simple KNN or K-means clustering will show there are (and where are) quite a few areas where the digits intermingle, e.g. some 4-s and 9-s have better overlap with each other than with their own categories.

cezar_t · December 26, 2021, 3:40pm

I don’t think we need to. Because similarity was proved to be preserved in case of arbitrary inputs and arbitrary sparse neuron connections in cases like fly hash, then whatever connection structure an e.g. SP learns it is just a particular case of the general one (with arbitrary connections, without learning) so yeah similarity of inputs will be transferred to outputs for any inputs.

Topic		Replies	Views
Measuring SPooler performance ? How? Implementations	2	509	November 27, 2019
Spatial Pooler Implementation for MNIST Dataset Implementations spatial-pooling , htm-implementations	49	6842	August 20, 2021
87.15% accuracy using Spatial Pooler and a biologically possible classifer on MNIST Applications	8	848	May 16, 2020
Network of SpatialPoolers Getting Started	15	1248	April 16, 2019
Understanding Boosting in Spatial Pooler Engineering	3	1134	August 8, 2016

Question about Spatial pooler

Related topics