Question about Spatial pooler

please explain about overlap score measure
thanks

I’m just guessing what your numbers for overlap mean since you didn’t define them here, nor did you define the training/testing protocol or what classification algorithm you’re using.

I think the small overlap here represents the learning and boosting algorithm efficiently allocating neurons for representing objects. This minimizes the number of bits needed to represent the object.

You can probably think of the overlap here as “noise” in the representation. The boosting/learning here reduces the amount of noise in the representation as much as possible.

Actually, it may be something completely different. If its the average overlap per activated neuron, it could be an artifact of the boosting process which keeps neurons from being overcommitted to a particular input signal. This favors neurons that activate over a broad array of inputs.

This is one of the reasons I don’t like boosting for practical applications and have never found a way to keep it stable enough to use.

1 Like

I’m not sure that the overlap is really what you want to be analyzing here…

The overlap is a measure of similarity between two SDRs. If you take the overlap between two representations of different things then the overlap should be very small. Between similar things it should be very large.

The MNIST training data is shuffled. If you take the overlap between random pairs of training data then the overlap does not really have any meaning, because sometimes the two SDRs represent the same thing and sometimes they represent different things.

Maybe try splitting your overlap measurement into two measurements:
one for when both samples are from the same input classification,
and a second for when they’re from different input classifications?


By the way: htm.core’s SDR_Metrics class will automatically measure the overlap, but not for you!
It measures the overlap between consequtive inputs, and is intended for use with time-series datasets.
MNIST is not a time-series dataset and so the built-in SDR_Metric.overlap will probably not be useful.

I hope this helps.

1 Like

No, the spatial pooler does not have any overlap between inputs and outputs.
Usually those two SDRs are different sizes and so its not even really possible to measure that overlap.

Two similar inputs will have similar outputs, but there is almost no similarity between an input and its corresponding output.

What is the relevance of this? I thought only comparing differences between two outputs vs differences between their respective inputs has some equivalence (might be a linear transition between the two overlaps).

overlap(x1,x2) ~ overlap(y1,y2)

Otherwise the output of a network is a totally different encoding than its input.

please help me. please

Yes, that comparison is wrong. You can not compare the SP input & output like that.

I think that the best way to compare those things is by measuring the classification accuracy on the dataset. IMO it should score at least 95% if you use both learning and boosting.

Ok, so if you have a sparse distributed representation (SDR) there are a lot of different ways to measure it and analyze its contents.

htm.core has pieces of code to measure SDR’s in the following ways: sparsity, activation-frequency, entropy, and overlap. Htm.core has a class named Metrics which applies all of these different ways to measure an SDR. It is supposed to be a convenience, its supposed to be easier than making each type of measurement individually.

The Metric class measures the overlap between every two consecutive assignments to the Metrics class. (The order that you give SDRs to the Metrics class is important for the overlap.) In the context of time-series-datasets this can be quite useful. However MNIST is not a time-series dataset.

In the MNIST example, the overlap is not useful. It shouldn’t really be part of the example, but it gets printed anyways alongside the other, more useful, information.

Yes, that’s correct, that’s what the code does.
In addition to the mean, it also prints the min,max, and standard-deviation of the overlap.

That code is also accessible from python via: htm.Metrics

He means to compare an image of 1 with another 1, 2 with 2, etc…
Overlap apples with apples, oranges with oranges. Same class.
It doesn’t help much, a simple KNN or K-means clustering will show there are (and where are) quite a few areas where the digits intermingle, e.g. some 4-s and 9-s have better overlap with each other than with their own categories.

2 Likes

I don’t think we need to. Because similarity was proved to be preserved in case of arbitrary inputs and arbitrary sparse neuron connections in cases like fly hash, then whatever connection structure an e.g. SP learns it is just a particular case of the general one (with arbitrary connections, without learning) so yeah similarity of inputs will be transferred to outputs for any inputs.