Variable sparsity SDR's and similarity?

The brain does not run on fixed sparsity, it is a good rule of thumb for implementation.
How will you interpret OVERLAP in a context of VARIABLE sparsity.

F.e. if you have vectors (a,b,c) with sparsity (5%,2%,2%) and overlaps :

         a/b = 20
         b/c = 20

which one is closer to b , a or c.

In general how do you interpret different combination of sparsity and overlap ?

Random high dimensional vectors are pretty much orthogonal most of the time. i.e. their dot product is very close to zero.
Even more so if the vectors are sparse, there’s virtually no chance they’re not (basically) orthogonal.
This fact also translates to random sparse binary vectors, given the high dimensionality and extreme sparsity.
Even if the sparsity varies a little bit, the probability they overlap in a meaningful way is astronomically low.
Now, learned representations are not random. But even if they highly overlap, it means they have very high semantic similarities and probably would have overlapped without varying sparsity.
So I don’t see how it makes any significant difference and I don’t think it matters at all in practice.

2 Likes

I think the primary purpose of fixed sparsity is to prevent the network from giving more weight to any particular feature. By allowing the SDR of one feature to have more active bits, you are effectively giving it more chances to activate other neurons in the network.

Now sparsity is not the same thing as active bit count. For instance, you could have variable sparsity and fixed bit count by adjusting the sizes of the vectors. In that case, the variable sparsity will only affect the probability of random vectors overlapping, as noted by the previous poster.

3 Likes

You are probably right … I’m thinking in such cases may be Jaccard distance will be better if the sparsity difference is bigger !! a thought

OR better yet Overlap-coef : Similarity in Graphs: Jaccard Versus the Overlap Coefficient | NVIDIA Developer Blog

1 Like

Jaccard distance is definitely a better similarity metric in general cases.
But wouldn’t it break the union properties?

1 Like

hmm interesting … may be the opposite, because the Jaccard/Ocoef will more fully represent the merge of two vectors…
btw with J/OC you can compare the union vector a|b with c, with overlap you are not comparing but checking for existence, right ? which is subtly semantically different

 J(a|b,c) vs olap(a|b,c) !!!

have to think about it ! what can u do with one that u cant do with the other ?

1 Like

The Jaccard index might hurt the union/sub-sampling property that HTM utilizes in regular basis.
Even if the input SDR has every bit active that the reference SDR has, just because it also has many other bits active, the similarity score decreases so that might hurt the union operation of TM/GCM and the sub-sampling operation of dendrite segments.
This overlap coefficient, on the other hand, might not hurt the properties that HTM utilizes.
But I still don’t understand why one would prefer this over just the simple bit-AND&count operation. The reference SDR would pretty much always have hardly varying sparsity and much fewer bits active than the input SDR, so the denominator of the overlap coefficient would not matter, which would make this operation virtually equal to bit-AND&count(the cardinality of the intersect).

1 Like

I dont prefer overlap over jaccard, just thinking if there could be some advantage .

I’m thinking of a different task more along the lines of distinguishing/organizing many/millions vectors with variable sparsity say from 20/100_000 to 200/100_000 (not exactly SDR, may be not brain related :wink: ).

with pure overlap i think will be hard to find clumps of vectors … with Jaccard you use all the bits ! So you can say have targeted merge to compress/unionize selectively

As we know every bit has semantic meaning overlap only account for similarities, but if say A have the same overlap with B and C, but B has more bits than C … it would seem that A should be more similar to C, because it has less missing features !

Something like long term memory, where such variability is possible !