Maximum overlap between the SDRs of two distant values

Hi!

I have a question related to the maximum overlap of the SDRs of two distant values (Ideally 0). In the source code of the Random Distributed Scalar Encoder (RDSE) appears the following:

  1. Dissimilar scalars should have very low overlap so that the SP does not
    confuse representations. Specifically, buckets that are more than w indices
    apart should have at most maxOverlap bits of overlap. We arbitrarily (and
    safely) define “very low” to be 2 bits of overlap or lower.
    Properties 1 and 2 lead to the following overlap rules for buckets i and j:
  If abs(i-j) < w then:
    overlap(i,j) = w - abs(i-j)
  else:
    overlap(i,j) <= maxOverlap

Is it possible to define a “very low” max overlap as a function of the number of active bits (w) of the SDRs? Or is it a (possibly fixed) value independently of w?

For example, consider the function f(x) = floor(0.05x) and a sparsity of 2%, then for n = 2048 and w = 41, the max overlap would be f(41) = 2. With n = 10000 and w = 200 the max overlap would be f(200) = 10. In this particular case, would the SP performance suffer with this maximum overlap?

The greater w, the more difficult is to find representations for distant values ​​that do not have overlap choosing bits with a uniform distribution (for example RDSE method). So, it is necessary to relax the maximum overlap as w increases. Correct me if I’m wrong.

Thak you very much,
Caesar.

1 Like

You could make it a function of w, but it won’t matter much in practice. The chance of having 2 bits of overlap actually decreases very fast as n increases, so that condition will rarely be encountered. It will be extremely unlikely to ever get an overlap of 10 with n=10000. Also it does not make too much difference if you increase w as you increase n. w of 21 is a very safe value for almost any usage. I do not recommend using n smaller than 400 with this implementation.

See [1] for some of the math behind this.

[1] https://arxiv.org/abs/1601.00720

Thank you @subutai. I am currently reading [1]. I need some time to fully understand some ideas contained in it.

I asked the original question because when I was studying the operation of RDSE, I came up with an idea to make a stateless encoder. Although it was not useful it serves me to better understand the properties and generation of SDRs. I did not implement it because I first wanted to test the idea by doing a statistical study and quick simulations. According to the tests I have done, the value of maxOverlap between SDRs of distant values ​​is ~5% of w (there could be an overlap of 5% of w or less bits between random SDRs) using 1000 buckets.

So, can this be modeled as a 5% noise level (in the worst case) and apply the false positive probability function (eq. (4)) of [1]?

Regards,
Caesar.