I have a question related to the maximum overlap of the SDRs of two distant values (Ideally 0). In the source code of the Random Distributed Scalar Encoder (RDSE) appears the following:
Dissimilar scalars should have very low overlap so that the SP does not
confuse representations. Specifically, buckets that are more than w indices
apart should have at most maxOverlap bits of overlap. We arbitrarily (and
safely) define “very low” to be 2 bits of overlap or lower.
Properties 1 and 2 lead to the following overlap rules for buckets i and j:
If abs(i-j) < w then:
overlap(i,j) = w - abs(i-j)
else:
overlap(i,j) <= maxOverlap
Is it possible to define a “very low” max overlap as a function of the number of active bits (w) of the SDRs? Or is it a (possibly fixed) value independently of w?
For example, consider the function f(x) = floor(0.05x) and a sparsity of 2%, then for n = 2048 and w = 41, the max overlap would be f(41) = 2. With n = 10000 and w = 200 the max overlap would be f(200) = 10. In this particular case, would the SP performance suffer with this maximum overlap?
The greater w, the more difficult is to find representations for distant values that do not have overlap choosing bits with a uniform distribution (for example RDSE method). So, it is necessary to relax the maximum overlap as w increases. Correct me if I’m wrong.
You could make it a function of w, but it won’t matter much in practice. The chance of having 2 bits of overlap actually decreases very fast as n increases, so that condition will rarely be encountered. It will be extremely unlikely to ever get an overlap of 10 with n=10000. Also it does not make too much difference if you increase w as you increase n. w of 21 is a very safe value for almost any usage. I do not recommend using n smaller than 400 with this implementation.
Thank you @subutai. I am currently reading [1]. I need some time to fully understand some ideas contained in it.
I asked the original question because when I was studying the operation of RDSE, I came up with an idea to make a stateless encoder. Although it was not useful it serves me to better understand the properties and generation of SDRs. I did not implement it because I first wanted to test the idea by doing a statistical study and quick simulations. According to the tests I have done, the value of maxOverlap between SDRs of distant values is ~5% of w (there could be an overlap of 5% of w or less bits between random SDRs) using 1000 buckets.
So, can this be modeled as a 5% noise level (in the worst case) and apply the false positive probability function (eq. (4)) of [1]?