If your single criterias on choosing sparsity are:
avoiding ambiguity when multiple patterns overlap on the same SDR
resistance to noise
Then the 40 on bits/pattern would be at least as reliable on a larger SDR, so SDRs can be increased without changing number of on bits to activate a single pattern.
Because probability of overlap (1.) decreases the larger the SDR space is.
@subutai once showed a pretty descriptive graphic where he depicted each pattern (SDR) as a circle. The radius of each circle was proportional to the number of on bits, and the size of the space and hence distance between the centers of the circles was proportional to the total number of bits. The point of the graphic was to show the trade off between the total number of representations that could be stored in the space and the probability of overlap (random collisions) between any two of those representations.
See the paper I linked to above for several formulas describing the number of representations and probability of collisions for a given number of bits and sparsity.
From a statistical/mathematical POV , having a SDR space with orthogonal patterns which accounting for noise has a sparsity of 0.05 (5% sparsity) the random chance of flipping any selected bit “by mistake” is 1/20
for two bits is ~ 1/400, for three is 1/8000 and so on, theoretically 4-5 ON bits from a given pattern will give a high confidence the corresponding pattern is active.
Yet in real applications these bits represent actual features and properties for related things, e.g. the pattern for dogs will have a high overlap with the pattern for cats.
So my personal guess is it’s highly unlikely you’ll get a very useful implementation by simply picking a fixed threshold of ON bits to activate all possible patterns, considering the real world is made of simple and complex things and you might need a few couple bits to identify with high confidence some simple objects and many more for others.
from htm import SDR
import random
size = 2000
sparsity = 0.02
num_synapses = 8
synapse_threshold = num_synapses / 2
num_trials = 100 * 1000
# Make a test SDR to try detecting.
target_pattern = SDR(size).randomize(sparsity)
# Make some synapses to detect the target_pattern.
synapses = SDR(size)
synapses.sparse = random.sample(list(target_pattern.sparse), num_synapses)
# Measure how often the synapses mistake a random SDR for the target_pattern.
false_detections = 0
for i in range(num_trials):
random_sdr = SDR(size).randomize(sparsity)
if synapses.getOverlap(random_sdr) >= synapse_threshold:
false_detections += 1
print(f"False detections: {false_detections} / {num_trials}")
I’d rather not actually.
The reason I posted this code example was so that you could play around with it and gain an intuitive understanding of how the concepts work.
You should run this example with different combinations of: size and num_synapses.
BTW: The example is python code, and it uses the “htm.core” library.
i asked because i get module conflicts when i tried last time to install htm … (had to downgrade numpy or something… )
anyway thanks for the code … i will try it