Weighted encoding semantics

I’ve been thinking about how to implement a MIDI encoder, and I re-watched the HTM School video’s on SDRs and encoders. One thing I’m confused about is how those come together.

If an SDR has a typical sparsity of 2%, and an encoder needs certain overlap between values to represent semantic similarity between the values, does that not contradict in a way? How should I practically combine low sparsity with high overlap?

To make it a little more concrete, lets consider the datetime encoder shown in Episode 6. I know it’s an example to make the principles clear, but it has an n of 328 and a w of 48. That makes for a sparsity of 14,6%. Would it be enough to increase n to 2048 without changing w? (sparcity = 2,3% but close enough). The overlap between values cannot be changed, so would that not cause other problems, like influence the risk for false positives, or the noise tolerance?

Edit: misspelled “sparsity”.

In the brain, typically yes, but encodings are coming from senses, which don’t play by the same rules. They can be a lot denser than 2%. I would say you are safe up to 50%.

1 Like

Ok, thanks. That makes sense.

Another practical question that got me wondering: is there a way to calculate the relative importance of semantics between the values?

Again, in the example of the datetime encoder, the weekend takes in 42 bits, 12 of which are on for weekends and 12 others are on for business days. (No overlap here, although I noticed there are bits that are never on, depending on the bucket size). So I interpret that weekend days are relatively more important than business days because they take up more than 2/7th of the value space.

In the meanwhile the time-of-day takes up 54 bits, almost the same as the weekend characteristic.

I understand that depending of the use of the datetime encoder, these can be changed to suit the application. But is there a formal way to calculate these relative sizes? Relative between different meanings (ex: weekend vs time-of-day), and relative between values of the same semantic characteristic (ex: overlap in representation of different time-of-day values).

Or doesn’t it matter?