Is there a brief answer to this question? I apologize if the answer to this requires a lot of setup, but it’s my intuition that “Overlap” and “Correlation” in the context of HTMs carry different meanings not found in everyday statistics or computer science; and so would defy attempts to find a resource for this answer on my own.

Yes, that is exactly correct. If the SDRs are correlated the chance of error is higher. It could be significantly worse if there is a lot of correlation. That is why it is better to be de-correlated. There is a fair amount of evidence that neural activity tends to get de-correlated in a number of ways. Good encoders are one way, but within the cortex there is pressure towards de-correlation.

I was not carefully using the term “correlation”. I’m still not sure what’s the exact difference between “correlation” and “overlap” of SDRs, but I leave this question for another day.

I was hoping today (or next few days… no rush) could be the day?

Given two specific SDRs, “overlap” is the number of bits that are ON in the same location. For binary vectors, you can compute this using the dot product. Or you can do a Boolean AND and see how many bits are left.

“Correlation” is a statistical term that denotes how any two variables relate. It is typically computed over the whole space of possible values for the variables. It is a property of the underlying representation and the underlying input probabilities, not about two specific vectors. Given an SDR representation with n dimensions, you can look at all possible inputs and then compute the correlation coefficient between any two bits. If it is 0 then the two bits don’t vary together. At the other extreme if it is -1 or 1, then they vary perfectly together.

I can create a corner case where the overlap between inputs is zero yet you get pockets of strongly correlated bits. I can also create a case where you get high overlap between pairs of inputs but there is hardly any correlation between individual bits when you look at the space of all inputs.

As such, I think the two concepts are not identical.

Thank you so much @subutai. I quoted the above because now it all makes sense to me. I can see now that with my new definition of “correlation” (not another term for overlap in other words) - I now see how de-correlation is preferred; how it impacts results; and maybe how the neocortex would have a bias toward de-correlation and why.

Ok this helps a lot. We also have to be careful when using the terms “de-correlated” and “uncorrelated” (see [1],[2],[3]). Decorrelation in signal processing means reducing the cross-correlation of signals.

Reading this wikipedia articles increased my understanding of what we are talking about a lot!

My conclusion:

With correlation we describe a kind of statistically dependency of bits between each other.

If 2 bits have an activation probability that is independent of each other, they are uncorrelated.

I speak now about linear decorrelation:

Decorrelation of a series of SDRs means finding regular patterns and represent them in a new way to increase sparsity.

When we speak about decorrelation inside a SDR it means something different. Going through time becomes going through indices. And we compare just 2 SDRs (or 1 in case of autocorrelation). Reducing cross-correlation is dependent on the shift. If we want similar patterns to have similar SDRs in space (and thus overlap between bits with near indices), we can express this partially with decorrelation:
2 SDRs with different patterns should have low cross-correlation for smaller shifts. 2 SDRs with similar patterns should have low cross-correlation for larger shifts.

In the opposite, increasing cross-correlation for small shifts inside SDRs, increases the correlation of near bits. Increasing cross-correlation for 0 shift increases the overlap of SDRs.

Simplified we may say, we want correlation for learning, robustness against false negatives and generalization and we want decorrelation for sparsity, and robustness against false positives and (?)

This are quite complicated concepts, I don’t know if my description is correct or even understandable. If someone who has more experience with the terms would take a look on it, I would appreciate it a lot.