I noticed something similar in cortical.io word fingerprints – words that appear in more contexts have more dense representations than words that appear in fewer contexts. In the case of their word fingerprints, though, it isn’t actually the density which is reflecting semantics. It is just an easier way to encode semantics.
As a simple example, suppose I want to encode an input “Diagonal Up/Left” to be semantically similar to two other inputs “Up” and “Left”. An easy way to encode this is to just make it a union of the other two inputs. This is a trivial case, but basically the same thing that is happening with the word fingerprints. Words that appear in many contexts end up being encoded more densely as a result. I don’t know if this is the case with your data set, but thought I would point it out.
If you think about the properties of SDRs, remember that the same semantics can be encoded after a loss of bits. What matters when encoding semantics is the ratio of bits that represent the semantic meanings being encoded. So rather than encoding “Diagonal Up/Left” as a union of “Up” and “Left”, instead it could be encoded with a random 50% of “Up” bits and a random 50% of “Left” bits, thus maintaining a fixed sparsity while still encoding the same semantics.
I’m not an expert, but I’m pretty sure dense representations will have a negative impact on the SP process. The denser the representations, the more input bits a column becomes specialized on. This could lead to a higher frequency of false positives. In the case of my above example, a “Diagonal Up/Left” representation containing a union of “Up” and “Left” might end up being represented by nearly identical columns to “Up” if that input happened to be better trained than “Left”. I don’t have any mathematics to back this claim up though, so take it with a grain of salt