So I know that the SDR is here because of the sparsity of neuron firing in the brain, but what caused this sparsity? Not from an energy consumption perspective, but why recognition may require such sparsity?
The inhibition mechanism is enforcing the sparsity.
Other than the energy consumption, the sparsity helps in:
- Reducing the over-fitting
- Identifying semantic similarities
- Reducing the impact of noise
Its all about competition between neurons. For example, in the visual cortex each neuron represents a feature. A feature could be the smallest line or corner, to the a whole face or building. A neuron that represents a feature is physically close to all other neurons that represent similar features. So based on the visual input the neuron that’s matches the input the closest wins the competition for similarity. The winning neuron will inhibit the surrounding (similar featured) neurons. This is what causes the sparsity. Without this local inhibition you’ll get confused and semantic propagation won’t occur.
My 2 cents. I think, it’s a golden middle between two things:
extreme sparsity - each “thing” is encoded with just one neuron. Each neuron is highly important. N neurons can represent N “things”. It’s too strict/discrete and prone to errors due to noise. Also there’s no natural way to encode similarity between “things”. Also, it’s not clear does a combination of “things” should be encoded with combination of neurons or another one neuron.
extreme density - each “thing” can be encoded with any neurons (= any binary vector). N neurons can represent 2^N “things” which is huuuuge (and is maximum you can get from N neurons and binary encoding scheme). Each neuron is still very important. And again it’s too strict/discrete and prone to errors due to noise (switch one bit and get another vector encoding another “thing”). The question is how to measure similarity? Let’s tackle it a bit later.
Reasonable sparsity has all pros of extreme sparsity, but it’s more stable against noise, as each neuron is less important. It has much more encoding capacity and a natural way to measure similarity - overlap score. Even more, a combination of sparse vectors is just a union, and it makes perfect semantic sense.
Btw, reasonable sparsity is almost equal to high density - you can apply NOT operator to switch between them (e.g. NOT 96% density == 4% sparsity), but former is cheaper in terms of energy.
Reasonable sparsity vs ~50% density. The latter is worse because similarity becomes much more noisy - every two random vectors are expected to have large overlap, and semantic union of a small number of vectors highly probable to become extremely dense (= semantically meaningless as it very similar to all vectors).
We could measure similarity with overlap score in extreme dense case. Then basic things should be encoded with more sparse vectors, while complex things - with more dense as they’re combinations of basic things and should reasonably overlap with all of them. But to keep similarity reasonable all vectors still should stay reasonably sparse (we got this conclusion comparing with 50% density above). Sooo… it seems that this particular similarity metric somehow makes sparsity nearly obligatory.
I’d say sparse encoding is really powerful thing and has very fruitful properties. It’s not so surprising that evolution chose it.