Nice. This seems indeed one of the measures to try and optimize for.
300 is maybe a bit low. May depend on some specifics of your algorithm, though. Some phenomenons could start to emerge after a longer while (bumping of under active columns etc.)
I find that stability of the output from one exposition to a particular code among your 100, to the next exposition to same code is surely something to aim for. Subsequent quality of TM predictions would depend on it. And on your overlap measurement as well.
The other (conflicting with the above) duty of the SP should be to distribute activations as evenly as possible. Hence boosting and bumping techniques… Hard to balance it all
It is unreasonable to expect that the data pre-processing hardware arranges the data in some topological meaningful way?
If it does then I don’t know that expecting the network to distribute the resulting activation over the entire map is the correct behavior.
Likewise, it is possible that there could be two or more streams being processed by the same map at the same time - each a local focus of activity attached to some separate part of perception?