Other successful implementations of HTM have also used this optimization (such as etaler for example) @scott commented on this optimization idea on another thread. He mentioned that this is a reasonable optimization and that the algorithm would still work. The cost would be increased likelihood of false positives once you get up to around 10-15 predicted patterns, and he mentioned that this could be mitigated with the right learning rates.
An important aspect is that segments support the reuse of a cell in multiple contexts, without those contexts conflicting and overwriting each other. This ability increases capacity and addresses “catastrophic forgetting”.
This is somewhat subjective to the use-case, but you are correct that in most cases one is not going to be a very good value to use. The setting in vanilla TM is called the activation threshold, and it is primarily associated with balancing capacity and noise tolerance. I did a quick analysis of this property in another thread here.