We define the entropy that

where P(a_i) is defined by , which indicates the average activation frequency of the *i*’th mini-column during *M* input timesteps.

and the function curve is

From the paper "*The HTM Spatial Pooler—A Neocortical Algorithm for Online Sparse Distributed Coding*

", we know that:

The SP will have low entropy if a small number of the SP mini-columns are active very frequently and the rest are inactive. Therefore, the entropy metric quantiﬁes whether the SP eﬃciently utilizes all mini-columns.

Then there comes some doubts. It is obviously that when P(a_i) equals 0.5, the entropy becomes maximum. If we set the activation density to be 2% (i.e. the sparsity should become 2%), while there is some error causing the sparsity to be 50%, then the entropy will be much larger then the correct ones, and we say the SP eﬃciently utilizes all mini-columns. That is not reasonable, isn’t it?