We define the entropy that
where P(a_i) is defined by , which indicates the average activation frequency of the i’th mini-column during M input timesteps.
and the function curve is
From the paper "The HTM Spatial Pooler—A Neocortical Algorithm for Online Sparse Distributed Coding
", we know that:
The SP will have low entropy if a small number of the SP mini-columns are active very frequently and the rest are inactive. Therefore, the entropy metric quantiﬁes whether the SP eﬃciently utilizes all mini-columns.
Then there comes some doubts. It is obviously that when P(a_i) equals 0.5, the entropy becomes maximum. If we set the activation density to be 2% (i.e. the sparsity should become 2%), while there is some error causing the sparsity to be 50%, then the entropy will be much larger then the correct ones, and we say the SP eﬃciently utilizes all mini-columns. That is not reasonable, isn’t it?