Does Entropy correctly evaluate whether the SP eﬃciently utilizes all mini-columns?

bowen · December 15, 2018, 10:19am

We define the entropy that

where P(a_i) is defined by , which indicates the average activation frequency of the i’th mini-column during M input timesteps.

and the function curve is

From the paper "The HTM Spatial Pooler—A Neocortical Algorithm for Online Sparse Distributed Coding
", we know that:

The SP will have low entropy if a small number of the SP mini-columns are active very frequently and the rest are inactive. Therefore, the entropy metric quantiﬁes whether the SP eﬃciently utilizes all mini-columns.

Then there comes some doubts. It is obviously that when P(a_i) equals 0.5, the entropy becomes maximum. If we set the activation density to be 2% (i.e. the sparsity should become 2%), while there is some error causing the sparsity to be 50%, then the entropy will be much larger then the correct ones, and we say the SP eﬃciently utilizes all mini-columns. That is not reasonable, isn’t it?

marty1885 · December 15, 2018, 3:25pm

Good skeptical thinking!
I think the assumption is that the SDR density that a SP generates is constant. Under such condition I don’t hink there is a way to exploit the formula.

rhyolight · December 17, 2018, 5:03pm

Without boosting, SP will not efficiently utilize all mini-columns. And yes, we do set a use a constant activation sparsity throughout the process. We don’t change it as time passes or depending on what is being represented.

dmac · December 18, 2018, 4:26pm

It is possible to “normalize” the entropy into the range 0-1. To do this divide by the entropy of the average activation frequence (this is either the hardcoded target freq OR it can be calculated from the data). A result of 1 or 100% indicates maximum utilization, and 0% means the program has serious problems. This normalization makes entropy into a useful debugging tool

rhyolight · December 18, 2018, 4:31pm

Perhaps an interesting technique to fine-tune boosting.

ExoBlue · December 18, 2018, 7:44pm

Distantly related to this thread, folks have looked at classes of logic circuits that maintain the same number of “0-nodes” and “1-nodes” for stable power consumption.

What might be more relevant – if thinking about entropy as a metric correlated with efficiency – is whether higher layers of cognition follow some sort of Boltzmann distribution in physical count or functional activity.

For example, incoming audio at 16 bits of resolution and a 5 KHz cutoff – i.e. 10K samples per second – shrinks from 20Kbytes per second to about 20 bytes per second if reduced to a single voice speaking.

And, intuitively contrasting today’s speech recognition with that of a couple of decades ago, there’s much more exception-archiving and real-time comparison today.

Which, to this non-expert, is somewhat how a child learns. A couple of dozen or hundred approximate rules to get the gist – and then fine-tuning to the mainstream word and idiom levels of understanding.

Topic		Replies	Views
Question on desired sparsity Getting Started sdrs	4	583	December 11, 2019
What happens if i increase the percentage of active columns? Numenta Theory	4	638	November 9, 2017
Understanding Boosting in Spatial Pooler Engineering	3	1134	August 8, 2016
Boosting in HTM and new ways General Neuroscience	27	2450	March 13, 2018
HTM without using minicolumns Tangential Theories	17	1695	September 9, 2020

Does Entropy correctly evaluate whether the SP eﬃciently utilizes all mini-columns?

Related topics