When is the HTM Spatial Pooler full?

Good question @sjoerdsommen. Adding to what @subutai said: the SP does have a limit to the number of SDRs it can distinguish, but this limit is very high (e.g. 2048 choose 40 which is 2.37*10^84) and is not the practical limit found with real data. There are two reasons for this.

The first is that real data tends to occupy a lower-dimensional manifold in the full space of possible inputs, so the statistics of the distances between real inputs tend to make them less distinguishable than random values chosen from the possible space.

The second is that the SP tends to partially contract its space of outputs by over-using successful columns. @floybix is studying this second issue and discusses it here. While suboptimal from an information theory point of view, this property of the SP is not as damaging as it would be if the inputs were dense, arbitrary representations (e.g. ASCII or float64) rather than semantically encoded pseudoSDRs.

Any finite encoding is already “throwing away” information about small differences in inputs, and a finite SP begins with some generalising property too. As the SP “fills up”, its ability to represent the space of inputs degrades gradually, and it tends to generalise more and more across very similar inputs. This degradation by increasing generalisation is exactly how you would want the SP to operate.

It’s important to bear in mind that the whole “design philosophy” of cortex and HTM systems is about not distinguishing between tiny details in inputs. In the real world, most of these differences are due to irrelevant factors such as measurement errors, signal noise etc. Cortical and HTM systems are built to filter out distracting factors and learn the common spatial and temporal structure which persists in the data. If your task involves distinguishing the fine structure of a large number of examples (e.g. a web log with thousands of users), HTM is not going to be of use.

3 Likes