When is the HTM Spatial Pooler full?

Hi all,

I understand the statistical properties of SDRs as explained in the whitepaper and by the beautiful youtube tutorials of Matt (I’m surprised by the low view counts because I find them very good!). A 2048 bit Spacial Pooler with around 2% of “on” bits can represent a huge amount of patterns. However, the true memory of the SP in NuPiC is represented by the connections and potential connections (synapses and potential synapses) this SP has with the input space. Do we have any idea when this (say 2048 bit) SP is “full”, meaning that it starts to fail to distinguish between different patterns? Of course this would all depend on how many potential synapses a node in the SP has with the input space.

2 Likes

I was going to answer that it is covered by the Jeff & Subutai’s SDR paper. But I guess the “practical” limitations might be more encumbering than the theoretical number of different representations you can get with 2048 columns and 2% sparsity.

Incidentally the paper estimates it to be in the magnitude of billions or maybe larger (I don’t have the paper in front of me). So large in fact that the question becomes a bit moot.

In theory there are indeed billions of representations possible if we have 2048 columns and 2% sparsity. However, consider a practical case where before training these 2048 columns have 2048 potential synapses with a 2048 bit input space. Each column here is connected to one unique bit in the input space. In this case this specific SP is not going to be able to represent a lot of different input values. Probably only 10 or so. If this SP is trained and all potential synapses have formed a synapse with the associated input bit this SP is not going to be able to distinguish between different inputs. This will already be the case from 50 or so different input bits learned. Of course this number can be increased a lot if we connect the 2048 columns to a larger input space, we allow overlapping potential synapses and we take multiple potential synapses connecting to this input space.
I was wondering by how much this number will increase. Therefore the question, when in the spacial pooler full?

This assumption isn’t quite true.

There are two states to every pairing of input bit to column. Firstly, each column has multiple input bits in its “pool”, meaning that it has designated input bits to which it can become “connected” - but they are not yet connected. Also, every input bit in a column’s pool has the ability to have its synapse permanence incremented whether it is connected or selected.

Of all the input bits whose permanence exceeds a “connected threshold”, only one of those connected bits are selected.

Each column here is connected to one unique bit in the input space. In this case this specific SP is not going to be able to represent a lot of different input values.

In this specific hypothetical case (remember most often there could be multiple synapses whose permanence exceeds the “connected threshold”), synapses belonging to other bits can have their permanences incremented above the threshold. And in fact, there is a boosting mechanism external to this “fairness” policy, that when employed (some debate over its idiosyncrasies is being spurred currently), greatly increases the likelihood of other synapses which aren’t currently connected - becoming connected.

If this SP is trained and all potential synapses have formed a synapse with the associated input bit this SP is not going to be able to distinguish between different inputs.

The SP is an “online” learning algorithm which means (due to decrementing of connected synapses whose input bits are “zero”), that it will (over time) adapt to an entirely different space of input bits from potentially different problem domains.

Now… when encountering a totally different space of inputs, the SP will need cycles in order to “adapt” and the columns which initially are selected to represent the new input, won’t necessarily be correct for a period of time following the input domain change. By “correct”, I mean that the SP’s job is to allow similar columns to represent similar inputs such that resulting SDRs will contain columnar selections proportional to the overall similarity of the input - relative to the similarity that each member of that set of inputs has with each other. See Raul’s explanation

Of course this number can be increased a lot if we connect the 2048 columns to a larger input space, we allow overlapping potential synapses and we take multiple potential synapses connecting to this input space.
I was wondering by how much this number will increase. Therefore the question, when in the spacial pooler full?

Remember there are a number of different “criteria” used to decide which “active” columns will represent a given input:

  1. Each column has synapses to a random subset of 50% of the input bits. (Its pool)
  2. The same input bits are very likely to appear in several columns input pool.
  3. An input bit must be above a connection threshold in order to become “eligible” for the next selection process.
  4. Only 2% of the connected columns are selected to become active. This can occur either as the highest connected columns globally or the highest connected columns within a given inhibition radius.
  5. All synapses that are connected but whose input bits are zero are decremented. All synapses whose input bits are “1” are incremented.

Then there is the eventual inclusion of @fergalbyrne 's paCLA idea, which is to further reward or punish those SP/Columnal-synapses which arise from predicted inputs or unpredicted inputs; to allow the TM to influence the Spatial distribution. This enhancement to the theory is deemed significant enough to consider as an addition to the algorithm, but has not been thoroughly examined by Numenta to date, and has only been been partially implemented in comportex and htm.java

Therefore the question, when in the spacial pooler full?

I suppose theoretically it is possible for the SP’s compression of the entire input domain to become “saturated” meaning that if encountering an input set which varies widely enough; resulting SDRs could eventually share the same columns (erroneously) due to the lossy compression?

I think that the robustness reported by the paper is addressing this possibility too? I’m not sure, maybe not? I would like this confirmed also?

If I understand the original question, you are asking in practice how many input patterns can lead to unique SP outputs? In theory there are 2048 choose 40 outputs, but in a practical scenario how many input patterns can it actually distinguish? Is this correct?

If so, the answer is hard to quantify exactly, but it is very large. Consider a randomly initialized SP (no training ) with 2048 columns. Say the input vector is 500 bits, and each column is randomly connected to 50% of those input bits. So each column contains 250 synapses connecting to 250 random input bits. There is no learning in this example.

For a given input vector, the 40 columns with the highest overlap will be chosen as the winners and this will be the output of the SP. It turns out that even if you change a few bits in the input, the output will change by at least one column.

As such there are a very large number of inputs that can be distinguished. The exact number will depend on the various parameters but I don’t know how to calculate it exactly - good exercise for someone to figure out!!

2 Likes

Thanks!
The reason for asking this question was because I was wondering about the following. Let’s consider 2048 columns which are connected to some input sensor, so this array of columns is at the bottom of the hierarchy. Let’s say that there is a huge amount of patterns which flow into this array. So much actually that there are too many patterns for the columns to actually distinguish them all. I was wondering if this array of columns would have some decentralized method of recognizing that it’s actually “full” and it’ll need more memory. In order figure out such method we must first know when it is full, therefore the question.
I think that in a practical situation, let’s say an app which utilizes HTM and is used by a large amount of users and processes huge amounts of data, this is a valid question. Am I right? Or am I perhaps not understanding some of the basics here.

If you use something like what I do here :

http://ifni.co/spatial_mapper.html

there should be no limit i.e. as long as the learning-mode is ON the SOM should adapt to the incoming data. The right question in this case should be how well it represent last-N unique data items.

If you need better representation you “just” need to change the algorithm to use multiple seeker-SDRs per bit and how do you pick the “winners”.
I’m currently testing modification that does both Spatial-pooler-like and Spatial-mapper and classifier with n-step prediction with very minimal change in the code.
(It seems more and more probable that Spatial Pooler, Mapper, Classifier and Temporal memory are almost the same thing with tweaks on the learning-mode and picking winner-2%, with some preprocessing)

I would also speculate that Numenta Spatial Pooler should be adaptive too !?!

2 Likes

Good question @sjoerdsommen. Adding to what @subutai said: the SP does have a limit to the number of SDRs it can distinguish, but this limit is very high (e.g. 2048 choose 40 which is 2.37*10^84) and is not the practical limit found with real data. There are two reasons for this.

The first is that real data tends to occupy a lower-dimensional manifold in the full space of possible inputs, so the statistics of the distances between real inputs tend to make them less distinguishable than random values chosen from the possible space.

The second is that the SP tends to partially contract its space of outputs by over-using successful columns. @floybix is studying this second issue and discusses it here. While suboptimal from an information theory point of view, this property of the SP is not as damaging as it would be if the inputs were dense, arbitrary representations (e.g. ASCII or float64) rather than semantically encoded pseudoSDRs.

Any finite encoding is already “throwing away” information about small differences in inputs, and a finite SP begins with some generalising property too. As the SP “fills up”, its ability to represent the space of inputs degrades gradually, and it tends to generalise more and more across very similar inputs. This degradation by increasing generalisation is exactly how you would want the SP to operate.

It’s important to bear in mind that the whole “design philosophy” of cortex and HTM systems is about not distinguishing between tiny details in inputs. In the real world, most of these differences are due to irrelevant factors such as measurement errors, signal noise etc. Cortical and HTM systems are built to filter out distracting factors and learn the common spatial and temporal structure which persists in the data. If your task involves distinguishing the fine structure of a large number of examples (e.g. a web log with thousands of users), HTM is not going to be of use.

3 Likes