Encoder and Spatial Pooler Confusion

I’m just starting to program this system following the many materials available, but I’m confused how encoders->Spatial Pooler->Neurons works. so we need to have an encoder to turn the data into an SDR, and then we run that through the Spatial pooler to get a different SDR?

I might be confused as to what encoding is. If i were encoding hue, i could have a 100 bit array representing the hue, and lets say a block of 12 bits that moves across this, and then wraps around to the start. Would this be considered data that is encoded? Am I supposed to do something to this array before feeding it to the Spatial Pooler, or can I just let the Spatial pooler use this directly and work its magic?

Why does the Data passed into the Spatial Pooler need to be pre-empively Sparse if the Spatial Pooler is going to give you an SDR anyway?

Sorry if this is a stupid question, but I have a bit of trouble wrapping my head around this

1 Like

I have taken this from a scientific paper I wich that helps you ^_^:

SDRs are the neocortically inspired data structure used in HTMs; they represent neurons and their state by means of a binary vector. For such a vector to be an SDR, only a small percentage, typically two percent, of its entries should be active (i.e., set to one). This is based on the fact that the relative number of neurons that are active in the neocortex at any given time is low. SDRs encode the semantic meaning of the data and have some valuable properties such as fault and noise tolerance, can be easily compressed, and the semantic similarity between two inputs can be quickly determined via a bit comparison, among others .

Transforming input data (It ) such as scalar values, dates, and categorical values into binary arrays (xt ) is achieved by using encoders. They are responsible for determining which entries in the array, or which bits, will be ones and which ones will be zero. This must be done in such a way that the semantic characteristics of the data are captured in its encoding. In this way, encodings for semantically similar data will have a larger number of overlapping bits than dissimilar ones.

Each encoded pattern xt is transformed into an SDR representing a set of neurons arranged in columns by the spatial pooler. For the purpose of this component, all neurons in a column are treated as a unit since it is assumed that all neurons within a column detect identical feedforward input patterns . Similarly to the input binary array, each column can be either active or inactive. Furthermore, the resulting SDR maintains the semantic properties of the input array; that is, SDRs of input arrays that are semantically dissimilar will also be dissimilar and vice versa. This is achieved by assigning each neuron with a set of potential connections (or synapses) to a random subset of the input array. Each potential synapse has a permanence value associated with it and only when this permanence value is greater than a threshold, the synapse is said to be connected. A neuron is then said to be active in the output SDR if the number of connected synapses to active entries in the input array is greater than a threshold. The learning process in the spatial pooler consists on adjusting these permanence values so that the model learns to represent spatial properties of the input data using SDRs in a way in which the semantic properties of the input vector are maintained.

The SDR produced by the spatial pooler, a(xt ), is fed as input to the temporal memory. In this case, individual cells or neurons within a column are differentiated upon as at any given point in time, a subset of neurons in the active columns will be used to represent the temporal context of the current spatial pattern. Neurons within a column can be in three different states: active, inactive, or predictive. Neurons in a predictive state become active in the next time step if they receive sufficient feedforward input, that is, if their column becomes active. In this way, neurons in such a state are anticipating that they will be active in the next time step, and hence are predicting what the next input may be. The output of the temporal memory, π(xt ), can then be interpreted as a prediction for a(xt+1), that is, columns that could potentially beactive in the next time step.


I had a similar question before. There can be a little bit of confusion brought by inconsistencies when SDR’s are described in some areas in this forum. By concept an SDR is basically the meaning of Sparse Distributed Representation, you may Google it up and see consistent meanings of it.

By implementation, the Spatial Pooler(SP) accepts an input SDR which is the encoded value of the raw input - the input of the SP. The SP also outputs an output SDR which is basically the activated columns at a given SP step - can be an input to another HTM component (e.g. Temporal Memory).

Yes and I don’t think there are strict rules for encoding data but there are guidelines about maintaining sparsity and semantic meanings. And of course they have to fit the desired input dimensions.


Encoders do not need SDRs for input. Sensory input is also not sparse. Encoders must output SDRs though.

1 Like

so you get your data, you “encode” it into a Sparse representation, and then the Spatial Pooler turns it essentially into another SDR? I guess im having a hard time understanding why you need to encode the data into a sparse representation. it seems the Spatial pooler does this just fine

Because HTM can’t handle data types other than binary vectors which would have vague meanings if they’re too dense.

So with the Spatial Pooler a good density is 2%, could the encoded vector be like 20% dense, or does it need to be around the 2% mark as well to be useful? the htmschool videos (lesson 7) showed some encoded data of dates and times with the gym-power usage example, and they looked around 40% dense. is this the data that’s being passed to the spatial pooler, or is the encoding another step inbetween not shown in that video?

An SDR that an encoder outputs can indeed be quite dense and its density can vary a little.
But the spatial pooler turns it into another SDR has fixed and also very low density(of typically about 2%) so the temporal memory can process.

1 Like

Well thanks very much everyone, that answers my questions. confusion cured!

1 Like

You do not have to do this. Encoder input does not have to be sparse.

I got SDR from corticol.io api can many SDR’s at once into spatial pooler

1 Like

I’m not totally sure but I think since they’re already SDR’s they can bypass the SP and go right into TM.

1 Like

My understanding of the reason why the spatial pooler is sometimes necessary has to do with making more efficient use of the bits in the SDR.

Consider an encoder that must allow input from zero to one hundred because that’s the range you expect your sensor to return. However, in actual use it typically only operates between 20 and 50 with some occasional outliers throughout the rest of the range. You could custom tune your encoder to give it greater resolution in the most used bandwidth, but you’d have to know that ahead if time and the encoder wouldn’t be able to adapt if the sweet spot in the data shifts.

The spatial pooler does each of these things automatically. It automatically adapts to the statistics of the input data and provides a set of representations that give you the greater resolution where the dates is concentrated and less where the data appears less frequently.


This is an interesting point to me since I’m dealing with this issue of long tail distributions with certain sweet spots. What I’ve been doing is setting the RDSE Min and Max values to the 5th and 95th percentile values of the metric, so the encoding space doesn’t get stretched out too much. Do you think this is unnecessary? Or how long tail does the distribution need to be for it to become well advised?

I guess it would depend on whether you want those outliers to be clamped to the ends of the range or (potentially) treated as something special/anomalous and thus have a (possibly) distinct SDR.

1 Like

Well I’d rather them stand out if they really are extreme, I just don’t want the encoder to have a harder time distinguishing between nearer values as a result. So in your example I’d like to set the min/max to 0 and 100 so that those outliers get make unique representations, though would this not make say 31 and 34 harder to distinguish? And since most of the action falls between 20 and 50 might this be a compromising trade off? I appreciate your intuitions on this.

Well, there are a couple of requirements that the encoder and pooler must satisfy.

The first one is fairly obvious. The encoder must be capable of the resolution that you need. The spatial pooler acts on the encoded signal, and will therefore only be able to distinguish between two sensed values if the encoder is also able to resolve them.

The second requirement is also obvious, but has some subtle caveats. The spatial pooler must have the capacity to uniquely describe a sufficient number of states. The subtle caveat is that this doesn’t necessarily mean that it has to have at least as many states as the encoder is capable of providing. Some of those states may never actually be used. In fact most of them will probably never be seen by the pooler.

The behavior we would like to see is that the sensor will generate some kind of output which the encoder will use to produce uniquely encoded states at the desired resolution and with an appropriate amount semantic overlap for similar values. The spatial pooler will then learn to associate it’s unique SDRs with the states that it actually receives from the encoder.

I’ve not actually done this experiment, but my intuition is that in areas where there are many input states that are semantically similar the pooler will begin to generate SDRs that gradually begin to differentiate themselves. By this I mean that they will gradually begin to utilize more and more of the available SDRs. The states from less frequently used ranges will consume a proportionately smaller number of SDRs. I think in this way, the spatial pooler actually begins to learn and encode some useful semantic information about the input space above and beyond that which was originally provided by the encoder.


I certainly like this concept! I wonder if other experts might weight in on this idea? Maybe @rhyolight and/or @subutai and/or @scott :grin:
This min/max setting question seems baked into any application on scalar data, and I want to have the best possible intuitions for handling it! I think the default in ‘getScalarMetricWithTimeOfDayAnomalyParams’ is to add padding to the min and max of the metric, basically the opposite of what I’m doing constraining them to the 5th and 95th percentile values.