What do the bits of an SDR currently represent in the HTM theory?

SDRs have useful mathematical properties. I find them a very simple and interesting (even though not new at all) data structure (from a CS perspective).

An encoder produces an SDR (from some raw input), where each bit represents a feature of the encoded input. On the other hand, the spatial pooling algorithm produces an SDR whose bits represent (mini?)columns. However, in the BAMI book (p. 12), it’s stated

The bits in an SDR correspond to the neurons in the neocortex

SDRs should, AFAIK, represent the activity of the brain. In particular, they should represent the sparse number of neuron spikes at any point in time. In this regard, saying that the bits of an SDR correspond to neurons in the neocortex, intuitively, makes sense.

What do the bits of an SDR currently represent in the context of the HTM theory?

Of course, the bits can actually represent whatever we like, but, in the HTM theory, they, apparently, represent different things depending on the context, and, in the current HTM theory, they do not seem to represent neurons at all in any algorithm (but I’m also not an expert yet).


I like to think of SDRs as dendrites, with the synapses as the one bits.

From a logical perspective, SDR’s represent semantic meaning. The ultimate purpose is to enable generalization and categorization of novel input based on memory of previous experience. Two things that are interpreted by the system to be similar to each other by some percentage will have that percentage of overlapping bits. Thus, an individual bit itself doesn’t carry much meaning alone. But a collection of bits (due to the mathematical properties you mentioned) can have meaning associated with them.

Functionally, the bits of an SDR can represent a number of things in the HTM algorithms. As @Bitking mentioned, they represent the synapses on a dendrite segment (i.e. which sub-sampling of cells it is connected to). They can represent the “potential pool” of a minicolumn. They can represent the winning minicolumns in SP. And of course they can represent activity and predictions at a given time step.


Keep in mind that one efficient way of representing an SDR is as a collection of indexes to the “1” bits. There are two such SDRs (“activeCells” and “winnerCells”) in the TM algorithm where those “1” bits represent neurons (in fact the functions return the neurons themselves, rather than indexes to them, but I would argue that the same principle applies).


The term ‘SDR’ can apply at more than one scope, though they’re often discussed in reference to the set of 40 columns (out of 2048) chosen to activate by the SP. This is the input to the TM. These can be though of as column-level SDR’s, each 1-bit representing one of the 40 columns chosen.

The SP algorithm selects this set of 40/2048 columns. Each of the 2048 are connected to a subset of the encoding vector, they each receive overlap scores and the 40 highest-overlap columns are activated. So each activate column (1-bit of this level SDR) represents a subset of the input (encoding) space, that was highly active at the current time step.


What do you mean by an “active” subset of the input space? In the sense that each of these active columns “recognizes” that subset of the input space?

Ok, so let’s say the input encoding looks like this:


There are 50 total bits (n) with 10 active (w). The 10 active for this input are bits 11-20. Each of the 2048 SP column is connected a random subset of this encoding.

For example column 1 may connect to bits: [1,5,7,13,16,17,22,27,31,34,38,42,43,46,49]. Of these, only 3 are active (13,16,17). This would give column 1 an overlap score of 3/15. Likewise If column 2 to connected to bits: [3,6,8,10,12,15,16,19,20,25,28,32,37,41,45] its overlap score would be 5/15. These sets of encoding bits that the SP columns link to constitute the columns’s ‘proximal dendrite segments’.

The overlap scores thereby measure how ‘active’ each of the columns’s encoding samplings are (their respective ‘subsets of the input space’). When an SP column becomes active (makes the top 40 overlap scores of 2048), its basically saying: “A lot of the input I’m looking at is activating, so I’m recognizing this input especially well.”

The set of encoding bits that activate (bits 11-20 in this example) is a direct, deterministic mapping from the raw input. So if the raw input ‘13.5’ maps to bits 11-20, maybe the input ‘15’ would map to bits 13-23. The two inputs share many of the same active bits (13-20) because they are qualitatively similar (13.5 is pretty near 15). The level of similarity between 13.5 and 15 depends on the encoding parameters. With encoding bits 13-23 active, each SP column would have a different overlap score, activating probably some though not all of the same columns as encoding bits 11-20.