Size of SDR

Looking toward the implementations of HTM systems I wonder why SDR are often large e.g. 512x32 or 1024x1024 (two implementations I’ve heard of). To be able to explore the hierarchy of HTM would we not want a small SDR and small macro-columns to limit compute resources?

No doubt, most people implementing HTM think of this. What are the differences in properties of small vs large SDR that make this scaling impossible?

If a cell were modeled with 16 inputs and a minicolumn consisted of 16 cells and an input 8x8, then a macrocolumn consisted of 8 minimolumns this would give an SDR of 32x32 into the macrocolumn. As long as the encoder does not have to encode more information than can represented 1024 bits this seems sufficient.

Obviously a cell with 16 inputs is very unrealistic in terms of bio-mimicry but the whole idea of modelling neuron as a digital component is unrealistic. As a bio-inspired algorithm this might be an acceptable trade-off if hierarchy gives a lot of benefits?

This is a naive question, not a proposed solution. So please don’t take offense :slight_smile:

[Edit: my point is not that a small SDR is the right implementation for a complex system, my question is why not scale down the problem space and the SDR size. So concepts like hierarchy can be explored with less compute overhead]

In a nutshell, the data structure of an SDR is that longer word length, and the coding is a few percent of that long data structure.

In digital computers, we have values in small words that represent thing like numbers or letters. The semantic meaning of individual bits is derived from outside the word - a given bit, say 00001000 does not code for anything in particular; it has no inherent semantic meaning by itself.

In an SDR each of the bits codes for some unique thing - a given “pixel” or the output of a spatial pool in a region. The location of the bit conveys semantic content. Nearby bits should encode similar things.

In the sensory streams that turn into the bit or an area of nearby bits coding for something that has semantic meaning.

In the BAMI book, the section on SDRs goes into this in much greater detail.

For an even deeper dive, this paper goes into the mathematical properties and explains why it is important for the SDR size to be as large as it is.

Or video is that is how you like to learn:

and

2 Likes

I understand the point of sparsity (the S in SDR…) I am saying the smaller SDR would still be sparse. There would be 32x32 bits in the SDR. So at a sparsity of 2% this gives approx 20 bits from 1024 that are “on”. That can still represent a lot of information about the input.

Do you get the critical nature of position coding inside the SDR?

WHERE the bits are in the SDR is important to convey semantic content.

Rather than going back and forth and eventually repeating the section of BAMI in this thread, I would look there first. The referenced technical paper does a good job of explaining why SDRs end up being a relatively large size for any meaningful coding scheme.

Yes and you are not answering the question :slight_smile:

[Edit: I will assume the question is poorly phrased. Why not prototype with an input that can be represented with a 32x32 SDR? You scale down the problem but could still demonstrate “intelligent” behavior - perhaps]

That little 1K SDR holds one value.
Now what?

[Edit: It kind of depends what you are trying to do with it. As the model gets smaller it gets more and more toy-like. There have been a large number of small models built. The one-shot learning and related anomaly detection feature has been amply demonstrated to the point were very little is being added to the understanding of HTM. There is a possibility that you may discover some previously overlooked “killer app” of one shot learning and the related anomaly detection but nothing has turned up so far. I am not aware of any other important behavior being reported in small models.

The one significant application that I am aware of is Cortical.IO, and they do not use HTM to build the model. The “retina” structure is built using plain old fashioned SOM networks, and the readout of the data is just a practical application of sparse data processing.

It may just be my personal opinion but the next advance in the HTM model will be adding in lateral connections between columns and the thalamus connections. This points in the direction of adding the H to HTM and building larger models.]

(snip)

It holds far more than one bit of information. Each bit of the 20 “on” bits can have semantic value. So it is 20bits that can be distributed across up to 1024 different categories - if the SDR was able to make optimal use of the 1024 bits.

Can you point me to where it says that? I have only seen mentions of sets of bits, unions and fuzzy matching. Nothing about meanings for single bits.

Do you have a reference for that (about holding several values at the same time)? I didn’t see it so far.

This is a consequence of the models being “fully connected.” I will let you work out how jamming several SDRs together causes the semantic meaning to be fused.

In the paper " How do neurons operate on sparse distributed representations? A mathematical theory of sparsity, neurons and active dendrites," figures 5A & B and 6 are directly related to the question of SDR size and reliability of the representation. Both the size of the SDR and the number of bits sampled and the relationship to the accuracy of both positive and false-positive cases are probed.

The accompanying text essentially outlines that the performance goes up as SDR length is increased more than it does for increasing the number of message bits in that SDR. A sweet-spot is described where the error performance was better than 10^9. In the discussion, it was explained that there is a point of diminishing returns where more than a certain number does not add very much to the performance.

Executive summary: very small SDRs will have worse performance. Adding more “on” bits to that small network does not help.

That is the underlying math on the selection of SDR size. There are practical concerns with implementation. As Martin pointed out, the models that are practical are rather limited in scope compared to the number of columns needed to exhibit the theoretical behaviors observed in the wetware. Various band-aids are employed such as “boosting” and “fully connecting” the arrays to get the models to do anything.

None of the models I have looked at are large enough to demonstrate topologic behavior. The is a key in-vivo behavior documented by researchers such as Moser. The topology switch in the models just controls how the cells are interconnected locally. All the models I have looked at have to have topology turned off to work at all - they are fully connected. All cells sample all other cells in the model.

Without being to duplicate and manipulate what the brain does there limited capability to test theories on how the wetware is doing what it does. When Jeff says that we really don’t know how this stuff works he is correct. I have read many papers with subtle variations on how the cells interact in the CC and even tiny changes lead to major shifts in the proposed mechanisms. There is some convergence on possible mechanisms driven by better in-vivo probing and recording technology but having a high fidelity simulation would be very helpful in sorting out what is going on.

There is a threshold below which the networks don’t do very much at all. I don’t know what that threshold is. Small (toy?) networks do “something” but I don’t know if they actually convey any useful understanding of what the brain is doing or how it does it. The hot gym model shows a type of learning but it a very far bridge to get from there to a deeper understanding of how that builds to things like the formation of Gabor filters in the V1 region.

Without the interactions that come from larger models, the investigation of the CC by itself is useless - the large-scale interactions with other cells is missing and you have no way of knowing if it is working. We can tweak how the CC works in these tiny models but that may end up being exactly the wrong thing when the model is increased in size. We just don’t know because we can’t test the ideas.

I expect that as the technology to do the simulations get better the band-aids won’t be needed and the emulation of the wetware will get good enough that we will start to see a better agreement with what has been observed in the wetware. We seem to be very far from that now.

2 Likes