Capacity of a Macrocolumn


The size of a cortical column / macrocolumn varies somewhat based on sensory area, level in hierarchy, and species. The exact number of minicolumns per cortical column is a bit hard to pin down precisely.

All our sparse distributed codes use an M-out-of-N coding strategy. In our Columns paper we assumed a cortical column contained between 150 and 250 minicolumns. See the section on Capacity in this paper - in that section M was 10.

The robustness of these codes increases dramatically as N increases (see this paper).



I think that this is a rather short sighted way to look at capacity.

The number of glyphs in the English language is small - perhaps 60 or so in common usage. With the correct arrangement these are capable of being grouped and sequenced into a virtually infinite number of representations.

Bringing this concept to the neural representation requires the consideration of inter-map connections and serial transitions. This last item is an area where the HTM concept shines. The contribution of hierarchy is yet to be seen but I expect good things.


Actually, I think I found my mistake…
Apparently there was a big misunderstanding on what is meant by “macrocolumns RF do not overlap”.
I thought it would mean that neighboring macrocolumn’s RFs intersection would be zero, but what it seems to mean is that neighboring macrocolumns RFs ratio of intersection is (slightly) less than one (so they don’ FULLY overlap but have offset, yet still strongly overlapping RFs).
Can somebody who is pretty firm in neuroscience confirm or deny ?
Cheers !

This has been the topic of much discussion.

I think that this post may give you some answers:


hmm, but that would mean that my initial assumption was correct and that there is roughly only 1 minicolumn per input fiber which i believe makes it rather hard to develop proper feature detectors for natural input in V1 :confused: or i’m still missing something here…

The reason why I thought that I misinterpreted that “overlap” was due to a paper.
Here, below Eq1 it reads

In the results described in this paper the number of mini-columns at each retinotopic location in the non-granular layer was set to 100.

which I thought can be interpreted as 100 minicolumns per input fiber.

Each mini-column has a small number of branching dendrites - perhaps six. As they branch the number of active dendrite segments increase to a much larger number.
An accending axon is potentially sampled by dendrites from layers 2/3, 4, 5, and 6.
With the link I just posted it should be clear that this accending fiber is available to approx 400 mini-columns for potential sampling.
The dendrites have been observed to change their shape to wire to new axonal connections as needed.
Based on this I really don’t know how you are getting your rather limited numbers.

Well, I agree that every minicolumn has a much larger RF, but that doesn’t matter as long as the ratio is only 1 minicolumn per 1 input.
If a minicolumn extracts one feature of its surrounding inputs, neighboring minicolumns could easily extract other features of those same inputs as their RFs could surely span quite some range and fully overlap with the first minicolumn’s RF. I agree on that.
BUT since you also want to detect features on neighboring patches of input you get into the same problem as you used minicolumns to detect more features from the initial patch you will lack them now on the neighboring patch. No matter how you turn it, if there is only a ratio of 1 feature detector per 1 input it greatly limits the fidelity of the generated feature maps (either in spatial resolution or in number of features).

Maybe I am seeing your problem. The 100 or so cells that make up a mini-column compete to sense a transition.
The same mini-column is capable of sensing a vast number of transitions. These mini-columns compete to “be the one” to recognize the local transition feature in the macro-column. . In my post on Hex-grids I postulate that layer 2/3 is hitting on fixed patterns and that there is a form of voting going between layers in the mini-column. (temporal and spatial patterns)
In any case - the feature recognizers compete to signal that they are “the one” to win in this macro-column. Here is where I differ from the the HTM canon - HTM as written picks a winner based on some sparsity metric; every macro-column is an island of recognition.
In my Hex-grid proposal the competition between potential grid forming pattern recognition binds the local recognition to adjacent patterns and sparsity is a natural by-product.

1 Like

With transitions you step into the temporal domain. My post is purely about spatial pooling. This must be done before any temporal sequencing can be done. So first things first :wink:

I am not sure how you intend to divorce HTM from time. It is exactly what the HTM model is all about - signaling that a predicted pattern has happened - or not.

This pattern prediction is what is being detected and grouped at the macro-column level.

Unless i have things completely wrong you have to start with the dendrites of a cell detecting some learned pattern and biasing the soma into a predictive state. On the next cycle that same cell sees the pattern “faster” than it’s neighbors and fires first triggering the local inhibitory inter-neurons to silence the “slower” losers.
This winning neuron is the response of that column for a successful predicted pattern. That is what is available to be spatially pooled.

If nobody is successful then nobody is a winner and bursting at the mini-column level drives all the cells to learn if they synapse on a firing axonal projection.

Sure HTM is about temporal prediction. But the basis to do so is spatial pooling. First there is spatial pooling and then there is temporal prediction on the spatially pooled patterns. You can just run the spatial pooling process (with a single cell per minicolumn) completely abandon the temporal pooler. Of course then you loose temporal prediction, but you still learn a set of spatial patterns which occur most often in the input space and therefore form nice labels to spatially pool against. On the other hand you can not just run a temporal pooler without the spatial pooler as the temporal pooling is based on the spatial one.
I talk solely about spatial pooling in this thread and the amount of spatial patterns the system can store and recognize. I do NOT talk about sequential prediction of those patterns in any way. Forget about the temporal stuff for now.

This sentence catches me wrong. Spatial Pooling does not require minicolumns or defining how many cells are in them. That is the Temporal Memory algorithm. And there is no temporal pooler today.

And if you want all the details about the spatial pooler, read this: The HTM Spatial Pooler: a neocortical algorithm for online sparse distributed coding.

oh, you are right. i meant temporal prediction (not pooling). and yes, no need for minicolumns at all. those could simply be cells (e.g. simple cells of V1). i was just trying to convey to bitking that my question is not at all about temporal prediction and its capacity, but only about capacity of spatial pooling (so spatial feature recognition). there is no temporal aspect in my question.
And thanks for the paper link. This seems to be an updated SP I believe ? The one I implemented (given some pseudocode on the numenta website) used global inhibition and was globally connected, which worked, but i didnt think it was very biologically plausible). I’ll give that paper a read…

1 Like

Not really updated. The SP has been the same since pretty much forever. There are versions with global inhibition and local inhibition. But when you add local inhibition, you are adding topology and so the input data needs to also respect some topology. It makes things trickier.

Well, I do use topological inputs (images) for my tests.
Actually I was wondering: Is there a benefit of using a let’s say 10-winner-takes all with a large inhibition radius versus a single-winner-takes-all with a smaller inhibition radius (actually that much smaller than its area of inhibition is only 1/10th of the first version). The latter would be much easier to implement and much faster. Does the former perform better ? And if so, why ?

I’m curious if you read the linked Yuwei Cui, Subutai Ahmad, and Jeff Hawkins paper above?

When you ask about benefits there are several answers depending on what you are looking for.

The paper lists: fixed-sparsity representation, utilization of all available resources, robustness to noise , flexibility, and fault tolerance as desirable metrics.

There is some exploration of these properties and the math to show the metrics for these properties. You should be able to manipulate these to optimize whatever property you are looking for.

Rather than me telling you what I think is better it may be best if you decide what trade-offs you consider the best.

I did read it.
I do see the benefits of using SDRs, but you can achieve fixed-sparsity either way. Robustness to noise, flexibility, and fault tolerance is intrinsic to using SDRs which happens in either case. Utilization of all available resources is done by boosting which is independent of the way sparsity is enforced.

I thought that you asked and answered the efficiency question on an earlier thread:

right. i just asked it here again, because rhyolight linked the paper in which they still pick k-winner instead of narrowing the inhibition radius, so i thought he might know.