The size of a cortical column / macrocolumn varies somewhat based on sensory area, level in hierarchy, and species. The exact number of minicolumns per cortical column is a bit hard to pin down precisely.
All our sparse distributed codes use an M-out-of-N coding strategy. In our Columns paper we assumed a cortical column contained between 150 and 250 minicolumns. See the section on Capacity in this paper - in that section M was 10.
The robustness of these codes increases dramatically as N increases (see this paper).
I think that this is a rather short sighted way to look at capacity.
The number of glyphs in the English language is small - perhaps 60 or so in common usage. With the correct arrangement these are capable of being grouped and sequenced into a virtually infinite number of representations.
Bringing this concept to the neural representation requires the consideration of inter-map connections and serial transitions. This last item is an area where the HTM concept shines. The contribution of hierarchy is yet to be seen but I expect good things.
Actually, I think I found my mistake…
Apparently there was a big misunderstanding on what is meant by “macrocolumns RF do not overlap”.
I thought it would mean that neighboring macrocolumn’s RFs intersection would be zero, but what it seems to mean is that neighboring macrocolumns RFs ratio of intersection is (slightly) less than one (so they don’ FULLY overlap but have offset, yet still strongly overlapping RFs).
Can somebody who is pretty firm in neuroscience confirm or deny ?
hmm, but that would mean that my initial assumption was correct and that there is roughly only 1 minicolumn per input fiber which i believe makes it rather hard to develop proper feature detectors for natural input in V1 or i’m still missing something here…
Each mini-column has a small number of branching dendrites - perhaps six. As they branch the number of active dendrite segments increase to a much larger number.
An accending axon is potentially sampled by dendrites from layers 2/3, 4, 5, and 6.
With the link I just posted it should be clear that this accending fiber is available to approx 400 mini-columns for potential sampling.
The dendrites have been observed to change their shape to wire to new axonal connections as needed.
Based on this I really don’t know how you are getting your rather limited numbers.
Well, I agree that every minicolumn has a much larger RF, but that doesn’t matter as long as the ratio is only 1 minicolumn per 1 input.
If a minicolumn extracts one feature of its surrounding inputs, neighboring minicolumns could easily extract other features of those same inputs as their RFs could surely span quite some range and fully overlap with the first minicolumn’s RF. I agree on that.
BUT since you also want to detect features on neighboring patches of input you get into the same problem as you used minicolumns to detect more features from the initial patch you will lack them now on the neighboring patch. No matter how you turn it, if there is only a ratio of 1 feature detector per 1 input it greatly limits the fidelity of the generated feature maps (either in spatial resolution or in number of features).
Maybe I am seeing your problem. The 100 or so cells that make up a mini-column compete to sense a transition.
The same mini-column is capable of sensing a vast number of transitions. These mini-columns compete to “be the one” to recognize the local transition feature in the macro-column. . In my post on Hex-grids I postulate that layer 2/3 is hitting on fixed patterns and that there is a form of voting going between layers in the mini-column. (temporal and spatial patterns)
In any case - the feature recognizers compete to signal that they are “the one” to win in this macro-column. Here is where I differ from the the HTM canon - HTM as written picks a winner based on some sparsity metric; every macro-column is an island of recognition.
In my Hex-grid proposal the competition between potential grid forming pattern recognition binds the local recognition to adjacent patterns and sparsity is a natural by-product.
I am not sure how you intend to divorce HTM from time. It is exactly what the HTM model is all about - signaling that a predicted pattern has happened - or not.
This pattern prediction is what is being detected and grouped at the macro-column level.
Unless i have things completely wrong you have to start with the dendrites of a cell detecting some learned pattern and biasing the soma into a predictive state. On the next cycle that same cell sees the pattern “faster” than it’s neighbors and fires first triggering the local inhibitory inter-neurons to silence the “slower” losers.
This winning neuron is the response of that column for a successful predicted pattern. That is what is available to be spatially pooled.
If nobody is successful then nobody is a winner and bursting at the mini-column level drives all the cells to learn if they synapse on a firing axonal projection.
Sure HTM is about temporal prediction. But the basis to do so is spatial pooling. First there is spatial pooling and then there is temporal prediction on the spatially pooled patterns. You can just run the spatial pooling process (with a single cell per minicolumn) completely abandon the temporal pooler. Of course then you loose temporal prediction, but you still learn a set of spatial patterns which occur most often in the input space and therefore form nice labels to spatially pool against. On the other hand you can not just run a temporal pooler without the spatial pooler as the temporal pooling is based on the spatial one.
I talk solely about spatial pooling in this thread and the amount of spatial patterns the system can store and recognize. I do NOT talk about sequential prediction of those patterns in any way. Forget about the temporal stuff for now.
oh, you are right. i meant temporal prediction (not pooling). and yes, no need for minicolumns at all. those could simply be cells (e.g. simple cells of V1). i was just trying to convey to bitking that my question is not at all about temporal prediction and its capacity, but only about capacity of spatial pooling (so spatial feature recognition). there is no temporal aspect in my question.
And thanks for the paper link. This seems to be an updated SP I believe ? The one I implemented (given some pseudocode on the numenta website) used global inhibition and was globally connected, which worked, but i didnt think it was very biologically plausible). I’ll give that paper a read…
Not really updated. The SP has been the same since pretty much forever. There are versions with global inhibition and local inhibition. But when you add local inhibition, you are adding topology and so the input data needs to also respect some topology. It makes things trickier.
Well, I do use topological inputs (images) for my tests.
Actually I was wondering: Is there a benefit of using a let’s say 10-winner-takes all with a large inhibition radius versus a single-winner-takes-all with a smaller inhibition radius (actually that much smaller than its area of inhibition is only 1/10th of the first version). The latter would be much easier to implement and much faster. Does the former perform better ? And if so, why ?
I did read it.
I do see the benefits of using SDRs, but you can achieve fixed-sparsity either way. Robustness to noise, flexibility, and fault tolerance is intrinsic to using SDRs which happens in either case. Utilization of all available resources is done by boosting which is independent of the way sparsity is enforced.