as much as I like the local learning rules and biological plausibility of HTM as much I dislike the explicit k-winner approach it uses to determine the minicolumns which should be used for learning. It is neither a biologically plausable implementation nor is it well suited to be ported to a GPU (which is currently my concern) as it requires either sorting or k-times some sort of max-masking.
Clearly the cortex accomplishes the feat of sparse activation differently and most likely by balancing excitatory connections by (unspecific) inhibitory connections which effectively could implement input normalization (forward inhibition) as well as sparsification (feedback inhibition) in a local and more biologically plausable manner.
Has there been any attempts or discussions regarding this or can anyone recommend some readings on this subject ?
I am proposing to discipline the HTM model with this proposed behavior in the L2/3 cells. Sparsificatin and lateral binding are natural outcomes of this process.
yes, but i see this as an additional feature (quite an interesting one), but not to solve the sparsification problem by itself
how does this solve the sparsification in your opinion ?
Depending on how far out you set the mutual connections you will get a spacing of x cells to the hex-grid.
If my memory is correct, with the biologically plausible 16 cell diameter this works out to 7 cells out of about 220, or about 3% activation.
I dont understand that. The mutual connections are excitatory. How can this lead to sparsification ?
Some sort of suppression (inhibition) is required to silence non-winning cells.
The excitation is ONLY long range. The balance between the combination of (mutual excitation and local activation) AGAINST the (inhibitory inter-neurons/local inhibition field) allow the hex-grid with the strongest activation to form and suppress all others. (Yes - the inter-neuron inhibition is activated by local cells activating. More activation triggers more inhibition.)
Note that there could be local pockets of strong activation but they are not getting any love from neighboring cells. Once the training learns parts of a distributed pattern it should always be able to recall when presented this pattern and suppress lower matching competitors.
Ah, so you are saying that there is a constant inhibitory background ?
But how does this solve the problem if it is just about to learn something ? I mean let’s consider the initial setup of minicolumn which are randomly connected to the input. How is this mechanism supposed to silence most minicolumns while leaving only a few active for learning ? Am I missing something here ? I really don’t see how this is suppose to work. Or at least you make it sound like it is almost trivial which I dont see. I think there needs to be some sort of adaptive mechanism which leads to competitive dynamics in the minicolumn activation which ends up with only the initially most active ones staying active and silencing all others.
You are correct, in the beginning the cells are competing to learn impinging patterns. From these initial untrained conditions the cells with this special spacing that are being activated at the same time will have a natural advantage to win and learn the incoming pattern.
As they weakly respond they will attempt to recruit distant cells - the only ones that will resonate will be the ones that are simultaneously excited.
I call it a inhibitory field but that is somewhat misleading. The inhibition is the natural reaction to activation from the local cells. I see the balance to regulate to a point where local activation ONLY is just strongly discouraged by inhibitory inter-neurons.
Activation + one remote activation increases local activation to trigger learning, This is also generating additional inhibition.
Activation + two or more is enough to drive attempts recruit additional cells at the special distance. Learning is stronger. I see the projections to other areas as being weak and ineffective at this level - more of an area tonic.
In a fully formed hex-grid the pattern is strongly signalling vial inter-areal projections. Inhibition of competing patterns is very strong at this point.
Note that these levels of activation are discrete integer values 0 through 6, a natural outcome of this natural physical arrangement. I can see them as having very different defined actions. Part of this different actions can be the interaction with the local inhibitory cells. In my mind I do think of it as a variable strength local field
There we have it. What you mentioned here is what I’m interested in and is really not as easy as you make it sound here.
Without an explicit “k-winner selection scheme” as proposed in the original HTM spec there is no such thing as a “winner”. The interaction of inhibitory inter-neurons and excitatory neurons should result in dynamics which shuts down all but the most active excitatory neurons. How to wire these neurons and how they adjust their synaptic strength to each other to get this kind of dynamics is not trivial, though.
Possibly more than you want to know:
I see the overall system running at 10 Hz (alpha rate) with the interaction/competition between L2/3 hex-discipline cells running at 40 Hz. (Gamma rate) I see each round of competition being a rolling average 8x8 window through the activated cells looking for the max integer value. This max is passed to the local inhibition field to be the start of the next pass.
This is where the balance between activation and inhibition is performed: The local cell computation is total activation(some scaling factor) - local inhibition(some scaling factor). Total activation is the direct activation and mutual excitation. And yes - the scaling between the direct activation and the mutual interactions will have to be tuned to get the correct behavior.
Determining the winner mini-columns can be done in linear time, which means that a CPU can do it very quickly. I think that you will find that: the performance bottlenecks in an HTM happen everywhere it interacts with synapses, since they are the most numerous component in the HTM.
The python/numpy function which does this is numpy.argpartition
This is an interesting article, it explains how neurons compete. It describes two stages of the competition: the first where neurons integrate information and emit a few spikes to indicate their approximate input level. In the second stage of the competition the winning neurons emit many spikes which clearly indicates their winner status (as opposed to the single spikes which dont necessarily indicate winners) and also incurrs disproportionately more inhibition which ends the competition.
These results are highly compatible with HTM theory and provide a framework for thinking about how all of the different peices fit together.
They also discuss the short commings of prior models of neural competition and discuss how their approach overcomes them, which lends credance to a concept which had been much discussed but lacked a strong & practical theory.
Reading The HTM Spatial Pooler: a neocortical algorithm for online sparse distributed coding it appears that even though local inhibition is used it still selects the k-winners within the inhibition radius.
I was wondering if someone could explain to me what’s the benefit of doing so instead of using a smaller inhibition radius of 1/sqrt(k) (so the area of inhibition scales by 1/k).
Cheers.
@xortdsc, I’m not a math person, but for a higher level perspective, what you describe would always generate evenly distributed encodings. I remember having a related discussion on this thread. That was related to a bit different implementation than what you are describing, but I think the points @blue2 raised there still apply.
Capacity-wise this might be true, but on the other hand as mentioned quite a lot in HTM theory each “bit” should have semantic meaning so each bit should encode for some semantic meaningful entity (e.g. in V1 it could be a tiny fraction of a curve with some orientation and location). Also looking at the feature map (e.g. orientation maps in V1) there is clearly a structure in which anatomical distances of features detectors relate to the correlation of those features in the input. So something not-so-random is going on with sorting these feature detectors.
I am on a trip so I brought a stack of papers to catch up on This one is very interesting to me as it discusses one of the areas I am trying to work out in the model I am building.
Note that this paper does not reflect the actual inter-neuron distribution. It offers the two variables of strength of inhibition and number of neurons, using a single inhibition field inhibitory inter-neuron.
For the local area of 250 or so mini-columns that are within the arbor connection area of a single mini-column there are a large distributed collection of inhibitory inter-neurons. While the paper may be correct as far as it goes - it fails to address the known in-vivo model.
While I don’t think it directly addresses the question about very simple k-winner systems it does speak to the assumption that a pool of inhibition balances N number of neurons in a winner take all competition. You are correct - with the fragile system described in the paper, tuning for performance is critical.
The paper you posted does allude to the possibility of non-linear inhibition and the possible advantages that could offer. This paper I am posting goes further and discusses the observation that the inhibition is learned as the neuron is trained. You may wish to consider what nature is doing before deciding that a simple inhibition field is all there is for a winner-take-all calculation.