Are you saying that you don’t think the mini-column structures in the cortex are functionally relevant?
Interesting. Have you done some thought experiments on how transitions between different objects might be handled in the system? I haven’t gotten deep enough into it yet to be able to imagine this particular scenario playing out.
Indeed.
From my understanding of what is absolutely necessary to give rise to the desired functionality in a pooling layer or an inference layer, minicolumns are not actually necessary, per se. what is necessary, however, (and the effect that minicolumnarly-modelled software provides,) is one where predicted patterns cause less cells to become active, than unpredicted ones. this is key.
It is known that this functionality is inherent to some of the algorithm in the temporal memory, however the TM achieves this functionality in a very “unnatural” way- one which i have corrected in the code, which i will be posting soon.
the interesting thing i found, is that if you abandon the minicolumnar scheme, and completely redo the TM and SP to simply support the aforementioned principle (that predicted patterns cause less cells to become active, than unpredicted ones) (specifically using depolarization values and competitive ion uptake inhibition), the code collapsed quite beautifully, and it also opened up the possibility of it being multithreaded.
I have also created the architecture of the code to support inter-layer connections, which will be invaluable to testing the hypotheses in my theory.
Ah, so you mean mini-columns are not necessary for pooling. I think that is consistent with current HTM SMI theory. The output layer does not have mini-column structures. But the input layer does have them.
I would add an additional important element to this thought – I think it is also key for the collection of unpredicted active cells to encompase the smaller collections of cells which would have been active if the pattern had been predicted. This is essentially what the property of minicolumn bursting handles.
The input layer in HTM SMI however is what @dwrrehman referrs to as an “inference layer”, so I believe he means globally that minicolumns are not necessarily a required feature.
well, actually, id go further, and say that the inference layer does not even necessarily need them, which i know is quite contrary to the current HTM theory.
i do have a conceptual reasoning for this, which i will try to write up and post at some point, once it’s evident that my code actually works, (specifically the inference layer code, which wont be modeling columns). i wouldn’t take what i think as fact until i prove it to you, though- HTM might still be right.
absolutely correct, on both comments, @Paul_Lamb.
…For those familiar with my theory to a degree, the reason why i refrain from calling what i call an “inference layer” (HTM TM + HTM SM inference layer), as an “input layer”, is because according to my theory, there actually exist an “inference layer”-type thing, which is anatomically an output layer. And whats more, what Numenta calls an “output layer” (or what i call a “pooling layer”) actually appears anatomically as an input layer.
this is the reasoning for breaking away from the “input layer” and “output layer” scheme, and going more towards the “inference layer” and “pooling layer” nomenclature.
This concept is starting to resonate with me a little… it opens up some some new opportunities for concurrency, as well as being able to use one common learning algorithm for proximal, distal, and apical connections. Like you said, it definitely warrants further testing to verify functional equivalence though. I’ll explore this some myself as well.
Exactly! Im glad its making sense. you are completely right about there being a single algorithmic process for the proximal, distal, and apical dendritic inputs. in fact, in the code which will be up soon, there is actually very little new code, (almost the same algorithm is done for the proximal distal and apical inputs.) the same thing for the dendritic learning portion of the algorithm- there is very little unique code in the learning portion of the algorithm as well.
considering that this same code is used for any inference layer --or with a slight modification-- any pooling layer, it makes you sorta stunned that, according to the theory, so little unique code can actually simulate the entire neocortex. i’m still having a hard time believing it, TBH.
So does each cell have its own unique proximal connections? I view the minicolumn structure as having something in common with convolutional networks. You want to be able to re-use input features where possible, to reduce the number of parameters you need to learn and to help with generalization. In convnets, you can trust that features at any location in an image are equally likely to appear anywhere else, so you can share weights across the image. In HTM, you can trust that the same input features can appear in many temporal contexts, so you can share proximal segments across those contexts. I expect generalization issues if each temporal context needs to learn its own feedforward features.
The way I am considering implementing this is that when input cells activate, the top 2% * 32 best connected cells are selected via winner takes all. If fewer than that number are connected, a random sampling of cells with the least number of proximal synapses are chosen up to that number. Then, if any were in a predictive state, those activate and inhibit the others (predicted active), else they all activate (bursting).
I’m still going through the thought process about how best to select cells for enforcing/ degrading proximal synapses in the learning phase.
well, not quite, I’d say. each cell, like in the SP, has a potential pool, which it can derive many of its inputs from. this is modelling of the fact that when many thalamic axons come into layer 4, for instance, they do not connect to one and only one column or cell, they connect to a plethora of cells, possible not all in the same minicolumn. if this is the case for every axon, (which it probably is) this creates a situation where every single cell in the layer doesn’t just get one thalamic input, but possibly dozens or more, and the resultant feedforward pattern coming into the at each timestep, will simply be the set of cells which have the biggest number of axonal inputs from the thalamus.
Keeping this in mind, and noting on your thought about this method’s generality, oddly, I would actually say it is as or more general than the original minicolumnar method. I’ll explain.
if you imagine a set of cells, (lets say layer 4), that gets input from the thalamus, let’s say, and creates a proximal feedforward pattern in the same way described in the first paragraph, then we can be certain that there are at least a minimal amount of proximal inputs to some cells, otherwise we wouldn’t have even be able to create the current proximal feedforward pattern in the first place. keep in mind, there are no grown distal connections at this point, anywhere in the layer. (we are assuming the layer has just been “born” and everything it sees is unpredicted.) given that we have a these minimal amount of proximal connections, now lets try to imagine the cellular representation for a new specific temporal context, which this previously stated proximal feedforward pattern might occur in.
if you accept the idea that @Paul_Lamb correctly noted earlier, that for a given proximal FF sensory input, any predicted version, (a temporal, or an allocentric locative context which this sensory input might occur in) always is a strict subset of the original set, where the original set is the unpredicted version. more so, the original set of proximally active cells is actually the sensory input in every possible context, which, if you remember is, at the very least, parallel to the idea of bursting minicolumns, --but more so-- it is probably functionally equivalent to, or more biologically correct than minicolumnar bursting.
since any predicted feature is a strict subset of an unpredicted feature, you do not have to grow any new proximal dendrites to represent any specific temporal context, if it truly a subset of an already proximally recognized but unpredicted feature.
i hope that makes a little more sense!
I just moved all the posts related to @dwrrehman’s theory not using minicolumns, because I think this is an interesting side-discussion and it is clogging up the previous topic. I hope that is ok.
Please continue to discuss how this would be implemented without minicolumns here.
I should clarify, I’m not really talking about how “general” the method is. Stop me if I’m misinterpreting, but I’m talking about generalization, which is an entirely different thing. Generalization refers to the ability of a system to transfer what it’s learned to new situations.
So the concern is, in terms of the re-use of parameters that occurs in convnets and minicolumn proximal segment sharing, a method like what you describe seems like it will require more experience to learn. If I can encounter the same feedforward input in 10 different contexts, that get represented by 10 different cells, then surely I need to see that input 10 times as often to get as much training data (for learning feedforward features) as I would if I had a single proximal segment that was trained on all of those contexts. So a given amount of experience may not generalize to new situations as well without sharing segments, because the feature detectors have each been trained on less data.
This actually touches on the uncertainty I alluded to earlier, about which cells to apply enforcement/degradation to their proximal synapses. Do you only train the predicted active cells, or should you also train the inhibited cells (which would have become active if there had been no predictions)? I’m playing around with these ideas now in my own implementation.
@Paul_Lamb @dwrrehman Did this end up going anywhere?
Bumping for progress on this?
The way I ended up implementing this was:
- Randomly connect each cell to the input space with a proximal segment (in the same way as one would connect minicolumns in vanilla HTM).
- Process inputs with the SP algorithm (except the proximal segments are attached to cells rather than to minicolumns). This step selects cells for proximal learning and subsequent operations, but does not activate them yet. The number of cells selected should be set at 2% * total cells * 32.
- Apply the SP learning to the proximal segments (same as vanilla SP, except the proximal segments are attached to cells rather than to minicolumns)
- If any of the cells selected in step 2 were predicted from activity in the previous timestep, activate them and select them for TM learning.
- If none of the cells selected in step 2 were predicted, activate all of them (equivalent to bursting), but do not select any for TM learning yet.
- If fewer than 2% * total cells have been selected for TM learning, select up to that number for TM learning from the cells selected in step 2, and activate them if not already. This step first prioritizes cells best connected to activity in the previous timestep above a configurable threshold, followed by cells with the fewest distal segments (using a random tie breaker).
- Proceed with the normal TM algorithm.
This implementation, while an interesting exercise, does not really provide any benefit over vanilla HTM SP algorithm, and costs 32X the resources. It can be extended to operate as a temporal pooling algorithm which we discussed with @dwrrehman in another thread.