@Tachion, I am not a neurscientist, so I’m coming at this from a more practical perspective (I’ll leave your question for the other talented folks on the forum). The “repeating inputs” behavior is a problem today in the practical use of TM. Just exploring some possible solutions (they may be completely off the mark for how nature has addressed the problem). Hopefully something useful can be distilled from these experiments that will help advance HTM down the road (if at the very least to demonstrate what not to do )
They are similar, but not exactly the same. One could say there are essentially two classes of patterns that a system like the brain should be able to model:
- Patterns that are cause by our own interactions with the world
- Patterns that are independent of our actions
SMI relates to the first one, and TM to the second. The difference between the two is the source of the “location” signal. In classic TM theory, the location of a feature is derived from the feature which preceded it (whose location is derived from the feature before it, and the one before that, and the one before that… etc.)
The problem with this strategy is that it doesn’t allow for a sequence to ever end. This is not an issue for long sequences which don’t continuously repeat. But it becomes a problem for short sequences that continuously repeat. For example, in the repeating sequence ABABABABABAB… the 6th B is in a different location than the 3rd B which is in a different location than the 1,000th B. Each time the sequence repeats, it grows a little longer and essentially new “locations” are added.
This is different from SMI, where the locations are able to wrap around. If I move my finger around the lip of a cup, I eventually end up back at the location I started. I don’t end up in new locations each time around. This is the property of path integration, which grid cells bring to the equation.
I should point out that TBT currently also relies on the “output layer” concept from the Columns paper. If you question the validity of this concept, then you must also question TBT itself (at least in the theory’s current form) From the Frameworks paper, section " Rethinking Hierarchy, the Thousand Brains Theory of Intelligence":
The reference to “Lewis et al. 2018” is the Columns Plus paper, which goes into more technical detail on object recognition and grid cells. From this paper, it discusses the method of voting between many sensory patches:
This “additional population of cells” is of course is a reference to the output layer described in the Columns paper. Hopefully this demonstrates that what I have described above isn’t a major deviation from TBT, but rather it borrows a concept from it.
I like the mechanism you introduce here and awesome presentation, thanks! I wanted to ask about learning representations in the output layer. In this example we presume to know about ‘ABCD’ and ‘XBCY’ as repeated sequences, but what if we didn’t know to look for these? Or what if another sequence appears like ‘DAYC’, is there a mechanism for forming a new representation bit in the output layer?
It seems to be that with this feature the flexibility of this pooling mechanism could really help out the TM overall, and especially in sequence classification. Great work!
Yes, I have been working on a mechanism which utilizes hex grid formation. I’ll go into it in more depth on a separate thread (didn’t want to muddy the waters on this thread, since the focus here is to explore strategies for addressing the repeating inputs problem)
A potential solution for this is just sample the burst, i.e. only a random set of the non-predicted column can burst. If the size of that set is bigger than the sparsity, you still will connect the sequence. In my implementation, that approach seems to be working just fine.
@vpuente Could you elaborate? I’m not sure what you mean by “sample the burst” in this context.
If the column is not predicted, just flip a random ( with a low sampling rate) to decide if the burst has to be done or not.
if (randomFloat(1.0f) < sampleRate) // 1>> SampleRate >= sparsity (~0.02)
The key is to build the sequence A->A not at once but progressively. If the sample rate is low, the number of cells shared between A and A’ will be pretty high. In a few steps you will be back in A (A->A’->A’’->A) and your fixed signal issue is done. If another symbol appears in the sequence, you will fail. The eager approach has the potential problem of having a waste of synapses/cells for those constant signals. My lazy way is that you will fail at the point that AAAA sequence ends. In general lazy burst seems to be beneficial, in terms of synaptic load.
So if the random flip decides that a burst should not happen, then what happens to the state of cells in that minicolumn? Presumably no cells in that minicolumn become active?
In your example, is A in a specific context (similar to A’ and A’’), or does it mean A without context (i.e. bursting minicolumns)? Sorry if this is a dumb question. I’m having trouble visualizing how this statement follows if you chose to skip some of the minicolumns during the bursting step:
Do you happen to have some code I could look at which uses this strategy?
Exactly: you just ignore the burst in that minicolumn. Nothing should happen there this time. Sooner or later the random flip should decide to burst that minicolumn.
The key is to wait multiple times the sequence before to learn the whole context. Somewhat you are building the connections between cells in two values in the sequence, one-by-one not all at once.
My code is really convoluted and nasty to share I can’t even understand it Hope it will change in the future.
In any case, you can test in yours jut by inserting that if in the burst of the minicolumn. It should learn the sequence (a bit slower).
BTW: Coincidentally burst in the biological systems are also stochastically produced (although the meaning is not necessarily the same than in HTM )
Got it. I think this strategy might also lead to ambiguity between the C in ABCD vs the C in XBCY, in particular if they were both learned around the same time (i.e. not thoroughly train on one before training on the other). However, this is a bit tricky to visualize, so I’ll have to experiment to see whether or not this intuition is correct.
You are correct. But LTD should fix that later.
Anyway, note that in HTM LTD is not really “bio” faithful. From a biological perspective, you should decrement permanence if presynaptic cell fires and postsynaptic doesn’t. HTM leaves “stale” synapses because you only punish those synapses if the postsynaptic segment reaches the threshold. I’m using a more “bio” way (to do that requires deeper changes in the base architecture of the code).
I will definitely need to test this to get a firmer understanding. Initially, it seems to me that if the representation for C in ABCD contains some of the same cells as the C in XBCY, then as long as both of the sequences reappear often enough that the global decay process doesn’t cause one of them to forgotten, the ambiguous connections for C in both contexts would be reinforced.
This means that there are different numbers of winnerCells for different bursted inputs right? With 40 active columns, a totally unpredicted input would normally yield 40 bursts and 40 winnerCells chosen.
As I understand the process of:
burst --> choose winner cell --> build new segment
guarantees that a column caught off guard always learns. With an ignored burst the column doesn’t learn right? Whether or not its costly long term, just curious if I have that correct.
You build the new segments with synapses to all the winner cells (one in each bursting minicolumn) if there is no segment with synapses to the previous activations. If there is a “partially” good segment there, you should grow new synapses in it. It’s like creating the connections between presynaptic and postsynaptic cells one by one (not all at once).
So the all prior winnerCells are included as presynaptic cells on the new segments? This sounds like upping the ‘maxNewSynapseCount’ parameter (or something similar). So instead of all columns bursting and learning on a subset of prior winnerCells, its some columns bursting and learning on the full set of prior winnerCells – correct?
I think that is not the same. maxNewSynapseCount serves to subsample the previously cells. Here you are also “subsampling” segment creation. I think maxNewSynapseCount models a sort of “spatial sampling” while this is some sort of “temporal sampling”.
What is the problem with just using backtracking?
is it just because it is not biologically inspired? or is there any disadvantage when using Backtracking?
IMO, there is nothing in particular wrong with this approach from a practical perspective. I’m mainly interested in exploring problems like this one from different angles and getting other folks’ perspectives. There is a lot of background knowledge in the community here, from neuroscience to computer science to evolution to electrical engineering.
I see =) thanks =)