I like the mechanism you introduce here and awesome presentation, thanks! I wanted to ask about learning representations in the output layer. In this example we presume to know about ‘ABCD’ and ‘XBCY’ as repeated sequences, but what if we didn’t know to look for these? Or what if another sequence appears like ‘DAYC’, is there a mechanism for forming a new representation bit in the output layer?
It seems to be that with this feature the flexibility of this pooling mechanism could really help out the TM overall, and especially in sequence classification. Great work!
Yes, I have been working on a mechanism which utilizes hex grid formation. I’ll go into it in more depth on a separate thread (didn’t want to muddy the waters on this thread, since the focus here is to explore strategies for addressing the repeating inputs problem)
A potential solution for this is just sample the burst, i.e. only a random set of the non-predicted column can burst. If the size of that set is bigger than the sparsity, you still will connect the sequence. In my implementation, that approach seems to be working just fine.
The key is to build the sequence A->A not at once but progressively. If the sample rate is low, the number of cells shared between A and A’ will be pretty high. In a few steps you will be back in A (A->A’->A’'->A) and your fixed signal issue is done. If another symbol appears in the sequence, you will fail. The eager approach has the potential problem of having a waste of synapses/cells for those constant signals. My lazy way is that you will fail at the point that AAAA sequence ends. In general lazy burst seems to be beneficial, in terms of synaptic load.
So if the random flip decides that a burst should not happen, then what happens to the state of cells in that minicolumn? Presumably no cells in that minicolumn become active?
In your example, is A in a specific context (similar to A’ and A’'), or does it mean A without context (i.e. bursting minicolumns)? Sorry if this is a dumb question. I’m having trouble visualizing how this statement follows if you chose to skip some of the minicolumns during the bursting step:
Do you happen to have some code I could look at which uses this strategy?
Exactly: you just ignore the burst in that minicolumn. Nothing should happen there this time. Sooner or later the random flip should decide to burst that minicolumn.
The key is to wait multiple times the sequence before to learn the whole context. Somewhat you are building the connections between cells in two values in the sequence, one-by-one not all at once.
My code is really convoluted and nasty to share I can’t even understand it Hope it will change in the future.
In any case, you can test in yours jut by inserting that if in the burst of the minicolumn. It should learn the sequence (a bit slower).
BTW: Coincidentally burst in the biological systems are also stochastically produced (although the meaning is not necessarily the same than in HTM )
Got it. I think this strategy might also lead to ambiguity between the C in ABCD vs the C in XBCY, in particular if they were both learned around the same time (i.e. not thoroughly train on one before training on the other). However, this is a bit tricky to visualize, so I’ll have to experiment to see whether or not this intuition is correct.
Anyway, note that in HTM LTD is not really “bio” faithful. From a biological perspective, you should decrement permanence if presynaptic cell fires and postsynaptic doesn’t. HTM leaves “stale” synapses because you only punish those synapses if the postsynaptic segment reaches the threshold. I’m using a more “bio” way (to do that requires deeper changes in the base architecture of the code).
I will definitely need to test this to get a firmer understanding. Initially, it seems to me that if the representation for C in ABCD contains some of the same cells as the C in XBCY, then as long as both of the sequences reappear often enough that the global decay process doesn’t cause one of them to forgotten, the ambiguous connections for C in both contexts would be reinforced.
This means that there are different numbers of winnerCells for different bursted inputs right? With 40 active columns, a totally unpredicted input would normally yield 40 bursts and 40 winnerCells chosen.
As I understand the process of: burst → choose winner cell → build new segment
guarantees that a column caught off guard always learns. With an ignored burst the column doesn’t learn right? Whether or not its costly long term, just curious if I have that correct.
You build the new segments with synapses to all the winner cells (one in each bursting minicolumn) if there is no segment with synapses to the previous activations. If there is a “partially” good segment there, you should grow new synapses in it. It’s like creating the connections between presynaptic and postsynaptic cells one by one (not all at once).
So the all prior winnerCells are included as presynaptic cells on the new segments? This sounds like upping the ‘maxNewSynapseCount’ parameter (or something similar). So instead of all columns bursting and learning on a subset of prior winnerCells, its some columns bursting and learning on the full set of prior winnerCells – correct?
I think that is not the same. maxNewSynapseCount serves to subsample the previously cells. Here you are also “subsampling” segment creation. I think maxNewSynapseCount models a sort of “spatial sampling” while this is some sort of “temporal sampling”.
What is the problem with just using backtracking?
is it just because it is not biologically inspired? or is there any disadvantage when using Backtracking?
IMO, there is nothing in particular wrong with this approach from a practical perspective. I’m mainly interested in exploring problems like this one from different angles and getting other folks’ perspectives. There is a lot of background knowledge in the community here, from neuroscience to computer science to evolution to electrical engineering.