HTM is so much more than show tunes

We talk a lot about temporal sequences but HTM is also good for episodic memory, procedural memory, structural memory (I think this is talked about in allocentric sensing). In episodic memory we have a sequence that consists of bigger chunks than second by second input. We go to a restaurant (al la Roger Schank) we are greeted by the maitre d we are led to a table we sit we are offered menus… There may be time gaps, varying time periods, but it is a sequence well suited to HTMs. Likewise a procedure to bake a cake with sequential steps of varying length with pauses and wait times is well suited to HTM. I would like to see broader use of HTM. I am told HTMs were mentioned at the AI meeting in Finland this week. Finland says it plans to lead the world in AI along with China and Saudi Arabia, etc…

I can see HTM recording transitions from one state to another.

abc

What should I expect with the dead spaces between these transitions?
If something is longer than some limited number of transitions does HTM see these as different?
abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbc
abbbbbbbbbbbbbbbbbbbbbbbbc
What is the upper sequence length before HTM shrugs and says - many?

Anyone?

The case of a single repeating input is an interesting topic that I have explored a bit. Not being a math person, I can’t give any concrete numbers, but I can describe the behavior. For simplicity, lets assume the parameters are set such that a transition can be learned in one timestep. There are a couple of different possibilities, depending on the implementation.

Implementation where a cell can be both predictive and active

The first three timesteps burst. This is followed by winners from T=2 being correctly predicted, then the winners from T=3 being correctly predicted. This is then followed by a burst.

This burst puts two cells in each minicolumn into predictive state (the winners from T=2 and T=3) which both become active in the next timestep. The winners from T=3 cause those from T=4 to become predictive, so winners from T=3 and T=4 become active followed by the winners from T=4 followed by a burst. This burst now puts three cells in each minicolumn into predictive state, which all become active in the next timestep. Then three then two then one then burst. Then 4, 4, 3, 2, 1, burst. 5, 5, 4, 3, 2,1, burst. 6, 6, 5, 4, 3, 2, 1, burst.

This pattern continues until all cells in the minicolumns become both predictive and active and then active cells per minicolumn decreases each timestep until one, followed by a final burst. This burst results in a random sampling of cells growing a second distal connection to winners from the final element in the sequence.

This is the point where things get interesting. Every column has now hooked up a random point in the sequence to the final element in the sequence. There is essentially a reshuffling of representations when each minicolumn at different timesteps reaches the end of their predictions and bursts, thus further saturating the connections. At some point enough connections are formed that there is no more bursting and every cell in the minicolumns always predicts every other cell in the minicolumns every timestep.

Implementation like #1, but a cell can grow additional connections to previously active cells

This is a wrong implementation, but hints at a possible way to stabilize without saturating the columns when an input repeats. The first three timesteps burst. This is followed by winners from T=2 being correctly predicted, then the winners from T=3 being correctly predicted. This is then followed by a burst.

This burst puts two cells in each minicolumn into predictive state (the winners from T=2 and T=3) which both become active in the next timestep. The winners from T=3 cause those from T=4 to become predictive, so winners from T=3 and T=4 become active. These second activations of the same cells are growing additional connections with the previous timestep, and so in very few timesteps, the representation stabilizes on two cells per minicolumn predicting itself every timestep. One of the cells in each minicolumn is connected more weakly than the other, so potentially a learning rule could be applied to thin the representation down to one cell per minicolumn (an implementation that I am exploring)

Implementation where a cell cannot be both predictive and active

Every timestep bursts, growing more and more connections until completely saturated, such that every cell in the minicolumns would predict every other cell in the minicolumns. However, because the same input is repeating, and every cell is active every timestep due to bursting, none of them are ever predictive.

2 Likes

I would expect that in a biologically inspired solution that habituation would occur.
Short term:


Longer term:

Just a thought.

Are you doing the depolarization before the synapse adaptation on iteration cycles?

Right. I recall a suggestion that the two steps are reversed in some implementations, in which case you see two bursts instead of three (winners from T=2 become predictive at T=2 then active at T=3). It is a slight variation, but I would expect to see essentially the same pattern (3,3,2,1,burst becomes 4,3,2,1,burst, etc)

Yes, that is a variation I have considered, but haven’t gotten around to implementing yet. Since HTM uses discrete time, one simple implementation would be to enforce some configurable number of max timesteps a cell can be active, then minimum timesteps a cell needs to “rest” before it can activate again.

1 Like

This is the visualization of what happens on repeating inputs in vanilla HTM in case it is hard to understand verbally. I demonstrated 2 different repeating inputs starting from full bursting. You can see the input at the bottom in black and white. The same thing happens if you repeat ABABABABAB or ABCABCABCABCABC. This is I think one of the reasons behind manual resets. Also, if these patterns are mixed with each other it does not get this bad.

1 Like

Thanks, I was thinking of adding a visualization myself. One thing your visualization highlighted to me which I hadn’t anticipated is that doing synapse adaption before depolarization results in that last burst after every cell has grown a distal segment and a random one per minicolumn has grown two, all cells are predicted every timestep after that. That is an interesting property that would probably reduce the memory footprint.

I posed my question in response to the OP. He describes real-world scenarios with long variable periods of the same uninteresting pattern being presented between periods of interesting events.

Hence the longs streams of bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb.
I suspected the same issues that Paul described above when I first worked throught HTM theory.
I thought it would learn b, then bb then bbb then bbbb then bbbbb on and on.

@Bitking Oh it wasn’t directed to you, I did not mean that you did not understand it sorry if it came out like that. We had the repeating sequence discussion multiple times before in the forum which I wanted to visualize for everyone. This discussion just appeared at the right time for it.

Did you mean depolarization before adaptation there? Can you reword it, I did not quite understand?

I meant in an implementation where a cell can be predicted by the synapses it just grew (as in the visualization, since it starts with two bursts instead of three). This behaves differently than my implementation, and is probably more memory efficient in the case of a repeating input.

Yes, I would agree with that too since it requires less bursts, less iterations and less new synapses if newly created segments can instantly be used for predicting. By the way, Nupic is configured like this too (depolarization after adaptation on same iteration).

1 Like