Exploring the "Repeating Inputs" problem

I have messed with it a bit more. The sequence AABB… is met with a light problem. Due to me selecting random winner cells, one of the column has the same cell on both occations, and thus canceling the learned synapses.
image

Thanks for your interest! I think I’ll work on thousand brains theory in the meantime.

Also, it seems that higher order memory only works when perm_inc < perm_dec. Otherwise the TM never learned the sub-sequence.

3 Likes

Interesting! I didn’t realize this was being discussed.

We encountered this problem internally when we were developing our own implementation. We had the same issues you had with repeating sequences and with the fix actually creating more problems. The challenge is to make sure the system can learn both repeating sequences and non-repeating sequences within the same memory and input time-series. If you can do this, you’ve solved the problem.

It took a long time to get my head around it and figure out exactly what the problem was. But we did come up with a solution that seems to work and I think is biologically plausible. Once we release our project, I will share it with you (company policy).

But you guys are really close to the solution!

4 Likes

This is a very interesting problem and I’ve had some thoughts about this as well. I’ve also have read the associated thread that showed some neat visualizations.

@Paul_Lamb
If I may ask, how does a TM system save and discard a context? Is it correct to say that the TM learns context-free sequences? The reason why I ask these questions is because I couldn’t find yet any TM mechanism that saves context. It may “look like it is context-aware” because it follows sequences but if I try to analyze its algorithm it doesn’t really save any context. However it saves/remembers connections. This repeating inputs problem seems to emerge because it seems the TM is not context-aware. By context-aware I mean some state is saved/removed at some certain level. An analogy would be the scoping of functions in programming languages where there is a stack to save the context of the scope, but of course the TM IMO should have a similar (maybe simpler) but biologically plausible mechanism. I hope I didn’t go off-topic, however I think this is important to discuss as the TM is always said to use the context of this and that but I don’t think (maybe I missed it) there is enough detailed explanation about this.

I think of distal and apical signals as the source of context. In the case of TM, the distal signal creates a context based on what a given feature followed. Therefore one could say activity in a TM layer indicates not only a feature, but also every other feature that came before it. What is missing is a higher-level context that “labels” a sequence (or section of a sequence).

The theory I have been working on is that TM could benefit from an “output layer” as described for SMI. Activity in this layer would be more stable, and represent a sequence (or parts of a sequence), and it would be the source of the apical signal providing a piece of context which is missing from raw TM by itself. It would also allow multiple TM layers to work together to vote on that higher level context (which I think aligns well with the thousand brains theory)

The application of an output layer specifically to assist with the “repeating inputs” problem happens to be what I have been working today. I’ll go into it in more detail in my next post, so stay tuned.

6 Likes

Thanks. Could you possibly point me to any documentation regarding this theory that you are implementing right now? Is this the Temporal Pooling?

This is what I was referring about context. It has to be labelled and labelling means something is recognized and stored. I think the TM does not have this.

Additional thoughts on the TM in relation to this problem;

  • The selection of a predictive cell is a bit wasteful and that it does not take into consideration the previous effort used to learn a sequence. For example, it simply takes a different set of active cells to discard previously just like the first example presented in this problem. The previous cells that correctly predicted the next sequence were easily ignored. In the SP world, representations do not get discarded easily unless they were only seen a few times.

  • For the current implementation of the TM the recognition of a sequence is equated explicitly to a cell/s. IOW it assumes it is accurate at pinpointing the right cells in a particular active column, when in reality it is biologically unrealistic. IMO when a cell is activated it shouldn’t mean that this cell is the cell that can predict the next sequence (e.g. deriving the predictive cells from it), it should only mean that some cell/s in this active column close to these current active cell/s may predict the next sequence. Therefore it increases the chance of choosing the previously active cells that predicted the correct sequence, for example in the problem presented it may have used the cells that have distal connections to B.

Yes, I have been referring to it as that (the name may not stick however, since this term has been used in HTM’s past to refer to older variations of the TM algorithm, so reusing it could cause confusion).

For this post, I’ll leave out important elements of the theory, such as how activity in this layer is driven by activity in the TM layer, as well as how the long-distance connections work for voting. I’ll be posting a more comprehensive explanation on a separate thread, but I want to stay on topic for this thread.

To keep things simple and on-topic, lets assume an object representation has already been settled on through some external means that I will not describe for now. Lets also not worry about any connections from the TM layer up to the output layer. Instead, we’ll only focus on the apical signal from the output layer down to the TM layer, and how that can be used to address the “repeating inputs” problem we are discussing on this thread.

The basic strategy is to introduce an output/object layer. Activity in this layer will remain stable throughout a sequence for as long as it repeats. The cells in this layer will provide an apical signal to cells in the TM layer. Thus, a cell in the TM layer may become predictive due to the distal signal (from other cells in the TM layer), or it might become predictive due to the apical signal (from cells in the output layer). A cell may also become predictive due to both distal and apical signals.

Each timestep, the winner cells in the TM layer will grow apical connections to the active cells in the output layer, using the exact same learning algorithm as TM (except the cells are growing apical segments rather than distal ones). One could use distal segments for this rather than apical ones (if there were some reason that it was more biologically feasible) – the only requirement is to separate which layer the input is coming from.

Any time a minicolumn is activated, any cell(s) predicted by both apical and distal signals will become the winner. If none are predicted by both signals, then any cell(s) predicted by the distal signal will become the winner. If none are predicted by the distal signal, then any cell(s) predicted by the apical signal will become winner. And of course, if no cells are predicted, the minicolumn will burst.

To make things easier to visualize, I’ll be using a tiny TM layer which has 4 cells per minicolumn, and one minicolumn per input. I’ll also be using a single cell in the output layer to represent each object. Obviously in practice, there would be larger dimensions involved. This is just to describe the strategy in the simplest possible manner.

For these visualizations, I am assuming the parameters are set such that the max new synapse count in the TM layer is greater than the activation threshold (one-shot learning), and for the output layer, less than the activation threshold (such that a sequence must be seen twice for it to become connected). I don’t yet know what the best general learning rate should be, but for the below example, “two shot learning” is sufficient to explain the concept without requiring me to draw out too many iterations.

A quick explanation of the symbols and colors:

image

Let’s begin with learning the repeating sequence A-B-C-D, using this strategy

The first time through the sequence A-B-C-D, the minicolumns burst, winners are chosen, and distal connections are formed as normal. Additionally, the winner cells also grow apical connections with the active cells in the output layer representing object “ABCD”. Note that the learning rate is set low for the apical connections, so after this pass they are connected below the activation threshold.

The second time through the sequence, the first input bursts, and a second representation A’’ is chosen as winner. This one grows both a distal connection to D’, as well as an apical connection to object “ABCD”. This second time through the sequence, B’, C’, and D’ grow additional apical synapses with object “ABCD”, and are now connected above the activation threshold. Note that there are two potential representations for “A” at this point, but neither is connected to object “ABCD” above the activation threshold.

Normally, this would be the point where the “repeating inputs” problem kicks in, and the “B” minicolumns would burst this time through the sequence. However, B’ is now predictive due to the apical signal, so this bursting will not happen. Note that A’’ was predicted distally, which allowed it to become the winner and grow additional apical connections to object “ABCD”. Thus, A’ has lost the competition. You can now repeat the sequence as many times as you want, and it will cycle through the same four representations in the TM layer. Notice that TM has (distally) learned the sequence B’-C’-D’-A’’, and it is the apical connection which bridges the gap between A’’ and B’.

So what happens when we introduce a new sequence X-B-C-Y? Will this strategy lead to ambiguity like the other strategy? Let’s find out.

The first time through, you see the expected behavior of TM. The previously learned connection between B’ and C’ is activated by bursting in step 2, and a new distal connection between C’ and Y’ is formed in step 4. As in the last scenario, apical connections are formed to object “XBCY”, and they are initially below the activation threshold.

The second time through the sequence, a second representation X’’ is chosen (like we saw for A’’ in the previous example). B’ is activated again, so it grows additional apical connections with object “XBCY”, and is now above the activation threshold. Because B’ was not connected to anything in the previous iteration, this time through the sequence the C minicolumns burst, and a second representation C’’ is chosen. Because of the bursting, Y’ is predicted and becomes active, growing additional apical connections to object “XBCY”. The representation for X’’ is now predicted.

The third time through the sequence, the apical connections are reinforced like we saw in the previous example (they all now breach the activation threshold), and bursting has stopped. X’ and C’ have lost the competition to X’’ and C’’. You can now repeat the sequence as many times as you want, and it will cycle through the same four representations in the TM layer. There is no ambiguity with the four representations in sequence A-B-C-D. Interestingly, in this case TM has (distally) learned two sequences Y’-X’’ and B’-C’’, and it is the apical connection which bridges the two gaps.

Notice also that in the process of learning X-B-C-Y, a stray distal connection between C’ and Y’ was formed. Inputting A-B-C… will now distally predict both D’ and Y’. However, D’ will be predicted both distally and apically, so this could be used by any classification logic to weigh D’ more heavily than Y’ as the next likely input.

I’ll provide pseudo-code and a working demo in a future post (still working out some kinks), but wanted to post a basic introduction to the idea for anyone who is curious. Let me know if something isn’t clear.

10 Likes

Awesome I really appreciate you providing an explanation not to mention a very clear one. I also like to read and understand algorithms such as this.

The algorithm looks simple and effective though it’s tricky for me to predict its behavior when run in larger scales. I like that it has somewhat taken advantaged of a stabler layer to compensate the TM’s indecisiveness.

I have some questions though which you may not necessarily answer as I don’t like to mess with your focus on this implementation.

  1. Is it correct to say that the output layer retains its predictive cells at a certain number of steps? I noticed there are more than 1 active apical connections at a certain step (2nd row, 3rd col). If it’s correct, when can these predictive cells (by the output layer) change or disappear?

  2. On sequence ABC, (row 3, 2nd col), how did the cell at column B got active when there were at least 3 predictive cells (row 3, 1st col) that previously existed?

4 Likes

Yes, I forgot to point that out. This theory assumes that a cell can remain in a predictive or active state for extended periods of time. Note that this is also implied in the columns paper, where representations in an output layer bias activity in an input layer. So hopefully isn’t deviating too far into Crazy Land :slight_smile:

It became active because the minicolumn it was in became active (for example via Spatial Pooling, etc). When a minicolumn becomes active, any cells in it which are predictive become active and inhibit the others.

2 Likes

can you with confirm without any doubt in some neuroscience paper that this thing is happening?

i get it you’re referencing the columbs paper, but at this point this is second hand knowledge that might be wrong, no matter how much i dont like that fact

EDIT: it has come to my attention that “repeating patterns” can constitute objects in a TBT sense. Think of them as a set of notes that constitute a song or a bunch of features like a coffee cup.

How is A-B-C-A-B-C == foo, X-Y-Z-X-Y-Z = bar and {foo, bar} = objects; isnt exactly the type of problem which the TBT solved

EDIT2: Furthermore if there are repeating patterns of repeating objects you can from there construct new objects into new sets on hierarchies of cortical columbs that you “connected” to them. My point is answering the question/problem of this thread is premature

@Tachion, I am not a neurscientist, so I’m coming at this from a more practical perspective (I’ll leave your question for the other talented folks on the forum). The “repeating inputs” behavior is a problem today in the practical use of TM. Just exploring some possible solutions (they may be completely off the mark for how nature has addressed the problem). Hopefully something useful can be distilled from these experiments that will help advance HTM down the road (if at the very least to demonstrate what not to do :grin:)

4 Likes

They are similar, but not exactly the same. One could say there are essentially two classes of patterns that a system like the brain should be able to model:

  1. Patterns that are cause by our own interactions with the world
  2. Patterns that are independent of our actions

SMI relates to the first one, and TM to the second. The difference between the two is the source of the “location” signal. In classic TM theory, the location of a feature is derived from the feature which preceded it (whose location is derived from the feature before it, and the one before that, and the one before that… etc.)

The problem with this strategy is that it doesn’t allow for a sequence to ever end. This is not an issue for long sequences which don’t continuously repeat. But it becomes a problem for short sequences that continuously repeat. For example, in the repeating sequence ABABABABABAB… the 6th B is in a different location than the 3rd B which is in a different location than the 1,000th B. Each time the sequence repeats, it grows a little longer and essentially new “locations” are added.

This is different from SMI, where the locations are able to wrap around. If I move my finger around the lip of a cup, I eventually end up back at the location I started. I don’t end up in new locations each time around. This is the property of path integration, which grid cells bring to the equation.

1 Like

I should point out that TBT currently also relies on the “output layer” concept from the Columns paper. If you question the validity of this concept, then you must also question TBT itself (at least in the theory’s current form) From the Frameworks paper, section " Rethinking Hierarchy, the Thousand Brains Theory of Intelligence":

The reference to “Lewis et al. 2018” is the Columns Plus paper, which goes into more technical detail on object recognition and grid cells. From this paper, it discusses the method of voting between many sensory patches:

This “additional population of cells” is of course is a reference to the output layer described in the Columns paper. Hopefully this demonstrates that what I have described above isn’t a major deviation from TBT, but rather it borrows a concept from it.

3 Likes

I like the mechanism you introduce here and awesome presentation, thanks! I wanted to ask about learning representations in the output layer. In this example we presume to know about ‘ABCD’ and ‘XBCY’ as repeated sequences, but what if we didn’t know to look for these? Or what if another sequence appears like ‘DAYC’, is there a mechanism for forming a new representation bit in the output layer?

It seems to be that with this feature the flexibility of this pooling mechanism could really help out the TM overall, and especially in sequence classification. Great work!

2 Likes

Yes, I have been working on a mechanism which utilizes hex grid formation. I’ll go into it in more depth on a separate thread (didn’t want to muddy the waters on this thread, since the focus here is to explore strategies for addressing the repeating inputs problem)

3 Likes

A potential solution for this is just sample the burst, i.e. only a random set of the non-predicted column can burst. If the size of that set is bigger than the sparsity, you still will connect the sequence. In my implementation, that approach seems to be working just fine.

1 Like

@vpuente Could you elaborate? I’m not sure what you mean by “sample the burst” in this context.

@Paul_Lamb

If the column is not predicted, just flip a random ( with a low sampling rate) to decide if the burst has to be done or not.

something like:

if (randomFloat(1.0f) < sampleRate) // 1>> SampleRate >= sparsity (~0.02)
{
burst();
}

The key is to build the sequence A->A not at once but progressively. If the sample rate is low, the number of cells shared between A and A’ will be pretty high. In a few steps you will be back in A (A->A’->A’’->A) and your fixed signal issue is done. If another symbol appears in the sequence, you will fail. The eager approach has the potential problem of having a waste of synapses/cells for those constant signals. My lazy way is that you will fail at the point that AAAA sequence ends. In general lazy burst seems to be beneficial, in terms of synaptic load.

1 Like

So if the random flip decides that a burst should not happen, then what happens to the state of cells in that minicolumn? Presumably no cells in that minicolumn become active?

In your example, is A in a specific context (similar to A’ and A’’), or does it mean A without context (i.e. bursting minicolumns)? Sorry if this is a dumb question. I’m having trouble visualizing how this statement follows if you chose to skip some of the minicolumns during the bursting step:

Do you happen to have some code I could look at which uses this strategy?

Exactly: you just ignore the burst in that minicolumn. Nothing should happen there this time. Sooner or later the random flip should decide to burst that minicolumn.

The key is to wait multiple times the sequence before to learn the whole context. Somewhat you are building the connections between cells in two values in the sequence, one-by-one not all at once.

My code is really convoluted and nasty to share :frowning: I can’t even understand it :slight_smile: Hope it will change in the future.

In any case, you can test in yours jut by inserting that if in the burst of the minicolumn. It should learn the sequence (a bit slower).

BTW: Coincidentally burst in the biological systems are also stochastically produced (although the meaning is not necessarily the same than in HTM :smile: )

1 Like

Got it. I think this strategy might also lead to ambiguity between the C in ABCD vs the C in XBCY, in particular if they were both learned around the same time (i.e. not thoroughly train on one before training on the other). However, this is a bit tricky to visualize, so I’ll have to experiment to see whether or not this intuition is correct.

1 Like