Exploring the "Repeating Inputs" problem

I thought I would start a thread to discuss and share ideas on how to solve the “repeating inputs” problem that the temporal memory algorithm currently experiences. For those not familiar with the problem, this thread goes into it in some detail with visuals.

Kicking off the discussion, I thought I would start with the idea that I suggested in the above thread:

Here is a demo of the above strategy in action. In this case I simply added “max new synapse count” new synapses each timestep (up to “max synapses per segment”) with winning cells from previous timestep. You can see that with this approach, given a sufficient number of iterations, the activations in a repeating pattern stabilize and bursting stops:

One drawback with this approach is that it will lead to ambiguity for the C in ABCD vs XBCY. I mentioned a possible solution for this issue, which I’ll try next:

Any other thoughts on issues with this strategy?

Open to discussing (and testing) any other strategies that folks might think of for solving the repeating inputs problem. I have some other ideas myself as well which I’ll post here as I have a chance to implement them.

4 Likes

@Paul_Lamb Does that mean that higher-order memory doesn’t work under your strategy?
May you describe exactly how to reproduce the problem? I have built a similar application as yours in tiny-htm; after a tiny change in my TM code (which seems to be a general bug). TM seems to learn the sequence ABCD… and ABCBCD… no problem.

(The slowdown in predicting in due to me pressing the keyboard slower, not the computers fault.)

Edit: My TM learned this connection to predict the sequence AABB… (Green is the source of the connection, blue is the destination)
image

1 Like

Yes, by itself, it seriously hampers higher-order memory. What happens is that on the first shot of learning the second sequence, more than one cell per minicolumn encode the representations of C in XBCY (for example). Some of the cells shared between with the C in ABCD which was learned earlier. So of course this is not a good fix for the problem, since it creates another problem. Exploring whether the second problem can also be fixed by tweaking the learning rule. I also have some other ideas which I need to formalize, so this one was mainly to get the conversation started.

2 Likes

It appears your implementation does not exactly match the official TM algorithm. My own implementation worked similarly as well. I’ll look a bit deeper into your algorithm to see where it deviates from the official one. I have a thought that clues to the solution to the problem may be found in the places where other HTM implementations like yours and mine have deviated from the official algorithm.

3 Likes

I haven’t read the official TM learning algorithm. I’ve just followed what Matt describe in HTMSchool. Maybe that’s the cause?

This is the current TM core code.

xt::xarray<bool> active_cells = applyBurst(predictive_cells_, x); //If column has no cell being predictive, burst the column
xt::xarray<uint32_t> overlap = cells_.calcOverlap(active_cells, connected_permanence_); //Run trough all connections and calculate how many connected cells are on for all cells
predictive_cells_ = (overlap > connected_thr_); //Filter all badly predicted cells, all cells passing this stage is predictive
if(learn == true) {
	xt::xarray<bool> apply_learning = selectLearningCell(active_cells); //Select a random cell if column in bursting
	xt::xarray<bool> last_active = active_cells_;
	cells_.learnCorrilation(last_active, apply_learning, permanence_incerment_, rmanence_decerment_); //inc/dec the connection strength
	cells_.growSynapse(last_active, apply_learning, initial_permanence_); //Make new connections to all active cells that's not connected
}
active_cells_ = active_cells; //Store the new active cells
return xt::sum(predictive_cells_, -1); //Since only one cell is needed for the column to be predicting, sum up the cells in the column to generate the prediction

My algorithm tries to connect a learning cell to all previously active (including bursting) cells. As what Matt have states in HTM School. Which causes a lot of unneeded connection. But I can always remove them later on.
image

2 Likes

The difference is probably here (this is exactly how I also first implemented the TM algorithm). In the official algorithm, there are a couple of differences:

  1. Only grow connections to previously active winner cells (not to ones which were not winners)
  2. If a segment has “max new synapse count” active synapses when its post-synaptic cell becomes predicted active, then do not grow any additional synapses (to winners or otherwise)

@mrcslws explained the reasoning behind this:

3 Likes

Hmm… Maybe we should just modify the algorithm so this is not a issue anymore? And the resulting connections are a lot cleaner with the connection-to-every-cell strategy. Could this approach bring any downside? It causes excess connections, but that is easily solvable by trimming them later on.

This is all the active synapses what Standard TM algorithm learned to predict ABCD… perfectly.
2019-02-23%2023-07-28%20%E7%9A%84%E8%9E%A2%E5%B9%95%E6%93%B7%E5%9C%96

And with the strategy.
image

I can also confirm that it learns ABCD and XBCY without ambiguity and forgetting any of them.

2 Likes

A bit too rat-nesty for me to interpret at a glance :smiley: I’ll do some further analysis on the “connect to all active cells” strategy to see where it breaks down. My intuition tells me it will create ambiguity in some cases (double sub-sequences for example), but will put together hard proof with visuals.

1 Like

I have messed with it a bit more. The sequence AABB… is met with a light problem. Due to me selecting random winner cells, one of the column has the same cell on both occations, and thus canceling the learned synapses.
image

Thanks for your interest! I think I’ll work on thousand brains theory in the meantime.

Also, it seems that higher order memory only works when perm_inc < perm_dec. Otherwise the TM never learned the sub-sequence.

3 Likes

Interesting! I didn’t realize this was being discussed.

We encountered this problem internally when we were developing our own implementation. We had the same issues you had with repeating sequences and with the fix actually creating more problems. The challenge is to make sure the system can learn both repeating sequences and non-repeating sequences within the same memory and input time-series. If you can do this, you’ve solved the problem.

It took a long time to get my head around it and figure out exactly what the problem was. But we did come up with a solution that seems to work and I think is biologically plausible. Once we release our project, I will share it with you (company policy).

But you guys are really close to the solution!

4 Likes

This is a very interesting problem and I’ve had some thoughts about this as well. I’ve also have read the associated thread that showed some neat visualizations.

@Paul_Lamb
If I may ask, how does a TM system save and discard a context? Is it correct to say that the TM learns context-free sequences? The reason why I ask these questions is because I couldn’t find yet any TM mechanism that saves context. It may “look like it is context-aware” because it follows sequences but if I try to analyze its algorithm it doesn’t really save any context. However it saves/remembers connections. This repeating inputs problem seems to emerge because it seems the TM is not context-aware. By context-aware I mean some state is saved/removed at some certain level. An analogy would be the scoping of functions in programming languages where there is a stack to save the context of the scope, but of course the TM IMO should have a similar (maybe simpler) but biologically plausible mechanism. I hope I didn’t go off-topic, however I think this is important to discuss as the TM is always said to use the context of this and that but I don’t think (maybe I missed it) there is enough detailed explanation about this.

I think of distal and apical signals as the source of context. In the case of TM, the distal signal creates a context based on what a given feature followed. Therefore one could say activity in a TM layer indicates not only a feature, but also every other feature that came before it. What is missing is a higher-level context that “labels” a sequence (or section of a sequence).

The theory I have been working on is that TM could benefit from an “output layer” as described for SMI. Activity in this layer would be more stable, and represent a sequence (or parts of a sequence), and it would be the source of the apical signal providing a piece of context which is missing from raw TM by itself. It would also allow multiple TM layers to work together to vote on that higher level context (which I think aligns well with the thousand brains theory)

The application of an output layer specifically to assist with the “repeating inputs” problem happens to be what I have been working today. I’ll go into it in more detail in my next post, so stay tuned.

6 Likes

Thanks. Could you possibly point me to any documentation regarding this theory that you are implementing right now? Is this the Temporal Pooling?

This is what I was referring about context. It has to be labelled and labelling means something is recognized and stored. I think the TM does not have this.

Additional thoughts on the TM in relation to this problem;

  • The selection of a predictive cell is a bit wasteful and that it does not take into consideration the previous effort used to learn a sequence. For example, it simply takes a different set of active cells to discard previously just like the first example presented in this problem. The previous cells that correctly predicted the next sequence were easily ignored. In the SP world, representations do not get discarded easily unless they were only seen a few times.

  • For the current implementation of the TM the recognition of a sequence is equated explicitly to a cell/s. IOW it assumes it is accurate at pinpointing the right cells in a particular active column, when in reality it is biologically unrealistic. IMO when a cell is activated it shouldn’t mean that this cell is the cell that can predict the next sequence (e.g. deriving the predictive cells from it), it should only mean that some cell/s in this active column close to these current active cell/s may predict the next sequence. Therefore it increases the chance of choosing the previously active cells that predicted the correct sequence, for example in the problem presented it may have used the cells that have distal connections to B.

Yes, I have been referring to it as that (the name may not stick however, since this term has been used in HTM’s past to refer to older variations of the TM algorithm, so reusing it could cause confusion).

For this post, I’ll leave out important elements of the theory, such as how activity in this layer is driven by activity in the TM layer, as well as how the long-distance connections work for voting. I’ll be posting a more comprehensive explanation on a separate thread, but I want to stay on topic for this thread.

To keep things simple and on-topic, lets assume an object representation has already been settled on through some external means that I will not describe for now. Lets also not worry about any connections from the TM layer up to the output layer. Instead, we’ll only focus on the apical signal from the output layer down to the TM layer, and how that can be used to address the “repeating inputs” problem we are discussing on this thread.

The basic strategy is to introduce an output/object layer. Activity in this layer will remain stable throughout a sequence for as long as it repeats. The cells in this layer will provide an apical signal to cells in the TM layer. Thus, a cell in the TM layer may become predictive due to the distal signal (from other cells in the TM layer), or it might become predictive due to the apical signal (from cells in the output layer). A cell may also become predictive due to both distal and apical signals.

Each timestep, the winner cells in the TM layer will grow apical connections to the active cells in the output layer, using the exact same learning algorithm as TM (except the cells are growing apical segments rather than distal ones). One could use distal segments for this rather than apical ones (if there were some reason that it was more biologically feasible) – the only requirement is to separate which layer the input is coming from.

Any time a minicolumn is activated, any cell(s) predicted by both apical and distal signals will become the winner. If none are predicted by both signals, then any cell(s) predicted by the distal signal will become the winner. If none are predicted by the distal signal, then any cell(s) predicted by the apical signal will become winner. And of course, if no cells are predicted, the minicolumn will burst.

To make things easier to visualize, I’ll be using a tiny TM layer which has 4 cells per minicolumn, and one minicolumn per input. I’ll also be using a single cell in the output layer to represent each object. Obviously in practice, there would be larger dimensions involved. This is just to describe the strategy in the simplest possible manner.

For these visualizations, I am assuming the parameters are set such that the max new synapse count in the TM layer is greater than the activation threshold (one-shot learning), and for the output layer, less than the activation threshold (such that a sequence must be seen twice for it to become connected). I don’t yet know what the best general learning rate should be, but for the below example, “two shot learning” is sufficient to explain the concept without requiring me to draw out too many iterations.

A quick explanation of the symbols and colors:

image

Let’s begin with learning the repeating sequence A-B-C-D, using this strategy

The first time through the sequence A-B-C-D, the minicolumns burst, winners are chosen, and distal connections are formed as normal. Additionally, the winner cells also grow apical connections with the active cells in the output layer representing object “ABCD”. Note that the learning rate is set low for the apical connections, so after this pass they are connected below the activation threshold.

The second time through the sequence, the first input bursts, and a second representation A’’ is chosen as winner. This one grows both a distal connection to D’, as well as an apical connection to object “ABCD”. This second time through the sequence, B’, C’, and D’ grow additional apical synapses with object “ABCD”, and are now connected above the activation threshold. Note that there are two potential representations for “A” at this point, but neither is connected to object “ABCD” above the activation threshold.

Normally, this would be the point where the “repeating inputs” problem kicks in, and the “B” minicolumns would burst this time through the sequence. However, B’ is now predictive due to the apical signal, so this bursting will not happen. Note that A’’ was predicted distally, which allowed it to become the winner and grow additional apical connections to object “ABCD”. Thus, A’ has lost the competition. You can now repeat the sequence as many times as you want, and it will cycle through the same four representations in the TM layer. Notice that TM has (distally) learned the sequence B’-C’-D’-A’‘, and it is the apical connection which bridges the gap between A’’ and B’.

So what happens when we introduce a new sequence X-B-C-Y? Will this strategy lead to ambiguity like the other strategy? Let’s find out.

The first time through, you see the expected behavior of TM. The previously learned connection between B’ and C’ is activated by bursting in step 2, and a new distal connection between C’ and Y’ is formed in step 4. As in the last scenario, apical connections are formed to object “XBCY”, and they are initially below the activation threshold.

The second time through the sequence, a second representation X’’ is chosen (like we saw for A’’ in the previous example). B’ is activated again, so it grows additional apical connections with object “XBCY”, and is now above the activation threshold. Because B’ was not connected to anything in the previous iteration, this time through the sequence the C minicolumns burst, and a second representation C’’ is chosen. Because of the bursting, Y’ is predicted and becomes active, growing additional apical connections to object “XBCY”. The representation for X’’ is now predicted.

The third time through the sequence, the apical connections are reinforced like we saw in the previous example (they all now breach the activation threshold), and bursting has stopped. X’ and C’ have lost the competition to X’’ and C’‘. You can now repeat the sequence as many times as you want, and it will cycle through the same four representations in the TM layer. There is no ambiguity with the four representations in sequence A-B-C-D. Interestingly, in this case TM has (distally) learned two sequences Y’-X’’ and B’-C’', and it is the apical connection which bridges the two gaps.

Notice also that in the process of learning X-B-C-Y, a stray distal connection between C’ and Y’ was formed. Inputting A-B-C… will now distally predict both D’ and Y’. However, D’ will be predicted both distally and apically, so this could be used by any classification logic to weigh D’ more heavily than Y’ as the next likely input.

I’ll provide pseudo-code and a working demo in a future post (still working out some kinks), but wanted to post a basic introduction to the idea for anyone who is curious. Let me know if something isn’t clear.

10 Likes

Awesome I really appreciate you providing an explanation not to mention a very clear one. I also like to read and understand algorithms such as this.

The algorithm looks simple and effective though it’s tricky for me to predict its behavior when run in larger scales. I like that it has somewhat taken advantaged of a stabler layer to compensate the TM’s indecisiveness.

I have some questions though which you may not necessarily answer as I don’t like to mess with your focus on this implementation.

  1. Is it correct to say that the output layer retains its predictive cells at a certain number of steps? I noticed there are more than 1 active apical connections at a certain step (2nd row, 3rd col). If it’s correct, when can these predictive cells (by the output layer) change or disappear?

  2. On sequence ABC, (row 3, 2nd col), how did the cell at column B got active when there were at least 3 predictive cells (row 3, 1st col) that previously existed?

4 Likes

Yes, I forgot to point that out. This theory assumes that a cell can remain in a predictive or active state for extended periods of time. Note that this is also implied in the columns paper, where representations in an output layer bias activity in an input layer. So hopefully isn’t deviating too far into Crazy Land :slight_smile:

It became active because the minicolumn it was in became active (for example via Spatial Pooling, etc). When a minicolumn becomes active, any cells in it which are predictive become active and inhibit the others.

2 Likes

can you with confirm without any doubt in some neuroscience paper that this thing is happening?

i get it you’re referencing the columbs paper, but at this point this is second hand knowledge that might be wrong, no matter how much i dont like that fact

EDIT: it has come to my attention that “repeating patterns” can constitute objects in a TBT sense. Think of them as a set of notes that constitute a song or a bunch of features like a coffee cup.

How is A-B-C-A-B-C == foo, X-Y-Z-X-Y-Z = bar and {foo, bar} = objects; isnt exactly the type of problem which the TBT solved

EDIT2: Furthermore if there are repeating patterns of repeating objects you can from there construct new objects into new sets on hierarchies of cortical columbs that you “connected” to them. My point is answering the question/problem of this thread is premature

@Tachion, I am not a neurscientist, so I’m coming at this from a more practical perspective (I’ll leave your question for the other talented folks on the forum). The “repeating inputs” behavior is a problem today in the practical use of TM. Just exploring some possible solutions (they may be completely off the mark for how nature has addressed the problem). Hopefully something useful can be distilled from these experiments that will help advance HTM down the road (if at the very least to demonstrate what not to do :grin:)

4 Likes

They are similar, but not exactly the same. One could say there are essentially two classes of patterns that a system like the brain should be able to model:

  1. Patterns that are cause by our own interactions with the world
  2. Patterns that are independent of our actions

SMI relates to the first one, and TM to the second. The difference between the two is the source of the “location” signal. In classic TM theory, the location of a feature is derived from the feature which preceded it (whose location is derived from the feature before it, and the one before that, and the one before that… etc.)

The problem with this strategy is that it doesn’t allow for a sequence to ever end. This is not an issue for long sequences which don’t continuously repeat. But it becomes a problem for short sequences that continuously repeat. For example, in the repeating sequence ABABABABABAB… the 6th B is in a different location than the 3rd B which is in a different location than the 1,000th B. Each time the sequence repeats, it grows a little longer and essentially new “locations” are added.

This is different from SMI, where the locations are able to wrap around. If I move my finger around the lip of a cup, I eventually end up back at the location I started. I don’t end up in new locations each time around. This is the property of path integration, which grid cells bring to the equation.

1 Like

I should point out that TBT currently also relies on the “output layer” concept from the Columns paper. If you question the validity of this concept, then you must also question TBT itself (at least in the theory’s current form) From the Frameworks paper, section " Rethinking Hierarchy, the Thousand Brains Theory of Intelligence":

The reference to “Lewis et al. 2018” is the Columns Plus paper, which goes into more technical detail on object recognition and grid cells. From this paper, it discusses the method of voting between many sensory patches:

This “additional population of cells” is of course is a reference to the output layer described in the Columns paper. Hopefully this demonstrates that what I have described above isn’t a major deviation from TBT, but rather it borrows a concept from it.

3 Likes