HTM Regressor?

Hi All,

So I’m trying to apply HTM for basic regression – to predict Y1,Y2 (t+1) from X1,X2 (t).
I have an approach in mind, and very curious for anyone’s opinion.

This setup differs from most HTM applications which are auto-regressive (AFAIK) – predicting X1,X2,Y1,Y2 (t+1) from X1,X2,Y1,Y2 (t).

I don’t want the system to predict X1,X2 at all because they represent random motion.
This is why I think the standard auto-regressive approach is doomed to fail.

Y1,Y2 however represent a human controller’s response to X1,X2 – so seems better to learn the X --> Y sequences, since they aren’t inherently noise-laden.

The data would thus be structured sequentially like so:

  1. X1,X2
  2. Y1,Y2 **learn
  3. X1,X2
  4. Y1,Y2 **learn
  5. X1,X2
  6. Y1,Y2 **learn

To implement this, I think 2 separate HTM regions are required, one for each X1,X2 and Y1,Y2. Here’s my idea:

The blue arrow represents cell-depolarizing connections (TM’s distal links), and the red arrows represent column-activating connections (SP’s proximal links).
So TM cells in Region_2 connect only to Active Columns in Region_1. This is how auto-regression is replaced with common X --> Y regression.

The procedure would be as follows:

T=1

  1. Region_1 columns activated by input --> X1,X2
  2. Region_2 cells depolarized by input --> Region_1’s Active Columns

T=2

  1. Region_2 columns activated by input --> Y1,Y2
  2. Region_2 anomaly score calculated
  3. Region_2 cells’ links to Region_1 Active Columns updated (TM learn)

This 2-step process would be repeated, where every odd-numbered tilmestep act like T=1, and every even tilmestep like T=2.
So ultimately you feed the system an X1,X2 input at (t) and get a Y1,Y2 predicted output for (t+1).

Does this seem like a valid approach to you!?
I’m very curious for anyone’s thoughts!

Thanks

3 Likes

I am curious to see how this turns out. It sort of seems like an application of the original core idea behind SMI, where you have a layer running the typical TM algorithm, but receiving its distal input from an external location signal rather than from other cells in the same layer. In your case, X1, X2 would be analogous to the “location signal” providing context to the cells in the Y1, Y2 layer.

That leads to an interesting question of whether you might want context encoded in the activity of the X1, X2 layer as well, and if so what should provide that context? A couple of examples that come to mind might be getting context via feedback from activity in the Y1, Y2 layer, or adding an “Object” layer with pooling for a higher level context.

1 Like

That’s an interesting idea! Another similar option would be to use proximal connections between regions 1 and 2 instead. I don’t know if the version of HTM library you’re using enables that, but I believe it makes the most conceptual sense at least. I think using the TM may cause context-induced muddling.

2 Likes

So basically you are describing X1, X2 and Y1, Y2 all as proximal input to region 2, correct?

Normally, this would result in the active minicolumns in region 2 representing a combination of X1, X2, Y1, and Y2. Thus, you would ultimately still end up with distal connections from whatever Y1 and Y2 represent to whatever X1 and X2 represent when you then run the TM algorithm in region 2.

The difference between this and my suggestion is that region 2 would now also be equally distally connected to whatever Y1 and Y2 represent as to what X1 and X2 represent (versus only getting its context from what X1 and X2 represent). In other words, region 2 would be predicting X1, X2, Y1, and Y2 at time (t+1) from X1, X2, Y1, and Y2 at time (t). Since X1 and X2 are described as being random in nature, one would assume this would be roughly equivalent to simply running TM on just Y1, and Y2 (with 50% noise added to the system).

2 Likes

I definitely agree on this point WRT region 1. If the ordering of inputs X1 and X2 is random (i.e. no learnable sequence), then running the TM algorithm in region 1 would not likely produce any useful output.

2 Likes

Almost, but joined to the pooler rather than the inout space of X1, X2. And without a TM at all. In my own experiments this sort of setup allows the second pooler to learn to correlate the inputs Y1, Y2 with the activity of the first pooler. By initially training on both datasets simultaneously, it learns to provide the same representation on both, OR on just either one of them. So I would say it’s not really representing a mixture of X1 and Y1 because it cannot differentiate between them.

Interestingly, I believe that last point means it should not only work as an X->Y regressor but also Y->X, given two separate ML interpreters. Of course, that only works if the inverse function X(Y) passes the vertical line test.

Edit: a quick clarification. Without the TM, it no longer ‘predicts’ the corresponding Y. Instead you cut the proximal Y connection after training and only provide X. Then interpret the second pooler activity.

2 Likes

Thank you guys for the great ideas!! I want to verify.

Starting with @Paul_Lamb’s :

Would this updated diagram reflect what you’re suggesting? Where the blue arrow means Distal learning and red is Proximal.

Any concepts missing there?

So Y1,Y2 represents a game player’s 2D motion response to a randomly moving object (X1,X2), and the player’s goal is to keep proximity to the object.

Given this, I had this logic:

  • running TM on just Y1,Y2 would be inherently noisy, because the Y1,Y2 values are always reactive to the random X1,X2.

  • any auto-regressive TM approach would theoretically have this noise problem

  • so, the X–>Y sequence pairs would be most viable to maximize signal/noise ratio, as best for TM

Does these assumptions of mine make sense?
Or anything left out?

1 Like

I was originally questioning whether or not region 2 should grow any distal connections to other cells in the same region. Since the motion represented in region 1 is random that the user is following, then their actions will also have some randomness to them. The drawback of course if you were to only grow distal connections to region 1 would be that you could only have a low-order memory in region 2 (since region 1 is missing any context).

Thinking about this problem some more, I think the way you have it is actually better. A user is not going to react like a computer would, and their reactions to the random motion from region 1 will likely have some patterns to them that could be learned by growing distal connections between cells in region 2.

I’ve never applied HTM concepts to this particular type of problem myself, so I am looking forward to your results.

1 Like

Do you mean no TM in either region, or just in region 1?

I assume you do not mean no TM in either region, since then you wouldn’t have any obvious mechanism for generating predictions for Y1 and Y2 at t+1.

If I understand you correctly, you propose adding no TM in region 1 (only SP) but still doing both SP and TM in region 2, with the output of region1’s SP as input to region2’s SP (along with Y1 and Y2). I could be mistaken, but I do not think that would be much different than having X1 and X2 as part of region 2’s input directly (though TBH I have not ever done such an experiment, since I have a model of how SP works which I have not questioned…) The job of the SP algorithm is to fix sparsity while preserving semantics. Thus, if I understand the SP algorithm properly, the output of region 1’s SP would encode the same information as X1 and X2 did originally (and would be just as random in nature as X1 and X2 were).

Edit: sorry, I was too quick to reply and missed your last point (keeping my earlier reply above, since I think it is still useful information for anyone thinking about the problem)

I see. If I am imagining this correctly (I have not used proximal connections this way, so I could be misinterpreting it…) this is essentially a pattern completion algorithm. It would also behave as a low-order memory (similar to growing distal connections from region 2 to only region 1). In other words any one X1 X2 input would always predict the same set of Y1 Y2 after training, regardless of what happened in any previous timestep.

2 Likes

This captures the essence my (and I think @Andrew_Stephan’s) concepts. For me this X–>Y is the only sequence-type available with potential for a fair signal/noise ratio.

2 Likes

In that case, I would modify your last diagram to remove the distal connections in region 2 to other cells in the same region, and only grow distal connections to region 1. Also, it may be relevant to consider that (depending on your tempo) the user may be reacting to something which happened earlier than just t-1, which may make their activity appear to be uncorrelated.

1 Like

That’s a good way to describe the implementation I proposed. I think it simplifies the problem to not have a TM, since there seems to be no benefit to having time-series prediction. That is, unless the regression problem incorporates time series data somehow.

2 Likes

You are probably right. I was just thinking that given the particular use case where a user is attempting to follow (via some controller) something that is moving randomly, that it is possible they will repeat certain action patterns that could span multiple timesteps. This extra context might be learnable via distal connections between cells in region 2. However, it would really depend on how noisy that data is and whether the higher-order memory outweighs the extra noise that comes along with it (compared to the cleaner low-order X -> Y).

2 Likes

Ahh yes, that’s a good example of a regressor where TM would be useful. I couldn’t think of one.

2 Likes

Yes good point! Maybe the X region could do TM learning to a 3rd region – which encodes raw target motion from the prior time-step. This is what the error is in response too.

So this would make it an W(t-1)–>X(t)–>Y(t+1) learning sequence at each time step, where

W = target motion;
X = distance from target;
Y = player movement;

Do you think that may add more useful context??