Feeding HTM with PCA output ! Looks Like "I'm Groot"

Me and one of my colleagues argued about if we could feed the HTM model with PCA’s components or not.

Since PCA is a transformation of some correlated variables and it can represent more than one variable at the same time.

Simply, I answered him, this is like “I’m Groot” from Guardians of the Galaxy movie, that no way you can tell the meaning of a word in whatever context if every word you are using can express so many meanings.

In another way, the HTM at each step will have so many predictions enough to get it confused which won’t be accurate at all.

I just want to make sure if I thought about this the correct way or not, since I’m new to it.

1 Like

Can you link to more information about “PCA”?

This video explains PCA (Principal Component Analysis) in a good way

1 Like

It seems like PCA output is an approximation and that there is a potentially huge amount of data loss, especially as you add dimensions. So I’m not sure how well HTM would do with it. I think it would largely depend on the quality of the principle component analysis. It certainly will not perform as well as directly processing each dimensional datum with an HTM model.

1 Like

I would add to that, it also depends on what exactly you would be using HTM to predict. If you were simply trying to classify the cell type based on the PC1 and PC2 values, that might be possible after enough training. Training would involve using the points on the graph as “on” bits for the SP (topology enabled), and then add a running average classifier to the winning minicolumns (skipping TM in this case). On the other hand, you could probably do that a lot easier with some simple geometry :slight_smile:

If you mean by TM the (Temporal Memory or Temporal Pooler), then having TM is crucial in our model as we are trying to build HTM model for anomaly detection.

So if that is the case, do you agree that PCA is not applicable for such a model that requires TM?

Probably not, but I suppose it depends on specifically what property it is that you would be trying to predict (and thus identifying anomalies for).

Is the thing being predicted the next point on the 2D map? If so, that isn’t a temporal memory problem – it is more of a statistical analysis (how close a given point is to the the average range of previous points). The fact that each point is by its very nature randomly distributed within a given range, means by definition that there isn’t a temporal pattern that could be learned.

Is the thing being predicted the type of cell given its point on the map? See my previous comment on that – again not a problem for temporal memory. SP maybe could be applied, but there are obviously much more efficient ways to do that (simple geometry, for example).

Is the thing being predicted PC3, PC4, etc. values given PC1 and PC2 (to find anomalies in other PCx values)? Whether or not that is applicable would depend on the specific data (I’m not familiar enough with this field to comment on that, but my instinct says probably not).

BTW, I am assuming you are talking about the same protein/cell data that is depicted in the video. It occurred to me that you might instead be talking about using PCA to do predictions/ anomaly detection of some other multi-dimensional data (which perhaps might have temporal patterns). In other words, using PCA to solve the “multiple fields” limitation of HTM.

If that is what you are talking about, note that the whole point of PCA is that it basically picks the two (or whatever chosen number) most impactful fields to measure, and discards the rest. Thus you wouldn’t really be running anomaly detection on on all the fields – you’d be discarding most of them and running anomaly detection on only a couple of them.

Whether or not this would be useful would again depend on the specific data. There are a lot of data sets where you don’t need to give HTM most of the fields, and good predictions can be made on just one or two of the fields. There are other data sets where this is not the case. Would need to see the some of the actual data and what you would be trying to predict, to be sure about whether or not HTM could be used.

1 Like

That is exactly the case.

Thank you for your answer that was very helpful