I’m trying using HTM in building a system to be used in communication networks. I’m predicting the delay value in mobile networks in different nodes. For example I have three nodes, in each node I have HTM to predict the next step delay in real-time. I’m trying to share the learning between the three HTM models. In case new pattern is not recognised by one model using another model to predict it. Is there a way to use the learning of one temporal memory to get predictions for other node?
One simple approach would be to have 3 models, each taking all 3 node’s values as input and using a different node as predicted field. That way the predictions for each node will be based on data from all 3 nodes.
The problem with this is that it won’t scale past a handful of nodes – since no model should have more than a handful on input features. But if you do actually have 3 nodes that’d be fine.
Another approach could be to use a pooling layer, which takes as input the activity of 3 separate TMs. The outputs of this layer can be fed back down into all 3 TMs, to have a depolarizing effect on each. As I understand this mimics apical dendrites, which make cells predictive just like with basal dendrites in standard TM – except that this depolarizing signal is coming from another layer (the pooling layer).
Thanks @sheiser1 for your feedback. As I’m using more than 3 nodes the first solution you suggested is not feasible due to the limited number of inputs for HTM. Regarding the second solution, can you elaborate more on the pooling layer functionality?
I can’t say precisely how best to do this, but I’ll try and clarify the concept and point you toward an example of it applied.
Conceptually the idea of ‘temporal pooling’/‘union pooling’ is to recognize the sequence itself, by monitoring the behavior of other TM regions.
For instance let’s say you encounter a familiar sequence:
A,B,C…X,Y,Z
When input C arrives at timestep 3, a normal TM region (that has seen this pattern before) just recognizes C in the context of A,B, and predicts D. A pooling region however, recognizes that this is the English alphabet.
So while the activity of the normal TM is constantly changing every time step (which bits are active), the pooling layer should change more slowly. Theoretically the pooling layer - once it recognizes the familiar sequence - should stay constant throughout the sequence (up to letter Z in this case).
I don’t know the inner working of this as applied in current research code, but here’s a discussion thread discussing it in detail including network schematics: 2D Object Recognition Project - #104 by Zbysekz (schematic on post 100).
I’d also recommend checking out this Numenta paper, which demos 3D (virtual) object recognition. Again I don’t know exactly how it’s done, but I know that the information from separate sensors (TMs) is aggregated in some way to classify the object.
In this case the different sensors are different fingers, each feeling different parts of the objects, and thus gathering different information on them. Since each sensor has limited info, it can’t narrow down the set of possible objects nearly as fast as when info is shared among sensors.
Classic example is a coffee cup object. One finger is on the round handle, so it thinks the object may be a ball – but another finger is on the lip of the cup, so the sensor can eliminate the possibility of a ball. When the different sensors’ info is shared the object is recognized much faster.
Anyone please correct me if I’m wrong/missing anything, but I think that’s a core finding from the paper:
Thanks a lot. It really helps !! One more question, how i know the learning capability of HTM model at specific time, in other words I want to compare two HTM models which is performing better not based on the accuracy but based on internal details of HTM
Thanks for your swift reply. What about the learning measure? I mean If I want to compare between two HTM models which have high learning capabilities? which learn more than the others? which is better?
I’d use the anomaly scores. Models with lower anomaly scores are less surprised by incoming inputs – which means they are predicting more of what’s happening.
In tandem I’d track the prediction densities – to make sure that low anomaly scores aren’t due to the model predicting so many different things at once.
I’d favor a model with both low anomaly scores and low prediction densities. These traits mean the model has: 1) learned to recognize the behavior (shown by lower anomaly scores), and 2) learned the patterns precisely, not predicting too many different things (shown by lower prediction densities).