This is abit tongue in chick question …
If you think about it from the point of view of higher layers TM should not be needed.
Let me graph it … normally we imagine it this way :
[Encoder]=>[Spatial pooler]=>[Temporal Memory]=>[Temporal pooler]=>[Higher LVL]
but what if we just have :
[Encoder]=>[Spatial pooler]=>[Temporal pooler]=>[Higher LVL]
The only difference is that we miss TM predictive capability … except that all else looks good.
Why wouldn’t this scheme allow us to recognize objects ? It is still “recording” the interaction as single SDR at the TP output, right ? And if two pooled-recordings are similar then the objects will be similar , aren’t they !!
I groked the answer
Perception is bottom-up, Prediction is top-down !!
so on the way up we only use TM to learn the sequence, but dont use it to predict.
On the way down we use it to predict and influence the bottom layer TPooler.
The question is how do you merge/update lower-TP i.e. the order of actions to be executed , because you have to sync signals coming from 3 directions :
1. Update from below 2. Lateral sync between columns, voting 3. Update from Prediction from the top
any ideas ?