Is HTM capable of long-term memory?

Oh, sorry for the lack of context. I was talking about learning to represent and classify MNIST digits.
A (maybe) “entirely different” problem of learning the dynamics of a ball would be greatly improved, if it’s modelled with a sense of reference frames, in terms of sample efficiency and extrapolation. It’s a key to solve the curse of dimensionality indeed. I wasn’t even thinking about it in that aspect.

Maybe a grid of spatial poolers could marginally improve the performance due to the increased capacity, but it would definitely suffer from the curse of dimensionality as it too has to observe every possible trajectory for learning. I’m not saying that I like this solution, but weight sharing would definitely help in that aspect, though.

I wouldn’t be satisfied even with the solution of normalizing the reference frame as the canonical coordinates. Maybe the human vision does something analogous to this by fixating the eyes to a moving object, but you can recognize its movement even when you don’t. Or maybe there’s an evidence for some horrendous internal mechanism in the brain that literally routes the visual signals in a way that they are always normalized. I’m not a neuroscientist, so I wouldn’t know.

One problem I see with your approach is that it does not allow path integration, or rather, the model has to learn every possible instance of it. I guess, in your case, it would be the velocity transitions. There is an abundant amount of evidence that the human brain solves this by utilizing the representations of grid cells, head-direction cells, even place cells, etc…, which makes it evident that the brain abstracts away the specific identity of the object it is representing the spatial relationships of, making it applicable regardless of which specific object it is. Whether it’s a ball, an obstacle, or walls wouldn’t matter, and it allows extrapolation in its truest sense, in turn, enables very efficient learning of dynamics.

But it would be an understatement that it’s easier said than done. Maybe it’s enough to normalize the reference frame for the applications and experimentation. However, in that case, I see no reason not to use weight sharing, especially when it would probably work just as well. But that’s just my opinion.

1 Like

Thats the thing that bogs me down.

How the heck is it possible to take raw moving data and somehow transform it into:

  • a stable object id (what)
  • a “numerical” grid cell-like location code (where)
  • and a velocity code that can be used to preddict changes in the location code (how).
2 Likes

I genuinely want to know as well. But I believe that identification and localization go hand in hand, rather than being even remotely indepedent, as the brain evolved intelligence by applying the mechanism responsible for exploration, which resembles SLAM, which is often utilized for robotics. And in SLAM, better localization means better mapping and vice versa.
My suspicion is that the effective disentangling between what and where is achieved by compositionality, or maybe I’m just too naive. :person_shrugging:

1 Like