Huge disclaimer: I’m very far from the most experienced person on this forum, if anyone else chimes in then I’ll likely defer to them Also, my approach is to build up the answer from basics, this is for my benefit as much as anything, so don’t think I’m being condescending!
It’s an interesting problem. @cogmission’s comment from the other topic is worth keeping in mind:
Encoders transform input data into a format which encodes the semantics of the problem domain
(I’d merge these topics but don’t have the permissions).
Definitely read/watch these if you haven’t already:
So keeping in mind that semantically similar input should result in a higher overlap, you have two different measurements at play (location and a movement vector).
In isolation, the concepts are straightforward:
- The closer two objects are in 3D space, the more overlap should result because we’d assume close proximity implies semantic similarity
- The closer the two movement vectors match, the more overlap should result because we’d assume that similar direction and speed implies semantic similarity
As others have noted, there’s an implementation here, although it’s been noted as an incomplete implementation and needs some work.
But if you don’t use this approach, as @sheiser1 said you’ll have a bunch of predictions in isolation. In your scenario, to me this pushes the overall “semantic similarity” problem downstream. Let me butcher the tutorial image from “Building HTM Systems” to try to illustrate:
You now know on each individual axis how close the objects are, but we’re still just as far from semantic similarity in the 3D space (e.g. similar position on the x axis does not imply similar overall location in 3D).
The problem of encoding the movement vector is similar and done in parallel, then I imagine stacking the spatial poolers would work to predict the next step.
As for predicting the final location, I guess this would be a subsequent post-HTM classifier step, predicting a categorical value that you’d need to provide alongside the other data. Of course the usual machine learning caveats apply, the data in the sequence would have to have predictive power on the final location, whenever it’s ambiguous in the real world then it won’t be an accurate prediction.