“Neuroscientists have long suspected that a similar mechanism drives how the brain works. (Indeed, those speculations are part of what inspired the GQN team to pursue this approach.) According to this “predictive coding” theory, at each level of a cognitive process, the brain generates models, or beliefs, about what information it should be receiving from the level below it. These beliefs get translated into predictions about what should be experienced in a given situation, providing the best explanation of what’s out there so that the experience will make sense.”
“The prediction errors that can’t be explained away get passed up through connections to higher levels (as “feedforward” signals, rather than feedback), where they’re considered newsworthy, something for the system to pay attention to and deal with accordingly. “The game is now about adjusting the internal models, the brain dynamics, so as to suppress prediction error,” said Karl Friston of University College London, a renowned neuroscientist and one of the pioneers of the predictive coding hypothesis.”
It’s exciting to see the big AI players paying attention to biology, hopefully this snowballs into more investment in this area.
The way they’ve implemented it seems like a fairly modest modification of other recent reinforcement learning techniques like those in OpenAI Universe. With an LSTM network making the predictions, it’s difficult to imagine any sort of invariance emerging. The language in their tweet seems a little misleading:
The Generative Query Network, published today in @ScienceMagazine, learns without human supervision to (1) describe scene elements abstractly, and (2) ‘imagine’ unobserved parts of the scene by rendering from any camera angle.
DeepMind’s AI Learns To See | Two Minute Papers #263:
After looking at DeepMind’s GQN maze exercise (see the link above from @keghn_feem), I thought it might be fun to conduct a similar exercise using HTM in place of neural nets, potentially using grid cell elements as part of the implementation. Not expecting it to perform particularly well, but people seemed to go wild over the idea that an AI could predict unknown visual input, so I saw it as a good opportunity to plug HTM and improve my knowledge at the same time.
I’m not aiming for high biological fidelity, so not going to saccade vision around the scene or anything, I thought I’d start with:
- a naive bitmap encoder
- repeated loop around the maze
- a simple SP and TP architecture
- enhance the bitmap encoder to include edge detection and maybe primitive object detection
- look at using the location signal to incorporate grid cells, then the head direction cells to start predicting the next frame without repeating the same loop.
Is this a crazy idea that will fail badly? Any advice from the community to maximise my chance of success?
visual input + barebones HTM is maybe not a perfect match for predicting next visual input.
but if you’re ready to try to add parts of a hierarchy then @sunguralikaan’s work may be of interest for you.
Ah. Some question. When you say ‘TP’, what do you mean ?
I’m guessing Temporal Pooler but beside some of Paul’s ideas around it, it’s not part of what we know how to implement… so, what do you have in mind ? ^^’
Hi @gmirey, thanks for the comments! I actually mean temporal memory, I get a little confused with the term as I think it’s sometimes TP in the code (?)
As you’ve reinforced, I expect a whole lot of nothing by just plugging vision straight into HTM, but I hoped at least to play around with it and try some things. To get a result anything like the demo DeepMind showed, I’d need to cheat a fair bit on the input encoding, but hopefully not so much that it undermines the purpose of the exercise; that a biologically constrained approach can not only yield some kind of result today, but that as the gaps are filled in the future it becomes more compelling.
In any case I definitely won’t be the person to solve vision, just hoping that the sheer simplicity of the maze demo allows for a shortcut or two
This is one area of confusion for anyone who starts digging into some of the background materials, code, and research that current HTM theory is built on. It seems that for some time in its life, the temporal memory algorithm was referred to as Temporal Pooling. Early in HTM research, there was an understanding that temporal patterns are pooled in the brain into stable patterns which “name” a sequence for it to be used in a hierarchical fashion. Since hierarchy was a key idea in HTM’s earlier days, I expect this is why the algorithm was originally called Temporal Pooling.
I should point out that as a newcomer to HTM myself, I was not around during this time frame. I can only speculate that as the code matured, it probably became apparent that high-order sequence memory and pooling of sequences into stable representations really needed to be broken out into separate functions. I imagine this was probably the motivation behind changing the name of the algorithm from Temporal Pooling to Temporal Memory. Someone at Numenta can undoubtedly explain this better though.
I personally see this as more of a bookkeeping exercise than anything else. The distinction between the two functions was known early on in the theory. One example is Jeff’s Presentation at UBC Department of Computer Science in March 2010. At around 22:27 Jeff describes these as separate functions, and at 48:57 he describes one possible implementation of it (essentially feeding the SP of next hierarchical level with activity over multiple timesteps). I personally believe that particular implementation is missing some important properties (like lower levels being able to represent long sequences and complex objects), but it does point out that TP has always been understood to be an important part of the cortical circuit when it comes to hierarchy.
I still think the term “Temporal Pooling” itself is still a perfectly valid term to keep around in HTM vocabulary, because there is still a need to pool temporal inputs into stable outputs (the SMI “Output Layer”, for example, will require this functionality in the current round of research). Ultimately what TP should do is form a stable, sparse representation which “names” an object or sequence while preserving its semantics.