(I’m purposely trying to use generic vocabulary where I can below, as I understand not everyone here is familiar with deep learning terms… I occasionally slip though. Please forgive me.)
Just saw this video where a reinforcement learning system using deep learning seems to have been able to:
- Internally create a representation of the game world that it experienced/observed/was exposed to.
- Generate predictions about what comes next in that world (common enough for the past couple years with recurrent neural networks for a few years… it’s improved a lot in the past year though).
- Then use its internal understanding of its experience to further train itself and improve.
Where this differs from previous approaches/experiments is that this system instead of only being fed by actual game data, was allowed to be fed by its own memory/generated understanding of the game world that it had previously experienced, then repeatedly generate prediction upon prediction over time… it predicted its own state over time, in a simulation of its own making (hence Siraj calling it “dream environment”… the simulated world was a figment of its own predictions).
What I find interesting about the approach is that it seems like something that should be doable with HTM as well, with temporal memory and grid cells…
Encoders that could take/process screenshots?
—>Maybe incorporate some deep learning where the first few layers (which typically learn corners/edges, then higher level features) send their output to an SDR encoder, rather than using any other schemes to convert images to SDRs?
How to encode a “reward”?
—>maybe have an encoder with small bit space which receives feedback from the simulation/game about game state, allowing HTM system to associate different input states with different game states?
Number of passes over initial input data before inducing “feedback” (feedback == output predictions of HTM system back into itself…
—> Researchers in the mentioned paper’s case allowed their convolutional neural network to first self-create a dense representation/compression of the world states. Unsure if HTM could do this, or if this representation is what might be fed into the HTM, which would essentially replace the memory unit in the referenced experiment.
Am I mistaken in my understanding of HTM’s abilities, or should this be doable? Current deep learning models for “World Models” seem to first require a ton of data in order to learn, again, via back propagation. (It should be noted, that in order to perform well, any DL system requires a lot of data examples…)