In this thread, I will describe a small framework that can be helpful for anyone that wants to build a software agent in HTM+RL for browser-based experiments. I hope this can increase collaboration and incrementally build towards a more scalable solution of an RL integration into HTM.
The last couple of months I was working part-time on my Bachelor thesis towards a software agent through a combination of HTM and reinforcement learning for one-shot character recognition. Time was short and I had to learn a lot of new concepts and deepen my understanding, which in the end led to just about a month of implementation and experimentation. Thus I did not really accomplish the results that I wished for but I tried to gather the obstacles and combine them in a small framework such that anyone who wants to try similar does not have to start from scratch and we can build incrementally towards a better working solution.
The theory is mostly based on the amazing thesis work of @sunguralikaan for a “HIERARCHICAL TEMPORAL MEMORY BASED AUTONOMOUS AGENT FOR PARTIALLY OBSERVABLE VIDEO GAME ENVIRONMENTS”. Thank you for the help during the work! (see our long forum discussions)
There are some important differences in the details such as global decay or about the information flow, but the general concepts do apply. More information can be found in my final thesis draft.
One of my main objectives was to make it very easy to reuse, modify and compare. As I attempted one-shot character recognition to relate to Lake et al. paper on “Ingredients for Artificial Intelligence”, I needed a flexible way to define my experiments and one that optimally was close to other machine learning research. I found OpenAIs Universe - World of bits environment. Unfortunately, the project lost official support during this time.
The framework consists of two main parts:
- The client (The NUPIC-Agent and universe libraries)
- The remote (The gym environment with experiments written in JS/CSS/HTML)
New experiments are simply defined as a web-task where the agent controls the mouse and sees the screen pixels while accumulating a reward.
An example experiment from the paper:
The agent implementation is based on the NUPIC Network API. This makes is very easy to add/remove layers or change the information flow in experiments. If you have experimented with the NUPIC and HTM-Research components, in the GitHub-repo is mentioned which are used and how the different layers are built.
It is very easy to e.g. interchange/modify the encoder or add your own agent logic if you attempt to build one based on HTM theories and test it on some experiment that you can define as a browser task.
The overview of the architecture (each layer is implemented in NUPIC):
Additionally, to the source installation, the environment and the agent are packed in docker images which makes it very easy to deploy them in the cloud and run your experiments remotely.
Example of observing experiments remotely via VNC:
However, there are still many things missing and parts of the code would need refactoring as they are based on over-functional NUPIC components. A list of what is still missing:
- Parameterize verbosity level of debugging print-out (e.g. Indices)
- Refractor code and documentation (simplify some components that are based on NUPIC-components)
- Support/Optimize parallel training of multiple agents in the cloud.
- Finish serialization implementation (SparseMatrixConnections from NUPIC Core missing)
- Add support for Player guided exploring
- Advance visualization and debug tools
Nevertheless, I wanted to publish it rather now than later as I am not sure when I will find the time in the next months to continue the work on it. I tried to add sufficient documentation, but for questions, you are always welcome to contact me.
The Github-Repos | Docker Images :
The main Github repo should contain all information to set it up.