Encoding OpenAI Gym's Input

I would go with the game image. The game image is the output of the data in RAM and in my opinion a more abstract version of it (might even qualifiy as hand-crafted compared to raw RAM data). I would think that learning patterns of RAM would be harder because it is lower level and I would assume a lower chance of semantic similarity.

The answer to this question is not really clear cut. The amount of pre processing done by the retina cells is huge. There are neuromorphic (retina-like) vision sensors just trying to imitate retina doing all sorts of image transformations before the input is fed to the model. So the job is for both the brain and the sensor. The more work the sensor does, the more capacity left to brain for higher level things. If the sensor provides less, the brain needs to decipher more.

Some rambling on hand-crafted data.

When I asked the feasibility of a HTM game agent 4 years ago on Nupic mailing lists, someone told me that predicting 16 independent variables was too much for HTM and that I needed handcrafted features. I got dissapointed and answered with the argument you came up above.

However, biology has enourmous amounts of “handcrafting” applied on top of the chemical reactions caused by the photons in human retina to make it more digestible for the brain. In time, what is handcrafted and what is natural became a moot point for me. In addition, raw sensory input isn’t the most bio plausible one either because it is not what the brain gets.

Can you get it to work with handcrafted features? If you can, start from there and see where it gets stuck. If you can’t, either your sensor should provide a more learn-able format for the model, or your model should understand the sensor better.

The brain may be able to understand images that it normally cannot, if I find a better way to show it or if I show it on different light conditions. General intelligence seems like it is more than being able to act on raw input data. Alpha go proves that by just being able to play go.

I insisted to work with raw data so my vision encoder was an RGB array just like in the atari example. You could feed raw pixel data of the game at each frame and maybe the HTM can capture stuff if configured properly. I work with edge detection or event based sensors currently. This thread has valuable discussion on encoding vision if you haven’t read already.

Below are additional questions to challenge your current architecture that might be helpful.

Why would the agent explore or even move at all?

Is this the activity output of the agent or the game image? If it is the former, how would it help with its task other than being self-aware? Also, if it is the former how would you merge it with the RAM or image data as those were the actual inputs?

3 Likes