Encoding OpenAI Gym's Input

encoders
openai

#1

I came up with an structure made of HTM neurons that maybe has the ability to learn with realtime data ,rewards and intrinsic rewards (partly inspired by
Proposing a Model for the Basal Ganglia and Reinforcement Learning in HTM )
and want to code it out to test my ideas.

The environment I choose for my agent to learn and play is OpenAI Gym’s MsPacman v0
(http://gym.openai.com/envs/MsPacman-v0/)
There are two types of input(observation) I can choose from: Ram and Game Image
For Game Image, Gym will return a 3D (210,160,3) array (210 and 160 are the height and width of a emulated Atari screen while 3 means R,G,B).
For Ram, Gym will return a 1d array with 128 numbers from 0 to 255, derived from the emulated Atari machine’s ram.

From this forum I learn that encoding vision is a very hard problem which you haven’t solved completely.So I have to choose Ram.So how should i encode the ram properly?
Ram is something like this:


(here shows Ram in consecutive 3 timesteps)

For me, i can only think of scalarEncoding the numbers in the ram one by one and stack them to make a new SDR
scalar encoder whose resolution = 1, w = 21 , minval = 0 ,maxval = 255 encodes 128:

I have only gone through the HTM school videos and read some example code.I understand the meaning of the parameters, but don’t master the ideas of encoding as well as […is this grammatically correct…ah…im not a native speaker, plz don’t misunderstand this!] most of you in this forum.So I am asking : how should i encode the ram properly so as not to lose the information in the ram (and also…effectively)

Thank you

—Edit—
By the way, I know MultiEncoder can be used to make different fields of input into one SDR…are 128 fields too many?
And, if field Alpha uses 100 bits to encode while field Beta uses 10, will Alpha become more “important” than B?Should I make every bit encoding B 10 bits, or the network will figure out that Alpha and Beta are of the same importance?


#2

I’m not an expert in Atari architecture, but it safely predates ASLR so I suppose address locations may consistently represent some aspect of the game (like locations of MsPacman, enemies, dots etc).

Of course you could build a number of encoders that “understand” what different areas of RAM mean (e.g. you use a scalar encoder for the RAM that represents health level and a coordinate encoder for the RAM that represents a player position), but from memory the ground rules of OpenAI gym are that you only define reward and punishment, and no other semantic meaning of the game is coded in.

So if you’re sticking with these ground rules, it seems to me you’re left with as much of a challenge as the vision option.

But I could easily be wrong, so definitely don’t let me stop you experimenting :grin:


#3

but from memory the ground rules of OpenAI gym are that you only define reward and punishment, and no other semantic meaning of the game is coded in.

I don’t understand this sentence. Gym gives the environment, and we provide algorithms. Gym gives us two types of observation to let our agents know what is happening.Here, Ram and Game Image are the same to our new-born agents for they both have unlearned patterns in them waiting for the agents to discover.Wrong?

And, if locations, enemies, dots are represented by some bits in the ram, what does “no semantic meaning is coded in” mean?

Anyway, I post this thread to ask how to properly encode, but your concerns are very valuable.I don’t want my agent to stare at nothing informative :P. Waiting your reply!

(PS.I’m guessing that the word “meaning” means differently to me and to you…)


#4

@Richardn,

After reading the doco it seems my understanding of OpenAI gym is slightly off - I’d read the original DeepMind paper that inspired this, it was all about taking raw pixel input and using deep reinforcement learning rather than handcrafted features. Then when I checked out OpenAI I saw similar examples in the arcade learning environment, and assumed that all gym environments worked this way - turns out a lot of them do actually provide observation spaces that are handcrafted features (like cartpole).

So I assumed it would be considered poor form to start feature engineering on the RAM values and then building an algorithm driven by the game’s characters/props and what they are doing - turns out I’m wrong and this is actually fair game, apologies for the confusion! (and also my grammar didn’t help either :wink:)

I don’t know how much you intend to borrow from that other thread you linked, to be honest most of it is well over my head. But I’ll assume you want your HTM system to be able to learn sequences from the game so that it can predict an upcoming negative outcome and avoid it - am I close?

Normally HTM encoders are supposed to map an input value onto a SDR such that greater semantic similarity between two inputs results in more overlap in their SDRs. This is what you hope will give you reliable enough predictions when the exact sequence hasn’t been seen before. The problem I think you’ll encounter with just ScalarEncoding the RAM is that the magnitude of those values may not always correspond to semantic similarities in the game.

For example, your scalar encoding of the 128 value is perfect if that whole byte happens to represent something like health score - health of 128 is very similar to health of 127 and they overlap almost entirely. And a health score of 0 is completely different to a health score of 255 and there’s no overlap, so again this is great. (obviously this doesn’t make sense in the dead-or-alive world of pacman, but I couldn’t think of a better example)

But what if the last bit in one of the bytes actually represents a boolean that corresponds to one of the dots being eaten or not? (another contrived example). It’s going to end up overlapping heavily with an adjacent scalar value when there should be no overlap.

Therefore to get the best result, I think you’re going to need to understand what’s happening in the RAM. Or at the very least, find out where the real boundaries between variables are, as in a RAM-poor system like the 2600 they won’t generally fall into neat bytes.

Hope this is some use, if you’re lucky then someone who’s actually attempted RL in HTM will chime in with something a little more helpful :grinning:.


#5

Therefore to get the best result, I think you’re going to need to understand what’s happening in the RAM. Or at the very least, find out where the real boundaries between variables are, as in a RAM-poor system like the 2600 they won’t generally fall into neat bytes.

The question lies in whether making the patterns clear is a job for the sensor or a job for the “brain”. If it is a sensor job, then I should go to find some docs about Atari’s structure and get the data out for my agent. It’s like if we don’t have lenses in our eyes, there is no way our brains can see. Physics laws block that.But if it is a job for the “brain”, I needn’t(shouldn’t) do anything to the data. Getting the data manually is like building an “intelligent” talkbot by telling it "if the user says hello you say hi, if the user says how are you you say fine…"Giving computers data in a human-crafted way (i think) is not the way to achieve a general AI(that’s what I am trying to get with HTM. When it comes to single-purpose AIs , Back Propagation and those CNNs are far ahead of us.). Why AlphaGo can only play Go?Because humans only passed it data in a way that can only pass a Go Board through.

But what if the last bit in one of the bytes actually represents a boolean that corresponds to one of the dots being eaten or not? (another contrived example). It’s going to end up overlapping heavily with an adjacent scalar value when there should be no overlap.

This is something I’m afraid of. When “difference” becomes “similarity”, no intelligent system can work. But at least overlapping heavily isn’t overlapping completely…I must make a wild guess that maybe my agent will figure it out if I stick to the original Ram…Terrified

I don’t know how much you intend to borrow from that other thread you linked, to be honest most of it is well over my head. But I’ll assume you want your HTM system to be able to learn sequences from the game so that it can predict an upcoming negative outcome and avoid it - am I close?

I created my regions and connections based on brain structure. I have dug into those wikipedia pages and citations on the page for a half year, trying to get a clear image of how information is flowed in the brain. This is different from normal programming because I don’t think fully understanding how this program completes tasks is possible. I programs to a state when enough resources are given to the agent so that it is “possible” for it to complete tasks. I only make guesses about why this should work.

1.Open the learning of the regions, and neurons will be “happy” to predict right. Here define “happiness” as a tendency to keep the current state. This is where intrinsic rewards comes from.
2.Send in the normal reward by releasing dopamine on D1 and D2 MSNs, the details are based on biological facts.
3.Plug the output back to the input to let the agent “self-aware”. (Literally, self-aware :slight_smile:) (Biological base: efferent copy)

All the details are settled, but the programing just started. I’m only trying, nothing is proved yet.

if you’re lucky then someone who’s actually attempted RL in HTM will chime in with something a little more helpful

@sunguralikaan did an amazing job


maybe I can ask @sunguralikaan about how to deal with vision on this specific task.


#6

Agreed, but currently HTM is mostly focused on the common learning algorithm present in the neocortex. This means that for now, the encoding of raw sensory input that occurs in other areas is emulated via this human-crafting, because exactly how they are encoded would vary across the different sensory modalities (arguably the neocortex is being fed encoded data in a nature-crafted way).

Good call!


#7

I would go with the game image. The game image is the output of the data in RAM and in my opinion a more abstract version of it (might even qualifiy as hand-crafted compared to raw RAM data). I would think that learning patterns of RAM would be harder because it is lower level and I would assume a lower chance of semantic similarity.

The answer to this question is not really clear cut. The amount of pre processing done by the retina cells is huge. There are neuromorphic (retina-like) vision sensors just trying to imitate retina doing all sorts of image transformations before the input is fed to the model. So the job is for both the brain and the sensor. The more work the sensor does, the more capacity left to brain for higher level things. If the sensor provides less, the brain needs to decipher more.

Some rambling on hand-crafted data.

When I asked the feasibility of a HTM game agent 4 years ago on Nupic mailing lists, someone told me that predicting 16 independent variables was too much for HTM and that I needed handcrafted features. I got dissapointed and answered with the argument you came up above.

However, biology has enourmous amounts of “handcrafting” applied on top of the chemical reactions caused by the photons in human retina to make it more digestible for the brain. In time, what is handcrafted and what is natural became a moot point for me. In addition, raw sensory input isn’t the most bio plausible one either because it is not what the brain gets.

Can you get it to work with handcrafted features? If you can, start from there and see where it gets stuck. If you can’t, either your sensor should provide a more learn-able format for the model, or your model should understand the sensor better.

The brain may be able to understand images that it normally cannot, if I find a better way to show it or if I show it on different light conditions. General intelligence seems like it is more than being able to act on raw input data. Alpha go proves that by just being able to play go.

I insisted to work with raw data so my vision encoder was an RGB array just like in the atari example. You could feed raw pixel data of the game at each frame and maybe the HTM can capture stuff if configured properly. I work with edge detection or event based sensors currently. This thread has valuable discussion on encoding vision if you haven’t read already.

Below are additional questions to challenge your current architecture that might be helpful.

Why would the agent explore or even move at all?

Is this the activity output of the agent or the game image? If it is the former, how would it help with its task other than being self-aware? Also, if it is the former how would you merge it with the RAM or image data as those were the actual inputs?


#8

I think of the retina as a specialized chunk of the brain that ended up mounted in a slightly remote location. When you consider that the eyes are mounted on a gyroscope-stabilized platform and that the gaze point is directed by a bunch of specialized lower level processing centers that work with this system and feed back into the spatial location processing - it’s really hard to look at it any other way.