Encoding OpenAI Gym's Input

Therefore to get the best result, I think you’re going to need to understand what’s happening in the RAM. Or at the very least, find out where the real boundaries between variables are, as in a RAM-poor system like the 2600 they won’t generally fall into neat bytes.

The question lies in whether making the patterns clear is a job for the sensor or a job for the “brain”. If it is a sensor job, then I should go to find some docs about Atari’s structure and get the data out for my agent. It’s like if we don’t have lenses in our eyes, there is no way our brains can see. Physics laws block that.But if it is a job for the “brain”, I needn’t(shouldn’t) do anything to the data. Getting the data manually is like building an “intelligent” talkbot by telling it "if the user says hello you say hi, if the user says how are you you say fine…"Giving computers data in a human-crafted way (i think) is not the way to achieve a general AI(that’s what I am trying to get with HTM. When it comes to single-purpose AIs , Back Propagation and those CNNs are far ahead of us.). Why AlphaGo can only play Go?Because humans only passed it data in a way that can only pass a Go Board through.

But what if the last bit in one of the bytes actually represents a boolean that corresponds to one of the dots being eaten or not? (another contrived example). It’s going to end up overlapping heavily with an adjacent scalar value when there should be no overlap.

This is something I’m afraid of. When “difference” becomes “similarity”, no intelligent system can work. But at least overlapping heavily isn’t overlapping completely…I must make a wild guess that maybe my agent will figure it out if I stick to the original Ram…Terrified

I don’t know how much you intend to borrow from that other thread you linked, to be honest most of it is well over my head. But I’ll assume you want your HTM system to be able to learn sequences from the game so that it can predict an upcoming negative outcome and avoid it - am I close?

I created my regions and connections based on brain structure. I have dug into those wikipedia pages and citations on the page for a half year, trying to get a clear image of how information is flowed in the brain. This is different from normal programming because I don’t think fully understanding how this program completes tasks is possible. I programs to a state when enough resources are given to the agent so that it is “possible” for it to complete tasks. I only make guesses about why this should work.

1.Open the learning of the regions, and neurons will be “happy” to predict right. Here define “happiness” as a tendency to keep the current state. This is where intrinsic rewards comes from.
2.Send in the normal reward by releasing dopamine on D1 and D2 MSNs, the details are based on biological facts.
3.Plug the output back to the input to let the agent “self-aware”. (Literally, self-aware :slight_smile:) (Biological base: efferent copy)

All the details are settled, but the programing just started. I’m only trying, nothing is proved yet.

if you’re lucky then someone who’s actually attempted RL in HTM will chime in with something a little more helpful

@sunguralikaan did an amazing job

maybe I can ask @sunguralikaan about how to deal with vision on this specific task.