Reward hacking in simple HTM agents (using OpenAI Gym)

Here’s a visual draft of my proposed model.

Input flow:

  1. Modulator (thalamus) receive raw input values from sensor/motor systems, encodes it.
  2. It is then concatenated with current state representation (encoded), and which state pools were back-activated the previous time-step.
  3. First level HTM node processes input (producing both SP/TM data).
  4. SDR representation of first level node is fed to state-change, concatenated with current state encoding (since we see that thalamus is connecting to multiple layers in cortical columns, this seems acceptable/plausible… play with this idea.)
  5. Anomaly score is calculated on state-change pools --> high score represents “surprise”, low score represents an accurate prediction.
  6. Send inverse of anomaly scores (concatenated from all state pools), so that correctly predictive state pools show up most to the modulator’s HTM node.
  7. Modulator HTM node receives encoding of I.A.S’s, concatenated with which intended goal was being pursued (can be a simple binary encoding of which goal was active or not for this time step)… this allows the modulator’s HTM node to learn the relationship between which state pool was correctly predictive for a given active goal.


Output flow:

  1. (Depending on logic) for goal which is actively being pursued, activate associated state pools.
  2. State pools receive their activation signal, triggering predictive columns to fire down their distal connections to the First Level node.
  3. Connected first level columns activate their distal connections into the output space, generating an output encoding (can optionally have some thresholding on those outputs, so that low-scoring connections don’t fire into the output encoding space)
  4. Modulator receives output encoding, and has option to intervene based on any desired logic, or not.
  5. Encoded output is decoded to raw form, and sent back out to the world.

Here’s an image of the logic flow, with black being the input pass, with red being the output pass.

You can imagine that there might be other parts, such as a node for thresholding the output encodings (so that weak outputs might not make it out the gate):

There is room for some flexibility here (such as what info to provide to which level, feedback, etc.), but I think this covers the initial idea I had for an HTM-based state machine for agent-based learning.