The one-step trap


#1

Yesterday watched Richard Sutton lecture :

Something picked my curiosity i.e. “The one-step trap”.
This problem is close to me, cause I encountered it many times even when implementing my version of HTM.

The problem is that if you have a system, like the current HTM theory where it can predict only one step ahead it is useless in long term, because of the combinatorial explosion of possible predictions the more steps in the future you want to predict.

TD solves this problem via learning to predict from “guess of a guess” and propagating it backwards thus pure-one-step prediction becomes one-step-prediction toward “goal” (even if it is approx goal, it get better and better over time).

how do you plan to solve “The one-step trap” ?

Thinking that one-step predictions are sufficient
• That is, at each step predict the state and observation one
step later
• Any long-term prediction can then be made by simulation
• In theory this works, but not in practice
• Making long-term predictions by simulation is
exponentially complex
• and amplifies even small errors in the one-step predictions
• Falling into this trap is very common: POMDPs, Bayesians,
control theory, compression enthusiasts

There is link to the slides in the video.


#2

In my understanding of HTM, I don’t think the answer to the “one-step trap” is in the predictive states used by TM, but rather in the active states used by pooling. In SMI, for example, the active cells in the output layer represent entire objects, while the predictive cells in the input layer only predict the next feature/location input.

While pooling hasn’t been officially applied to sequence memory yet, it isn’t difficult to imagine this same concept being applied to depict an entire sequence as an “object”. In my own implementation of RL, I use a forward-looking pooling strategy which activates representations depicting the remaining elements in a sequence (versus all elements in the sequence).

When hierarchy is eventually tackled, I expect the “one-step trap” will go away completely – the predictions being made will be sequences of sequences, or sequences of sequences of sequences, and so-on.