Yesterday watched Richard Sutton lecture :

Something picked my curiosity i.e. “The one-step trap”.

This problem is close to me, cause I encountered it many times even when implementing my version of HTM.

The problem is that if you have a system, like the current HTM theory where it can predict only one step ahead it is useless in long term, because of the combinatorial explosion of possible predictions the more steps in the future you want to predict.

TD solves this problem via learning to predict from “guess of a guess” and propagating it backwards thus pure-one-step prediction becomes one-step-prediction toward “goal” (even if it is approx goal, it get better and better over time).

how do you plan to solve “The one-step trap” ?

Thinking that one-step predictions are sufficient

• That is, at each step predict the state and observation one

step later

• Any long-term prediction can then be made by simulation

• In theory this works, but not in practice

• Making long-term predictions by simulation is

exponentially complex

• and amplifies even small errors in the one-step predictions

• Falling into this trap is very common: POMDPs, Bayesians,

control theory, compression enthusiasts

There is link to the slides in the video.