I didn’t add a TM to the agent because the world state is static and nothing is hidden to the agent. So there’s no need of memory. Hm… how could HTM perform reward tracking? I’ve been trying but it seems very difficult with the standard SP/TM and local learning rules. (And tracking rewards as a real number is also tricky)
That’s a very good point!