Right way to get output from an HTM system

TBT does not have RL yet.

My current idea is :
TM-like structure.
Dendrite Input : State
FF : ??
Prediction: use Action SDR (the predicted SDR == actual Acion SDR bits) and a REWARD to figure out which dendrites to boost from t-1

The update rule has to keep the last step Active neurons and dendrites /in a buffer/, so that you can do Q-value calc like you do in RL.

Sort of like Ensemble RL that predict active bits of SDR which should match Action-SDR

Then pass this Action to CC. Sense is State.

So what is left is how you represent and apply a GOAL.

1 Like