In the description of LS memory in the meeting we can take this as related to how all predictive systems function - we predict some future state and then perceive the ground truth of the actual future. The error function is the difference between the prediction and the perception- the surprise!
By delaying perception to some point in the future with a memory as is done with LS memory we are doing much the same thing with a different mechanism.
As with most recurrent networks we distribute the time feedback over the one or more stages which establishes the time horizon for the feedback of the error signal. They all describe a pipeline and the key differences are where the comparison with the input stream happens. With HTM it is just one step. With more stages we can distribute the error over different time frames instead of just a fixed time interval; each stage expands the scope of the time under consideration.
By adding in the H of HTM and level skipping connection you should start to see the same multi-stage features normally associated with multi-stage RNNs.
These models are all related in a very deep way in that they all use the sequence of operations as a way to mix perception, memory, and time to internally generate an error signal.
In that same deep sense they are all also forms of auto-encoders.
I know that this is a bit of rambling that does not seem to come to a point but I can see that there is some very large scale descriptive umbrella that covers both RNNS and HTM. I am struggling to formalize that relationship to an elegant reductive form. Not there yet.