I have been going through the HTM basics lecture by Rahul Agarwal and although that lecture cleared a lot of my concepts about HTM but I am still finding it difficult to understand how the HTM model learns about temporal sequences. Please help.
Maybe this paper can help you : https://numenta.com/papersvideosandmore/resources/hierarchicaltemporalmemorywhitepaper/
And there are some others here : https://numenta.com/papers/
Maybe it can be good to integrate links to these documents in the nupic doc ?
Is there anything in particular that you are confused about? You should definitely also check out HTM school if you havenâ€™t.
I find it easiest to think about the temporal a.k.a. sequence memory as representing the sequence so far. Except for the first input of the sequence, you form the representation by putting the new input in context of the previous sequence so far. This creates a chaining effect, where each input changes which cells are on for future inputs of the same sequence.
Hereâ€™s an HTM algorithm. It has some required components and details left out for simplicity. Itâ€™s also in an unusual order for simplicity. Usually, HTM determines which cells would be used in each column if that column turns on next cycle. This is called the predictive state. This algorithm waits to find the predictive state until the input being predicted arrives.
For each time step/next input:
 Using the spatial pooler, determine which temporal memory columns are on, and the spatial pooler learns. The goal is mostly just to create a sparse distributed representation.
 Inactive columns always have inactive cells. For each active column, determine which cells are on. Each cell has a bunch of dendritic segments, basically one for each known sequence context for which the cell is used. Each cell gets used for a bunch of different sequences, so the sequence is ambiguous if you look at one cell. But the combination of cells unambiguously indicates the sequence so far.
To decide which cells in a column are on:

For each segment which belongs to a cell in the column, calculate its total based on the previously active cells and its synapses. To find the total, just count how many synapses have permanence values over a certain threshold value and have a previously active cell as the synapseâ€™s input.

If any segmentâ€™s total is over a fixed threshold, turn on the cell to which the segment belongs. Those segments
learn, to help deal with noise. 
If there still arenâ€™t any cells on in the column, add a segment to a random cell to handle the sequence context. Its synapses have inputs from a random selection of x previously active cells. The permanence values could start above the threshold, but in a normal algorithm they would wait to see if the sequence can repeat.
Thanks a lot for your explanation. It helped a lot.