True, but even then, with a million columns (enough for input from a typical computer monitor), 4% activation gives 40000 active columns, and a temporal pooler just beginning learning might activate all 8 cells in many of the activated columns which could strengthen/weaken 4 connections per cell (guessing value via this post), performing operations on 1,280,000 segments independently. If I choose the method of adding up activation input values for each column and have 100 local input locations per column, that's 100 million floating point additions.
For storage, the temporal memory would require connection strength to be stored for each connection, which means 8 million cells times about 4 segments per cell, giving 32 million strength values, or 128 MBytes of storage if 32 bit floats are used. Meanwhile, the spatial pooler has 1 million columns with, depending on input radius, around 100 connections per cell, which requires about 400 MBytes of storage. If all that's right, it comes up to about 500 MBytes of storage.
That means my laptop's GeForce 940MX should be able to store about four of those million column networks on GPU memory. It should also be able to handle 100 million floating point additions per second, which is way under the tens of Gigaflops most GPUs are capable of performing. (I think that means it should be able to run the HTM layer at 100 FPS, even using an un-optimized column activation method.)
Whew, that was fun! Now I need to eat breakfast too.
Edit: I think this means I could theoretically run a 4-layer, 1 million column per layer, 8 cell per column, HTM network on my laptop.