Have you guys talked to the guys at Berkeley about their IRAM chip? IRAM is an emerging architecture that appears promising for sparse representation implementations because it places shared DRAM on the same die as the processing element for super low latency. Sparse boolean vectors stored on-chip, with many parallel sparse boolean vector units could radically speed up Numenta’s implementations.
The real advantage for HTM with Processing-In-memory (PIM) such as IRAM, I think, is not in sparsity but in the simplicity of the operations. PIM can be done combined with 3D stacking (v.gr. HBM2-PIM ). The “compute” layer will be less power hungry than in conventional DL (float multiply-adds vs. integer comparisons-adds). With 3D stacking the biggest issue is heat dissipation. This can be a game changer in the long term.
If you replace FP units in  by INT units, the number of OPs per second might much more than 1.2 10^12. You need a “fast” on-chip network tough (communications, even on chip, is a bit of a pain in HTM).
 Y.-C. Kwon et al. , “25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications,” in 2021 IEEE International Solid- State Circuits Conference (ISSCC) , 2021, vol. 64, pp. 350–352.