I am not familiar with the internals of current nupic implementation (just starting to poke around after a long pause), but speculating at a high level based on limited exploration, it appears that the underlying data model uses arrays and matrices much like the traditional neural networks. This data model for implementation is certainly very amenable to GPU acceleration, especially layered on top of tensorflow, pytorch etc.
But, are there advantages to using an underlying graph model instead for implementation? Some advantages that come to mind: sparsity comes along naturally, algorithms are easier to implement and visualize, the biological model of neuron and synapse may be considered as a natural graph. Details such as dendrite processing and reference frames can be “template subgraphs”. An obvious downside is the divergence w.r.t traditional neural network implementations, and all the goodies that come with them.
Graph databases and frameworks like Apache Tinkerpop and Neo4j have grown significantly and deployed on very large scales in the last decade+; using these frameworks will make experimenting with such an idea a bit easier. Tempted to explore in this direction, but wondering if this is a dead end based on prior experience in this community.
I played a bit with pytorch and implemented some parts of HTM in it. It was not a full HTM clone, I just wanted to understand some parts of the theory and learn pytorch. It felt very natural to write the code using tensors: only a few sum, threshold and topk functions are needed. This allowed me to write most of the code in ~20 lines of code. In this regard it is very efficient to write and understand. It can be easily modified and tweaked. And thanks to tensors, very scalable to multiple implementations in parallel.
However there is a big drawback: memory and compute performance!!! It is hard to give concrete benchmarks, because it highly depends on the SDR parameters like sparsity and vector width. However with a sparsity of 2% roughly a factor of 50x memory can be improved when directly implementing it in C or any other pure-language. For speed it is roughly the same, but it also highly depends on the optimizations (which might affect the algorithm).
In my experiments however RAM was the bottleneck for pytorch. Even on my Laptop I was frequently running out of RAM. There was no way to think about bringing the algorithm to the GPU, as the RAM is even more constrained there. For that reason I would not recommend it for production use. Sparsity is a nice and important concept, and when available it allows a massive speedup! Pytorch does not feature anything that allows to write a HTM implementation, that is comparable to a bare metal implementation.
Also a sidenote: my HTM-like implementation was implemented as the forward pass through a network. Learning was done in the backward pass, however instead of gradients it used the values obtained from the forward pass. My initial idea was to combine HTM and neural networks, to make a system that is differentiable, however that turned out to be impossible. HTM is mainly based on binary values, whereas neural networks need real valued variables… and even when the HTM math is extended to real values, the system does not really learn any better… so this is pointless.
Thanks for writing a detailed response, and sharing your implementation experiences. My own guess was that Pytorch and tf are very useful in “standard” neural net implementations that are good to minimize errors in training datasets. Your response adds to an intuition that HTM and especially the newer concepts in TBT (thousand brains theory) can not easily fit into those frameworks.
Ideally, TBT implementation is crafted from scratch using very simple primitives supported by modern graphs, layered with support for a type system for nodes, edges and subgraphs allowing for arbitrary linkages including hierarchies. This will not be easy to implement, e.g., in 20 lines of code, and one is better off starting with an existing codebase. Also may be slow at runtime at least in the beginning; but the graph model itself can be ‘bootstrapped’ if properly architected using reflexive/recursive elements. Not sure about memory efficiency; worth exploring - will update here, and will take some time. I like playing around with reflexive and recursive graphs and this seems like a nice puzzle.