I had a thought recently about a possible strategy for adding concurrency to the classic HTM algorithms for use in multi-core processing. Could also be worked into a networked solution for horizontal scaling without breaking the bank (large-scale HTM on a Raspberry Pi cluster like this for example). Thought I would see if anyone has gone down this path and maybe identify some potential gotchas.
The basic idea would be to take a cluster of shards each process a sub-set of cells or minicolumns in parallel, and assemble their outputs into a completed SDR. This would be done by a leveraging a module which I’ll call the Sparse Representation Controller (SRC) which takes chunks of a representation and reassembles them:
An SRC would act as a message bus with receivers and transmitters. Shards that need to know complete activation SDRs would register as receivers, and related shards would register as transmitters to report their output. Once an SRC receives all outputs from transmitting shards, it would construct an activation SDR and transmit it to all registered receivers. Because only the resulting activation SDR is transmitted, the size of traffic within the cluster is small, and most of the processing within a shard can happen in parallel with a relatively small window required for synchronization.
A typical setup for sequence memory would look something like this (where each box could execute in parallel, and clusters could be broken into any number of shards)
The encoder would transmit its output to an SRC which is responsible for the SDR representing active input cells. The spatial pooling process would be sharded (in this example each shard is responsible for half the minicolumns). They score just the minicolumns they are responsible for, then report their winners to a second SRC responsible for assembling the “active minicolumns” SDR.
The TM process would also be sharded (in this example, each shard responsible for a 25% of the minicolumns). The TM shards would be responsible for all their minicolumns’ cells and dendrites. They would receive the “active minicolumns” SDR from the SRC above, and perform the TM algorithm on only the minicolumns they are responsible for. They would report the activated cells to a third SRC responsible for assembling the “active layer cells” SDR. This would then be fed back to the TM shards for use in the TM algorithm (i.e. previously active cells)