Strategy for Concurrent HTM Implementation

Well my original idea was that learning would be localized within each shard. This does imply that the output will always be evenly distributed across the shards, though, which has implications for topology. I haven’t personally had a need for topology, but you are right that solving that problem would require feedback from the SRC in order to choose winning columns that may be weighted more heavily in one shard compared to the others.

So you want local inhibition instead of global. That’s a concern that’s
separate from “topology”. I think there is a cost to local inhibition
but it may work well enough.

Topology aside, I don’t think functionally there would be a difference. The initial potential connections established in SP are random. If you are wanting 2% sparsity, for example, does it make a difference that the minicolumns chosen are distributed more evenly across the layer? Is there a capacity concern, or am I missing something important in my understanding of the SP process?

Well with your “evenly distributed” approach you’re limiting yourself to
a subset of all possible SDRs. I think there will be a quality penalty
but that still has to be quantified. I’m not saying the impact is
significant, especially if the number of shards is low.

1 Like

Ah, yes of course. I had a feeling this would impact capacity, which looks like it would.

I’m thinking fixing this problem and topology would involve a specialized “SRC” for the SP process. The shards would do scoring and report their scores to the SRC, and it would select the winners and report them back to the shards to perform learning. I’ll need to think of a better name for this module so it isn’t confused with the other SRCs…

Expanding on this solution a bit more, the amount of traffic transmitted within a spatial pooling cluster could be reduced by having the SP shards only report their top (sparsity * shard count) scores to this Spatial Pooling Controller (SPC). The SPC would join those and clip to just the top (sparsity) minicolumns. It would then report back only those winners which are relevant to each shard for learning.

Btw regarding your pic. There is a convention here that the data moved
"upwards" for feed forward and “downwards” for feedback. It like having
north up on a map. Your pic is the wrong way round.



Best use of emoji in a sentence :trophy: I laughed out loud. blue2 is right about the orientation though. I don’t see a reason your strategy couldn’t be made to work but the timing overhead might have to be quite disciplined to keep it from eating the advantages gained.

1 Like

Yes, that has been my concern with adding concurrency to HTM for some time. I have other strategies, but they tend to deviate from the classic algorithms. I think what makes this one different is that only the activation SDRs are transmitted as compressed array of indexes to the 1 bits (granted the SP cluster must transmit more information due to the problem @blue2 described). Limited information transfer means the window for synchronization between processes can be kept very small. Timing is essentially taken care of by the SRC which transmits the completed output only once it has been fully assembled.

I’ve started writing out a proof of concept in Golang to test the idea out. Will report my findings, however it turns out.

Here is an updated diagram capturing the changes discussed so far.



I’ve been going deep into computer vision over the past couple weeks, using the openCV library. I don’t know what others here have experience with, but I strikes me that we should be able to store layers as either binary (black/white, if you will) or greyscale images that could easily be passed back and forth, with optimized bitwise operations already existing for photo manipulation. Layers can then be passed as lossless PNG or compressed JPG.

Also, I haven’t done much with it before, but I do see that tensorflow (which has good distributed graph support), does have bitwise operations as well.

Has anybody already tried this?

( I first got into openCV using this tutorial series, if anyone is interested: )

1 Like

Excelent idea. In this case, it could be a way to further compress the traffic within the spatial pooling cluster. It would also be interesting to see if a sparse (say 2%) array encoded as a PNG is more compressed than a dense array of indexes to the 1 bits. My suspicion is no, but I’ll have to check into this…

Such partitioning looks like a perfect fit for the Actor Model. Each independent activity (SDR/pooler) could be modeled as an actor, even within one machine. The communication overhead on one machine would be negligible. Scaling out to other machines would be natural as well as messaging is transparent in the actor model.

The concurrency would not need complex synchronization given a good actor model implementation. The scheduler would automatically utilize available cores. (E.g. see Pony and Erlang scheduler details).

To communicate between the nodes, low latency zeromq could be used, serializing with msgpack, thus allowing for heterogeneous implementations.

Sorry for jumping in out of the blue. Will try to catch up with the discussion.


I’ve set out to try this in Pony. Current strategy is to get synchronous basics off the Go implementation, and then see, how actors could fit in. →
If anyone is willing to learn pony via Go + HTM, contributors are welcome


I’ve read through some of the Pony tutorial, and it sounds good like a great match for the idea. What have you completed so far on the htm.pony project and what more needs to be done? I looked for board on github but I didn’t see any features outline or completed so I don’t know what has been done and what needs to get done.

1 Like

Ditto that. I’d be happy to contribute if somebody write up some outlines with some milestones.

1 Like

@MaxLee @jordan.kay thanks for feedback! I’ve started very slowly, translating the go version, as the readme says. Now that there’s some interest, I’ll write a bit. I’ve only managed to port the tests & implementation for the dense and sparse matrices, and was going to continue with encoders. Have a look there, everything is up for grabs. PRs are welcome. Let’s continue on github

1 Like

@MaxLee @jordan.kay I’ve added a couple of issues to grab, a rough plan, and how to contribute. Feel free to join

Hi. The open source Apache Spark project is all about managing distributed tasks across a cluster and collecting the results. May get some inspiration from their code, and maybe see how the core task scheduling works. Scala code.