Such partitioning looks like a perfect fit for the Actor Model. Each independent activity (SDR/pooler) could be modeled as an actor, even within one machine. The communication overhead on one machine would be negligible. Scaling out to other machines would be natural as well as messaging is transparent in the actor model.
The concurrency would not need complex synchronization given a good actor model implementation. The scheduler would automatically utilize available cores. (E.g. see Pony and Erlang scheduler details).
To communicate between the nodes, low latency zeromq could be used, serializing with msgpack, thus allowing for heterogeneous implementations.
Sorry for jumping in out of the blue. Will try to catch up with the discussion.