Async v agency?

I noticed some work to introduce JSON data packets instead of using URL for passing things around.

I am as curious as to whether, while async helps with concurrency, is there any appeal in the use of agency to allow distributed HTM in a potentially more natural way?

Something like thespian?

Now I ask this question again since the last time the previous thread got straight into Erlang and its BEAM, Elixir and FPGA implementations, so all sound alternative options. The more useful discussion, though distracted by low level implementation discussions in the non-python options, was around how to partition the data and processing to take advantage of the distributed agency while managing the constraints of the HTM architecture.

So, how about a brain storming around python, HTM and thespian? We can leave the Elixir and Erlang and FPGA to the anarchists :grin:

Note also there may be both shallow and deep integration of agency into the HTM model, since the current architecture has not leveraged agency in the first instance. So, a re-look at that might be apt after an initial shallow look. At the moment there might be higher level abstractions than could benefit?

As someone living in the Python world and has tried to do JSON message passing and asynchronous execution for HTM applications, I will make the following observations:

On JSON

JSON is great for passing configurations and small data sets. However, for high volume data and extracting HTM states such as permanences, activations, and dendrites… JSON really starts to bog down due to its text-based representation. It is slowed by the JSON encoding and decoding process as well as the large size I/O transfer required between processes burdened by the text-only representation.

JSON is great in that it is human-readable, but to fix the above limitations, you would need to use something more binary like BSON, flatbuffers, or capnproto.

On Asynchronicity

I/O for message passing is the bottleneck for asynchronous methods. Any approach is going to require a lean data format, as well as a sparse representation.

An asynchronous approach is also going to have to worry about synchronization of computation. For a strictly ordered set of layers or a hierarchical architecture, synchronization is straight forward since data and execution neatly cascades. However, for networks that have recurrent connections from distal dendrites, effort is going to be required to do the bookkeeping of time and making sure no one process is getting ahead of the others.

I have used the python built-in asyncio to do some asynchronous programming which is a lot like your referenced “thespian” library. Asyncio is pretty new and still under flux and is kind of tricky to get the hang of since it uses completely new python syntax. But this only works in a single python process and is not something you would use an a multiprocessor or multi-host environment. Luckily it does have hooks for extending in that direction.

Even though I’m a Python guy, I still think Erlang/Elixir is the way to go for this approach because it is built from the ground up for this type of architecture and you could scale to hundreds of thousands of processes, one for each HTM neuron.

3 Likes

I’m wondering whether it would be of value to embrace the parallel nature of the physical brain. If I understand the concern above correctly, what you describe is the attempt to avoid is unfair scheduling of individual sub-systems running asynchronously. (Correct me if I’m missing the point)

If one part of the prediction context sends a message faster than the rest, even though it would be insignificant otherwise, the concern would be, whether it represents the biological system truthfully. Correct?

If one considers a point in time, this might be a problem, but if predictions are allowed to be updated with time (the way I experience my brain, at least), eventually, this effect would be compensated for, providing a kind of “eventual consistency”.

A fair scheduling (either pre-emptive as in BEAM/Erlang/Elixir or with high degree parallelism) might also mitigate the problem.

Perhaps, if one embrace concurrency, one wouldn’t need a lot of synchronization after all

Ah, we fell off topic as the question is how the current architectural approach might meaningfully be carved up, to work in a distributed fashion. The JSON story probably being a red herring.

The assumption that you would have one process per neuron need not be solved only in Erlang alone, but is likely closer to the topic of how to partition up the problem. Whether that be an actor in thespian or an actor or Erlang is either valid or not dependant upon a whole range of tensions and design decisions.

So the question is then whether a neuron per distributed anything is the starting position.

Is this just a problem of using CSP like channels or actors and their native synchronisation. Is the problem raised about bookkeeping of time because the problem is cause by not using a coordination oriented approach?

So Erlang actors or python actors using something like thesbian. We have now agreed an actor based mechanism might solve some aspect? So we are back to how to carve up the current python based implementation to assess the use of actor based distribution? The Erlang camp might came after the principles for the architecture as established out of the experiments in python perhaps?

So is that first experiment wrapping a neuron in an actor?