Ways to distribute TM cell training?

Andrew_Stephan · July 24, 2020, 2:01am

I’m probably exposing a lot of my ignorance with this post, but here goes!

So, I’m working to add parallel computing to my PyHTM implementation and I’m looking for advice. I examined the runtimes and by far the biggest time-suck (~80% of overall runtime) is calculating which cells in a TM become predictive after the activity for this iteration has been decided. That process involves looping through a list of Cell objects and doing array multiplications to determine each one’s overlap score with the current cell activity array. This is the bulk of the runtime.

I first tried to parallelize just this aspect of the library with Multiprocessing, using map(array_multiplication_function, cell_index_list). However this resulted in a dramatic slowdown. I’m not sure exactly why. Besides that, my current IDE, Spyder, seems to have issues with the multiprocessing library specifically.

I found out about Ray from a google search, and after messing around for a bit I don’t think it’s the right way to go either–it allows me to dedicate ‘remote’ workers for individual objects, but that means I end up with, say, 20000 workers (one for each cell). I haven’t dug into the documentation for Ray much so I may be able to conglomerate the cells into just a few workers, but I thought I’d throw a hail-mary out here first. Has anyone worked on parallelizing HTM processes, in particular the training of cells? What worked best?

adam · July 24, 2020, 11:14am

Hey Andrew!
Is my assumption correct that you are building a Python3 based HTM implementation and you have now troubles to make it more parallel?

Regarding a parallelization approach:

First of all you should get a better overview how your algorithm works and where it might be possible to introduce parallelization.
That means that checking which specific function is slow and takes the most time.
Check if the function can do the work independently for a rather long time. That means, that a multiple cores can independently work on the function. If you have such a case then you can start to parallelize it.

The case that you described that after making the program more parallel it slowed down can have multiple explanations. But usually the reason is that the part that was started on multiple cores is too short. You should be aware that spawning processes and destroying them takes a lot of time.

If the independence criterion is not met than you can introduce inter process communication between the cores to exchange data. But that is architecture wise usually really complex!

Could you provide a link to the code? So I can have a look, and might suggest a specific approach.

Andrew_Stephan · July 24, 2020, 3:35pm

Hi adam, thanks for replying!!

That is correct!

I believe the best candidate is during the Cell object processing. There are thousands of cells and each one needs to perform the same calculation, comparing the same input against its unique state variable, once per iteration. This is also the single biggest time-eater.

Ahh, that makes sense. I’m not sure how the multiprocessing library does things under the hood, but what about starting a pool and not closing it until all the training iterations are completed? (As opposed to starting a new pool for each iteration)

Here’s a link to a branch of my repo with just the relevant files. I have a test script that runs through the HotGym data. On line 113 it calls the tm.process_input() method and passes in a pool object. This gets used on line 1181 of the main library to call the tm.find_predictive_cells_multiprocessing() method, which (line 1143) calls pool.map() with tm.find_predictive_cells_single_column() and a list of indices.

I believe the independence criterion should be met, the manipulations in tm.find_predictive_cells_single_column() don’t require cross-communication . They just need copies of a) the TM object stored in the main process and b) the input argument. They don’t overwrite any values.

Again, thanks so much for offering to help!

adam · July 31, 2020, 9:33pm

Hello Andrew,

I tried to run you code, however I was running into an Python3 error message, which seems pretty cryptic to me. Do you have any idea how to fix this?

The first time I run your script, I got an error from ray, that it was missing the ray.init.
So I uncommented the line:
ray.init(ignore_reinit_error=True)

After that modification I run into this long error message.

2020-07-31 14:18:05,644	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
2020-07-31 14:18:05,645	INFO node.py:498 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-07-31_23-18-05_645093_8863/logs.
2020-07-31 14:18:05,749	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:27007 to respond...
2020-07-31 14:18:05,860	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:49171 to respond...
2020-07-31 14:18:05,862	INFO services.py:806 -- Starting Redis shard with 1.63 GB max memory.
2020-07-31 14:18:05,877	INFO node.py:512 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-07-31_23-18-05_645093_8863/logs.
2020-07-31 14:18:05,878	INFO services.py:1442 -- Starting the Plasma object store with 2.44 GB memory using /dev/shm.
Processed SP input 100 out of 4390...
That took 1.8991518020629883 seconds.
2020-07-31 14:18:10,594	ERROR worker.py:1668 -- WARNING: 12 workers have been started. This could be a result of using a large number of actors, or it could be a consequence of using nested tasks (see https://github.com/ray-project/ray/issues/3644) for some a discussion of workarounds.
(pid=8889) 2020-07-31 14:18:11,204	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
(pid=8889) 2020-07-31 14:18:11,204	ERROR worker.py:1337 -- Calling ray.init() again after it has already been called.
(pid=8888) 2020-07-31 14:18:11,329	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
(pid=8888) 2020-07-31 14:18:11,344	ERROR worker.py:1337 -- Calling ray.init() again after it has already been called.
(pid=8890) 2020-07-31 14:18:11,385	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
(pid=8890) 2020-07-31 14:18:11,385	ERROR worker.py:1337 -- Calling ray.init() again after it has already been called.
(pid=8887) 2020-07-31 14:18:11,522	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
(pid=8887) 2020-07-31 14:18:11,523	ERROR worker.py:1337 -- Calling ray.init() again after it has already been called.
2020-07-31 14:18:12,139	ERROR worker.py:1668 -- WARNING: 16 workers have been started. This could be a result of using a large number of actors, or it could be a consequence of using nested tasks (see https://github.com/ray-project/ray/issues/3644) for some a discussion of workarounds.
(pid=8937) 2020-07-31 14:18:12,391	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
(pid=8937) 2020-07-31 14:18:12,394	ERROR worker.py:1337 -- Calling ray.init() again after it has already been called.
(pid=8940) 2020-07-31 14:18:12,847	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
(pid=8940) 2020-07-31 14:18:12,852	ERROR worker.py:1337 -- Calling ray.init() again after it has already been called.
(pid=8939) 2020-07-31 14:18:12,904	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
(pid=8939) 2020-07-31 14:18:12,904	ERROR worker.py:1337 -- Calling ray.init() again after it has already been called.
(pid=8938) 2020-07-31 14:18:13,042	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
(pid=8938) 2020-07-31 14:18:13,046	ERROR worker.py:1337 -- Calling ray.init() again after it has already been called.
2020-07-31 14:18:13,877	ERROR worker.py:1668 -- WARNING: 20 workers have been started. This could be a result of using a large number of actors, or it could be a consequence of using nested tasks (see https://github.com/ray-project/ray/issues/3644) for some a discussion of workarounds.
(pid=8964) 2020-07-31 14:18:14,071	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
(pid=8964) 2020-07-31 14:18:14,071	ERROR worker.py:1337 -- Calling ray.init() again after it has already been called.
(pid=8961) 2020-07-31 14:18:14,291	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
(pid=8961) 2020-07-31 14:18:14,292	ERROR worker.py:1337 -- Calling ray.init() again after it has already been called.
(pid=8958) 2020-07-31 14:18:14,357	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
(pid=8958) 2020-07-31 14:18:14,364	ERROR worker.py:1337 -- Calling ray.init() again after it has already been called.
(pid=8967) 2020-07-31 14:18:14,804	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
(pid=8967) 2020-07-31 14:18:14,805	ERROR worker.py:1337 -- Calling ray.init() again after it has already been called.
(pid=8986) 2020-07-31 14:18:15,554	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
(pid=8986) 2020-07-31 14:18:15,554	ERROR worker.py:1337 -- Calling ray.init() again after it has already been called.
2020-07-31 14:18:15,679	ERROR worker.py:1668 -- WARNING: 24 workers have been started. This could be a result of using a large number of actors, or it could be a consequence of using nested tasks (see https://github.com/ray-project/ray/issues/3644) for some a discussion of workarounds.
Traceback (most recent call last):
  File "HotGym Example Scripts__Multiprocessing.py", line 155, in <module>
    tm, at = HotGym_TM_Example(sp,enc,dates,power)
  File "HotGym Example Scripts__Multiprocessing.py", line 113, in HotGym_TM_Example
    act, pred = tm.process_input(enc.encode([dates[index],power[index]]))
  File "/home/adam/Documents/htm/PyHTM/PyHTM_Multiprocessing_Branch.py", line 1169, in process_input
    self.cells[i].update_perms.remote(self.last_active_cells,correct_prediction = True)
  File "/home/adam/.local/lib/python3.6/site-packages/ray/actor.py", line 149, in remote
    return self._remote(args, kwargs)
  File "/home/adam/.local/lib/python3.6/site-packages/ray/actor.py", line 170, in _remote
    return invocation(args, kwargs)
  File "/home/adam/.local/lib/python3.6/site-packages/ray/actor.py", line 164, in invocation
    num_return_vals=num_return_vals)
  File "/home/adam/.local/lib/python3.6/site-packages/ray/actor.py", line 554, in _actor_method_call
    driver_id=self._ray_actor_driver_id,
  File "/home/adam/.local/lib/python3.6/site-packages/ray/worker.py", line 636, in submit_task
    args_for_raylet.append(put(arg))
  File "/home/adam/.local/lib/python3.6/site-packages/ray/worker.py", line 2219, in put
    worker.put_object(object_id, value)
  File "/home/adam/.local/lib/python3.6/site-packages/ray/worker.py", line 383, in put_object
    self.store_and_register(object_id, value)
  File "/home/adam/.local/lib/python3.6/site-packages/ray/worker.py", line 317, in store_and_register
    self.task_driver_id))
  File "/home/adam/.local/lib/python3.6/site-packages/ray/worker.py", line 246, in get_serialization_context
    _initialize_serialization(driver_id)
  File "/home/adam/.local/lib/python3.6/site-packages/ray/worker.py", line 1141, in _initialize_serialization
    serialization_context = pyarrow.default_serialization_context()
  File "/home/adam/.local/lib/python3.6/site-packages/ray/pyarrow_files/pyarrow/serialization.py", line 350, in default_serialization_context
    register_default_serialization_handlers(context)
  File "/home/adam/.local/lib/python3.6/site-packages/ray/pyarrow_files/pyarrow/serialization.py", line 345, in register_default_serialization_handlers
    _register_custom_pandas_handlers(serialization_context)
  File "/home/adam/.local/lib/python3.6/site-packages/ray/pyarrow_files/pyarrow/serialization.py", line 148, in _register_custom_pandas_handlers
    import pandas as pd
  File "/usr/local/lib/python3.6/dist-packages/pandas/__init__.py", line 57, in <module>
    from pandas.io.api import *
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/api.py", line 19, in <module>
    from pandas.io.packers import read_msgpack, to_msgpack
  File "/usr/local/lib/python3.6/dist-packages/pandas/io/packers.py", line 69, in <module>
    from pandas.util._move import (
ValueError: module functions cannot set METH_CLASS or METH_STATIC
(pid=8992) 2020-07-31 14:18:16,144	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
(pid=8992) 2020-07-31 14:18:16,144	ERROR worker.py:1337 -- Calling ray.init() again after it has already been called.
(pid=8989) 2020-07-31 14:18:16,282	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
(pid=8989) 2020-07-31 14:18:16,285	ERROR worker.py:1337 -- Calling ray.init() again after it has already been called.
(pid=8995) 2020-07-31 14:18:16,397	WARNING worker.py:1331 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
(pid=8995) 2020-07-31 14:18:16,397	ERROR worker.py:1337 -- Calling ray.init() again after it has already been called.

Andrew_Stephan · August 3, 2020, 6:36pm

That is the mother of all lengthy errors! I’m not sufficiently familiar with ray to diagnose unfortunately, so all I can suggest is that you try running ‘pip install setproctitle’ as the error message suggested.

adam · August 4, 2020, 9:34am

Hey, I just installed the ‘setproctitle’ package, however I am still getting the same error message.

Andrew_Stephan · August 4, 2020, 2:17pm

Another possibility is your system can’t make as many worker nodes as the code asked for. How many cores does your computer have?

Topic		Replies	Views
A flexable framework for HTM algorithms. (And another HTM implementation no one asked for) Implementations htm-implementations	28	2179	March 2, 2019
Distributed HTM? NuPIC Community Fork question	13	1488	June 1, 2020
Ideas about HTM concurrency Engineering concurrency	6	973	March 1, 2019
Developing a better NuPIC. Need suggestions Implementations htm-implementations , community , projects	17	1546	July 4, 2018
Another HTM test implementation Engineering	18	1763	April 24, 2017

Ways to distribute TM cell training?

Related topics