Ideas about HTM concurrency

cogmission · February 22, 2019, 4:12pm

Can you briefly (for the TemporalMemory since you haven’t applied it yet to the SpatialPooler) - outline or list what methods / tasks you are applying concurrency to? I’ve done some thinking about this a long time ago, and I would be interested to see how you organized the parallelism?

marty1885 · February 22, 2019, 4:54pm

The SP is in fact parallelized too as a side effect of parallelizing the TM. But SP is so fast by itself so I didn’t put it on the forum.

The design of tiny-htm is that there is a class central Cells that stores all the critical values and connections. And it handles most of the learning logic. While layer states (input, output, predictive/active cells) are passed in as parameters. Layers like TM and SP simply wrap around this class and calling methods in Cells in different order will lead to different layer behaviour.

As I have described in the first post of this thread. TM is expressed in a few reusable functions. So does SP. So by parallelizing and optimizing theses shared functions, both algorithms are accelerated.

The parallelized functions are (basically every computation heavy method)

Cells:: calcOverlap //Calculate
Cells:: learnCorrilation // increment/decrement permanence
Cells:: growSynapse // create new connections from specified cells to cells
Cells:: sortSynapse // sort the connection in each cell in access order to increase cache hit rate
Cells:: decaySynapse // remove synapses that is too weak
globalInhibition // select the top N cells
applyBurst // burst columns if no cell in columns is on
selectLearningCell // the reverse of applyBurst

The current parallelizing strategy in simple (since HTM requires a sequence of steps that are dependent on each other. Not much I can do here). I just parallelize the large loop inside those functions. OpenMP itself is a thread pool so there’s minimal overhead (but still causing slowdowns at small work size).

Edit: Loops scheduling turned out to be an important aspect. Simply splitting the loop into N parts and run it on every threads will cause some threads waiting for others. Yet letting each thread pick one iteration then the next introduces too much overhead.

cogmission · February 22, 2019, 4:58pm

Hi @marty1885,

Thank you so much for your VERY thorough response - I’m at work now so I’ll have to digest this later, but I wanted to send my appreciation promptly!

breznak · March 1, 2019, 8:58am

@marty1885 can you please link your post about parallelisation results? I remember seeing the graphs, but I cannot find it now.

Since this is a hackers’ subforum, I’d like to discuss some implementation concerns we’ve come up with.

We separate parallelization options into 3 levels:

low level (implicit): c++17 TS:Parallel can run some select routines
manual: what @marty1885 did here, +most benefits for single-core task, -complicates code,…
high leve: NetworkAPI level: it’ll be relatively trivial to run whole region as a separate thread, having a network, this also achieves the best utilization (no independent tasks)

breznak · March 1, 2019, 9:01am

fourth option is a fully asynchronous HTM, where each cell computes autonomously. That would be the closes biological implementation, and actually quite easy to implement programatically. Unfortunately, the current PC architecture would not perform well under such heavy thread switching. But there are new HW computation concepts coming, so this implementation might prove feasible in the future.

marty1885 · March 1, 2019, 10:11am

Here
%E5%9C%96%E7%89%87

I’ve optimized my code further so the actual number should be a lot lower. But I’m on vacation now and I’ve shutdown my workstation. It will be a few days until I can show the latest numbers.

marty1885 · March 1, 2019, 10:18am

Regarding to the level of parallelism. The ideas are great! Maybe high level parallelism and async HTM will ended up with the same issue? The CPU is trying to access many different locations of DRAM and flushing the cache all the time in both cases. It might be a good idea when we finally get GPU support or hardware accelerators for HTM (any one interested?) with their dedicated RAM.

Topic		Replies	Views
A flexable framework for HTM algorithms. (And another HTM implementation no one asked for) Implementations htm-implementations	28	2140	March 2, 2019
Strategy for Concurrent HTM Implementation Implementations concurrency	22	2546	November 18, 2018
HTM + OpenCL Implementations htm-implementations , opencl	11	2667	April 4, 2017
Speed of Spatial Pooler and Temporal Memory Engineering sequence-memory , spatial-pooling , question	11	1269	May 10, 2022
About the Implementations category Implementations	0	519	June 25, 2019

Ideas about HTM concurrency

Related topics