The SP is in fact parallelized too as a side effect of parallelizing the TM. But SP is so fast by itself so I didn’t put it on the forum.
The design of tiny-htm is that there is a class central
Cells that stores all the critical values and connections. And it handles most of the learning logic. While layer states (input, output, predictive/active cells) are passed in as parameters. Layers like TM and SP simply wrap around this class and calling methods in
Cells in different order will lead to different layer behaviour.
As I have described in the first post of this thread. TM is expressed in a few reusable functions. So does SP. So by parallelizing and optimizing theses shared functions, both algorithms are accelerated.
The parallelized functions are (basically every computation heavy method)
- Cells:: calcOverlap //Calculate
- Cells:: learnCorrilation // increment/decrement permanence
- Cells:: growSynapse // create new connections from specified cells to cells
- Cells:: sortSynapse // sort the connection in each cell in access order to increase cache hit rate
- Cells:: decaySynapse // remove synapses that is too weak
- globalInhibition // select the top N cells
- applyBurst // burst columns if no cell in columns is on
- selectLearningCell // the reverse of applyBurst
The current parallelizing strategy in simple (since HTM requires a sequence of steps that are dependent on each other. Not much I can do here). I just parallelize the large loop inside those functions. OpenMP itself is a thread pool so there’s minimal overhead (but still causing slowdowns at small work size).
Edit: Loops scheduling turned out to be an important aspect. Simply splitting the loop into N parts and run it on every threads will cause some threads waiting for others. Yet letting each thread pick one iteration then the next introduces too much overhead.