A flexable framework for HTM algorithms. (And another HTM implementation no one asked for)

Optimization and parallel computing

I have been optimizing and adding parallel processing to the system. Now it is literately 100x faster than I initially posted. And another few more times faster with parallel computing (although the performance does not scale ideally/linearly with the number of cores)

This is the testing environment that I’m using.

Hardware Info
Processor AMD Ryzen 1700X (8 cores, 16 theads) @ 3.4GHz (Tubro OFF, locked at 3.4GHz for testing)
RAM DDR4 2400MHz x2
Operation System Arch Linux x64 (kernel 4.20)
Compiler GCC 8.2.1
Parallel API OpenMP

Not boring everyone with details of optimization. I’ll just show the results.

Spatial Pooler

To test the performance of my HTM implementation. I decided to measure how long does TM/SP need to perform a specific task. For the SP, I measure how long it takes to generate and learn a 256 bit representation of the input SDR of different lengths with a potential pool ratio of 0.75.
image

The Spatial Pooler seems don’t scale well with the number of cores are there to solve the problem. There are only generally a 2.4x speedup for 1 core vs 4 cores. While Interestingly Hyper Thread / SMT threads seems not be doing and help. Using both 8 or 16 threads yields basically the same performance on my system (As I have 8 physical cores). This might indicate that some resource shared by the two cores has been used-up totally. I also found this behavior is also present on Intel processors.

I have also tested the performance of tiny-htm vs NuPIC.cpp… I’m ~10x slower in single thread inference speed. I totally don’t know why. Seems that NuPIC.cpp is unreasonably fast. Maybe I have set something wrong for NuPIC somewhere.

Temporal Memory

I did the same measurement for Temporal Memory. I measure how long a TM needs to infer and learn a random sequence of SDRs of different size. Sadly, I’m a few orders of magnitude slower than NuPIC.cpp in this case. I truly don’t know why.However, this time the HT/SMT threads does seem to help though. Running with 16 threads are noticeably (tho not much) faster than with 8 threads.

%E5%9C%96%E7%89%87

A 4x acceleration using 8 threads is good enough for me.

tiny-htm is very under-performing compared to NuPIC.cpp. But I hope this can serve as an reference to how HTM algorithm may behave under many threads.

5 Likes