A flexable framework for HTM algorithms. (And another HTM implementation no one asked for)

marty1885 · February 17, 2019, 5:15pm

Optimization and parallel computing

I have been optimizing and adding parallel processing to the system. Now it is literately 100x faster than I initially posted. And another few more times faster with parallel computing (although the performance does not scale ideally/linearly with the number of cores)

This is the testing environment that I’m using.

Hardware	Info
Processor	AMD Ryzen 1700X (8 cores, 16 theads) @ 3.4GHz (Tubro OFF, locked at 3.4GHz for testing)
RAM	DDR4 2400MHz x2
Operation System	Arch Linux x64 (kernel 4.20)
Compiler	GCC 8.2.1
Parallel API	OpenMP

Not boring everyone with details of optimization. I’ll just show the results.

Spatial Pooler

To test the performance of my HTM implementation. I decided to measure how long does TM/SP need to perform a specific task. For the SP, I measure how long it takes to generate and learn a 256 bit representation of the input SDR of different lengths with a potential pool ratio of 0.75.

The Spatial Pooler seems don’t scale well with the number of cores are there to solve the problem. There are only generally a 2.4x speedup for 1 core vs 4 cores. While Interestingly Hyper Thread / SMT threads seems not be doing and help. Using both 8 or 16 threads yields basically the same performance on my system (As I have 8 physical cores). This might indicate that some resource shared by the two cores has been used-up totally. I also found this behavior is also present on Intel processors.

I have also tested the performance of tiny-htm vs NuPIC.cpp… I’m ~10x slower in single thread inference speed. I totally don’t know why. Seems that NuPIC.cpp is unreasonably fast. Maybe I have set something wrong for NuPIC somewhere.

Temporal Memory

I did the same measurement for Temporal Memory. I measure how long a TM needs to infer and learn a random sequence of SDRs of different size. Sadly, I’m a few orders of magnitude slower than NuPIC.cpp in this case. I truly don’t know why.However, this time the HT/SMT threads does seem to help though. Running with 16 threads are noticeably (tho not much) faster than with 8 threads.

%E5%9C%96%E7%89%87

A 4x acceleration using 8 threads is good enough for me.

tiny-htm is very under-performing compared to NuPIC.cpp. But I hope this can serve as an reference to how HTM algorithm may behave under many threads.

Topic		Replies	Views
Yet another "Yet Another HTM Implementation" Implementations htm-implementations	10	1541	January 17, 2019
Small and tight HTM implementation? Implementations	3	782	February 8, 2018
HTM + OpenCL Implementations htm-implementations , opencl	11	2667	April 4, 2017
Ideas about HTM concurrency Engineering concurrency	6	955	March 1, 2019
Please review my HTM API design Engineering htm-implementations	0	616	April 13, 2019

A flexable framework for HTM algorithms. (And another HTM implementation no one asked for)

Optimization and parallel computing

Spatial Pooler

Temporal Memory

Related topics