Hello all.
I’m very excited to finally release a part of my graduation project - Etaler, a very flexible high-performance HTM framework.
Why build Etaler
NuPIC is research orientated, and thus it is not that fast. I think that’s one of the reason why we are still stuck at small scale experiments. Inference and learning on NuPIC of a 8192x16 TM can take up to 3 seconds per time step. That’s a big problem for me working on realtime SMI (Sensor motor inference)/RL(Reinforcement Learning). Therefor I set out to make my own framework. And make sure it is fast.
Features
Etaler is designed around these concepts.
- Integrated Tensors
- Separate fronted/API and backend
- Data Orientated Design as first class citizen
- Attempts to support research
Tightly Integrated Tensors
Instead of relying on libraries like numpy
or xtensor
to handle multi-dimensional arrays (Both of them don’t support GPU). Etaler implements it’s own Tensors. Which are tightly integrated into the core framework. And can be easily extended. Allowing easy future development.
And… These Tensors support broadcasting, GPU acceleration and basic indexing.
Separate fronted/API and backend
Etaler provides different backends to run on different devices. While all the backends are connected to the same frontend API. Enabling simple optimization strategies and the ability to run on the GPU with on line of code change. Like how most DL framework works.
Data Orientated Design
Like most modern Deep Learning frameworks, Etaler uses a DOD approach to it’s design, for example, instead of having a class called Synapse
which stores a connection target and a permanence. Synapses are described as two Tensors, one storing the connections to other neurons and one storing permanences. Reducing the amount of memory access and increasing efficiency.
A data orientated approach also results in highly reusable API. Now writing new layers is like writing tensor operators in DL frameworks - just write some code that chains the operations together. No more inheritance hell when developing new layers!
Supporting research and innovation
Being an very ambitious goal. I designed Etaler to support future research and communication between researchers and developers/hobbyists. By supplying a clean and high-performance interface, researchers need not to invent their hacks to have the system running at speed while devs/hobbyists shares their ideas with clear, expressive code. Less loop, less conditions, just function calls.
To support the previous two clams, TemporalMemory with apical synapses can be implemented in just a few lines.
auto compute(const Tensor& x, const Tensor& apical, const Tensor& last_state)
{
et_assert(x.dimentions() == 1);//This is a 1D implementation
//Feed forward TM predictions
auto [pred, active] = TemporalMemory::compute(x, last_state);
//Apical feedback
Tensor feedbacks = cellActivity(apical, apical_connections, apical_permance, 0.21, 3);
//Cells only predict if it feedback is active when there is more than 1 cells active in a column
auto s = pred.sum(1);
pred = pred&&(s>1 && feedbacks) || (s==1 && pred);
return {pred, active};
}
void learn(const Tensor& active_cells, const Tensor& apical, const Tensor& last_active)
{
//Let the distal synapses grow and learn
TemporalMemory::learn(active_cells, last_active);
//Let the apical synapses learn
learnCorrilation(apical, last_active, apical_connections, apical_permance, 0.1, 0.1);
growSynapses(apical, last_active, apical_connections, apical_permance, 0.21);
}
Performance
On CPU, Etaler is as fast as if not slightly faster my previous framework, tiny-htm. On an adequate GPU, Etaler outperforms any other framework by a huge margin.
Performance of SpatialPooler:
(9000 input bits, 9000 output bits, 10% target density, no boosting, no topology, 75% potential pool, potential radius = 4500, leaning enabled, random scalar encoded input)
NuPIC.core | tiny-htm | Etaler | |
---|---|---|---|
Processor | R7 1700X - 1 core | R7 1700X - 16 cores | RTX 2080Ti |
Time | 181ms | 25ms | 6.9ms |
Performance of Temporal Memory:
(8192 columns, 16 cells per column, max 1024 synapses per column. tiny-htm and Etaler using the connect-to-all method described here)
NuPIC.core | tiny-htm | Etaler | |
---|---|---|---|
Processor | R7 1700X - 1 core | R7 1700X - 16 cores | RTX 2080Ti |
Time | 3074ms | 36.8ms | 4.91ms |
- NuPIC.core compiled with clang 8.0.0 (running into linking issues with GCC)
- tiny-htm compiled with GCC 8.3.0
- Etaler compiled with GCC 8.3.0
- RTX 2080Ti benchmarked using Nvidia’s official OpenCL SDK.
OS/Device support
Etaler has only been tested and proven to work on the following systems. According to the results, there is no reason to not work on others systems.
system1 | system2 | system3 | system4 | MacBook Air | |
---|---|---|---|---|---|
OS | Arch Linux | Arch Linux | Arch Linux | Manjaro Linux | OS X |
CPU | R7 1700X | I7 8700 | I5 8250U | I7 6700 | I5 5250U |
GPU | GTX 780Ti | RTX 2080Ti | HD 520 | GTX 970 | HD 6100 |
- For Intel iGPUs, tesed on the new Neo OpenCL SDK
- Using the built-in OpenCL SDK on OS X
- For NVIDIA GPUs, tested on both official OpenCL SDK
- GTX 780TI is also tested with POCL w/ CUDA backend
- OpenCL on CPU is not tested.
- I want to test Etaler on an AMD card, but not having the budget to get one.
- Radeon VII (7nm Vega + HBM) theoretically should be faster than a RTX 2080 since the main limitation is the memory bandwidth.
- Etaler is build-able under Windows using MSYS2. But crashes immediately due to a MSYS2 specific compiler bug (the resulting DLL is not loadable). Other build environments are not tested.
- I see no reason not working on ARM.
- Built with GCC and stdc++ on OS X.
Future plans
I’m planing on long term supporting/developing Etaler after my graduation project if there is enough interest. There are some features I want to add to Etaler but don’t have the time to do so yet.
- port htmresearch layers
- More optimization (2x performance on GPU is may be feasible)
- Python wrapper
- More numpy-style array operators
- Graph more/lazy evaluation
- Better documents
- Support ongoing research
- Batch execution (optimization for Thousand Brains Theory)
- Windows support
- etc…
I believe Etaler can be a great tool pushing HTM theory forward, accelerating experiments and promoted innovation. But it is in it’s early stages and I’m not going to make it just by myself. If you think Etaler is a project you are interested in, by all means.
The easiest way to help is by using the library. If you find a bug or a missing feature, please open an issue and let us know how we can improve. If you want to develop the library it self, you are very welcomed! We are excited to see a new PR pop up.
Contribution
HTM Theory is in it’s young age and as we are growing. We’d like to get contributions from you to accelerate the development of Etaler! Just fork, make changes and launch a PR!
Special Thanks
A very huge thanks to @LiorA testing the framework, making a Dockerfile and now working on Layer visualizations. Thank you! Your work is amazing.
Also, thank you to all forum/Discord members. I won’t be able to go so far without you. I have learned a lot from the community and I love you awesome geeks!
Where is the source!
(I’m still working on the logo for the projerct. Hang on!)
And extra examples.