Performance optimizations

breznak · January 19, 2018, 1:28pm

A highly desired topic for many of us.
I’ve started the work in https://github.com/htm-community/nupic.cpp/issues/3

Related discussions:

Steps:

[ ] set baseline benchmarking tests, the more, the better
- micro benchmarks
- IDE profiling
[ ] refactor code to use shared, encapsulated class for passing around data, “SDR type”
- for now it could be typedef UInt*,
- later wrap vector, add some methods,
- even later wrap opt-Matrix type required by the library,…
[ ] identify bottlenecks
[ ] compare math library toolkits
- the library have their data type (EIgenMatrix, etc)
- converting to/from it will kill the (gained) performance -> “SDR type”
[ ] iterative optimizations

Requirements:

what we want from the library?

max speed
multi-platform
sparse (memory efficient)
big user-base, popular
low code “intrusiveness”
CPU backend (SSE, openMP)
nVidia GPU backend (CUDA)
AMD GPU backend (openCL)
generic GPU backend (both AMD, nV; likely on openCL?)
open source
clean & lean API (ease of use, high level)
bindings/support for other languages (python,…)
used in other AI/ML frameworks (TensorFlow, SCIKIT.learn, torch,…)
I don’t need no optimizations

0 voters

Show results

Considered toolkits:

Blaze https://bitbucket.org/blaze-lib/blaze
Eigen http://eigen.tuxfamily.org/index.php?title=Main_Page
other? let me know

Links:

nicely detailed issue https://github.com/numenta/nupic.core/issues/28 Please read, ideas, code,…
Eigen vs Blaze discussion: https://news.ycombinator.com/item?id=10117971

breznak · January 19, 2018, 1:50pm

As an alternative way of optimization, I came up with

David_Keeney · January 19, 2018, 4:25pm

I think the vote is missing the number one priority and that is “easy to understand”.

breznak · January 19, 2018, 10:10pm

That one is “low code intrusiveness”~ minimum number of changes needed in the current code to implement the optimizations. or it could also be “I don’t need no optimizations”)

breznak · January 21, 2018, 1:36pm

@chhenning do you have some say for Eigen (vs Blaze) as pybind11 has support for Eigen?

Also, I’m going to do some refactoring in encoders,SP,TM (maybe touch regions) regarding introducing a common typedef for “SDRvector”, I should probably coordinate this work with your removal of some classes (after you have the PR).

chhenning · January 21, 2018, 5:06pm

@breznak I’m very open for discussing the selection of the right matrix lib. My thinking is that we need a proper benchmark to make the right choice. I’m willing to work on that.

breznak · January 23, 2018, 3:41am

proper benchmark to make the right choice. I’m willing to work on that.

There’s

bin/connections_performance_tests which show some numbers
Ithink we could do micro benchmarks just for some specific methods as we try to change them
a good overall bench is the “Hotgym anomaly example”, I don’t know if it’s in Py only, or if there’s cpp variant.
- we could either port it, or run from Py with cpp bindings.
- I think it reflects very well the complex use of HTM (stresses almost all parts)

I’m ready with unit tests, so we have free hands to start experimenting!
I plot some plans here Optimization for performance · Issue #3 · htm-community/htm.core · GitHub

would like to start with the “common type for regions I/O”, and vectorization… Your pybind changes don’t touch the individual files in src/nupic/algorithms/ , do they?

breznak · January 23, 2018, 3:47am

do we know of more people interested in this change who would be willing to contribute? As I can prepare/refactor the stuff above, but I don’t have much actuall exp with the graphics/math libraries. So learning without guidance would take me longer…

chhenning · January 23, 2018, 3:30pm

Can you provide a link?

I’m fairly certain that’s only python. But for now python is the main user of nupic.core and maybe the benchmark should take that into account.

Anythin python is limited to the /nupic/python folder. There are some python remnants in the engine code but that hopefully will disappear soon.

So, no, no pybind stuff in the algorithms.

breznak · January 23, 2018, 8:07pm

https://github.com/htm-community/nupic.cpp/blob/master/src/test/integration/ConnectionsPerformanceTest.cpp

Good point,
We should have the releases now

And the bindings
https://ci.appveyor.com/project/breznak/nupic-cpp/build/0.3.0.26/artifacts
So we could use the Py code that would itself call the c++ bindings - the most real-world tests.

I also have some private code for running the Hotgym in c++, so I’ll clean it up and we can set up pure c++ benchmarks (imho easier and more fool proof game).

That’s sth I’d like to understand in the PR, do you remove the Regions and NAPI “framework”/functionality? Sorry to being dense on your changeset, I just need to understand the direction.

Cool, so other tasks can go in parallel with the Py3 work!

chhenning · January 23, 2018, 9:11pm

Here are some more details:

To be exact I have not removed “Regions”. The only region I have reimplemented is PyRegion using pybind11 functionality. I have named that class PyBindRegion…
What I have further done is remove nupic’s py_support folder.
Right now the engine is an integral part of nupic.core and supports the inclusion of python regions, like “anomaly_region.py” for instance. For that I need to use embedded python via pybind11.
Until we refactor the “engine” we’ll have to build nupic.core with pybind11 and therefore python c api headers.

I hope this makes sense.

breznak · January 24, 2018, 12:03pm

Thank you. That sounds very sensible and well designed. I was worried you just scratched the Region* classes

breznak · January 24, 2018, 7:54pm

chhenning · January 24, 2018, 8:31pm

Funny the description for dlib is wrong in awesome-cpp. It should be:

“A toolkit for making real world machine learning and data analysis applications in C++”

dmac · November 6, 2018, 1:24pm

Hi all,

I think I know how make topology in the spatial pooler run a lot faster. I call it poor mans topology, use many small spatial poolers arranged over a topological area, and each spatial pooler has global inhibition. It’s not a perfect solution but it should run as fast as the no-topology case.

I have experimented with this method with my own HTM implementation with success on the MNIST dataset. I use a (10 x 10) grid of spatial poolers, each with 100 mini-columns, and an input radius of about 2.8 pixels for a total of 106 potential synapses to each mini-column. It takes 4 minutes to run through the MNIST dataset and it scores ~95%.

Any thoughts? Is this something you’d all want in the community fork?

thanh-binh.to · November 6, 2018, 5:11pm

Score 95% is comparable to other HTM experiments …

breznak · January 31, 2019, 1:57pm

nupic.cpp brought a lot of performance optimizations, partly due to cleaned up code, and mainly thanks to Connections optimization https://github.com/htm-community/nupic.cpp/milestone/9 by @dmac !
Another great feature is that Spatial pooler (SP) is now using the optimized Connections structure (moved from SparseMatrix) which is shared with Temporal memory, a highly optimized code.

SP is no longer a bottleneck in HTM pipeline, and further optimizations are planned as stated above (esp to SP’s local inhibition)

dmac · January 31, 2019, 4:44pm

Thanks breznak,

I now have a prototype for solving the MNIST dataset:

95% accuracy
Uses local inhibition, with input radius of 4 pixels (MNIST uses images of 28x28 pixels)
Runs in 75 seconds (on my computer: Intel i7-4790K CPU @ 4.00GHz)
Prototype, lives in its own branch not master. Not ready for general consumption yet.

Topic		Replies	Views
Current Fastest HTM Implementation? Implementations	12	1126	November 22, 2019
Developing a better NuPIC. Need suggestions Implementations htm-implementations , community , projects	17	1534	July 4, 2018
Do we have performance numbers for SDR library? NuPIC question , sdrs	7	1180	January 11, 2018
Survey: Features & API-Compatibility NuPIC Community Fork question , community	22	1530	March 28, 2019
NuPic core C++ vs Other language performance benchmarks? NuPIC benchmark	13	1413	August 9, 2016

Performance optimizations

Related topics