I’m currently writing a highly optimized HTM implementation in Swift, and wanted to make sure I benchmark it against the fastest. Would the C++ nupic.core be the current fastest implementation, or are there community implementations which are faster? Also, any caveats in how to build/invoke nupic.core to ensure it is optimal?
is your goto C++ implementation
As far as I know it would be the fastest full-featured, API-compatible, ready to use, and maintained HTM implementation.
See meta issue:
and more detailed
for optimizations we’ve identified and/or done.
The performance difference between nupic.core and nupic.cpp is very significant!
We could probably join forces on
it’d be nice if we can compile some sort of overall benchmark for
numenta nupic.core
community nupic.cpp
specialized community forks (swift, java, torch, …)
I’m not sure that benchmark is valid, you are better off using our real-life benchmarks which we run in CI as well.
You want to build in Release, but mosts common builds do by default.
Might consider setting some compiler flags (-O3 -march=native) but other that that I think you should stick with common.
There’s also a bunch of specialized HTM implementations that would be much faster than any C++ on a SPECIFIC task or conditions.
maybe there’s a thread, but from top of my head:
No… Yes, but both currently private work and not publicly available. @jacobeverist have one and mine still needs more work. I’m now working on adding OpenCL<->CPU inter-op and making space for a FPGA backend. I’ll release and open-source mine in the next few month.
And to reply to the topic. I think my tiny-htm is the fastest now. It is a minimal HTM implementation designed to be fast. It is 14x faster then NuPIC.cpp (single thread). But tiny-htm is not fully compliant to the standard TM algorithm and has limited feature. And potentially buggy.
How far along are you with OpenCL and FPGA? I think it’s been a couple months since our last post.
We have some people looking at the OpenCL->FPGA compilation problem right now. I believe we’re starting with the FPGA dev environment provided by Amazon Web Services.
Long time!
We have two different codebase for OpenCL and FPGA as we are now working towards building a Verilog HTM core for embedded systems. Tho a OpemCL based FPGA design is still on the road map for high performance situations.
Yeah I agree that OpenCL on FPGA is kinda difficult to work with. I’m sure that you feel the pain.
Try not to synthesize the bitstream each time when your code changes. Use the simulator, they produces accurate performance numbers in relatively short time. (I’m not sure if this is doable on AWS tho)
Damn 2-3us for the spatial pooler and I thought shrinking my time down from 1 second to 2-300us for my Hex Grid pooler through some janky parallisation was good lol
Can’t wait to see the python wrapper when its ready.