I’m currently writing a highly optimized HTM implementation in Swift, and wanted to make sure I benchmark it against the fastest. Would the C++ nupic.core be the current fastest implementation, or are there community implementations which are faster? Also, any caveats in how to build/invoke nupic.core to ensure it is optimal?
No… Yes, but both currently private work and not publicly available. @jacobeverist have one and mine still needs more work. I’m now working on adding OpenCL<->CPU inter-op and making space for a FPGA backend. I’ll release and open-source mine in the next few month.
And to reply to the topic. I think my tiny-htm is the fastest now. It is a minimal HTM implementation designed to be fast. It is 14x faster then NuPIC.cpp (single thread). But tiny-htm is not fully compliant to the standard TM algorithm and has limited feature. And potentially buggy.
We have two different codebase for OpenCL and FPGA as we are now working towards building a Verilog HTM core for embedded systems. Tho a OpemCL based FPGA design is still on the road map for high performance situations.
Yeah I agree that OpenCL on FPGA is kinda difficult to work with. I’m sure that you feel the pain.
Try not to synthesize the bitstream each time when your code changes. Use the simulator, they produces accurate performance numbers in relatively short time. (I’m not sure if this is doable on AWS tho)