I am finishing up my core (SP+TM) + decoders implementation in Apple’s swift 3.0 language.
I would like to benchmark my implementation against the c++ core implementation using some kind of standardised benchmarking process. What do you guys use for benchmarking?
I need to benchmark initial performance and then continue benchmarking while I do some customisation work on my core + decoders. Basically I need to see performance improve over time, but also need to make sure that my implementation is faster and more memory efficient than the C++ core implementation. In my current implementation I see very low CPU usage and acceptable memory usage after rewriting the SP a few times, so I guess I am more interested in memory optimisation than CPU optimisation even though that might change once I scale things.
I have a plan to move certain things over to CUDA late fall, so I need to be tracking what I do performance wise but am not quite sure how to go about doing this.
I thank you on behalf of all who have contributed!
Me too, but to do it now would be premature. Also I am scrambling at the moment to sync up the two versions - but after that I will be making an “optimization pass” over the code because I did briefly profile it some months ago and it could benefit from some “detailed” attention…
I moved this from #htm-hackers into #nupic because it’s specifically asking for a nupic.core benchmark. @scott or @mrcslws do you know if we have any perf benchmarks at all?
Ah yes I forgot about the Python implementation. I read through most implementations a while back.
No, that is fair enough, I totally understand that. Also it is not really a comparison between languages, as the implementations are not identical anyhow.
If no one has performed any benchmarking, maybe I can devise a universal benchmark after I reach beta.
No worries. @rhylolight did call on some Numenta engineers to report back with whether there may have been any benchmarks developed for NuPIC… So I would wait and see what kind of replies you get back?
Yes there will be quite some time before I reach beta as I have yet to implement asynchronous operations and event/data streams.
But pretty interesting to think about how to create a benchmark in general terms. Mem/CPU usage is one thing, but would be cool to have a functional benchmark.
My favorite benchmark is simply to run real data through it, e.g. hotgym. I record how long it takes, and I profile it to see which parts of the code are taking the most time.
Not really, because any benchmark times are specific to the computer. So individual times aren’t super useful, they are more useful for comparing with other times on the same computer. When I need baseline benchmark data, I gather it via the Instruments “Time Profiler”, then quickly throw it away at the end of the day.
I suppose it could be interesting to post the current Profiler results somewhere, since it would show relative time spent in the SP, TM, classifier, Python overhead, etc., but I haven’t saved that anywhere.