I am finishing up my core (SP+TM) + decoders implementation in Apple’s swift 3.0 language.
I would like to benchmark my implementation against the c++ core implementation using some kind of standardised benchmarking process. What do you guys use for benchmarking?
I need to benchmark initial performance and then continue benchmarking while I do some customisation work on my core + decoders. Basically I need to see performance improve over time, but also need to make sure that my implementation is faster and more memory efficient than the C++ core implementation. In my current implementation I see very low CPU usage and acceptable memory usage after rewriting the SP a few times, so I guess I am more interested in memory optimisation than CPU optimisation even though that might change once I scale things.
I have a plan to move certain things over to CUDA late fall, so I need to be tracking what I do performance wise but am not quite sure how to go about doing this.
Any thoughts on this?
The HTM.Java implementation is still in its “formative” stages and has not received an optimization pass over the code. But stay tuned!
I looked over your implementation and it is really well written.
Would be interesting to see some hard numbers on memory/CPU usage?
I thank you on behalf of all who have contributed!
Me too, but to do it now would be premature. Also I am scrambling at the moment to sync up the two versions - but after that I will be making an “optimization pass” over the code because I did briefly profile it some months ago and it could benefit from some “detailed” attention…
I moved this from #htm-hackers into #nupic because it’s specifically asking for a
nupic.core benchmark. @scott or @mrcslws do you know if we have any perf benchmarks at all?
When you say ‘The two versions’ you mean C++ / Java?
Also would you agree on the performance profile that you see a higher demand on memory resources than cpu cycles?
I meant the Python and the Java…
To date, yes… but as I said, I am loathe to report any concrete performance findings until the code is more mature…
Ah yes I forgot about the Python implementation. I read through most implementations a while back.
No, that is fair enough, I totally understand that. Also it is not really a comparison between languages, as the implementations are not identical anyhow.
If no one has performed any benchmarking, maybe I can devise a universal benchmark after I reach beta.
No worries. @rhylolight did call on some Numenta engineers to report back with whether there may have been any benchmarks developed for NuPIC… So I would wait and see what kind of replies you get back?
Yes there will be quite some time before I reach beta as I have yet to implement asynchronous operations and event/data streams.
But pretty interesting to think about how to create a benchmark in general terms. Mem/CPU usage is one thing, but would be cool to have a functional benchmark.
My favorite benchmark is simply to run real data through it, e.g. hotgym. I record how long it takes, and I profile it to see which parts of the code are taking the most time.
I also use this script for the Temporal Memory: https://github.com/numenta/nupic/blob/8e40e7ad16fd3a04cc2a7d3d12174ccf3fa44daa/scripts/temporal_memory_performance_benchmark.py
But real data is better.
Thanks, will look into it.
@mrcslws do you have any benchmark data that you could share at this point?
Not really, because any benchmark times are specific to the computer. So individual times aren’t super useful, they are more useful for comparing with other times on the same computer. When I need baseline benchmark data, I gather it via the Instruments “Time Profiler”, then quickly throw it away at the end of the day.
I suppose it could be interesting to post the current Profiler results somewhere, since it would show relative time spent in the SP, TM, classifier, Python overhead, etc., but I haven’t saved that anywhere.
Yes of course that makes sense So we have various hardware, various implementations and various programming languages.
I will think about this and see if I can come up with some clever way of benchmarking the solution regardless of Hardware, Implementation or Language.