I found that TemporalMemory runs slower and slower while training. Eventually becoming so slow that it is impartial to use. For example, taking 10 seconds to infer 1000 times after extensive training. At least 100 times slower than initially.
This is something you’re going to need to experiment with, but you could trim segments occasionally. Just randomly remove some percentage of segments and see how it affects prediction performance vs cpu performance.
In my experiments I dont have any problem with runtime even in very long learning time, but it will go down if the max number of synapses per segments is to big. I observed this problem in my grid cells modules by increasing number of active synapses while learning . Frame rate decreases from 1000hz to 2hz for 6 GC modules by thing classification
I’d be interested in hearing what the bottleneck was. Perhaps, at the code level, there’ll be a target for reasonable parallelisation if it make sense. Very slowly moving towards a Pony implementation. If anyone wants to join, welcome.
I found that I can replicated the problem faster by sending random SDRs into a TM. Since sending random SDRs causes a lot of boosting. My guess is that there’s too much useless distal connections in the TM (Weight around 0, etc).
Segments trimming seems to be a possible solution. And parallelizing the computation definitely will help!
@Gary_Gaulin very glad to hear! Visual Studio Code has very good syntax support. The resulting binaries are C++ compatible. On windows one can simply debug Pony binaries with Visual Studio. On Windows, the Visual C++ SDK must be installed. On other platforms it’s even easier.
For an experiment we now only need easy sample code like your encoder example, to copy from a reply you post in this forum into the Pony Playground.
One of the biggest problems in open source communities like this one is the difficulty getting started caused by requiring multiple downloads of programming related packages that after enough updates often no longer work together properly. What is needed is an easy as possible implementation of HTM to paste into a playground like that one.
Yes, the problem is likely that there are too many distal segments / synapses. There is no trimming going on as new patterns are learned.
There is a way to counter this, but it is deep inside the Nupic code. What you would do is adjust logic within the TM so that anytime new distal synapses are created, the same number of synapses are randomly deleted from the global pool of distal connections (above some maximum amount of synapses).
This sounds easy, but once you go looking in Cells4.cpp for how to actually do it, you see that is will take some serious testing to ensure you don’t break functionality.
Just for clarification …“TM…and the Segments & Cells4”, by TM, do you mean Temporal memory (using Connections), or the older Temporal pooler (TP) which (using Cells4, Segments, etc)? If it’s in TM, we’ll definitely look into it!
The code that needs changing is not in the TM (and we are not talking about the TP or “backtracking TM” at all here). The code that needs changing is in Cells4.cpp.
We think the right place to make this change is within Cells4.cpp::adaptSegment, which is called by the TM to add more synapses. New functionality would include finding out how many new synapses are being created with the segment update and randomly deleting the same number of synapses across the entire pool of segments. The challenge here is finding out how segments and synapses are stored in Cells4 data structures and updating those data structures appropriately. Unfortunately there is no Cells4 manual, and all of the original authors of this code are gone.
That’s why I’ve asked about TM (Connections) vs. TP (Cells4, Segments).
Cells4 is considered obsoleted by TM and unused (or something changed?)
So your answer is mixing things up, temporal memory ™, relies on Connections for backend, and Conn defines typedef UInt32 Segment, it does not call any code from Cells4.
So the question is, can you replicate this if TemporalMemory.hpp is used, instead?
I’ll make this into a performance test (we make some performance regression checks as part of the unit tests, just assert on the time taken for 1000 iters)
To summarize the problem:
TM (this also happened in TP, as observed before) takes increasingly (linear) longer to execute (with noisy input)
does this happen only with noisy/random inputs? Ie, if I change only 5% of the input vector, still an issue (that is a more realistic scenario)
does it stop (reach plateau) at some point? (my experiments with TP behaved that way, TP plateaued at about 33Hz)
memory allocation is likely the low-level culpit
and most importantly: is this a problem at all? not a natural behavior? If you combine my question about 100% random vs 5% random inputs-behavior, and does it plateau. My conclusion is the memory ™ is actualy learning, storing the new patterns, so it reaches its stable runtime behavior when the TM’s capacity is full (storing a new pattern means forgetting some other, or a generalization) and then the execution time (per 1000 iters) is constant