TemporalMemory running very slow after long training

sequence-memory

#1

I found that TemporalMemory runs slower and slower while training. Eventually becoming so slow that it is impartial to use. For example, taking 10 seconds to infer 1000 times after extensive training. At least 100 times slower than initially.

Is there a solution?


#2

This is something you’re going to need to experiment with, but you could trim segments occasionally. Just randomly remove some percentage of segments and see how it affects prediction performance vs cpu performance.


#3

@rhyolight I have gone trough both NuPIC and NuPIC.core 's core. Trimming seems to be only available for SpatialPooler but not for TemporalMemory. Have I overlooked anything?


#4

In my experiments I dont have any problem with runtime even in very long learning time, but it will go down if the max number of synapses per segments is to big. I observed this problem in my grid cells modules by increasing number of active synapses while learning . Frame rate decreases from 1000hz to 2hz for 6 GC modules by thing classification


#5

have gone trough both NuPIC and NuPIC.core 's core. Trimming seems to be only available for SpatialPooler but not for TemporalMemory. Have I overlooked anything?

Check out community developed nupic.cpp (which is nupic.core actively developed and with some improvements)

Coincidentally, we’re just now working on a PR for spatial pooler using connections for speed, and part of it is moving trim to Connections


#6

I’d be interested in hearing what the bottleneck was. Perhaps, at the code level, there’ll be a target for reasonable parallelisation if it make sense. Very slowly moving towards a Pony implementation. If anyone wants to join, welcome.


#7

I found that I can replicated the problem faster by sending random SDRs into a TM. Since sending random SDRs causes a lot of boosting. My guess is that there’s too much useless distal connections in the TM (Weight around 0, etc).
Segments trimming seems to be a possible solution. And parallelizing the computation definitely will help!


Concurrent HTM experiment in Pony: htm.pony
#8

Behavior based asynchronous actor-model language?

https://tutorial.ponylang.io/types/actors.html

Dmitry, I’m in! The language sounds as good or better than Visual Basic, which I still program in due to its fast asynchronous object based structure.

Can you recommend an editor (and whatever else is required) and easiest way to from there get Pony running your Github code?


#9

@Gary_Gaulin very glad to hear! Visual Studio Code has very good syntax support. The resulting binaries are C++ compatible. On windows one can simply debug Pony binaries with Visual Studio. On Windows, the Visual C++ SDK must be installed. On other platforms it’s even easier.


#10

I have been studying. With this link everyone is right away ready to start coding!

https://playground.ponylang.io/

For an experiment we now only need easy sample code like your encoder example, to copy from a reply you post in this forum into the Pony Playground.

One of the biggest problems in open source communities like this one is the difficulty getting started caused by requiring multiple downloads of programming related packages that after enough updates often no longer work together properly. What is needed is an easy as possible implementation of HTM to paste into a playground like that one.


#11

Let’s continue the Pony discussion in a separate thread: Concurrent HTM experiment in Pony: htm.pony


#12

Yes, the problem is likely that there are too many distal segments / synapses. There is no trimming going on as new patterns are learned.

There is a way to counter this, but it is deep inside the Nupic code. What you would do is adjust logic within the TM so that anytime new distal synapses are created, the same number of synapses are randomly deleted from the global pool of distal connections (above some maximum amount of synapses).

This sounds easy, but once you go looking in Cells4.cpp for how to actually do it, you see that is will take some serious testing to ensure you don’t break functionality.


#13

@marty1885 Would you like to give us a PR with a reproducible test-case? Ideally in a form of a unit-test to https://github.com/htm-community/nupic.cpp/pulls
We’d like to reproduce the problem.

Just for clarification …“TM…and the Segments & Cells4”, by TM, do you mean Temporal memory (using Connections), or the older Temporal pooler (TP) which (using Cells4, Segments, etc)? If it’s in TM, we’ll definitely look into it!


#14

The code that needs changing is not in the TM (and we are not talking about the TP or “backtracking TM” at all here). The code that needs changing is in Cells4.cpp.

We think the right place to make this change is within Cells4.cpp::adaptSegment, which is called by the TM to add more synapses. New functionality would include finding out how many new synapses are being created with the segment update and randomly deleting the same number of synapses across the entire pool of segments. The challenge here is finding out how segments and synapses are stored in Cells4 data structures and updating those data structures appropriately. Unfortunately there is no Cells4 manual, and all of the original authors of this code are gone.


#15

That’s why I’ve asked about TM (Connections) vs. TP (Cells4, Segments).

Cells4 is considered obsoleted by TM and unused (or something changed?)
So your answer is mixing things up, temporal memory ™, relies on Connections for backend, and Conn defines typedef UInt32 Segment, it does not call any code from Cells4.

So the question is, can you replicate this if TemporalMemory.hpp is used, instead?


#16

Cells4 May only be used by the Backtracking TM. I might be wrong about. Sorry!


#17

Yes, the problem is replicatable on TemporamMemory. I’ll post some code later.


#18

This should reproduce the problem. But I don’t think I can make it a unit-test as this is a performance problem, not a result one.


#19

Thanks a bunch!

I’ll make this into a performance test (we make some performance regression checks as part of the unit tests, just assert on the time taken for 1000 iters)

To summarize the problem:

  • TM (this also happened in TP, as observed before) takes increasingly (linear) longer to execute (with noisy input)
  • does this happen only with noisy/random inputs? Ie, if I change only 5% of the input vector, still an issue (that is a more realistic scenario)
  • does it stop (reach plateau) at some point? (my experiments with TP behaved that way, TP plateaued at about 33Hz)
  • memory allocation is likely the low-level culpit
  • and most importantly: is this a problem at all? not a natural behavior? If you combine my question about 100% random vs 5% random inputs-behavior, and does it plateau. My conclusion is the memory ™ is actualy learning, storing the new patterns, so it reaches its stable runtime behavior when the TM’s capacity is full (storing a new pattern means forgetting some other, or a generalization) and then the execution time (per 1000 iters) is constant

#20

Performance tests are usually part of nightly test suites, or at least integration suites. I would not put them in with unit tests. Just some advice.