TemporalMemory running very slow after long training



True, it’s a bit of abuse :slight_smile: We try to keep them quick, but might eventually separate the test-cases as you had in numenta, and run performance nightly.


Thanks a lot!


5% of noisy bits is enough to trigger this behavior. Although I find that long sequences of inputs that TM has trouble learning does the same thing. (Ex: text, audio)

Seems to. TM seems to take at most 9ms to execute on my system in my test.

Yes and no. The connections are there to learn the pattern. But trimming out the connections made by noisy inputs should make the algorithm more robust. And we need some method to forget anyway. I posted this to the forums because when I did NLP with HTM; HTM can’t process a entire article. The first few paragraphs are fast… but the rest are slow as a paramecium.
From a practical POV. I’d like to see the capability to trim connections on the fly to reduce training/inference time.


TP had an option to do synapseDecay - eventually destroy a synapse if unused for a longer period of time, that is actually a biologically correct behavior.

Does TM have it? If not, we can add that and see how it helps.

I’d like to see the capability to trim connections on the fly to reduce training/inference time.

  • Definitely can add manual trim() method.
  • SP could have param to trim every N runs
  • SP( probably SP region rather) could have an option to trim if EXECUTION_TIME_PER_BATCH (=1000) exeeds M.


I’m not aware that one exists in NuPIC.core’s TemporalMemory.

That sounds like a good idea. Thanks!


I knew it existed in numenta/nupic (py) repository in the old temporal pooler (TP class). TP is not there anymore, was it renamed, or completely removed? It’s c++ counterpart Cells4 (it was the same) still exists and does have applyGlobalDecay() (but I’m not sure if it would work, or ever did)


I have not run you example yet, but from the description of the problem: it sounds like your TM is going to have 100% anomaly. Between the random input data and the fact that isn’t enough activity to trigger a dendrite prediction.

(64 inputs X 10% sparsity = 6 active mini-columns)

Bursting mini-columns is not the fastest code path. I read through the source code for bursting and it says there are several known performance issues, and I’m sure that with a profiler I could find a few more issues. For example I see several calls to vector.erase(), which are slow and can often be avoided.

  • Method destroyMinPermanenceSynapses has a note saying:
  // Find cells one at a time. This is slow, but this code rarely runs, and it
  // needs to work around floating point differences between environments.```
  • Method growSynapses has a note saying:
  // It's possible to optimize this, swapping candidates to the end as
  // they're used. But this is awkward to mimic in other
  // implementations, especially because it requires iterating over
// the existing synapses in a particular order.


If I remember correctly. Cells4 is the non biologically possible TemporalPooler while TemporalMemory is the biological possible one.

[marty@zack algorithms]$ ls TemporalMemory.hpp 
[marty@zack algorithms]$ cat TemporalMemory.hpp | grep decay #nothing
[marty@zack algorithms]$ cat TemporalMemory.hpp | grep Decay #nothing


:neutral_face: Sorry about the misdirection about Cells4.cpp, that was advice I gave someone who was trying to do the same thing with the old “Backtracking TM”. For the new TM you guys are on the right track.


No prob.
Thinking about it, the synapse / segment decaying would be nice to have in TM.
Although I was a bit unclear too, decay unlearns a synapse when not used, in addition to the current unlearning when mismatched.

The new thing would be pruning/reusing of the old synapses, segments.


The way I have implemented this is by tagging each synapse with last used timestep, and then I only need to apply the decay when a synapse is next used (and can remove it at that time if it reaches zero permanence as well). Saves you some processing steps compared to iterating over every synapse to apply the decay.


I only need to apply the decay when a synapse is next used (and can remove it at that time

Yes, on the other hand you keep synapses created only once, likely the culprit observed here.

I’m thinking if decay could be a separate thread visiting synapses and lowering them, by the timestamp as you say


I’m not sure about NuPIC specifically, but my initial interpretation of extraneous synapses causing TM to run slower over time (versus just consuming more memory) is that the algorithm is iterating over all synapses somewhere.

HTM can be implemented without doing this. Instead of sampling from the receiving segments, you can update via pointers on the transmitting axons. Due to sparsity, this is a significant optimization. This strategy can be applied to both SP and TM.


… Then you can do LTD as biology does: if a cell fires but a postsynaptic connected cell don’t, you apply a tiny decrement in the permanence of the synapse. That way some unused synapses will go away progressively. LTD, at least from a biological perspective (i.e. activation of calcium-dependent phosphatases), I think is not being considered in NuPIC.

The “time-stamp” based approach can be similar to the biological synaptic pruning (which a different thing. Involves microglia and is tied to “early” stages of life). Pruning is a “key” process across all the nervous system (including central). To prune synapses seems like a memory “hog”. If you tag prune unused distal-segments (i.e those that haven’t produced a valid prediction in a very long time), I think it could be easier.


Or you can take this a step further and have a nightly maintenance phase where you shrink all synapses!


So what does it mean to reduce all memories at a constant rate?
If memories fall away at some constant rate you have a sliding window that essentially goes to zero after some passage of time. There is a notable exception in your teen years where you imprint on your culture; the last of the plastic learning phases.

This concept is explored here:

Understanding the reminiscence bump: A systematic review


And a “fluffier” exploration of the same concept:


The problem is that at “boot” time, input flow will modify how the temporal memory perceives the input data (via changes in proximal synapses). Therefore, you will never see again many “early” learned sequences. They are there using resources for nothing. I guess biology faces the same problem and solves it via synaptic pruning after the “initial” learning is done.

IMHO, all those psychophysical papers seems a bit “dangerous”. They are up in the above the “conscience” level in the hierarchy (and many unknown mechanisms can be involved). In the lowest levels of the hierarchy, the core algorithm should be the same.


Agreed. Measuring the “total” system performance does not tell you what is happening at the lower levels. It does provide a “black box” that give you limits that must be met overall.

For example: timing test of performance and response from presentation of a cue to taking some action give a hard limit in how fast processing must be happening. Knowing neuron firing rates (This has been measured many times) leads to the “100 step” rule that whatever the brain is doing must take less than 100 steps from input to output. We don’t know what those steps might be but we know that it can’t be some long iterative process. This rules out whole classes of possible processing algorithms.

The same thing applies with these memory tests - we know that whatever mechanism the brain uses for memory there are some definite memory time frames and forgetting rates. Theories that do not match up to this will have to explain why they are different.


it reminded me of a talk, it is not very thorough, but I believe it is related and it shows that perhaps this is not exactly a problem.
What is a Thought? How the Brain Creates New Ideas | Henning Beck | TEDxHHL


Ok, a good example (for my reasoning :grinning: The assumption of “processing” time (as an equivalent metric of computational power) might be misleading. What if the brain is using extensively forwarding prediction to do that? You can cross many layers in the hierarchy really fast, but you don’t understand how the cortex is able to perform “accurate” predictions forward.

In my opinion, the key is in the learning process (i.e. how is done). That learning will progressively build the layout for your “total” system performance (in time __ across lifespan experience __ and space __ across the hierarchy __ ). Black box observations say you little about how the details are done. I think is much better start from below.


I can’t argue that working from “the bottom up” is a bad approach; my own works starts with the biology and builds from that point. This is one of the reasons I am invested with the basic HTM model - much of what it does is in close alignment with how I think the biology works.

I will offer that blind adherence to a strict top down or bottom up stance limits your navigation in the problem search-space. As you move from the known to the unknown each step adds a degree of uncertainty. At some point, the amount of uncertainty builds to the point where you really don’t know anything. Having some “goal” helps constrain the search space for faster convergence on a solution.

Each method should inform the other to aid in faster understanding.

BTW: the " Deep Predictive Learning: A Comprehensive Model of Three Visual Streams" paper postulates that this is exactly what is going on in the cortex/thalamus streams. The “forward” stream from the senses going up the hierarchy interacts with the guidance from the “reverse” pathways including the hypothalamus/thalamus/forebrain/cortex as a feedback or training signal. This is NOT the classic ANN back propagation but is instead more plausible local error processing. I highly recommend this paper to anyone thinking about system-level on-line learning. Not an easy read but well worth the effort.