Time Complexity of SP and TP

Why is the time complexity of these algorithms so non-deterministic? I found that runtime can increase as much as 2x by just changing the values in input data.

To be more precise, I ran a loop over 5000 csv lines, with a simple scalar encoder:

  1. Input is a sine-wave
    a) with SP.compute() only in the loop it takes ~ 2.7 seconds
    b) with SP.compute() and TP.compute() it takes ~ 10.7 seconds (so TP alone must be taking around 8s)
  2. Input is a constant = 0
    a) with SP.compute() only in the loop it takes ~ 7.6 seconds
    b) with SP.compute() and TP.compute() it takes ~ 22 seconds (so TP alone must be taking around 14.4s)

There is no swarming involved so the SP and TP parameters are the same in all cases. I could understand if there was a small variation for different parameters, even for different data if just a little, but not such a huge difference in both the SP and the TP. The outcome seems replicable for any constant (I tried 0.5 and it’s roughly the same).
Can anyone provide at least an intuitive explanation?

1 Like

Hi @dorinclisu, and Welcome!

I find this to be a very very interesting line of inquiry, and I’m interested to see what answers you get also. This is in no way meant to be a “canonical” answer to your question, but consider that the algorithms’ purpose is to form a sensorimotor model of the world (though the sensorimotor part is still being worked on). This means that the model is being formed from a stream of changing data. I’ve even heard it said that these algorithms sometimes perform better if you throw a little noise at them! (Since they’re meant to emulate biological processes)

That’s just a 10,000 ft. observational point of view to mull over, while you wait for a response from someone more knowledgable than myself for an answer.

For clarity sake, can I ask how you ran the code? Did you use the OPF, or the NetworkAPI? Did you use two separate runs with no serialization in-between to store the models and reuse them?

I’m just asking all of this to help the next person that comes along to better answer your questions.

Nice line of inquiry though!

cheers,
David

Interesting idea to consider noise, because a constant is exactly the opposite of that. It’s a boring input and we know that time passes slowly when we’re bored :smiley:

Now, I just tried with completely random input data (bound within the encoder limits of course) and got ~ 23 seconds (SP+TP) so it still doesn’t make much sense. In fact, I can’t seem to find any other dataset that will run faster than the sine wave.

I used plain C++ (no serialization so the runs are indeed independent) with no other bells and whistles, the 2 compute() routines are taking 99% of the time which is why I believe this is tied to the algorithms themselves and not the implementation.

1 Like

Are you using local or global inhibition on your SP? Can you provide the configuration parameters you’re using for the SP and TM? Which TM algorithm are you using? (These are just questions your eventual “expert” may ask you - that I’m asking up front) :stuck_out_tongue_winking_eye:

SpatialPooler SP

{
/*vector<UInt> inputDimensions*/ std::vector<UInt> {100},
/*vector<UInt> columnDimensions*/ std::vector<UInt> {2048},
/*UInt potentialRadius = 16*/ 16,
/*Real potentialPct = 0.5*/ 0.8,
/*bool globalInhibition = true*/ true,
/*Real localAreaDensity = -1.0*/ -1.0,
/*UInt numActiveColumnsPerInhArea = 10*/ 40,
/*UInt stimulusThreshold = 0*/ 0,
/*Real synPermInactiveDec = 0.008*/ 0.1,
/*Real synPermActiveInc = 0.05*/ 0.05,
/*Real synPermConnected = 0.1*/ 0.1,
/*Real minPctOverlapDutyCycles = 0.001*/ 0.001,
/*Real minPctActiveDutyCycles = 0.001*/ 0.001,
/*UInt dutyCyclePeriod = 1000*/ 1000,
/*Real maxBoost = 10.0*/ 1.0,
/*Int seed = 1*/ 7777777,
/*UInt spVerbosity = 0*/ 0,
/*bool wrapAround = true*/ true
};

Cells4 TP

{
/*UInt nColumns = 0*/ SP.getNumColumns(),
/*UInt nCellsPerCol = 0*/ 32, 
/*UInt activationThreshold = 1*/ 13,
/*UInt minThreshold = 1*/ 9,
/*UInt newSynapseCount = 1*/ 20,
/*UInt segUpdateValidDuration = 1*/ 5,
/*Real permInitial = 0.5*/ 0.21,
/*Real permConnected = 0.8*/ 0.8,
/*Real permMax = 1.0*/ 1.0,
/*Real permDec = 0.1*/ 0.1,
/*Real permInc = 0.1*/ 0.1,
/*Real globalDecay = 0*/ 0.0,
/*bool doPooling = false*/ false,
/*int seed = -1*/ 7777777,
/*bool initFromCpp = false*/ true,
/*bool checkSynapseConsistency = false*/ false
};

@dorinclisu very good question.

Here’s my short explanation.

HTM models the temporal and spatial relations inside the input. This means that it has to scale itself (forming thousands of synapses) so that it can capture meaningful relations and at the same time eliminate noise. This process would of course depend on the actual different combinations of the input field, the level of correlation between the input bits, how much it encapsulates semantic similarity, sparsity etc. Sometimes because of the inputs (in your case random inputs), the system would work very slow constantly trying to reorganize itself and not converge into some set of representations. Every new entry would cause changes at the structure. Think of it as a confused person being stoned. One of the main powers of HTM is being able to scale itself according to the input attributes and combinations. The downside is your question.

Another outcome of this:

The performance of my implementation also varies in the order of a magnitude in a short 5 minutes of lifetime in real time. At the start, there is random organization. Any new input would cause a lot of changes in synapse connections inside the spatial pooler. As the time goes on, inputs are classified into buckets and you tend to see familiar activations. This allows spatial pooler to work less because there is less to be corrected. Same with the temporal memory; the better it models, the less it needs any changes. Also, any change in spatial pooler also necessitates a lot of change in temporal memory most of the time.

In short, the better it models, the faster it runs.

2 Likes

Well, this is what would intuitively make sense but it doesn’t explain why a constant input also runs slowly.
An input that doesn’t change should be the easiest to predict with minimal effort, yet there is something that trips the HTM about it.

Oh I think that may be because of global inhibition, synapse bumping and boosting mechanisms. Every column would be trying to learn the same thing. Which in turn leads to a lot of preactive columns before the inhibition phase. When the input is constant, the columns would all compete in every single iteration for becoming active because that is how HTM is designed.

I moved this from #nupic:developers into #nupic (the developers’ forum is for NuPIC contributor discussions).

@dorinclisu Just to clear this up… you are running NuPIC, right? And you said you’re using C++ so you’re the C++ Network API directly? So this is purely nupic.core with no python?

I use the nupic.core algorithms directly without the network engine because I find it easier to handle the regions on my own (at least for now) than understand the API.

@dorinclisu it seems as though you are using an old implementation of Temporal Memory (Cells4 TP). You should try using the newer and heavily optimized TemporalMemory.cpp (https://github.com/numenta/nupic.core/blob/master/src/nupic/algorithms/TemporalMemory.cpp). This should provide a substantial speed up to the examples you are working with.

1 Like

Thanks for the tip, I’m aware of the new temporal memory and will try it soon, I’m still in the learning process.

1 Like

Glad to hear it; just a bit of a warning. When @mrcslws optimized the TM he cleaned up a lot of the things that Cells4 was doing that it wasn’t supposed to be doing. For that reason, I warn against using Cells4 to try to understand the TM. It uses backtracking and some other techniques that are completely gone in the new implementation.

TM is even more wild when it comes to runtime. In ideal cases it is faster than TP (and gives much better predictions), but it can also be much slower.

Running similar tests, I got:

  1. Input is a sine-wave: SP+TM ~ 6 seconds
  2. Input is a constant: SP+TM ~ 15 seconds, SP only ~ 13 seconds
  3. Input is a sine-wave the first 1/10 samples and constant afterwards: SP+TM ~ 6.1 seconds
  4. Input is a sine-wave with 5% noise: SP+TM ~ 21 seconds
  5. Input is 100% noise: SP+TM ~ 53 seconds
  6. Constant sparse vector (2% density) as direct input to TM: TM only ~ 9 seconds

I realize this is by no means a consistent test suite, instead it’s what I intuitively came up with while trying to make any sense out of the numbers.

So far it seems pointless to benchmark the speed of the algorithms without assigning an objective metric to the ratio valuable_insight_of_analysis / computational_cost because the sensitivity to input data is high and it is difficult to verify (outside of HTM) which or how many hidden patterns are contained in the data. Does anyone have an idea about such a metric?