Time Complexity of SP and TP

dorinclisu · July 26, 2016, 8:27am

Why is the time complexity of these algorithms so non-deterministic? I found that runtime can increase as much as 2x by just changing the values in input data.

To be more precise, I ran a loop over 5000 csv lines, with a simple scalar encoder:

Input is a sine-wave
a) with SP.compute() only in the loop it takes ~ 2.7 seconds
b) with SP.compute() and TP.compute() it takes ~ 10.7 seconds (so TP alone must be taking around 8s)
Input is a constant = 0
a) with SP.compute() only in the loop it takes ~ 7.6 seconds
b) with SP.compute() and TP.compute() it takes ~ 22 seconds (so TP alone must be taking around 14.4s)

There is no swarming involved so the SP and TP parameters are the same in all cases. I could understand if there was a small variation for different parameters, even for different data if just a little, but not such a huge difference in both the SP and the TP. The outcome seems replicable for any constant (I tried 0.5 and it’s roughly the same).
Can anyone provide at least an intuitive explanation?

cogmission · July 26, 2016, 9:54am

Hi @dorinclisu, and Welcome!

I find this to be a very very interesting line of inquiry, and I’m interested to see what answers you get also. This is in no way meant to be a “canonical” answer to your question, but consider that the algorithms’ purpose is to form a sensorimotor model of the world (though the sensorimotor part is still being worked on). This means that the model is being formed from a stream of changing data. I’ve even heard it said that these algorithms sometimes perform better if you throw a little noise at them! (Since they’re meant to emulate biological processes)

That’s just a 10,000 ft. observational point of view to mull over, while you wait for a response from someone more knowledgable than myself for an answer.

For clarity sake, can I ask how you ran the code? Did you use the OPF, or the NetworkAPI? Did you use two separate runs with no serialization in-between to store the models and reuse them?

I’m just asking all of this to help the next person that comes along to better answer your questions.

Nice line of inquiry though!

cheers,
David

dorinclisu · July 26, 2016, 10:37am

Interesting idea to consider noise, because a constant is exactly the opposite of that. It’s a boring input and we know that time passes slowly when we’re bored

Now, I just tried with completely random input data (bound within the encoder limits of course) and got ~ 23 seconds (SP+TP) so it still doesn’t make much sense. In fact, I can’t seem to find any other dataset that will run faster than the sine wave.

I used plain C++ (no serialization so the runs are indeed independent) with no other bells and whistles, the 2 compute() routines are taking 99% of the time which is why I believe this is tied to the algorithms themselves and not the implementation.

cogmission · July 26, 2016, 12:22pm

Are you using local or global inhibition on your SP? Can you provide the configuration parameters you’re using for the SP and TM? Which TM algorithm are you using? (These are just questions your eventual “expert” may ask you - that I’m asking up front)

dorinclisu · July 26, 2016, 12:34pm

SpatialPooler SP

{
/*vector<UInt> inputDimensions*/ std::vector<UInt> {100},
/*vector<UInt> columnDimensions*/ std::vector<UInt> {2048},
/*UInt potentialRadius = 16*/ 16,
/*Real potentialPct = 0.5*/ 0.8,
/*bool globalInhibition = true*/ true,
/*Real localAreaDensity = -1.0*/ -1.0,
/*UInt numActiveColumnsPerInhArea = 10*/ 40,
/*UInt stimulusThreshold = 0*/ 0,
/*Real synPermInactiveDec = 0.008*/ 0.1,
/*Real synPermActiveInc = 0.05*/ 0.05,
/*Real synPermConnected = 0.1*/ 0.1,
/*Real minPctOverlapDutyCycles = 0.001*/ 0.001,
/*Real minPctActiveDutyCycles = 0.001*/ 0.001,
/*UInt dutyCyclePeriod = 1000*/ 1000,
/*Real maxBoost = 10.0*/ 1.0,
/*Int seed = 1*/ 7777777,
/*UInt spVerbosity = 0*/ 0,
/*bool wrapAround = true*/ true
};

Cells4 TP

{
/*UInt nColumns = 0*/ SP.getNumColumns(),
/*UInt nCellsPerCol = 0*/ 32, 
/*UInt activationThreshold = 1*/ 13,
/*UInt minThreshold = 1*/ 9,
/*UInt newSynapseCount = 1*/ 20,
/*UInt segUpdateValidDuration = 1*/ 5,
/*Real permInitial = 0.5*/ 0.21,
/*Real permConnected = 0.8*/ 0.8,
/*Real permMax = 1.0*/ 1.0,
/*Real permDec = 0.1*/ 0.1,
/*Real permInc = 0.1*/ 0.1,
/*Real globalDecay = 0*/ 0.0,
/*bool doPooling = false*/ false,
/*int seed = -1*/ 7777777,
/*bool initFromCpp = false*/ true,
/*bool checkSynapseConsistency = false*/ false
};

sunguralikaan · July 26, 2016, 1:00pm

@dorinclisu very good question.

Here’s my short explanation.

HTM models the temporal and spatial relations inside the input. This means that it has to scale itself (forming thousands of synapses) so that it can capture meaningful relations and at the same time eliminate noise. This process would of course depend on the actual different combinations of the input field, the level of correlation between the input bits, how much it encapsulates semantic similarity, sparsity etc. Sometimes because of the inputs (in your case random inputs), the system would work very slow constantly trying to reorganize itself and not converge into some set of representations. Every new entry would cause changes at the structure. Think of it as a confused person being stoned. One of the main powers of HTM is being able to scale itself according to the input attributes and combinations. The downside is your question.

Another outcome of this:

The performance of my implementation also varies in the order of a magnitude in a short 5 minutes of lifetime in real time. At the start, there is random organization. Any new input would cause a lot of changes in synapse connections inside the spatial pooler. As the time goes on, inputs are classified into buckets and you tend to see familiar activations. This allows spatial pooler to work less because there is less to be corrected. Same with the temporal memory; the better it models, the less it needs any changes. Also, any change in spatial pooler also necessitates a lot of change in temporal memory most of the time.

In short, the better it models, the faster it runs.

dorinclisu · July 26, 2016, 1:14pm

Well, this is what would intuitively make sense but it doesn’t explain why a constant input also runs slowly.
An input that doesn’t change should be the easiest to predict with minimal effort, yet there is something that trips the HTM about it.

sunguralikaan · July 26, 2016, 1:18pm

Oh I think that may be because of global inhibition, synapse bumping and boosting mechanisms. Every column would be trying to learn the same thing. Which in turn leads to a lot of preactive columns before the inhibition phase. When the input is constant, the columns would all compete in every single iteration for becoming active because that is how HTM is designed.

rhyolight · July 26, 2016, 2:48pm

I moved this from #nupic:developers into #nupic (the developers’ forum is for NuPIC contributor discussions).

rhyolight · July 26, 2016, 2:53pm

@dorinclisu Just to clear this up… you are running NuPIC, right? And you said you’re using C++ so you’re the C++ Network API directly? So this is purely nupic.core with no python?

dorinclisu · July 26, 2016, 3:16pm

I use the nupic.core algorithms directly without the network engine because I find it easier to handle the regions on my own (at least for now) than understand the API.

amalta · July 26, 2016, 4:27pm

@dorinclisu it seems as though you are using an old implementation of Temporal Memory (Cells4 TP). You should try using the newer and heavily optimized TemporalMemory.cpp (https://github.com/numenta/nupic.core/blob/master/src/nupic/algorithms/TemporalMemory.cpp). This should provide a substantial speed up to the examples you are working with.

dorinclisu · July 26, 2016, 4:54pm

Thanks for the tip, I’m aware of the new temporal memory and will try it soon, I’m still in the learning process.

amalta · July 26, 2016, 5:00pm

Glad to hear it; just a bit of a warning. When @mrcslws optimized the TM he cleaned up a lot of the things that Cells4 was doing that it wasn’t supposed to be doing. For that reason, I warn against using Cells4 to try to understand the TM. It uses backtracking and some other techniques that are completely gone in the new implementation.

dorinclisu · August 9, 2016, 1:34pm

TM is even more wild when it comes to runtime. In ideal cases it is faster than TP (and gives much better predictions), but it can also be much slower.

Running similar tests, I got:

Input is a sine-wave: SP+TM ~ 6 seconds
Input is a constant: SP+TM ~ 15 seconds, SP only ~ 13 seconds
Input is a sine-wave the first 1/10 samples and constant afterwards: SP+TM ~ 6.1 seconds
Input is a sine-wave with 5% noise: SP+TM ~ 21 seconds
Input is 100% noise: SP+TM ~ 53 seconds
Constant sparse vector (2% density) as direct input to TM: TM only ~ 9 seconds

I realize this is by no means a consistent test suite, instead it’s what I intuitively came up with while trying to make any sense out of the numbers.

So far it seems pointless to benchmark the speed of the algorithms without assigning an objective metric to the ratio valuable_insight_of_analysis / computational_cost because the sensitivity to input data is high and it is difficult to verify (outside of HTM) which or how many hidden patterns are contained in the data. Does anyone have an idea about such a metric?

Topic		Replies	Views
Network API and Algorithms API NuPIC	3	535	October 31, 2017
Raw TM Test (no SP) NuPIC encoders , temporal-memory , category-encoding	30	1373	June 10, 2018
Nupic Algorithms complexity NuPIC	3	654	August 7, 2017
Quick NewbPIC question - SP/TM are separate neural networks? Getting Started question	2	466	December 31, 2018
Interaction between SP and TM Newbie question NuPIC	2	582	August 1, 2018

Time Complexity of SP and TP

Related topics