Why Neurons Have Thousands Of Synapses, A Theory Of Sequence Memory In Neocortex


Ok thanks @subutai.


The caption of Figure 6 of the paper mentioned in the title says:

The input stream used for this figure contained high-order sequences mixed with random elements. The maximum possible average prediction accuracy of this data stream is 50%.

@jhawkins, @subutai, I don’t really get why the average prediction accuracy would just be 50%, even in the presence of noise. Btw, the amount of noise added to the sequence is not even specified.


If the max possible prediction accuracy is 50%, doesn’t that mean the stream is 50% noise?


Even in the presence of much noise, I don’t get why the prediction accuracy couldn’t still be 100%, given that HTM should be, as it’s claimed in the paper, robust to noise, i.e. deal with sequences which contain noise as if they didn’t contain it.


If the input stream is 50% noise, and you are comparing your predictions to the input stream as ground truth, no algorithm will ever get much better than 50% accuracy.


It’s not written in the paper that the accuracy is calculated in this way you’re suggesting.

Also, in the paper, it’s written

For this simulation we designed the input data stream such that the maximum possible average prediction accuracy is 50%

It should have been stated what actually was done in practice to achieve such a thing.

Also, I really don’t get why would one set up this experiment to obtain just a maximum average prediction accuracy of 50%.


To clarify, HTM will still keep sequences in context in the presence of noise, and continue making predictions given those sequences. It can hold onto those sequences in the noise and continue recognizing features of them over time, even when noise is persistent. Of course, nothing can predict random noise 100%, but HTM can pattern match temporal sequences even with lots of noise.


So the input is not trivial to predict. I’m sure the sequences are very simple patterns. Given a noisy signal, here is our prediction accuracy (near 50%). As we remove cells, accuracy decreases. The point is to show how accuracy is affected by cell death, which is fault tolerance.


This is not demonstrated in the paper.


This experiment to demonstrate the ability of the HTM sequence memory to perform predictions is, at least, a little suspicious, also because artificial simple sequences were used. Anyway, it somehow shows that it can cope with cell death.


I think if you understand the HTM algorithm, activity->prediction is no different in a simple sequence compared to a complex one. The connections formed during learning represent a transition from one element to another, regardless of the overall length or complexity of the sequence. The test shows how that transition is impacted by noise in the system.


As far as I understood, the prediction accuracy of the HTM sequence memory particularly depends on the number of cells per column and on the complexity of the input sequences.


Number of cells in the column yes, but not complexity of the sequence. Number of cells impacts capacity (i.e. how many contexts a particular input can be in)


Actually, also on the complexity of the sequence. More specifically, it depends on the contextual Markovian order of the elements in the sequence. I suppose that you can cope with this by changing the number of cells per mini-column. Maybe you meant another type of complexity other than the dependencies between the elements in the sequence?


Upping the order of the sequences (like going to ‘a,b,c,d,e,f’ from ‘a,b,c’) increases the req’d capacity, though the standard 32 per column can potentially handle quite a lot. I think it will inevitably take more iterations to learn long sequences though regardless of the cells per column, since longer sequences mean more context.


Sorry, don’t really understand this point, but…

The capacity of a typically sized 2048 minicolumn, 32 cells each is ridiculously huge. With the learning params set properly, even a sequence of ungodly length can be learned in a single shot.


To further hammer home my point, with the learning params set low, if it were to take 3 itterations to learn a very short sequence, it would likewise take 3 iterations to learn an enormously long one. The learning rule is done locally to remember the transition between two elements in context. As such, it functions the same way regardless of how long the sequence is.

I can draw up some visualizations to further explain if this point sounds wrong to you.


In this simulation, the stream contained sequences that were each 6 elements long, followed by 4 noise elements. e.g. “XABCDENNNNYABCFGNNNNXABCDENNNN…” (where N is a completely random input).

In this setup the four noise elements are unpredictable as is the first element of each sequence, so the best you can do is predict 50% of the elements.

The full code for the experiments is here:


I’m still not sure what mathematical property lets you say that the best you can do is 50%, also because the noise is inserted regularly between the sequences (i.e. at regular intervals, given that the noisy sequences have all the same length, as well as the “normal” sequences) and, after a while, the algorithm should have learned all the sequences.

Btw, how exactly did you measure the prediction accuracy?


You know where to look to find this, Subutai just gave you the link.