How are the cells per columns and length of sequence related?

The videos show consistently 4 cells per column, while the hotgym example appears to actually use 13. What effect does it have?

Specifically, does the number of cells per column correlate with the length of sequence that can be recognised? Or with the number of different sequences recognised?

One of the videos shows a tune “The Saints”, and then takes it no further. The tune is 50 notes or so. Can a TM 4 or 13 cells wide recognise a sequence of that length? If not, how many cells are needed? Why?

1 Like

One of the pdfs you could find around here is concerned with all kinds of maths related to SDR. My answer here will be less precise than those materials.
But simply think about it. If say, 80 active minicolumns are used to encode a single note, and a “note in context” is a choice of 1 among 4 cells per minicolumn, then the total number of “this note with a distinct index in a distinct sequence” is 4^80.
If rather, it is a choice of 1 among 13, then same number is 13^80.
And, let’s see… 1 among 5 would (obviously at this point) result in 5^80. But a choice of 2 among 5 would be 9^80 if I’m not mistaken.

So yes, it is related, but maybe more intricately tied to other considerations that your question seem to imply. If you have fewer sequences you can arguably remember longuer ones. If less active minicolumns in your SDR, that number goes down (but as you can see it is still a huge number in most cases). And if you allow more than 1 winner per minicolumn it may go up again.

Also, there’s the synaptic view : all those basic maths above show you can “encode” such possibilities individually, but you’ll need more dendritic segments to remember more sequences, or more lengthy sequences, in a TM (although I believe each segment may benefit from semantic overlaps : you wouldn’t need 4^80 segs for the 4^80 case at full capacity… maybe @Paul_Lamb would know more about this).

Then there are considerations such as robustness to neuron dying, robustness to noisy input, or efficiency (learning speed) to fully decide about those.

At some point the complexity becomes a challenge for numerical analysis, so my best advice would be to experiment with different parameters and see how it goes (and report about those experiments : they’d be as much “research” as anything else)

4 Likes

It seems to me this must be a question someone has considered, thought about and hopefully answered. I wouldn’t have thought the first thing to try is a bit of original research.

The videos show 4 cells per column and a 4 note sequence. The hotgym example has a weekly cycle and 13 cells per column. That kind of suggests it only works for short sequences. We know that humans can memorise incredibly long sequences, certainly in the thousands, and for the most part animals cannot. You are quoting numbers around the ‘atoms in the universe’ level. That’s a pretty wide span of possibilities.

I’m just interested in getting a rough idea of what the number might be in practice, and then what practical benefit is gained by having more cells (if it’s not longer sequences). You would have thought someone must know.

2 Likes

There is a lot more to capacity than the number of cells per minicolumn. It also directly depends on the number of minicolumns per input, the amount of overlap between input representations, the number of times each input is repeated in the sequence, the activation threshold and the max synapses per segment.

I know you are asking for specific numbers, so this isn’t a direct answer to your question, but hopefully it gives you an idea that there isn’t a simple correlation between cells per minicolumn and sequence length. A lot more variables must be accounted for, as @gmirey pointed out.

5 Likes

The videos are 4 just to keep it simple. We usually use 16 or more. The hotgym examples usually have 32 (not sure where you are getting 13). The affect is that the TM can remember more sequences.

Somewhat, but you can remember extremely long sequences even with just 2 cells per minicolumn as long as you have a lot of minicolumns.

3 Likes

The hotgym example prints out these parameters:

Parameters:
{   'anomaly': {   'likelihood': {   'probationaryPct': 0.1,
                                     'reestimationPeriod': 100}},
    'enc': {   'time': {'timeOfDay': (30, 1), 'weekend': 21},
               'value': {'resolution': 0.88, 'size': 700, 'sparsity': 0.02}},
    'predictor': {'sdrc_alpha': 0.1},
    'sp': {   'boostStrength': 3.0,
              'columnCount': 1638,
              'localAreaDensity': 0.04395604395604396,
              'potentialPct': 0.85,
              'synPermActiveInc': 0.04,
              'synPermConnected': 0.13999999999999999,
              'synPermInactiveDec': 0.006},
    'tm': {   'activationThreshold': 17,
              'cellsPerColumn': 13,
              'initialPerm': 0.21,
              'maxSegmentsPerCell': 128,
              'maxSynapsesPerSegment': 64,
              'minThreshold': 10,
              'newSynapseCount': 32,
              'permanenceDec': 0.1,
              'permanenceInc': 0.1}}

Can you put order of magnitude numbers on ‘long’ and ‘a lot’? Sequences of 10/1000/1000000? Is 1000 ‘a lot’ of minicolumns? Isn’t that just the size of the SDR?

I will refer you again to a very important paper: Why Neurons Have Thousands Of Synapses, A Theory Of Sequence Memory In Neocortex. Specifically, that paper states:

This can be calculated as the product of the expected duty cycle of an individual neuron (cells per column/column sparsity) times the number of patterns each neuron can recognize on its basal dendrites. For example, a network where 2% of the columns are active, each column has 32 cells, and each cell recognizes 200 patterns on its basal dendrites, can store approximately 320,000 transitions ((32/0.02)*200). The capacity scales linearly with the number of cells per column and the number of patterns recognized by the basal synapses of each neuron.

2 Likes

To relate this back to my post above, the number of patterns that each cell can recognize is dependent on a few configurable properties, like number of minicolumns per input (or worded differently, total number of minicolumns * sparsity), max synapses per segment, activation threshold, etc. If you’d like, I can describe in a little more detail how each of these impact the number of patterns a cell can recognize.

1 Like

Thank you and yes, that’s exactly the kind of calculation I was looking for. Obviously I need to read that paper again and/or more carefully.

BTW if that 2% means 20 on-bits, that implies 1000 columns and a total of 32000 cells, so about 10 transitions per cell. A simple melody might be 100 notes, but a classical piano sonata is more than 10,000. A full piano repertoire is going to take quite a few columns and cells to memorise.

There may be a good deal of nesting between the layers resulting in a dramatic reduction in storage requirements.

The layer of parsing/nesting is part of why the brain has such a vast storage capacity.

I don’t see how one can possibly deduce the number of transitions per cell without knowing some other parameters besides sparsity, total minicolumns, and cells per minicolumn. The range of possibilities is enormous.

I’ll draw up some visualizations so someone can correct me if there is a flaw in my understanding of the theory (I readily admit that I am not a mathy person).

But before I do that, let’s just take the name of that paper at face value – neurons have “thousands of synapses”. Lets pick 2,000 (that being the smallest number which can be called “thousands”).

Lets take the layer dimensions that you described, where there are 1,000 minicolumns, 20 minicolumns per input, and 32 cells per minicolumn. If each cell has a max of 2,000 synapses, and if the activation threshold is set at 1 (in reality this would obviously be higher – I’m picking the lowest value because it results in the lowest capacity) then the cell will become predictive when any of some 2,000 cells become active. If we divide that by 20 (the number of minicolumns per input), then each cell can learn up to 100 transitions. This of course is already 10 times more than what you calculated, and it becomes astronomically higher as you increase the activation threshold.

4 Likes

I don’t see how one can possibly deduce the number of transitions per cell without knowing some other parameters besides sparsity, total minicolumns, and cells per minicolumn. The range of possibilities is enormous.

I just did that, quoting directly from the paper. Sparsity is 2%, cells per minicolumn is 32, total minicolumns is therefore 32/2%=1600. Just reading the paper. No magic.

I’ll draw up some visualizations so someone can correct me if there is a flaw in my understanding of the theory (I readily admit that I am not a mathy person).

I look forward to it.

But before I do that, let’s just take the name of that paper at face value – neurons have “thousands of synapses”. Lets pick 2,000 (that being the smallest number which can be called “thousands”).

Lets take the layer dimensions that you described, where there are 1,000 minicolumns, 20 minicolumns per input, and 32 cells per minicolumn. If each cell has a max of 2,000 synapses, and if the activation threshold is set at 1 (in reality this would obviously be higher – I’m picking the lowest value because it results in the lowest capacity) then the cell will become predictive when any of some 2,000 cells become active. If we divide that by 20 (the number of minicolumns per input), then each cell can learn up to 100 transitions. This of course is already 10 times more than what you calculated, and it becomes astronomically higher as you increase the activation threshold.

That’s wrong. No single cell learns a transition, it takes 20 cells per pattern to yield 10 active synapses and a 2:1 noise immunity. Your estimate is 10x what the paper says.

1 Like

You’re correct, I was using the word “transition” incorrectly. What I calculated is recognized patterns. That said, 200 patterns is incredibly low compared to what is possible (and even normal) with the algorithm. I’ll post some visualizations and hopefully get to the bottom of where I am going wrong.

Visualizations may take a while to draw, so let me start with a quick analysis of how one input A,A,A,A,… repeating can be represented. This may be enough to demonstrate if and where I am going wrong in my understanding of the algorithm with respect to capacity.

In the TM algorithm, when an input is completely unpredicted, all the cells in that input’s minicolumns activate (which the algorithm calls bursting), and one winning cell per minicolumn is chosen to represent the input in that context. The cell chosen for each minicolumn is chosen from the cell(s) which have the fewest distal segments, using a random tie breaker.

Suppose we have determined the first three inputs of A, and all cells now have one distal segment each. Let’s call these representations A(1), A(2), and A(3). If your layer dimensions were 4 minicolumns per input and 3 cells per minicolumn (and assuming SP boosting is not enabled), the representations might be something like this (we do not need to consider the other 196 other inactive minicolumns, since they will never have active cells in them in this scenario)

image

If “A” is input a fourth time, and a representation is chosen for A(4), the chances of the four random tiebreakers resulting in exactly one of these three representations is 3 / 3^4 (“number of repeated inputs” / “all possible representations”). Then for A(5) it would be 4 / 81, then for A(6) it would be 5 / 81, etc. The numerator increases by one for each new set of tiebreakers (i.e, the longer the sequence, the higher the likelihood of randomly selecting a representation that has already been used in the sequence).

What are the chances of this happening in the layer dimensions you mentioned earlier? The number of possible representations for 20 minicolumns containing 32 cells each is 32 ^ 20. Thus, in a layer of this size, it is infinitesimally likely that any element in a sequence repeating inputs would by random chance happen to be exactly the same as a previous representation, until that sequence has become astronomically long.

With this in mind, lets consider an HTM system with the dimensions that you mentioned earlier, trained on a repeating input A, A, A, A, A…

Lets train it like this, where we start with some other input Z after a reset, then increase the number of A’s one at a time. This will ensure that we always start with the A(1) representation (versus a burst in the A minicolumns which requires multiple iterations to disambiguate). Training the system this way will speed up the training process in case you want to repeat this experiment with NuPIC. The different representations involved will be learned like so:

(reset) Z -> A(1)
(reset) Z -> A(1) -> A(2)
(reset) Z -> A(1) -> A(2) -> A(3)
(reset) Z -> A(1) -> A(2) -> A(3) -> A(4)

(reset) Z -> A(1) -> A(2) -> (…) -> A(31) -> A(32)

Now, lets assume we set the activation threshold to 1 (lowest capacity). When we begin inputting the next sequence into the pattern:

(reset) Z -> A(1) -> A(2) -> (…) -> A(32) -> A(33)

The representation for A(33) will consist of random bits that are all each contained in one of the previous 32 representations for A (since all cells in the A minicolumns at this point will have been used exactly once) Thus, when A(33) becomes active, some combination of the previous A representations will become predictive, and the A minicolumns will never burst again (and thus no new representations for A will be chosen).

Thus the capacity of a system of this size and configuration (for a single repeating input) is 32 transitions. Hopefully it is intuitive that if the sequence has more diversity than just one repeating input (or is using SP boosting), the number of transitions would be larger than that. So 32 (i.e. the number of cells per minicolumn) represents a lower bound of capacity for this size of system, when the activation threshold is equal to one.

Now consider the other extreme, and assume the activation threshold is 20 (i.e. the number of minicolumns per input) (and max synapses per segment is at least 20). Now each segment fully connects to each representation and can uniquely distinguish them, so we are bounded only by the number of distal segments that a cell is allowed to grow. This can be set to any arbitrary value (assuming non biological plausibility).

Thus the capacity of a system of this size and configuration (for a single repeating input) with activation threshold 20 is 32 ^ 20 transitions (actually it will be something lower than this, because the unlikely random event I mentioned earlier of a previous representation being chosen will happen some time before then). This astronomical capacity of course comes at the cost of zero noise tolerance.

Adding in biological constraints, or configuring your system for various properties like noise tolerance, one-shot learning, etc. could place the capacity of a given HTM system with these same dimensions anywhere within this vast range of possibilities.

Does anyone see if and where I am going astray in my understanding of capacity?

3 Likes

I’m a bit confused by this graphic. Are the columns in each representation supposed to be minicolumns and the rows cells in them? If the spatial input A is seen in different context, the same minicolumns should be activated, but different cells within them active. Am I missing something? Or maybe you are just showing the active minicolumns and excluding non-active ones… in which case this makes sense, but is a bit confusing.

1 Like

Yes, showing the active minicolumns and excluding the inactive ones. In this case, there are 4 minicolumns per input, and three cells per minicolumn. Since I am repeating the same input (and SP boosting is not enabled), there is no need to draw the other minicolumns since they would never be active.

I’ll see if I can draw that a little better and will update my post.

Updated

3 Likes

To summarize my argument, what the Neuron Paper identifies as “the number of patterns recognized by the basal synapses of each neuron” is highly impacted (over many orders of magnitude) by several configurable parameters. These include the activation threshold, max synapses per segment, max segments per cell, SP boosting, and the diversity of the sequences being learned. Thus I do not think the original question, as stated, has a useful answer without also considering these other factors.

3 Likes

Do you accept the calculation in the ‘1000 synapses’? Here it is (again):

This can be calculated as the product of the expected duty cycle of an individual neuron (cells per column/column sparsity) times the number of patterns each neuron can recognize on its basal dendrites. For example, a network where 2% of the columns are active, each column has 32 cells, and each cell recognizes 200 patterns on its basal dendrites, can store approximately 320,000 transitions ((32/0.02)*200). The capacity scales linearly with the number of cells per column and the number of patterns recognized by the basal synapses of each neuron.

Seems pretty clear to me, and directly answers the question I asked, which was:.

Specifically, does the number of cells per column correlate with the length of sequence that can be recognised? Or with the number of different sequences recognised?

So what’s the issue, exactly?

1 Like

I suppose nothing really, except that without knowing how “the number of patterns recognized by the basal synapses of each neuron” is determined, does the answer really help you to understand how cells per column and length of sequence are related? The original question seems to imply that these two factors are closely linked. Just pointing out that there are additional factors to consider.

3 Likes

Just to toss in a non-Numenta confounding factor …
There is an issue that gets tossed around from time to time here: repeating sequences.

You can search for it and see some of the discussions.
I proposed habituation as a possibility in the solution set.