How are the cells per columns and length of sequence related?

I’m a bit confused by this graphic. Are the columns in each representation supposed to be minicolumns and the rows cells in them? If the spatial input A is seen in different context, the same minicolumns should be activated, but different cells within them active. Am I missing something? Or maybe you are just showing the active minicolumns and excluding non-active ones… in which case this makes sense, but is a bit confusing.

1 Like

Yes, showing the active minicolumns and excluding the inactive ones. In this case, there are 4 minicolumns per input, and three cells per minicolumn. Since I am repeating the same input (and SP boosting is not enabled), there is no need to draw the other minicolumns since they would never be active.

I’ll see if I can draw that a little better and will update my post.

Updated

3 Likes

To summarize my argument, what the Neuron Paper identifies as “the number of patterns recognized by the basal synapses of each neuron” is highly impacted (over many orders of magnitude) by several configurable parameters. These include the activation threshold, max synapses per segment, max segments per cell, SP boosting, and the diversity of the sequences being learned. Thus I do not think the original question, as stated, has a useful answer without also considering these other factors.

3 Likes

Do you accept the calculation in the ‘1000 synapses’? Here it is (again):

This can be calculated as the product of the expected duty cycle of an individual neuron (cells per column/column sparsity) times the number of patterns each neuron can recognize on its basal dendrites. For example, a network where 2% of the columns are active, each column has 32 cells, and each cell recognizes 200 patterns on its basal dendrites, can store approximately 320,000 transitions ((32/0.02)*200). The capacity scales linearly with the number of cells per column and the number of patterns recognized by the basal synapses of each neuron.

Seems pretty clear to me, and directly answers the question I asked, which was:.

Specifically, does the number of cells per column correlate with the length of sequence that can be recognised? Or with the number of different sequences recognised?

So what’s the issue, exactly?

1 Like

I suppose nothing really, except that without knowing how “the number of patterns recognized by the basal synapses of each neuron” is determined, does the answer really help you to understand how cells per column and length of sequence are related? The original question seems to imply that these two factors are closely linked. Just pointing out that there are additional factors to consider.

3 Likes

Just to toss in a non-Numenta confounding factor …
There is an issue that gets tossed around from time to time here: repeating sequences.

You can search for it and see some of the discussions.
I proposed habituation as a possibility in the solution set.

I only mentioned them here because I have been working with them for a while and it was easy to use it to demonstrate my point (it also is an extreme example of where diversity, or lack thereof, in the sequences being learned have an impact on actual capacity).

The real point here is that if one were to find themselves in a situation where their HTM configuration did not have enough capacity for the problem at hand, the best answer might not be to add more cells per minicolumn. Adjusting another property, such as the activation threshold, may be a better option, depending on the use case.

2 Likes

can u elaborate on the “nesting” behaviour of TM ? How it happens and how it works ?

I’m asking because I was thinking from another angle and came up with the same requirement for TM.
As far as I understand the TM algorithm I cant see how this will happen, yes for detecting variable order sequences, but not “nesting”.

See: temporal pooling.

By accident just read about temporal pooling earlier today … but this does not seem like nesting, more just like “labeling” sequences.

The output layer learns representations corresponding to
objects. When the network first encounters a new object, a sparse
set of cells in the output layer is selected to represent the new
object. These cells remain active while the system senses the
object at different locations. Feed forward connections between
the changing active cells in the input layer and unchanging active
cells in the output layer are continuously reinforced. Thus, each
output cell pools over multiple feature/location representations
in the input layer. Dendritic segments on cells in the output layer
learn by forming lateral modulatory connections to active cells
within their own column, and to active cells in nearby columns.
During training, we reset the output layer when switching to
a new object.

1 Like

That is an initial, naive implementation of TP (if you can even call it that yet), IMO. When we reach a level of sophistication where those pooled representations encode proper semantics and they are used themselves as components of other objects, then it really won’t be simple labeling anymore.

2 Likes

When considering temporal pooling, I typically come down in one of two places.

The first way of looking at it is that the pooling cells are acting like low pass filters. They perform a sort of temporal averaging in order to maintain a persistent representation of features in the domain that might be composed of smaller, more transient features on the input sensors.

The second way I’ve thought about them is to imagine the TP representations as the closed loops. That is to say that a persistent representation is formed by establishing a sequence of SDRs that repeat in a loop. I’m working on an implementation of this form of TP now.

For each of these, nesting occurs by associating transient inputs with the more stable representations. If one can assume that some of the more stable attributes of the sensed object/feature are encoded by the TP representation, then all the lower layers need to worry about is tracking the perturbations of the input from the mean expected behavior.

2 Likes

Sorry I was talking about nesting in sequences, not nesting of “labeled” objects
So in a sense you should have nesting on both levels.

f.e. if you have the following sequences :

 1. ABCDEF
 2. GBCHBCDXY

virtually the common parts are “compressed” on the fly (or may be when we sleep) :

R1: B,C
R2: R1,D

so they become :

1. A,R2,E,F
2. G,R1,H,R2,X,Y

TM VOSeq algo does not do that … it always “records” the full sequence


Why do TM has to be able to do that ?

The first minor benefit is the capacity of the TM will grow. Repetitive seq will take almost no space.

The major benefit is that will simultaneously encode all encountered sub-sequences too.
Partial matches of interactions can happen automatically … ++++

The drawback is that the branch(burst) logic will be more complex OR we need sleep-consolidation process.

BTW: There are online algos to do this type of compression.

1 Like

From my perspective, sequences are the same thing as objects (the only difference is where the distal signal is coming from).

I do not think that TP is part of TM (in my current understanding, these two processes must run in different populations of cells, because they require a temporal differential – this separation also matches TBT currently as well)

In any case, there is a lot of evidence that the brain does this sort of chunking, and HTM is ultimately intended to faithfully model biology. And really, just from observing myself how I replay music in my head, I know that I construct the “object” of a song in components (especially sections that repeat themselves – I don’t think of them as different, but as semantically identical other than their position).

This is very different than the way the TM algorithm alone currently functions, where for a given sequence, each iteration through its sub-sequences involves a completely different (semantically dissimilar) set of representations with virtually no overlap.

BTW, I posted on this thread a while back how I see this sort of thing working in a hierarchy, where abstractions work their way down the hierarchy the more frequently their components are encountered. I still see this as one of the requirements for a “good” TP algorithm.

1 Like

Alright then. From your description, I would tend think of it more like a recursive representation. That’s all well and good, but you’ve introduced the additional complications of how to encode that representation in the context of an HTM network as well as how to expand it back out temporally when/if you want to play it back.

1 Like

A recursive representation is actually the route that I am currently exploring – it can work when the number of times the recursion must happen is stored in a sequence at a higher level of abstraction which is providing (apical) feedback down to the lower level of abstraction.

I walked through some visualizations of how this would wire up in another thread here. From that setup, imagine those representations in the output layer themselves becoming inputs to a sequence (such as (ABCD)` (ABCD)`` (ABCD)``` unfolding into the same sequence of representations A’’ B’ C’ D’ repeated three times)

There is of course a big implementation problem with that, which is the question of timing – that will need to be tackled at some point (today we are good at learning the order of elements, but not how long they each should last before moving to the next one).

Of course I may have misinterpreted what you meant by @mraptor describing recursion (for example, simply wiring up a recursive sequence in TM without an output layer providing feedback would introduce some difficult complications to overcome, like you said)

1 Like

yes thats what I did :wink: … I’m not that familiar with biology, so I’m probably totally wrong … the reason I assume this is algorithmic i.e. I also assume that TM stores Transitions with structure and if that is true then storing nested seqs-of-Ts simplifies other algorithms.

To name a few Planning and RL comes almost for free.
The Model (which in computer lingo is prob dist of all Ts) has to be stored somewhere.

I imagine TM by being nested seqs … is sort of hierarchical Ts “table” , where for RL/Planning it is hierarchical policy “table” … so whats left for the outside circuitry is to implement search, action selection and execution.

Otherwise the Model complication still have to be solved by outside modules.
F.e. comp. algo for planning normally requires a queue to store visited states … it is much more biologically plausible to use nested-TM rather than “queues”.
nested seq are ready solution for a Plan, just have to play the seq .

Agree with that… nested TM does not contradict TP

Once again I would like to mention the Semantic DB, previously mentioned here, with a github README here.

Sequences are a natural data-type in the SDB, so I took your example of sequence compression as a challenge. I wanted to see how hard, and if, I could implement it as an operator in the SDB. It took some work, but it now seems to be working correctly. So on to a quick demonstration. (Noting that the SDB shell has "sa: " as a prompt.)

We start by learning our starting sequences using these two learn rules:

sa: seq |one> => ssplit |ABCDEF>
sa: seq |two> => ssplit |GBCHBCDXY>

where ssplit is an operator that splits a string into a sequence. Noting however that scompress[] works with arbitrary sequences, and we are using simple sequences for clarity.

Now we use our new sequence compression operator, scompress[]:

sa: scompress[seq, cseq]

where “seq” is the source operator, and “cseq” is our destination operator. You can change them as required/desired.

Then to tidy things a little, we delete the original sequences using this short operator sequence:

sa: unlearn[seq] rel-kets[seq]

where rel-kets[] provides a list (in SDB terms, a superposition) of kets for which “seq” is defined.
and where unlearn[] then unlearns “seq” learn rules for those kets.

Then finally, we display the knowledge currently loaded into memory:
sa: dump

|context> => |Global context>

cseq |one> => |A> . |scompress: 0> . |E> . |F>

cseq |two> => |G> . |scompress: 1> . |H> . |scompress: 0> . |X> . |Y>

cseq |scompress: 0> => |scompress: 1> . |D>

cseq |scompress: 1> => |B> . |C>

cseq |*> #=> |_self>

And of course, we can go the other way, and reconstruct the original sequences. In fact, this is rather easy in the SDB. We simply use cseq^k. Ie, cseq applied k times. Let’s demonstrate that:

sa: cseq |one>
|A> . |scompress: 0> . |E> . |F>

sa: cseq^2 |one>
|A> . |scompress: 1> . |D> . |E> . |F>

sa: cseq^3 |one>
|A> . |B> . |C> . |D> . |E> . |F>


sa: cseq |two>
|G> . |scompress: 1> . |H> . |scompress: 0> . |X> . |Y>

sa: cseq^2 |two>
|G> . |B> . |C> . |H> . |scompress: 1> . |D> . |X> . |Y>

sa: cseq^3 |two>
|G> . |B> . |C> . |H> . |B> . |C> . |D> . |X> . |Y>

Noting that since we require k == 3 to reconstruct the original sequences, we can say this system has a hierarchical depth of 3.

Let’s do one more quick example:

sa: seq |one> => ssplit |ABCDEUVWXY>
sa: seq |two> => ssplit |BCD>
sa: seq |three> => ssplit |UVWZ>

sa: scompress[seq, cseq]
sa: unlearn[seq] rel-kets[seq]

sa: dump

|context> => |Global context>

cseq |one> => |A> . |scompress: 0> . |E> . |scompress: 1> . |X> . |Y>

cseq |two> => |scompress: 0>

cseq |three> => |scompress: 1> . |Z>

cseq |scompress: 0> => |B> . |C> . |D>

cseq |scompress: 1> => |U> . |V> . |W>

cseq |*> #=> |_self>

And in this case, k == 2 is sufficient to reconstruct the original sequences, so has a depth of 2.

I hope this is of at least a little interest, and provokes some interest in the SDB.
Feel free to contact me at garry -at- semantic-db.org

3 Likes

So I decided to make a follow up post, given that the last post was purely an abstract demonstration of the scompress[] operator. It has since occurred to me that we might be able to apply scompress[] to raw text and hopefully extract out repeated phrases. I was only 50/50 on whether it would work in practice, or if we would simply get noise, but for the most part I think it has worked. Let me walk you through my SDB code.

I decided to use simple English wikipedia page about dogs as my source text, on the assumption that simpler English will work better than standard English. Especially since scompress[] is trying to extract out repeated substrings, and simpler text will presumably contain more repeat substrings than standard English. For standard English, we would need a much larger data-set to get comparable results.

So, let’s define our set of sentences, and in the process split them into sequences, of single letters, using the ssplit operator. With the warning that the current code breaks if it contains any digits, so we have deleted them from our sentences.

learn-page |dog> #=>
    seq |0> => ssplit |Dogs (Canis lupus familiaris) are domesticated mammals, not natural wild animals.>
    seq |1> => ssplit |They were originally bred from wolves.>
    seq |2> => ssplit |They have been bred by humans for a long time, and were the first animals ever to be domesticated.>
    seq |3> => ssplit |There are different studies that suggest that this happened between and years before our time.>
    seq |4> => ssplit |The dingo is also a dog, but many dingos have become wild animals again and live independently of humans in the range where they occur (parts of Australia).>
    seq |5> => ssplit |Today, some dogs are used as pets, others are used to help humans do their work.>
    seq |6> => ssplit |They are a popular pet because they are usually playful, friendly, loyal and listen to humans.>
...
    seq |54> => ssplit |Some of the most popular breeds are sheepdogs, collies, poodles and retrievers.>
    seq |55> => ssplit |It is becoming popular to breed together two different breeds of dogs and call the new dog's breed a name that is a mixture of the parents' breeds' two names.>
    seq |56> => ssplit |A puppy with a poodle and a pomeranian as parents might be called a Pomapoo.>
    seq |57> => ssplit |These kinds of dogs, instead of being called mutts, are known as designer dog breeds.>
    seq |58> => ssplit |These dogs are normally used for prize shows and designer shows.>
    seq |59> => ssplit |They can be guide dogs.>
    |>

To learn all of those sentences/sequences, we simply invoke the operator:
learn-page |dog>

It turns out that scompress[] processing of raw text works better if we convert everything to lowercase before proceeding. The reasoning is that if we preserved case, then “Dogs” and “dogs” would return “ogs” as the repeating pattern instead of “dogs”. So, here is a quick wrapper operator to do the work for us, which makes use of the to-lower operator that converts text to all lower case:

convert-to-lower-case |*> #=>
    lower-seq |__self> => to-lower seq |__self>
    |>

Now we apply the convert to lower case operator to all of our sequences that have been defined with respect to “seq” (using the relevant kets operator):
convert-to-lower-case rel-kets[seq]

Next, let’s run our scompress operator on our “lower-seq” sequences, storing them with respect to the “cseq” operator, and using "W: " as the scompress sequence prefix:
scompress[lower-seq, cseq, "W: ", 6, 40]
where 6 is the minimum ngram length, and 40 is the maximum ngram length used by scompress[]. By specifying them, it speeds things up a bit.

Now that the hard work is done, let’s take a look at what we have. There are two obvious things we can do next, one is to print out the repeated substrings detected by scompress[], and the other is to measure the system depth of our sequences. See my previous post for a definition of system depth.

Here is the relevant code to print out the repeated substrings, sorted by longest strings first:

filter-W |W: *> #=> |_self>
expand-W |W: *> #=> smerge cseq^20 |_self>
find |repeat patterns> #=> seq2sp expand-W cseq rel-kets[lower-seq] |>

print-coeff |*> #=>
    print (extract-value push-float |__self> _ |:> __ |__self>)
    |>

print-minimalist |*> #=>
    print |__self>
    |>

-- print-coeff reverse sort-by[ket-length] find |repeat patterns>
print-minimalist reverse sort-by[ket-length] find |repeat patterns>

Here are the repeating substrings, with length in range [6, 40]:

 dogs, hunting dogs, herding dogs,
 "man's best friend" because they
they have been bred by humans
man of about years of age,
 dog is called a pup
 between and years
 because they are
 different breeds
s of domestication
 showed that the
 the alpha male.
 are sometimes
these dogs are
 called mutts,
 to that of a
 suggest that
 domestication
e domesticated
ed from wolves
 dog is called
s have lived
 loyal and li
 wild animals
e great dane
 than humans
domesticated
 police dogs
ed together
e different
there are a
dogs often
 dogs with
 dog breeds
 sometimes
s are used
dogs can se
 domestic
dogs have
 years ago
dogs can s
dogs with
modern dog
 sometimes
with human
 dogs are
there are
guide dogs
the first
 and the
 the dog
n average
different
a dog in
 because
they can
 popular
e of the
 to human
 such as
dogs are
 other ar
 designer
sometimes
s in the
 parents
usually
e dogs,
they are
 people
 have be
or blind
e other
 a dog,
there ar
 before
 of the
e breed
parents
t least
 dogs, h
s closer
this is
human bo
trained
en years
 longer
 called
 police
lifespan
er dogs
 of dogs
s and ca
 and li
 where
ed that
e that
called
human b
e dogs.
 a few
 dogs,
 called
 people
 friend
 often
 humans
 their
as pets
parents
animals
 better
ll and
 breed
 dogs w
breeds
 breeds
 poodle
 wolves
before
 dogs t
e pette
usually
 and ar
 known
 a dog
long t
 and a
 were
 other
dogs,
dogs w
 dingo
 human
of dog
red by
 group
 have
they a
breed
 the a
breeds
s can
s and
 years
t see
 shows
, but
 being
r this
it is
 or pu
dog is
wolves
 for d
the re
d not
s are
e dogs
 been

So, it did a moderate job of extracting out repeat phrases and words, given the starting point was sequences of individual letters. Though it would presumably do an even better job if we had a much larger data-set, with more repeat phrases and words.

Finally, let’s look at the system depth, as defined in my previous post. Here is the relevant code in the SDB language (making use of recursion):

find-depth (*) #=>
    depth |system> => plus[1] depth |system>
    if( is-equal(|__self>, the |input>), |op: display-depth>, |op: find-depth>) cseq |__self>

display-depth (*) #=>
    |system depth:> __ depth |system>

find-system-depth |*> #=>
    depth |system> => |0>
    the |input> => lower-seq |__self>
    find-depth cseq |__self>

coeff-sort find-system-depth rel-kets[lower-seq]

And here is the result:
27|system depth: 4> + 15|system depth: 5> + 12|system depth: 3> + 6|system depth: 2>

While making the observation that not all of the input sequences share the same system depth.

And that is about it. I don’t think we have done anything useful with scompress[] yet, but it is a start.
I should also mention, the above data-set took about 10 seconds to process. And without passing in a min and a max ngram length to scompress[] the code took about 18 seconds to process. Given all the processing going on inside scompress[], repeatedly breaking the input sequences into smaller and smaller ngrams, I don’t think that is all that terrible. Do people have any other suggestions for where scompress[] might be more interesting? It should work for almost any collection of sequences.

Here is the Semantic DB github page.
Here is the code for this post
Feel free to contact me at garry -at- semantic-db.org

1 Like