I’m a bit confused by this graphic. Are the columns in each representation supposed to be minicolumns and the rows cells in them? If the spatial input A is seen in different context, the same minicolumns should be activated, but different cells within them active. Am I missing something? Or maybe you are just showing the active minicolumns and excluding non-active ones… in which case this makes sense, but is a bit confusing.
Yes, showing the active minicolumns and excluding the inactive ones. In this case, there are 4 minicolumns per input, and three cells per minicolumn. Since I am repeating the same input (and SP boosting is not enabled), there is no need to draw the other minicolumns since they would never be active.
I’ll see if I can draw that a little better and will update my post.
To summarize my argument, what the Neuron Paper identifies as “the number of patterns recognized by the basal synapses of each neuron” is highly impacted (over many orders of magnitude) by several configurable parameters. These include the activation threshold, max synapses per segment, max segments per cell, SP boosting, and the diversity of the sequences being learned. Thus I do not think the original question, as stated, has a useful answer without also considering these other factors.
Do you accept the calculation in the ‘1000 synapses’? Here it is (again):
This can be calculated as the product of the expected duty cycle of an individual neuron (cells per column/column sparsity) times the number of patterns each neuron can recognize on its basal dendrites. For example, a network where 2% of the columns are active, each column has 32 cells, and each cell recognizes 200 patterns on its basal dendrites, can store approximately 320,000 transitions ((32/0.02)*200). The capacity scales linearly with the number of cells per column and the number of patterns recognized by the basal synapses of each neuron.
Seems pretty clear to me, and directly answers the question I asked, which was:.
Specifically, does the number of cells per column correlate with the length of sequence that can be recognised? Or with the number of different sequences recognised?
So what’s the issue, exactly?
I suppose nothing really, except that without knowing how “the number of patterns recognized by the basal synapses of each neuron” is determined, does the answer really help you to understand how cells per column and length of sequence are related? The original question seems to imply that these two factors are closely linked. Just pointing out that there are additional factors to consider.
Just to toss in a non-Numenta confounding factor …
There is an issue that gets tossed around from time to time here: repeating sequences.
You can search for it and see some of the discussions.
I proposed habituation as a possibility in the solution set.
I only mentioned them here because I have been working with them for a while and it was easy to use it to demonstrate my point (it also is an extreme example of where diversity, or lack thereof, in the sequences being learned have an impact on actual capacity).
The real point here is that if one were to find themselves in a situation where their HTM configuration did not have enough capacity for the problem at hand, the best answer might not be to add more cells per minicolumn. Adjusting another property, such as the activation threshold, may be a better option, depending on the use case.
can u elaborate on the “nesting” behaviour of TM ? How it happens and how it works ?
I’m asking because I was thinking from another angle and came up with the same requirement for TM.
As far as I understand the TM algorithm I cant see how this will happen, yes for detecting variable order sequences, but not “nesting”.
See: temporal pooling.
By accident just read about temporal pooling earlier today … but this does not seem like nesting, more just like “labeling” sequences.
The output layer learns representations corresponding to
objects. When the network first encounters a new object, a sparse
set of cells in the output layer is selected to represent the new
object. These cells remain active while the system senses the
object at different locations. Feed forward connections between
the changing active cells in the input layer and unchanging active
cells in the output layer are continuously reinforced. Thus, each
output cell pools over multiple feature/location representations
in the input layer. Dendritic segments on cells in the output layer
learn by forming lateral modulatory connections to active cells
within their own column, and to active cells in nearby columns.
During training, we reset the output layer when switching to
a new object.
That is an initial, naive implementation of TP (if you can even call it that yet), IMO. When we reach a level of sophistication where those pooled representations encode proper semantics and they are used themselves as components of other objects, then it really won’t be simple labeling anymore.
When considering temporal pooling, I typically come down in one of two places.
The first way of looking at it is that the pooling cells are acting like low pass filters. They perform a sort of temporal averaging in order to maintain a persistent representation of features in the domain that might be composed of smaller, more transient features on the input sensors.
The second way I’ve thought about them is to imagine the TP representations as the closed loops. That is to say that a persistent representation is formed by establishing a sequence of SDRs that repeat in a loop. I’m working on an implementation of this form of TP now.
For each of these, nesting occurs by associating transient inputs with the more stable representations. If one can assume that some of the more stable attributes of the sensed object/feature are encoded by the TP representation, then all the lower layers need to worry about is tracking the perturbations of the input from the mean expected behavior.
Sorry I was talking about nesting in sequences, not nesting of “labeled” objects
So in a sense you should have nesting on both levels.
f.e. if you have the following sequences :
1. ABCDEF 2. GBCHBCDXY
virtually the common parts are “compressed” on the fly (or may be when we sleep) :
R1: B,C R2: R1,D
so they become :
1. A,R2,E,F 2. G,R1,H,R2,X,Y
TM VOSeq algo does not do that … it always “records” the full sequence
Why do TM has to be able to do that ?
The first minor benefit is the capacity of the TM will grow. Repetitive seq will take almost no space.
The major benefit is that will simultaneously encode all encountered sub-sequences too.
Partial matches of interactions can happen automatically … ++++
The drawback is that the branch(burst) logic will be more complex OR we need sleep-consolidation process.
BTW: There are online algos to do this type of compression.
From my perspective, sequences are the same thing as objects (the only difference is where the distal signal is coming from).
I do not think that TP is part of TM (in my current understanding, these two processes must run in different populations of cells, because they require a temporal differential – this separation also matches TBT currently as well)
In any case, there is a lot of evidence that the brain does this sort of chunking, and HTM is ultimately intended to faithfully model biology. And really, just from observing myself how I replay music in my head, I know that I construct the “object” of a song in components (especially sections that repeat themselves – I don’t think of them as different, but as semantically identical other than their position).
This is very different than the way the TM algorithm alone currently functions, where for a given sequence, each iteration through its sub-sequences involves a completely different (semantically dissimilar) set of representations with virtually no overlap.
BTW, I posted on this thread a while back how I see this sort of thing working in a hierarchy, where abstractions work their way down the hierarchy the more frequently their components are encountered. I still see this as one of the requirements for a “good” TP algorithm.
Alright then. From your description, I would tend think of it more like a recursive representation. That’s all well and good, but you’ve introduced the additional complications of how to encode that representation in the context of an HTM network as well as how to expand it back out temporally when/if you want to play it back.
A recursive representation is actually the route that I am currently exploring – it can work when the number of times the recursion must happen is stored in a sequence at a higher level of abstraction which is providing (apical) feedback down to the lower level of abstraction.
I walked through some visualizations of how this would wire up in another thread here. From that setup, imagine those representations in the output layer themselves becoming inputs to a sequence (such as (ABCD)` (ABCD)`` (ABCD)``` unfolding into the same sequence of representations A’’ B’ C’ D’ repeated three times)
There is of course a big implementation problem with that, which is the question of timing – that will need to be tackled at some point (today we are good at learning the order of elements, but not how long they each should last before moving to the next one).
Of course I may have misinterpreted what you meant by @mraptor describing recursion (for example, simply wiring up a recursive sequence in TM without an output layer providing feedback would introduce some difficult complications to overcome, like you said)
yes thats what I did … I’m not that familiar with biology, so I’m probably totally wrong … the reason I assume this is algorithmic i.e. I also assume that TM stores Transitions with structure and if that is true then storing nested seqs-of-Ts simplifies other algorithms.
To name a few Planning and RL comes almost for free.
The Model (which in computer lingo is prob dist of all Ts) has to be stored somewhere.
I imagine TM by being nested seqs … is sort of hierarchical Ts “table” , where for RL/Planning it is hierarchical policy “table” … so whats left for the outside circuitry is to implement search, action selection and execution.
Otherwise the Model complication still have to be solved by outside modules.
F.e. comp. algo for planning normally requires a queue to store visited states … it is much more biologically plausible to use nested-TM rather than “queues”.
nested seq are ready solution for a Plan, just have to play the seq .
Agree with that… nested TM does not contradict TP
Sequences are a natural data-type in the SDB, so I took your example of sequence compression as a challenge. I wanted to see how hard, and if, I could implement it as an operator in the SDB. It took some work, but it now seems to be working correctly. So on to a quick demonstration. (Noting that the SDB shell has "sa: " as a prompt.)
We start by learning our starting sequences using these two learn rules:
sa: seq |one> => ssplit |ABCDEF>
sa: seq |two> => ssplit |GBCHBCDXY>
where ssplit is an operator that splits a string into a sequence. Noting however that scompress works with arbitrary sequences, and we are using simple sequences for clarity.
Now we use our new sequence compression operator, scompress:
sa: scompress[seq, cseq]
where “seq” is the source operator, and “cseq” is our destination operator. You can change them as required/desired.
Then to tidy things a little, we delete the original sequences using this short operator sequence:
sa: unlearn[seq] rel-kets[seq]
where rel-kets provides a list (in SDB terms, a superposition) of kets for which “seq” is defined.
and where unlearn then unlearns “seq” learn rules for those kets.
Then finally, we display the knowledge currently loaded into memory:
|context> => |Global context> cseq |one> => |A> . |scompress: 0> . |E> . |F> cseq |two> => |G> . |scompress: 1> . |H> . |scompress: 0> . |X> . |Y> cseq |scompress: 0> => |scompress: 1> . |D> cseq |scompress: 1> => |B> . |C> cseq |*> #=> |_self>
And of course, we can go the other way, and reconstruct the original sequences. In fact, this is rather easy in the SDB. We simply use cseq^k. Ie, cseq applied k times. Let’s demonstrate that:
sa: cseq |one> |A> . |scompress: 0> . |E> . |F> sa: cseq^2 |one> |A> . |scompress: 1> . |D> . |E> . |F> sa: cseq^3 |one> |A> . |B> . |C> . |D> . |E> . |F> sa: cseq |two> |G> . |scompress: 1> . |H> . |scompress: 0> . |X> . |Y> sa: cseq^2 |two> |G> . |B> . |C> . |H> . |scompress: 1> . |D> . |X> . |Y> sa: cseq^3 |two> |G> . |B> . |C> . |H> . |B> . |C> . |D> . |X> . |Y>
Noting that since we require k == 3 to reconstruct the original sequences, we can say this system has a hierarchical depth of 3.
Let’s do one more quick example:
sa: seq |one> => ssplit |ABCDEUVWXY> sa: seq |two> => ssplit |BCD> sa: seq |three> => ssplit |UVWZ> sa: scompress[seq, cseq] sa: unlearn[seq] rel-kets[seq] sa: dump |context> => |Global context> cseq |one> => |A> . |scompress: 0> . |E> . |scompress: 1> . |X> . |Y> cseq |two> => |scompress: 0> cseq |three> => |scompress: 1> . |Z> cseq |scompress: 0> => |B> . |C> . |D> cseq |scompress: 1> => |U> . |V> . |W> cseq |*> #=> |_self>
And in this case, k == 2 is sufficient to reconstruct the original sequences, so has a depth of 2.
I hope this is of at least a little interest, and provokes some interest in the SDB.
Feel free to contact me at garry -at- semantic-db.org
So I decided to make a follow up post, given that the last post was purely an abstract demonstration of the scompress operator. It has since occurred to me that we might be able to apply scompress to raw text and hopefully extract out repeated phrases. I was only 50/50 on whether it would work in practice, or if we would simply get noise, but for the most part I think it has worked. Let me walk you through my SDB code.
I decided to use simple English wikipedia page about dogs as my source text, on the assumption that simpler English will work better than standard English. Especially since scompress is trying to extract out repeated substrings, and simpler text will presumably contain more repeat substrings than standard English. For standard English, we would need a much larger data-set to get comparable results.
So, let’s define our set of sentences, and in the process split them into sequences, of single letters, using the ssplit operator. With the warning that the current code breaks if it contains any digits, so we have deleted them from our sentences.
learn-page |dog> #=> seq |0> => ssplit |Dogs (Canis lupus familiaris) are domesticated mammals, not natural wild animals.> seq |1> => ssplit |They were originally bred from wolves.> seq |2> => ssplit |They have been bred by humans for a long time, and were the first animals ever to be domesticated.> seq |3> => ssplit |There are different studies that suggest that this happened between and years before our time.> seq |4> => ssplit |The dingo is also a dog, but many dingos have become wild animals again and live independently of humans in the range where they occur (parts of Australia).> seq |5> => ssplit |Today, some dogs are used as pets, others are used to help humans do their work.> seq |6> => ssplit |They are a popular pet because they are usually playful, friendly, loyal and listen to humans.> ... seq |54> => ssplit |Some of the most popular breeds are sheepdogs, collies, poodles and retrievers.> seq |55> => ssplit |It is becoming popular to breed together two different breeds of dogs and call the new dog's breed a name that is a mixture of the parents' breeds' two names.> seq |56> => ssplit |A puppy with a poodle and a pomeranian as parents might be called a Pomapoo.> seq |57> => ssplit |These kinds of dogs, instead of being called mutts, are known as designer dog breeds.> seq |58> => ssplit |These dogs are normally used for prize shows and designer shows.> seq |59> => ssplit |They can be guide dogs.> |>
To learn all of those sentences/sequences, we simply invoke the operator:
It turns out that scompress processing of raw text works better if we convert everything to lowercase before proceeding. The reasoning is that if we preserved case, then “Dogs” and “dogs” would return “ogs” as the repeating pattern instead of “dogs”. So, here is a quick wrapper operator to do the work for us, which makes use of the to-lower operator that converts text to all lower case:
convert-to-lower-case |*> #=> lower-seq |__self> => to-lower seq |__self> |>
Now we apply the convert to lower case operator to all of our sequences that have been defined with respect to “seq” (using the relevant kets operator):
Next, let’s run our scompress operator on our “lower-seq” sequences, storing them with respect to the “cseq” operator, and using "W: " as the scompress sequence prefix:
scompress[lower-seq, cseq, "W: ", 6, 40]
where 6 is the minimum ngram length, and 40 is the maximum ngram length used by scompress. By specifying them, it speeds things up a bit.
Now that the hard work is done, let’s take a look at what we have. There are two obvious things we can do next, one is to print out the repeated substrings detected by scompress, and the other is to measure the system depth of our sequences. See my previous post for a definition of system depth.
Here is the relevant code to print out the repeated substrings, sorted by longest strings first:
filter-W |W: *> #=> |_self> expand-W |W: *> #=> smerge cseq^20 |_self> find |repeat patterns> #=> seq2sp expand-W cseq rel-kets[lower-seq] |> print-coeff |*> #=> print (extract-value push-float |__self> _ |:> __ |__self>) |> print-minimalist |*> #=> print |__self> |> -- print-coeff reverse sort-by[ket-length] find |repeat patterns> print-minimalist reverse sort-by[ket-length] find |repeat patterns>
Here are the repeating substrings, with length in range [6, 40]:
dogs, hunting dogs, herding dogs, "man's best friend" because they they have been bred by humans man of about years of age, dog is called a pup between and years because they are different breeds s of domestication showed that the the alpha male. are sometimes these dogs are called mutts, to that of a suggest that domestication e domesticated ed from wolves dog is called s have lived loyal and li wild animals e great dane than humans domesticated police dogs ed together e different there are a dogs often dogs with dog breeds sometimes s are used dogs can se domestic dogs have years ago dogs can s dogs with modern dog sometimes with human dogs are there are guide dogs the first and the the dog n average different a dog in because they can popular e of the to human such as dogs are other ar designer sometimes s in the parents usually e dogs, they are people have be or blind e other a dog, there ar before of the e breed parents t least dogs, h s closer this is human bo trained en years longer called police lifespan er dogs of dogs s and ca and li where ed that e that called human b e dogs. a few dogs, called people friend often humans their as pets parents animals better ll and breed dogs w breeds breeds poodle wolves before dogs t e pette usually and ar known a dog long t and a were other dogs, dogs w dingo human of dog red by group have they a breed the a breeds s can s and years t see shows , but being r this it is or pu dog is wolves for d the re d not s are e dogs been
So, it did a moderate job of extracting out repeat phrases and words, given the starting point was sequences of individual letters. Though it would presumably do an even better job if we had a much larger data-set, with more repeat phrases and words.
Finally, let’s look at the system depth, as defined in my previous post. Here is the relevant code in the SDB language (making use of recursion):
find-depth (*) #=> depth |system> => plus depth |system> if( is-equal(|__self>, the |input>), |op: display-depth>, |op: find-depth>) cseq |__self> display-depth (*) #=> |system depth:> __ depth |system> find-system-depth |*> #=> depth |system> => |0> the |input> => lower-seq |__self> find-depth cseq |__self> coeff-sort find-system-depth rel-kets[lower-seq]
And here is the result:
27|system depth: 4> + 15|system depth: 5> + 12|system depth: 3> + 6|system depth: 2>
While making the observation that not all of the input sequences share the same system depth.
And that is about it. I don’t think we have done anything useful with scompress yet, but it is a start.
I should also mention, the above data-set took about 10 seconds to process. And without passing in a min and a max ngram length to scompress the code took about 18 seconds to process. Given all the processing going on inside scompress, repeatedly breaking the input sequences into smaller and smaller ngrams, I don’t think that is all that terrible. Do people have any other suggestions for where scompress might be more interesting? It should work for almost any collection of sequences.