Visualizing high order sequences

garry.morrison · July 17, 2016, 9:03am

Visualizing the system after it has learnt the given sequences.

For our first example we use these three sequences:
“count one two three four five six seven”,
“Fibonacci one one two three five eight thirteen”,
“factorial one two six twenty-four one-hundred-twenty”

Here is the raw code, in my language:
http://semantic-db.org/sw-examples/improved-high-order-sequences--small.sw

Here is what that looks like after processing, and some manual tidying:
http://semantic-db.org/sw-examples/improved-high-order-sequences--small--tidy.sw

And here is the resulting pretty picture, generated using my sw2dot code and graphviz:

Next, using the alphabet as a single sequence:
“a b c d e f g h i j k l m n o p q r s t u v w x y z”

Next, using lowercase and uppercase alphabet, with a meeting point “phi” in the middle:
“a b c d e f g h i j k l phi m n o p q r s t u v w x y z”
“A B C D E F G H I J K L phi M N O P Q R S T U V W X Y Z”

Next, the alphabets plus a joining bridge:
“a b c d e f g h i j k l phi-0 m n o p q r s t u v w x y z”
“A B C D E F G H I J K L phi-5 M N O P Q R S T U V W X Y Z”
“phi-0 phi-1 phi-2 phi-3 phi-4 phi-5”

rhyolight · July 18, 2016, 1:51am

Hey Garry, I’m confused about what I’m seeing here. Is there an HTM system involved in this at all? Or is this all generated by your Semantic DB project?

garry.morrison · July 19, 2016, 5:29am

Sure, there are HTM ideas in there (SDR’s, mini-columns, high order sequence predictions, robustness to noise).
The individual circles are SDR’s, the joins are where the SDR’s have overlap, and the long chains of circles are sequences of SDR’s. I just thought this was an interesting visualization.

A brief explanation of my code:
– define a dense array, with 65536 bits on:
full |range> => range(|1>,|65536>)

– map the above SDR’s to SDR’s with mini-columns, 10 elements per column, but with only 1, picked randomly, on:
pattern |node 0: 0> => random-column[10] encode |count>

– define the SDR that follows this:
then |node 0: 0> => random-column[10] encode |one>

– and so on:
pattern |node 0: 1> => then |node 0: 0>
then |node 0: 1> => random-column[10] encode |two>

pattern |node 0: 2> => then |node 0: 1>
then |node 0: 2> => random-column[10] encode |three>

pattern |node 0: 3> => then |node 0: 2>
then |node 0: 3> => random-column[10] encode |four>
…

My language might look strange, but I’m using some HTM ideas.

Sean_O_Connor · August 22, 2016, 1:19pm

At first I though someone was sending packets of death to my Chrome browser for looking at this topic. However it is probably those massive png files causing the crashes. Don’t do that!
Anyway, I will check out what you are doing.

Sean_O_Connor · August 22, 2016, 1:31pm

I didn’t immediately see a presentation of the basic ideas etc at the given website.

garry.morrison · August 23, 2016, 10:35am

Yeah, I suck at explaining my ideas! Let me try again.
The idea is to try and represent knowledge using learn rules that generally look like this:
operator |some ket> => some-superposition

where “operator” is a string, with one interpretation being a labeled link between nodes in a network
|some ket> is called a ket, with one interpretation being a node in a network
and a superposition is a linear combination of kets, with one interpretation being current network state

For example, a couple of simple learn rules:

learn Fred’s friends are Sam, Max, and Rob:
the-friends-of |Fred> => |Sam> + |Max> + |Rob>

Once the learn rules are loaded into the console, they can be recalled:
eg:
sa: the-friends-of |Fred>
|Sam> + |Max> + |Rob>

Anyway, that is the starting point. The rest of the project is desgined around operators that map superpositions to superpositions. The central motivation being that I think superpositions, and operators, are a natural representation for what is going on in brains, and networks in general. A claim that SDR’s also make. Indeed, part of the relevance of all this to HTM is that superpositions conceptually are very similar to SDR’s. For example:

In practice most superpositions are sparse, since any ket with coefficient 0 we can drop without changing the meaning of that superposition.
In some use cases the superpositions have semantics distributed among a collection of kets, and we can randomly drop, or add, other kets and the meaning only changes a bit.
We have a function that returns the similarity of two superpositions. 1 for exactly the same, 0 for disjoint, values in between otherwise. The HTM analog being the overlap score, used to score the similarity of SDR’s.

A brief demonstration of those properties:
Let’s use an operator that maps strings to letter-ngrams, so we can measure string similarity:

sa: letter-ngrams[1,2,3] |management>
2|m> + 2|a> + 2|n> + |g> + 2|e> + |t> + |ma> + |an> + |na> + |ag> + |ge> + |em> + |me> + |en> + |nt> + |man> + |ana> + |nag> + |age> + |gem> + |eme> + |men> + |ent>

sa: letter-ngrams[1,2,3] |measurement>
2|m> + 3|e> + |a> + |s> + |u> + |r> + |n> + |t> + 2|me> + |ea> + |as> + |su> + |ur> + |re> + |em> + |en> + |nt> + |mea> + |eas> + |asu> + |sur> + |ure> + |rem> + |eme> + |men> + |ent>

these are sparse, since we omit all ngrams that are not in the seed strings.
we can measure their similarity using the ket-simm function:

sa: ket-simm(letter-ngrams[1,2,3] |measurement>, letter-ngrams[1,2,3] |management>)
0.478|simm>

ie, 47.8% similarity.

the spelling similarity is distributed among the kets. We can add or subtract some of these kets but some similarity will remain.

To test this, we first need to find out how many kets are in each superposition (ie, how many on bits in the SDR):
sa: how-many letter-ngrams[1,2,3] |measurement>
|number: 26>

sa: how-many letter-ngrams[1,2,3] |management>
|number: 23>

Now, pick a number less than 23, say 18 on bits, and remeasure similarity:
(where the pick[k] operator returns k random kets from a given superposition)

sa: ket-simm(pick[18] letter-ngrams[1,2,3] |measurement>, pick[18] letter-ngrams[1,2,3] |management>)
0.327|simm>

ie, they are still 32.7% similar.

If we drop back to only 10 on bits, and remeasure similarity:
sa: ket-simm(pick[10] letter-ngrams[1,2,3] |measurement>, pick[10] letter-ngrams[1,2,3] |management>)
0.167|simm>

ie, they are still 16.7% similar.

Now with the background out of the way, we need to tie this all back to high order sequences.

Let’s learn these three sequences:
– count one two three four five six seven
– Fibonacci one one two three five eight thirteen
– factorial one two six twenty-four one-hundred-twenty

– define a superposition/SDR with all 64k bits on:
full |range> => range(|1>,|65536>)

– Here is what these guys look like under the hood, which is clearly a SDR:
(in this case a list of the indices of the on bits)
sa: encode |count>
|48476> + |46200> + |16172> + |43017> + |25301> + |41925> + |14697> + |20684> + |23309> + |45815> + |6938> + |7894> + |37263> + |34500> + |47554> + |41565> + |21080> + |29408> + |63024> + |59305> + |29824> + |11414> + |30627> + |59143> + |61059> + |52893> + |12487> + |7664> + |63961> + |37138> + |6132> + |57766> + |5397> + |41515> + |2481> + |51026> + |33491> + |64623> + |2565> + |50773>

– Next, learn sequences of these objects, using the idea of minicolumns,
– where append-column[k] appends a column to the encoded SDR with all k cells on
– and random-column[k] appends a column with only 1 out of k cells on.

– give the sequence a name:
sequence-number |node 0: *> => |sequence-0>

– start learning the sequence:
pattern |node 0: 0> => random-column[50] encode |count>
then |node 0: 0> => random-column[50] encode |one>

pattern |node 0: 1> => then |node 0: 0>
then |node 0: 1> => random-column[50] encode |two>

pattern |node 0: 2> => then |node 0: 1>
then |node 0: 2> => random-column[50] encode |three>

pattern |node 0: 3> => then |node 0: 2>
then |node 0: 3> => random-column[50] encode |four>

pattern |node 0: 4> => then |node 0: 3>
then |node 0: 4> => random-column[50] encode |five>

pattern |node 0: 5> => then |node 0: 4>
then |node 0: 5> => random-column[50] encode |six>

pattern |node 0: 6> => then |node 0: 5>
then |node 0: 6> => random-column[50] encode |seven>

pattern |node 1: 1> => then |node 1: 0>
then |node 1: 1> => random-column[50] encode |one>

pattern |node 1: 2> => then |node 1: 1>
then |node 1: 2> => random-column[50] encode |two>

pattern |node 1: 3> => then |node 1: 2>
then |node 1: 3> => random-column[50] encode |three>

pattern |node 1: 4> => then |node 1: 3>
then |node 1: 4> => random-column[50] encode |five>

pattern |node 1: 5> => then |node 1: 4>
then |node 1: 5> => random-column[50] encode |eight>

pattern |node 1: 6> => then |node 1: 5>
then |node 1: 6> => random-column[50] encode |thirteen>

pattern |node 2: 1> => then |node 2: 0>
then |node 2: 1> => random-column[50] encode |two>

pattern |node 2: 2> => then |node 2: 1>
then |node 2: 2> => random-column[50] encode |six>

pattern |node 2: 3> => then |node 2: 2>
then |node 2: 3> => random-column[50] encode |twenty-four>

pattern |node 2: 4> => then |node 2: 3>
then |node 2: 4> => random-column[50] encode |one-hundred-twenty>

Once this is all processed and loaded into the console we can ask questions.
(see: http://semantic-db.org/sw-examples/improved-high-order-sequences--saved.sw )
(where “step-k” is an operator that gives a prediction k steps from the input)

sa: step-1 |count>
1.0|one>

sa: step-2 |count>
0.976|two>

sa: step-3 |count>
0.932|three>

sa: step-1 |factorial>
1.0|one>

sa: step-2 |factorial>
1.0|two>

sa: step-3 |factorial>
0.93|six> + 0.07|three>

sa: step-4 |factorial>
0.906|twenty-four>

– without context, we don’t know which sequence “one” is in.
– so 75% chance step-1 is “two” but a 25% chance it is "one"
sa: step-1 |one>
0.75|two> + 0.25|one>

sa: step-2 |one>
0.506|three> + 0.253|six> + 0.241|two>

And so on. In my opinion a pretty demonstration of HTM style symbolic high order sequence learning.
Heh, not that I’m saying it is useful yet, but it is a starting point.
I hope some of this post made sense! Sorry about the length, and hope it is not too off-topic.

For completeness, this is what the step-k operators look like:
input-encode |*> #=> append-column[50] encode |_self>

step-1 |> #=> drop-below[0.05] similar-input[encode] extract-category then drop-below[0.01] similar-input[pattern] input-encode |_self>
step-2 |> #=> drop-below[0.05] similar-input[encode] extract-category then drop-below[0.01] similar-input[pattern] then drop-below[0.01] similar-input[pattern] input-encode |_self>
step-3 |*> #=> drop-below[0.05] similar-input[encode] extract-category then drop-below[0.01] similar-input[pattern] then drop-below[0.01] similar-input[pattern] then drop-below[0.01] similar-input[pattern] input-encode |_self>

links:
http://semantic-db.org/sw-examples/improved-high-order-sequences.sw
http://semantic-db.org/sw-examples/improved-high-order-sequences--saved.sw

http://write-up.semantic-db.org/201-new-operators-append-column-and-random-column.html
http://write-up.semantic-db.org/202-learning-sequences-htm-style-using-if-then-machines.html
http://write-up.semantic-db.org/203-more-htm-sequence-learning.html
http://write-up.semantic-db.org/204-random-encode-similarity-matrices.html
http://write-up.semantic-db.org/206-naming-htm-sequences.html
http://write-up.semantic-db.org/207-visualizing-htm-sequences.html

Sean_O_Connor · August 23, 2016, 2:41pm

That was more comprehensible. I guess you should explore the ideas in your own way and if it works out then recast the terminology in more standard computer science terms. Or the smarter option if it works out would be just to go and make millions off the stock market.
I suppose you could read “concept” for “superposition” and then you are not too far off what I intend to try next. If you start with the notion that people only harbor a few thousand concepts and some new data arrives. Which concept do you link the new data to? If everything is fast enough then why not try all the options/concepts and see which has the best explanative power.
The worst outcome would be that you end up with an O(nn) algorithm but if you used something like the union find algorithm maybe you can get that to O(nln(n)). Anyway the constant factors could be so low that O(n*n) would still be a lot cheaper than the current deep neural networks. https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf

Sean_O_Connor · August 23, 2016, 3:39pm

I mean unsupervised learning of concepts.

garry.morrison · August 24, 2016, 10:41am

Thanks for the feedback. Yeah, it is a work in progress, and the blog is probably more for me, to explore and write down my ideas, than for an outside audience, though with the hope to eventually stitch it all together into something more interesting.

As for the meaning of superposition, well there are lots of alternatives, depending on what you are trying to do.
The simplest maths version is “sparse labeled vector”.
As for “concept” I think that is closer to kets than superpostions, eg: these kets: |2 litres of milk> or |person: Max>
In which case a superposition could be called a “thought vector”.

The origin of the term “superposition” is something I co-opted from quantum mechanics.
Don’t get me wrong, there are a lot of differences in my use of superpositions (and operators) with that in QM.
The most obvious is that coefficients of kets in QM are complex, in my work they are positive floats.

We can still see a hint of QM in my work though.
For example, this is how my language represents Schrodinger’s cat:

is-alive? |Schrodingers cat> !=> normalize weighted-pick-elt (0.5|yes> + 0.5|no>)

Now, we can ask if the cat is alive, with a 50/50 chance of yes or no, but after we ask, it will be in a definite state.
Here that is in the console:

sa: is-alive? |Schrodingers cat>
|yes>

sa: dump
|context> => |context: Schrodingers cat>
is-alive? |Schrodingers cat> => |yes>

This corresponds to a more general pattern where we can encode the probabilities of a quantum system:

some-measurement |some object/system> !=> normalize weighted-pick-elt (P1|state 1> + P2|state 2> + … + Pn|state n>)

where the P_k are the probabilities the system is in state k.
And after measurement, the system will be in a single state, with probability P_k (thanks to the weighted-pick-elt operator)

Anyway, that is why I prefer to call these things superpositions.

garry.morrison · August 24, 2016, 10:46am

Also, my work is a knowledge representation language.
In a strong way, QM is also about knowledge representation.

In the classical world, in theory we could know to infinite accuracy everything.
So knowledge representation was not an issue.
But in QM we can’t even know the position and momentum of a particle simultaneously to full accuracy (thanks Heisenberg uncertainty principle), or the exact position of an electron orbiting a nucleus. So QM is restricted to working with what we can know. ie knowledge representation.

Sean_O_Connor · August 24, 2016, 11:57pm

If you have only a small amount of information to base a decision on then when you mined that completely you may be left with a number of choices. A superposition.
If you have a large amount of information to start you can keep mining until you get complete disambiguation. That is somewhat non-robust to noise but you avoid a combinatorial explosion of possible outcomes. At the moment I want to look at the second option and see what happens.

Regarding the union-find algorithm this is the best paper I found:
http://www.ii.uib.no/~fredrikm/fredrik/papers/sm_disjoint.pdf
They recommend Rem’s algorithm. A different paper indicates that is a good choice with random inputs, not necessarily with ordered inputs. I’ll be using it with a hash table so fine.

Topic		Replies	Views
Learning high order sequences Tangential Theories	6	1237	July 10, 2016
A toy symbolic system partly inspired by HTM theory Lounge	0	741	June 9, 2017
Learning an ordered sequence out of order Tangential Theories	0	916	July 6, 2016
Learning how to spell using HTM inspired sequence learning Lounge	1	667	October 26, 2016
The coding of longer sequences in HTM SDRs Tangential Theories	144	2665	April 30, 2023

Visualizing high order sequences

Related topics