Encoder with multiple similarities?

By definittion you need to pass to an Encoder data that has SIMILARITY encoded then then the data has to go trough SPooler, which have to preserve this similarity.

The question is what can you do in case where the SIMILARITY/overlap is dependent on the interaction i.e. it is known after the fact.

For example in chess you may have a State’s (S1 and S2, encoded as SDR) where a board/physical similarity can be just one figure at differrent position i.e. they are spatially similar, but another State S3 is semantically closer.

In this case action A will be more appropriate to S1 and S3, but not for S2.

If you figure out a way to encode S1 and S3 with higher overlap then the Agent will learn better.

Another example Tic-Tac-Toe, states :

 (x,1,1),(o,2,1),(x,1,2),(o,2,2)
 (o,1,1),(x,3,1),(o,1,2),(x,3,2)

are semantically equivalent … the next Action (?,1,3) wins the game. Phisycally the States are not that similar.

So I’m looking for Encoder that generates similar SDR’s with feedback from outside ? How would you approach problem like this ?

1 Like

I would define this capability as Temporal Pooling (TP). The SP algorithm associates related bits that separated spatially, but as you have observed, it does not do so for bits that are separated temporally. The difference between this and an encoder, is that a TP algorithm would incorporate online learning (like SP does), rather than being static in nature.

That said, a somewhat similar encoding strategy that I can think of which is fairly well documented and thus relatively easy to reverse-engineer, is Corticla IO’s semantic folding algorithm. The basic concept could be modified to work for other use cases besides NLP.

2 Likes

the more i think about … i would expect the Encoding as the Data/States happen to drift and/or cluster towards the “points” they are measured against (similarity) , in this case Actions.

Or may be both drift towards clusters of …State ==> Action

1 Like

now if we can make semantic folding online algorithm :wink:

1 Like

Semantic folding : how do u encode the initial word-SDRs to bootstrap the process ?
In their paper they say they filter/AND sensory experiences to get the initial word-SDRs (for use in the snippets later) which makes sense, but how can u create WORD-SENSOR that given symbol-word generates fuzzy word-SDR ?

I dont think it will work with random SDRs

The way I implemented it (may not match what they did) is I gave every snippet a position on a huge map. If a word was in that snippet, then it got a 1 in that position. So the encoding for any given word was just 1 bits in all the positions of snippets that contain that word, and 0 bits everywhere else. Finally I scaled down the huge resulting word encodings to more usable dimensions (this works as long as you arrange the positions on the map for the snippets such that two snippets which share a lot of words are closer to each other than two which share fewer words).

Note that this strategy does not result in fixed-sparsity encodings, though – words that are used frequently (like “the”) end up with very dense encodings compared to words that are used less frequently are far more sparse. Of course you can always pass them through the SP algorithm to fix sparsity like you normally would.

Anyway, encoders like this are a fairly static way of solving the original problem. A temporal pooling algorithm might be a better solution (though all the TP algorithms that I have experimented with so far have their own quirks – there isn’t really a silver bullet, unfortunately).

this wont work for big spaces, wasnt it f.e. Go, chess, GPT3 scale of snippets …
for this it has to be dynamic ?

it is chicken and egg problem … snippets-map-position need word-SDR and word-SDR need snippet-map-position, if i understand the paper correctly ?

True, it only works when the spaces are a manageable size. Even the case of Wikipedia snippets is quite huge, and there are probably more efficient ways to do it with less brute force.

As far as being dynamic, one might be able to use overlapping hex grids for this. If you gave each word a random hex grid (varying by scale, angle, and phase), you could select a position (or set of positions) for a given snippet based on the area(s) where those grids best overlap. I would expect this to result in snippets which share a lot of the same words ending up close to each other.

I don’t think so (although I certainly may have read it wrong). As I understood, the word-SDRs need snippet-map-position, not the other way around.

How do you ground word-SDRs ?
Dont u need to union-words-SDR to represent the snippets-SDR or I’m missing something!!
snippets-SDRs have to be clustered somehow

No, I don’t think a snippet should be represented as an SDR, but as a position on a semantic map (a snippet is a single bit). The word SDRs are formed by placing a 1 bit in each position on the semantic map where that word was part of a snippet.

Like I said, though, I may have misinterpreted what Cortical IO was saying and implemented it differently than they did. For me, the tricky part wasn’t forming the word SDRs with proper semantics, it was doing it in a way where I could scale the SDRs down to more manageable dimensions (which requires organizing the snippet positions on the semantic map properly, not just randomly distributed – though you could probably just use the SP algorithm to do the “scaling” instead, and no need for topologically relevant snippet positions)

how do u compare snippets (so u can cluster them). Dont u need to covert them to some sort of vector ?

Yes, there are a lot of ML algorithms out there that you could use for that, though. IMO, the bit positioning is only important insofar as it allows you to easily scale the dimensions of your resulting SDRs. At the end of the day, the semantics of the bits are more important than their positions on the map (randomly distributed representations can be compared as easily as clustered representations – you’re just comparing percentages of overlapping bits)

you mean algorithms to convert snippets to vectors ?

check this paper : SIMILARITY PRESERVING COMPRESSIONS OF HIGH DI -
MENSIONAL SPARSE DATA

image

Here is an idea !!

  • A limit on the number of sentences is chosen, 5 in this case … to limit the size of the semantic map. In general we make it bigger than we need an at the end compress the word vectors to our needs
  • New sentences are merged with the closest sentence by overlap.
    Then sentence indexes are rejigged i.e clustered.

So we keep the semantic map the same size.

words 1 … 10
sentence nth bit is nth word

s1 = w1:w2:w3 => 1110000000
s2 = w4:w3:w7 => 0011001000  
s3 = w6:w9:w2 => 0100010010
s4 = w5:w8:w7 => 0000010110 
s5 = w6:w2:w5 => 0100110000

new sentence merge to the closest

s6 = w1:w5:w7 => s4 = w1:w5:w8:w7

S4
from == 0000010110
to   => 1000010110

The word vector is generated by the sentences indecies where the word occur with the highest frequencies.

The problem is that after alot of sentences the sentence vector will tend to become 1111111111,
so there needs to be a way to remove bits too ?

May be keep frequency of occurrence when merging and purge the low count from time to time ?

And the clustering should happen at the end, unless we figure online algorithm.

What do u think ?

Consider habitation- getting tired.

If you get an unchanging input you do something different like stop adding connections.

1 Like