Words to SDR?

Paul, yes, that is prefect. Thanks.

Is it safe to say that the cortical.io encoder works on data that’s exclusively in the form of words? So, all semantic information that is encoded pertains to the resulting space of words?

Yes. In their implementation of semantic folding, semantics are distilled by first defining all the source text snip-its (from Wikipedia) and positioning them on the semantic map such that snip-its that have a lot of the same words are closer to each other than snip-its which do not. In other words, the focus is purely NLP.

That said, the word SDRs themselves can be combined and sparsified to create SDRs for other non-word concepts, since pretty much any concept could be defined by a set of key-words. So they are still quite useful for non-NLP problems.

2 Likes

Thanks.
Are there any known encoders that use other techniques to convert words to SDRs?

In case you haven’t seen it:

2 Likes

There is also GloVe from Stanford University.
https://nlp.stanford.edu/projects/glove/
"GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space. "

They can make large vectors but I see no sparsity control.

1 Like

Interesting. They also use semantic information obtained from an archive of words. Although the essential aspect is probabilities, footprint of word occurrences is again the central theme.

Out of curiosity - how else does a human learn semantics if not from word proximity?
Meaning from context is still word proximity.
Explanation in teaching is at its core also word proximity.

I see the basic mechanism as coincidence detection.

Coincidence detection seems to be a universal theme in learning. On non-verbal teaching, I still see this as (sometimes guided) coincidence detection.

1 Like

If you mean language learning mechanisms other than word occurrences then I would say associative learning where in symbols are constantly associated with growing list of patterns on the go. Word occurrences can help build relationships among learned patterns and hence relate symbols that way.

In this thread, I assert that the significant part of what we associate with intelligence comes with language learning. If you are satisfied with making an AGI with the capabilities of a cat then you don’t need language but you will also have significantly less function in your AGI.

I do struggle with the difference between:

chunk … chunk
and
chunk … relationship … chunk.

where … is a transition.

I used to think that the tuple is a basic unit but this proximity question keeps haunting me.

Does simple proximity count as a relationship?

Am I looking at this wrong; should I see the tuple as two separate things?

chunk … relationship
and
relationship … chunk

Another possibility is proximity in the WHAT stream vs relationship in the WHERE stream.

1 Like

I think much of the logical and systemic association we are able to carry out in the brain is because of manipulation of symbolic meanings/groupings using certain rules. It’s easy to relate symbols using rules. Plus the processing space required for abstractions and associations is reduced by a lot of one uses symbols. Maybe we can work our AGI without high level language with substantially enormous processing space and power. I think.

How do you feel about a symbol being a certain grid pattern and the association being a transition to a different grid pattern?

2 Likes

I’d rather think of the symbol as a high order semantic SDR and the association as something pertaining to the grid pattern or the location stream. Perhaps, linking representations of symbols with particular locations on grid that pertain to particular relationships or associations.

See Yoshua Bengio’s paper “The Consciousness Prior”, he agrees with you that symbols/pointers are high order semantic SDR. I would say HTM are the association memory. We just drop the H and turn T (temporal) into sequence order. HTMs are the perfect memory for just about everything.

Guilt by association, heck yes. Most people do not think deeply, simple association is heavily used.

l’ve been trying to decide where to put my two cents in and this seems as good a place as any. There have been several threads dealing with language lately and for good reason.
One asked if language was required for intelligence (no). Another asked wether symbolic semantics might be an accessory rather than an a necessity. Yes symbolic semantics are an accessory like the neocortex is an accessory. Both flora and fauna have operated for billions of years without a neocortex but it seems to me that what we are discussing here is the neocortex. in other words there are all kinds of intelligence but what we are after is what happens in the in the last few layers of the brain.

So much for the obvious. The reason I’m butting in is that in my world words are several levels up form the first of the neocortex, three by my system. The first is the letter, the second the syllable, and then the word. After that there is the idiomatic expression (made up of words) and then the complete sentence which is made up of words and idiomatic expressions.

This is a long way to go to say that the word is more complex than a single SDR. While I’m at it I might posit that the “meaning” is in the complete sentence and that is where we might approach “intelligence” if we are going to use language as the vehicle, and I can’t think of another way.

For instance: the letter a. It is the first character of the Roman alphabet. It is a syllable in the word Paladin. And it stands alone as a word. As far as I can think it is the only letter that is also a syllable and a word. By coincidence (or maybe not) the first stroke in the Chinese language is 一 (pronounced YI) is the only stroke (character) that I can think of that is also a radical (syllable) and a word, it means one. The point is that words are composite and turning them into plurals is done on the second level not the third.

At this point I suppose I must lay my cards on the table and that would be my own HTM which I conceived in 1977 to write Chinese. Of coarse my dumb terminal wouldn’t do it and I had to wait ’til the Mac came out to actually do it. I did a proof of concept for Apple in ’86. Unfortunately I blew the negotiation and the VP Jean Luis Gasse felt that my faux pas was more important than the Chinese market so it lived in a drawer ’til Jeff’s book came out. Since then I’ve tried to the reconcile the original concept with Nupic.

Here is the original, compiled on a Mac 512, printed on a Laserwriter I with a bit of 72 dpi Inagewriter tacked on at the bottom.

2 Likes

I’m coming at this question from the perspective of having a deep learning background.

In deep learning, word2vec (glove) generally works by taking all the words that exist in a corpus, sorting them, then creating a potentially large representation, depending on the corpus.

If I had a corpus of a couple sentences, “Hello, how are you? I’m fine, thank you, and you?”, this would break down into a potential vector of length of 8 binary values, one for each possible word (how you choose to deal with punctuation varies… I’m ignoring it here, while of course it really does matter for context).

Then there is a sliding window that goes over the line, encoding the vectors. The width of this window can vary. If we order our possible vectors alphabetically, and have a three word window, we could have our first vector encoding as:
0,1,0,1,1,0,0,0
for
“are, hello, how”

That vector is then fed into a neural network. The goal in deep learning then becomes to train a network to predict, based on a given input vector, the sentiment, possible responses, next word in the sequence, etc., based on massive amounts of data, calculating the error value, and back propagation to adjust the weights for the “neurons” in the system.

Depending on your implementation, architecture choices, and training data, some decent systems are built out of this sort of approach. But it takes a ton of time and training to get to that level… lots of electricity is involved, frequently not knowing for certain if your tweaks will actually make the system better or worse. Deep Learning, while being powerful, produces idiot savant systems that struggle to move beyond the area on which they were trained. They don’t really learn in real time, and the methods of training them to get good results are far removed from how our brain seems to work. But there are ideas we can also learn from it too.

Sometimes, since certain word combinations occur frequently in any language, for a large corpus these designs will incorporate hashmaps/dictionaries to store calculated values (as a lookup can be cheaper than a doing the same maths operations repeatedly). These chunks have meaning and influence context.

Perhaps in HTM we could employ hashmaps/dictionaries in an attempt to speed up calculations? Or take a hash of a layer state rather than read through an entire binary array, so that we when see two previously known calculated hashes pop up, we can recal the results rather than do everything from scratch.

3 Likes

In case you haven’t seen it this cortical io video gives a good and brief summary of their system. It uses a corpus as well to form the semantic map, and takes advantage of the power of SDR’s to capture a lot of meaning in a small space, even whole sentences at once. I’m no expert at all on NLP, just a big fan of the htm-based approach because our brains have to capture a lot of meaning and they have to do it efficiently without a lot of math and energy use - a constraint that DNN’s don’t seem to generally adhere to.

1 Like

I hate to rock the boat here but the formation of the Cortical IO retina involves the use of those nasty old traditional techniques that that @maxlee was just outlining.

In this case, they use a SOM technique.

http://www.ai-junkie.com/ann/som/som1.html

http://users.ics.aalto.fi/jhollmen/dippa/node9.html

3 Likes