In case you haven’t seen it:
There is also GloVe from Stanford University.
"GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space. "
They can make large vectors but I see no sparsity control.
Interesting. They also use semantic information obtained from an archive of words. Although the essential aspect is probabilities, footprint of word occurrences is again the central theme.
Out of curiosity - how else does a human learn semantics if not from word proximity?
Meaning from context is still word proximity.
Explanation in teaching is at its core also word proximity.
I see the basic mechanism as coincidence detection.
Coincidence detection seems to be a universal theme in learning. On non-verbal teaching, I still see this as (sometimes guided) coincidence detection.
If you mean language learning mechanisms other than word occurrences then I would say associative learning where in symbols are constantly associated with growing list of patterns on the go. Word occurrences can help build relationships among learned patterns and hence relate symbols that way.
In this thread, I assert that the significant part of what we associate with intelligence comes with language learning. If you are satisfied with making an AGI with the capabilities of a cat then you don’t need language but you will also have significantly less function in your AGI.
I do struggle with the difference between:
chunk … chunk
chunk … relationship … chunk.
where … is a transition.
I used to think that the tuple is a basic unit but this proximity question keeps haunting me.
Does simple proximity count as a relationship?
Am I looking at this wrong; should I see the tuple as two separate things?
chunk … relationship
relationship … chunk
Another possibility is proximity in the WHAT stream vs relationship in the WHERE stream.
I think much of the logical and systemic association we are able to carry out in the brain is because of manipulation of symbolic meanings/groupings using certain rules. It’s easy to relate symbols using rules. Plus the processing space required for abstractions and associations is reduced by a lot of one uses symbols. Maybe we can work our AGI without high level language with substantially enormous processing space and power. I think.
How do you feel about a symbol being a certain grid pattern and the association being a transition to a different grid pattern?
I’d rather think of the symbol as a high order semantic SDR and the association as something pertaining to the grid pattern or the location stream. Perhaps, linking representations of symbols with particular locations on grid that pertain to particular relationships or associations.
Symbolic semantics might be an accessory rather than a necessity
See Yoshua Bengio’s paper “The Consciousness Prior”, he agrees with you that symbols/pointers are high order semantic SDR. I would say HTM are the association memory. We just drop the H and turn T (temporal) into sequence order. HTMs are the perfect memory for just about everything.
Guilt by association, heck yes. Most people do not think deeply, simple association is heavily used.
l’ve been trying to decide where to put my two cents in and this seems as good a place as any. There have been several threads dealing with language lately and for good reason.
One asked if language was required for intelligence (no). Another asked wether symbolic semantics might be an accessory rather than an a necessity. Yes symbolic semantics are an accessory like the neocortex is an accessory. Both flora and fauna have operated for billions of years without a neocortex but it seems to me that what we are discussing here is the neocortex. in other words there are all kinds of intelligence but what we are after is what happens in the in the last few layers of the brain.
So much for the obvious. The reason I’m butting in is that in my world words are several levels up form the first of the neocortex, three by my system. The first is the letter, the second the syllable, and then the word. After that there is the idiomatic expression (made up of words) and then the complete sentence which is made up of words and idiomatic expressions.
This is a long way to go to say that the word is more complex than a single SDR. While I’m at it I might posit that the “meaning” is in the complete sentence and that is where we might approach “intelligence” if we are going to use language as the vehicle, and I can’t think of another way.
For instance: the letter a. It is the first character of the Roman alphabet. It is a syllable in the word Paladin. And it stands alone as a word. As far as I can think it is the only letter that is also a syllable and a word. By coincidence (or maybe not) the first stroke in the Chinese language is 一 (pronounced YI) is the only stroke (character) that I can think of that is also a radical (syllable) and a word, it means one. The point is that words are composite and turning them into plurals is done on the second level not the third.
At this point I suppose I must lay my cards on the table and that would be my own HTM which I conceived in 1977 to write Chinese. Of coarse my dumb terminal wouldn’t do it and I had to wait ’til the Mac came out to actually do it. I did a proof of concept for Apple in ’86. Unfortunately I blew the negotiation and the VP Jean Luis Gasse felt that my faux pas was more important than the Chinese market so it lived in a drawer ’til Jeff’s book came out. Since then I’ve tried to the reconcile the original concept with Nupic.
Here is the original, compiled on a Mac 512, printed on a Laserwriter I with a bit of 72 dpi Inagewriter tacked on at the bottom.
I’m coming at this question from the perspective of having a deep learning background.
In deep learning, word2vec (glove) generally works by taking all the words that exist in a corpus, sorting them, then creating a potentially large representation, depending on the corpus.
If I had a corpus of a couple sentences, “Hello, how are you? I’m fine, thank you, and you?”, this would break down into a potential vector of length of 8 binary values, one for each possible word (how you choose to deal with punctuation varies… I’m ignoring it here, while of course it really does matter for context).
Then there is a sliding window that goes over the line, encoding the vectors. The width of this window can vary. If we order our possible vectors alphabetically, and have a three word window, we could have our first vector encoding as:
“are, hello, how”
That vector is then fed into a neural network. The goal in deep learning then becomes to train a network to predict, based on a given input vector, the sentiment, possible responses, next word in the sequence, etc., based on massive amounts of data, calculating the error value, and back propagation to adjust the weights for the “neurons” in the system.
Depending on your implementation, architecture choices, and training data, some decent systems are built out of this sort of approach. But it takes a ton of time and training to get to that level… lots of electricity is involved, frequently not knowing for certain if your tweaks will actually make the system better or worse. Deep Learning, while being powerful, produces idiot savant systems that struggle to move beyond the area on which they were trained. They don’t really learn in real time, and the methods of training them to get good results are far removed from how our brain seems to work. But there are ideas we can also learn from it too.
Sometimes, since certain word combinations occur frequently in any language, for a large corpus these designs will incorporate hashmaps/dictionaries to store calculated values (as a lookup can be cheaper than a doing the same maths operations repeatedly). These chunks have meaning and influence context.
Perhaps in HTM we could employ hashmaps/dictionaries in an attempt to speed up calculations? Or take a hash of a layer state rather than read through an entire binary array, so that we when see two previously known calculated hashes pop up, we can recal the results rather than do everything from scratch.
In case you haven’t seen it this cortical io video gives a good and brief summary of their system. It uses a corpus as well to form the semantic map, and takes advantage of the power of SDR’s to capture a lot of meaning in a small space, even whole sentences at once. I’m no expert at all on NLP, just a big fan of the htm-based approach because our brains have to capture a lot of meaning and they have to do it efficiently without a lot of math and energy use - a constraint that DNN’s don’t seem to generally adhere to.
I hate to rock the boat here but the formation of the Cortical IO retina involves the use of those nasty old traditional techniques that that @maxlee was just outlining.
In this case, they use a SOM technique.
Yes they certainly depart from neuroscience in forming the semantic map as you’d know much better than I. Ultimately this map should be formed in an online fashion the way the rest of HTM theory does, a fascinating question how that happens. In the absence of that understanding they’ve used some non-biological machinery to fill the gaps.
I know SOM isn’t the only such machinery they use, though it seems more similar to HTM than any other ANN mechanics I know of (a little like learning in SP at least). Though if the SDR’s they produce are viable in their semantic overlap their system seems good as any for the time being (?).
They use HTMs to read out the data after it is organized.
I am working though how to form the maps with online learning and it seems to be a very hard problem.
I am working on the same problem. I’ve used eligibility traces effectively for the online part, but still working out the topology element. Current theory is that grids could be used for this, but haven’t gotten into the weeds with that idea yet.
There are a large number of people that have no clue how to write but are perfectly fluent in a language.
I think that you can say that letters/writing is an artifact to capture either syllables (most alphabet based languages and Hangul) or word sounds (pictograph based languages).
By the same token, the stream of syllables to create the elaborate alert, mating, and dominance calls (the basis of language) that animals use for signaling is an artifact of our biological sound production hardware.
If you are looking to create biologically based systems and are trying to decide on the correct level to match up with the human neural hardware I would offer that our human semantic grounding in the speaking-hearing axis along the arcuate fasciculus pathway may be the best place to start. (Syllables)
Hangul: possibly the best match between spoken syllables and writing?
Esperanto NLP using HTM and my findings