Words to SDR?


#10

Out of curiosity - how else does a human learn semantics if not from word proximity?
Meaning from context is still word proximity.
Explanation in teaching is at its core also word proximity.

I see the basic mechanism as coincidence detection.

Coincidence detection seems to be a universal theme in learning. On non-verbal teaching, I still see this as (sometimes guided) coincidence detection.


#11

If you mean language learning mechanisms other than word occurrences then I would say associative learning where in symbols are constantly associated with growing list of patterns on the go. Word occurrences can help build relationships among learned patterns and hence relate symbols that way.


#12

In this thread, I assert that the significant part of what we associate with intelligence comes with language learning. If you are satisfied with making an AGI with the capabilities of a cat then you don’t need language but you will also have significantly less function in your AGI.


#13

I do struggle with the difference between:

chunk … chunk
and
chunk … relationship … chunk.

where … is a transition.

I used to think that the tuple is a basic unit but this proximity question keeps haunting me.

Does simple proximity count as a relationship?

Am I looking at this wrong; should I see the tuple as two separate things?

chunk … relationship
and
relationship … chunk

Another possibility is proximity in the WHAT stream vs relationship in the WHERE stream.


#14

I think much of the logical and systemic association we are able to carry out in the brain is because of manipulation of symbolic meanings/groupings using certain rules. It’s easy to relate symbols using rules. Plus the processing space required for abstractions and associations is reduced by a lot of one uses symbols. Maybe we can work our AGI without high level language with substantially enormous processing space and power. I think.


#15

How do you feel about a symbol being a certain grid pattern and the association being a transition to a different grid pattern?


#16

I’d rather think of the symbol as a high order semantic SDR and the association as something pertaining to the grid pattern or the location stream. Perhaps, linking representations of symbols with particular locations on grid that pertain to particular relationships or associations.


Symbolic semantics might be an accessory rather than a necessity
#17

See Yoshua Bengio’s paper “The Consciousness Prior”, he agrees with you that symbols/pointers are high order semantic SDR. I would say HTM are the association memory. We just drop the H and turn T (temporal) into sequence order. HTMs are the perfect memory for just about everything.


#18

Guilt by association, heck yes. Most people do not think deeply, simple association is heavily used.


#19

l’ve been trying to decide where to put my two cents in and this seems as good a place as any. There have been several threads dealing with language lately and for good reason.
One asked if language was required for intelligence (no). Another asked wether symbolic semantics might be an accessory rather than an a necessity. Yes symbolic semantics are an accessory like the neocortex is an accessory. Both flora and fauna have operated for billions of years without a neocortex but it seems to me that what we are discussing here is the neocortex. in other words there are all kinds of intelligence but what we are after is what happens in the in the last few layers of the brain.

So much for the obvious. The reason I’m butting in is that in my world words are several levels up form the first of the neocortex, three by my system. The first is the letter, the second the syllable, and then the word. After that there is the idiomatic expression (made up of words) and then the complete sentence which is made up of words and idiomatic expressions.

This is a long way to go to say that the word is more complex than a single SDR. While I’m at it I might posit that the “meaning” is in the complete sentence and that is where we might approach “intelligence” if we are going to use language as the vehicle, and I can’t think of another way.

For instance: the letter a. It is the first character of the Roman alphabet. It is a syllable in the word Paladin. And it stands alone as a word. As far as I can think it is the only letter that is also a syllable and a word. By coincidence (or maybe not) the first stroke in the Chinese language is 一 (pronounced YI) is the only stroke (character) that I can think of that is also a radical (syllable) and a word, it means one. The point is that words are composite and turning them into plurals is done on the second level not the third.

At this point I suppose I must lay my cards on the table and that would be my own HTM which I conceived in 1977 to write Chinese. Of coarse my dumb terminal wouldn’t do it and I had to wait ’til the Mac came out to actually do it. I did a proof of concept for Apple in ’86. Unfortunately I blew the negotiation and the VP Jean Luis Gasse felt that my faux pas was more important than the Chinese market so it lived in a drawer ’til Jeff’s book came out. Since then I’ve tried to the reconcile the original concept with Nupic.

Here is the original, compiled on a Mac 512, printed on a Laserwriter I with a bit of 72 dpi Inagewriter tacked on at the bottom.


#20

I’m coming at this question from the perspective of having a deep learning background.

In deep learning, word2vec (glove) generally works by taking all the words that exist in a corpus, sorting them, then creating a potentially large representation, depending on the corpus.

If I had a corpus of a couple sentences, “Hello, how are you? I’m fine, thank you, and you?”, this would break down into a potential vector of length of 8 binary values, one for each possible word (how you choose to deal with punctuation varies… I’m ignoring it here, while of course it really does matter for context).

Then there is a sliding window that goes over the line, encoding the vectors. The width of this window can vary. If we order our possible vectors alphabetically, and have a three word window, we could have our first vector encoding as:
0,1,0,1,1,0,0,0
for
“are, hello, how”

That vector is then fed into a neural network. The goal in deep learning then becomes to train a network to predict, based on a given input vector, the sentiment, possible responses, next word in the sequence, etc., based on massive amounts of data, calculating the error value, and back propagation to adjust the weights for the “neurons” in the system.

Depending on your implementation, architecture choices, and training data, some decent systems are built out of this sort of approach. But it takes a ton of time and training to get to that level… lots of electricity is involved, frequently not knowing for certain if your tweaks will actually make the system better or worse. Deep Learning, while being powerful, produces idiot savant systems that struggle to move beyond the area on which they were trained. They don’t really learn in real time, and the methods of training them to get good results are far removed from how our brain seems to work. But there are ideas we can also learn from it too.

Sometimes, since certain word combinations occur frequently in any language, for a large corpus these designs will incorporate hashmaps/dictionaries to store calculated values (as a lookup can be cheaper than a doing the same maths operations repeatedly). These chunks have meaning and influence context.

Perhaps in HTM we could employ hashmaps/dictionaries in an attempt to speed up calculations? Or take a hash of a layer state rather than read through an entire binary array, so that we when see two previously known calculated hashes pop up, we can recal the results rather than do everything from scratch.


#21

In case you haven’t seen it this cortical io video gives a good and brief summary of their system. It uses a corpus as well to form the semantic map, and takes advantage of the power of SDR’s to capture a lot of meaning in a small space, even whole sentences at once. I’m no expert at all on NLP, just a big fan of the htm-based approach because our brains have to capture a lot of meaning and they have to do it efficiently without a lot of math and energy use - a constraint that DNN’s don’t seem to generally adhere to.


#22

I hate to rock the boat here but the formation of the Cortical IO retina involves the use of those nasty old traditional techniques that that @maxlee was just outlining.

In this case, they use a SOM technique.

http://www.ai-junkie.com/ann/som/som1.html

http://users.ics.aalto.fi/jhollmen/dippa/node9.html


#23

Yes they certainly depart from neuroscience in forming the semantic map as you’d know much better than I. Ultimately this map should be formed in an online fashion the way the rest of HTM theory does, a fascinating question how that happens. In the absence of that understanding they’ve used some non-biological machinery to fill the gaps.

I know SOM isn’t the only such machinery they use, though it seems more similar to HTM than any other ANN mechanics I know of (a little like learning in SP at least). Though if the SDR’s they produce are viable in their semantic overlap their system seems good as any for the time being (?).


#24

They use HTMs to read out the data after it is organized.
I am wokring though how to form the maps with online learning and it seems to be a very hard problem.


#25

I am working on the same problem. I’ve used eligibility traces effectively for the online part, but still working out the topology element. Current theory is that grids could be used for this, but haven’t gotten into the weeds with that idea yet.


#26

There are a large number of people that have no clue how to write but are perfectly fluent in a language.

I think that you can say that letters/writing is an artifact to capture either syllables (most alphabet based languages and Hangul) or word sounds (pictograph based languages).

By the same token, the stream of syllables to create the elaborate alert, mating, and dominance calls (the basis of language) that animals use for signaling is an artifact of our biological sound production hardware.

If you are looking to create biologically based systems and are trying to decide on the correct level to match up with the human neural hardware I would offer that our human semantic grounding in the speaking-hearing axis along the arcuate fasciculus pathway may be the best place to start. (Syllables)

Hangul: possibly the best match between spoken syllables and writing?


Esperanto NLP using HTM and my findings
#27

Hi Mark, Sorry it took a while but I had to figure out what I thought. You’re likely correct that Hangul is the best text to speech representation out there but it is only efficient in Korean and more importantly it is not particularly relevant to this problem. This problem being thinking. If I look at what I wrote and your response I think my use of the the word syllable sent us down a blind fork. What I might have said instead is phoneme and still that suggests speech. So I might have said thought chunk but what does that mean and how does it relate to a Chinese character generator or Hangul for that matter. This is where I had to stop and think about an explanation.

First I should say that in ’77 I was not trying to create or even approximate a biologically based system. I was trying to learn Chinese and wanted a database that contained all of the information required. That required a stroke by stroke visual representation of the characters along with the sound, including tone, the meaning, the part of speech and more. The sound was just a part. The thing I was most pleased by initially and what I presented to Apple was a hexadecimal way of writing the character. The way it works is just a reimagined hierarchical stack of plastic ascii. The first level is stroke, the second is radical ( of which there are only 214 ) and the third level is character. It was not until I considered the forth level of idiomatic expression or compound word that I began to think of this as a thought processor rather than just a word processor.

It turns out that this system can draw Korean characters as well as Chinese but it also can write any other language in the form letter, syllable, word. This is why it doesn’t matter that Hangul is more efficient than English at describing the phonetics of a language. it is not the sound of the language but the meaning. In this respect it is important to see the syllable as a thought chunk, or a Latinate, Indo-European or older root. One of the interesting aspects of Chinese is that each of the 214 radicals has several thousand years of evolution and carries rich pictorial and metaphorical information in addition to the aural.

At any rate all of this is to say that language is not thought. As much as we might perceive of our thought in terms of language is is consequence and not a cause. For this reason any method of describing the sound is equally good because none of them are what is going on at the fundamental level of thinking. One thing we seem to have discovered is that thinking sounds a lot like a rather binary static if we overlook the meaning of the amplitude of the spikes that is. And as we know, that amplitude is anything but trivial. But the spoken or written or danced language those spikes and intervals ultimately get expressed in is trivial.


#28

Thank you for your thoughtful reply.

I agree - syllables are a stand-in for phonemes. For the bystander watching this exchange:

When I was working with the AVOS company (Assistive technology for the visually impaired) I became very interested in speech IO and the production of speech sounds. I discovered that the sounds of speech are an artifact of the kinds of sounds that the human speech production hardware was capable of making.
Please examine the charts on page 15 of this lecture:
http://research.cs.tamu.edu/prism/lectures/sp/l3.pdf
You can see that there is are maps of the various sounds that a human is capable of making and the sounds are “fixed points” on these maps that are distinct enough that they can be reliably produced and recognized.

I totally agree - speech sounds are not thinking. I suspect that this is a large part of the reason that AI approaches that are text/speech based have not been very successful. That said - speaking really does engage and expand the mental hardware. Humans without speech really are not what most of us think of a fully human. I have spent a fair amount of time thinking about this and have commented on this before:

While it sounds difficult to extract and represent the underlying symbols of the communication of thought it seems that google is mucking around with that very thing with their translate project:

Again - for the interested bystander - Want to learn more?
This is one of the first books I read on these topics, A classic top to bottom text on the entire chain of speech from speaker to listener:


Esperanto NLP using HTM and my findings
#29

I travel extensively for my job and have been doing so since the early 1990s. This included frequent visits to China and it became useful to learn Mandarin Chinese. I used the Pimsleur course and eventually was able to function on my own in day-to-day interactions. Immersion in the language and culture really drives this home. I can read and write a bit but my tones are terrible. I understand what you are saying about Chinese stroke order and character formation. I am struck that in Chinese many of the pictograms are actually reasonably good renditions of the thing that they are communicating.

Some examples:
火 - fire; I love this one = stick-figure person running around FIRE!
水 - water; hard to see the original picture - see chart below.
山 - mountain
口 - mouth/opening
品 - goods/commodities; a pile of boxes
門 - door; western movie bar doors anyone?

In some cases, the original picture has evolved to be difficult to recognize - for example - water (shuǐ), fourth line in this chart.


(evolution/versions of Sun, moon, mountain, water, rain, wood)

This is an ongoing process. For example, the door has evolved from the swinging bar doors 門 of the traditional Chinese to the more generic 门 in simplified Chinese.

The combination of these symbols is also somewhat based on more than just strokes or radicals; they often tell a short story.

日 - Sun (rì) - also used to say “day” things, with 月 - moon (yuè) to say “month” things.
間 - Time; you look to the door to see the sun position, learning the time.
Putting things in a door is used for other constructions
心 - heart
悶 - Stifling - you are inside but your heart is out the door.
But marking an actual door?
出口 - exit; mouth/opening to the mountains/outdoors.
Chinese is filled with these short stories; I love this language. There can be layers of meanings in a line.

What is not found in stroke order, (or simple letter sequences or syllables for a western script) is any sort of useful semantic or grammar information. This information is loaded at the word/pictograph and symbol grouping level. Any sort of generation algorithm will have to consider this level if the output is to look like it is making any grammatical sense. Languages that use conjugation have to consider word grouping to influence construction at the word level. There is no local information at the word level that tells me if the word (tener in Spanish - “to have” or “to be”) I am building should be tener or tengo or tienes or tiene or tenemos or tenéis or tienen or (there are a bunch more). This strongly suggests that the higher levels will feedback to the lower levels in word construction.

But this will still be gibberish. Without any semantic guidance, the generation from a system that follows reasonable grammar rules will make something that looks like the human affliction Wernicke’s aphasia.


Note the huge range of signs and symptoms listed. This gives some indication of the range of factors that go into speech production. The clusters of defects suggest to me that the production of words is the product of several maps working in harmony, each contributing to some aspect of the ongoing production stream.

This tells us that if the map or connections to it are failing you suffer the related defect. From a modeling perspective this suggests your functional building blocks.

A little more on structuring these building blocks; a key difference between biological hardware and computers is that computers have variables, brains have connections.

  • An important programming task is to learn the producer(s) and consumer(s) of a chunk of information and WHEN and WHERE it is produced and consumed. If a chunk of information in a computer is needed in several places it is tucked into a storage space and accessed wherever it is needed.
  • In the brain information exists in SOME FORM in SOME PLACE in the brain. If there is a producer and a consumer there has to be a connection. If there is some order or stages to this process there has to be a physical pipeline. Parts of these pipelines may be selectively enabled to gate the flow of information but the connections are always there. If you think about it, there is no other way that neural hardware can work.

Whenever I see someone that is describing some AI proposal I keep this distinction firmly in mind to test if it biologically plausible. Once I started thinking this way it shaped how I view papers describing neural research and proposed architectures.


Esperanto NLP using HTM and my findings