Google DeepMind claims they're close to achieving human-level AI

Then, I can’t see how DL will solve the problem. This video is an example of how the context determines the outcome of the detection.

I said this elsewhere: an intelligent system requires an internal state that combined with inputs and its timing determines the outcome (and next internal state). I don’t understand how that internal state or input timing can be mapped in a DL system.

Attention (e.g. transformers) combined with feedback to provide context provides this in the case of DL for current LLM. In other DL systems it can be achieved with LSTM instead of point neuron models.

This is only a recently published variant, using a associative memory of recent past tokens. https://www.reddit.com/r/MachineLearning/comments/vavffv/r_memorizing_transformers_google_2022/

Probably I didn’t explain myself correctly. My understanding is that transformers “embed” the context in the model during the learning process. If such context appear during the inference, it’s taken into account. Inference is a forward-only process. Because of this, the number of parameters are so ridiculously large (because they want to embed as much as possible contexts).

I think intelligence requires a “dynamical” context, i.e. context is not explicitly embedded in the model but actually dynamically taken into account (and possibly altering the model itself, i.e. there is no such thing as inference or training). Using a naïve analogy, you can mimic a sequential circuit with a massive combinatorial circuit. My understanding is that DL system are just like “combinatorial” circuits. Biological systems are “sequential” circuits. The question is how the Flip-flops are made :slight_smile:

The actual learning process in DL was actually intended to be a backward pass with new information, and then a forward one to produce inferences which can be backpropped - basically weak self-supervised learning.

The reason LLMs use static weights is because the Turing completeness of ANNs still holds, so the static weights model dynamic processes. Hence the “few-shot” learning paradigm - you can teach a model to do a new task better without changing the weights at all because the example prompt of a task provided, literally “prompts” the static weights into following a different process. That’s simply probabilistic conditioning but it changes the very way the model approaches the task.

So if you provide a new task, such as making up a new language and few-shot it with an LLM, it will perfectly understand it and converse naturally because its static weights somehow accomodate dynamic processes which endow LLMs with flexbility.

If you still find it troublesome to understand, a mathematical perspective might be helpfuly. Think about how DPPMs work - the process of Diffusion links two discrete distributions, one’s your target one which you approximate by learning on data, other the Gaussian.

That transformation between diffusions by presenting the process as a Markovian Model is englightening because the key part - the Neural network, is conditioned on text by enforcing attention. But that’s simply conditioning the actual process - modelling a differential equation between two distributions. That’s why a static set of parameters can model changes within a mathematical space - which is where theoretical papers conjecture the models transform data to (or commonly called “latent space”).

Hope this clears it up :slight_smile: I probably explained badly because I’m trying not to go too technical and nitty-gritty but lmk which part you find confusing and I can elaborate on that.

How do you avoid catastrophic forgetting if start to backprop inference input? This sound like with the proper set of inputs you can obliterate the model.

If weights are static, I can’t understand how you teach a new task. Technically, you are teaching nothing, just seems like the new task was already somehow embedded during the learning of the prior task (e.g., the initial task was so huge that inadvertently included the new one?).

I can’t understand how you can accommodate a dynamic process into a statically defined system. Definitely out of my grasp… the mathematical lingo is too much for my knowledge :slight_smile:

Yep, people avoided it for reasons like that. However, in modern usage scale of the models is inversely proportional to the forgetting experienced. GATO is an example of this, I think I remembered another paper which explictly tests ViTs against standard ResNets to demonstrate how scale leads to “negligble” losses, atleast on a few tasks they tested out.

In some form? maybe. But the intial task wasn’t a task at all - its simply reconstructing corrupted sequences. I’d love to wax things out here with hypothesis, but the cold truth is that there aren’t many rigourous papers studying this area extensively (atleast that I know of).

There have been some explorations by authors such as the GOPHER one where they test whether aspects are memorized/in-dataset or not but the models were able to complete them regardless. It’s still a mysterious domain.

that comes down to semantics, interpretations and whatnot but its not exactly “teach” - it’s kinda simply showing how the function is supposed to go, and the model appears to extrapolate it well. The results is that… well it “learns” things it’s never seen : Such as teaching GPT3 to learn to add numbers in an algorithmic way.

I’m not going to quibble whether its learning it per se, but since it appears to do so I prefer calling the process learning. Since you aren’t going to find this exact prompt with the exact word in the dataset, its safe to say that yes, it learns somewhat.

So this is still a guess, because my understanding is far from complete but what I guess is that they model differential equations - consider it graphically

image

This is diffusion - a process whereby you gradually start from pure noise and recover an image in the training distribution, i.e your dataset (approximating with an ANN).

It simply takes in an image at T and spits out one at T-1 - so it doesn’t denoise it fully, but denoise a little bit at each time. This happens until some pre-set length of chain is reached and the process stops.

This is a simplification of DDPMs, but this above is a dynamic process. It simply isn’t learning a way to convert pure Gaussian noise to a target data distribution in one go - but a skiw method to do the same iteratively timestep by timestep.

My guess that its somehow doing the same in LLMs - when you provide a piece of text, and few-shot it the differential equation simply has a better, closer approximation to the target distribution (say the correct answer) and thus helps to provide more accurate predictions, leading to a more realistic/accurate results.
Because the function is differential by nature, it would learn to learn - conveniently called meta-learning because that’s what differential equations kinda model, a complex function to approximate another simpler function to which you give noise and out comes an image.

That’s still a guess, but if someone has some insight into this that’d be interesting to discuss :slight_smile:

4 Likes

An associative memory of past tokens to provide context is

an internal state that combined with inputs and its timing determines the outcome (and next internal state).

They tested it past 1000pages (250k tokens) but there is no technical limit on its non- neural network, ANN memory size.

When someone says something like this, I always wonder if there are people that really don’t feel. I mean, there is color blindness; there is congenital insensitivity to pain; lots of people can’t feel empathy. These are known and tested conditions. So, the idea of a philosophical zombie is not totally irrational.

Is that what you mean, @david.pfx? Don’t you feel anything? Would such a statement be nonsensical to you?

I don’t know if they are scientific enough, but there are a few test one can do. Here’s a quote from Alex Garland, writer and director of Ex Machina, that shows you a possible proof. Warning! This is actually a big spoiler, in case you haven’t seen the movie. Here’s the clip from his interview by Lex Fridman.

Personally I’ve had another interesting experience. When I was a kid, I remember waking up, running to my parents and telling them what I had seen in my sleep. My dad told me that I had had a dream. (Now, you have to know that I almost never dream, which might explain why I could walk and talk by the time I experienced my first dream). But I was forced to use words that made sense to me, like watching television while sleeping. Dreaming, both literally and empirically, did not exist for me until then.

So, this is a possible test to present to an agent. No doubt, it’s easy to fake if you tell the agent in advance. But if you really need a test, you could carefully train it in language models omitting any vocabulary about imagination or dreaming or impressions, and then proces whatever invention you have to generate a feeling within the agent, and record how it describes it.

Just, don’t make it suffer, please.

1 Like

Some people don’t hear their thoughts or think in sentences. I imagine that’d make it seem less like consciousness exists.

@EEProf, would those people have the same kind of consciousness? I assume they can construct an internal narrative, do mental time travel, and form concepts.

2 Likes

Yes, but you still feel in-the-moment. You realize you exist.

Are there people out there that don’t know what that feels like?

When @david.pfx says “There is no such thing as consciousness. It does not exist. It is an illusion, a hoax, a scam” is that because he’s never felt conscious? Does he not know what it is to be conscious, just like a two year old can’t explain what it is to be in love?

I think it must be possible, if ever very unlikely. But I’d love to hear David’s testimony on this.

1 Like

For some the ‘Inner Voice’ is louder than others, but everyone has one that has attained consciousness. The very best description of this is by Vygotsky & Luria who studied the phenomenon extensively in children. Piaget did to, but in my opinion not to the depth of V&L. Unfortunately, V&L were under an extreme Soviet regime and much of their work was not seen in the West until long after it was published in the USSR. Also, if you were doing psychology research you had to work Pavlov into everything you wrote. Lastly, it was all in Russian. Once you get past the ‘issues’ you discover that V&L were geniuses who were able to cleverly work around things to get their ideas out and they were consummate scientists.

3 Likes

I think it’s safe to say that most people here are on autistic side. It’s a time-honored tradition, going back to Turing, Pitts, and probably Von Neumann. On the deepest cognitive level, ASD is local hyperconnectivity, most likely at the expense of long-range connectivity. Psychiatry is a generally low-quality field, I think the best perspective on that is Henry Markram’s “intense world theory”. Which translates into overactive task-positive network and underactive default mode. The later means that introspection, resulting in colloquially understood “consciousness”, is somewhat suppressed.

1 Like

Everything you see, feel, hear, touch is an illusion. You see colour but there is no colour, only light waves and frequencies. You hear sound but there is no sound, only waves in the air. You feel rough or smooth, hot or cold, wet or dry but they do not exist, it is all an illusion.

My guess is that we all share similar experiences because we are all the same species, but as a scientist and an engineer I cannot prove it is so.

Sp my challenge to you all is to prove the existence of consciousness before wishing it on our software creations. My claim is that you cannot, and never will, because it is an just an illusion.

2 Likes

is conciousness a property, a trait, or an endowment upon a certain complexity of species?

or is it emergent, a polyphony of neural firings not reducible to a specific quantity, a linguistic signifier without meaningful signified?

the word brings in too many assumptions and is fundamentally unscientific and non-useful.

ps: we have all been philosophical zombies the whole time, who in our hubris denied it. thanks for nothing descartes

I guess you can not reconcile the idea that it is an illusion, indeed, with the fact that in order to have an illusion the illusion must exist. There has to be an illusion-making mechanism. Which is not irrelevant nor nonexistent as you claim, the illusion-maker is the key of the whole charade

All models are illusions, not the real things which they model.
That doesn’t mean the model itself does not exist or can not be a reliable model.

Another possibility is you like to make pretentious claims just because they sound cool.


When someone pulls a rabbit out of an empty hat you cannot just dismiss it with “Impossible this cannot happen, therefore it is a trick, therefore it does not exist!”. You have to get what the trick is.
By dismissing its existence you’ll never get it.

In order to prove a trick is a trick you have to understand and reveal it to everybody else: “Look this is how it’s done, pull these levers, open these gates, abracadabra, there-s the rabbit!”.
Otherwise the audience will not believe you but the trickster, because “I have to see it to believe it” has always been a much more convincing argument than making claims you cannot prove.

3 Likes

Excellent. What you describe there is Plato’s Cave. You are forced to perceive a projection of reality. This is the contents of consciousness.

But to be able to perceive a projection, even if it is an illusion, you need the ability to perceive. Whatever it is, how wrong it turns out to be does not matter. If you perceive anything at all, you do perceive. This is the proces of consciousness. And this makes your own consciousness the only thing you can be certain of.

This is what Descartes meant with “Cogito ergo sum”. (I think therefor I am).

1 Like

David you contradict yourself…if all you have access to are illusions, from whence did you come to know of light and wavelengths…surely you pierced together illusions to realise this. How can you presuppose reality apart from the illusions you have experienced? If you mention anything it is only a function of those illusions. If it exists it is because of the illusions, not the other way around. Consciousness is the only real thing there is.I am a realist but it is based on faith more than on rationality. I dont like my faith to be interrogated either but its not useful to believ all there is is consciousness vis a vis the faith

2 Likes

If all we have access to are qualia, then those things we are arranging for the agi to detect are equally qualia. If i ask you what they are , you will either mention a quality or something that is a function of a quality. If its a function of that quality then since AGI’s approximate functions they can approximate “green” and “hot”.

2 Likes

What you say is in a certain sense correct.
In order to account for an objective external reality I would rephrase

To

1 Like