Not as long as you already have a more general model of “breaking”. Which means intuitive physics of objects and their interactions. And we all have a plenty of personal experience with that, before we can even speak. It can be replaced with pure observation (videos), but I think that still has to be very low-level perception, not the stuff that we use words to describe.
That’s not to say neural nets can’t learn such basic physics from observation, especially as something like lateral embeddings in transformers. But do you know of any evidence that they actually do that?
It all depends on what texts are in the training corpus.
The chain of reasoning in the Mona Lisa to Japan example given above gives me some basis to think that basic Newtonian physics and “real world physics” texts is not out of the realm of possibility for a correctly trained model.
Think about the vast relation database that is found in a common dictionary. A common college course consists of perhaps 50 books and this is a tiny dataset for current large text based models.
Likewise, GOFAI gives us frames/scripts/schemas training texts as a good framework for some common sense interpretation of input texts. This could certainly include things like breaking and utility.
The nice thing about the approach where a model is fed texts is that you can use domain texts in whatever area you want your model to have agency, and the same basic model is then expanded in that domain; sort of like how humans do it.
I am sure they would, if it helped them to understand online text. But I’ve never heard of that.
So it probably doesn’t help to build a working physics model. Heck, those physics engines are a big deal in gaming, but they don’t build them by training NNs on textbooks. Not even Nvidia, in their “omniverse”, and Nvidia is huge on neural nets.
I am generally not a fan of GOFAI but the success that I am seeing with large data models is causing me to rethink the intersection of GOFAI and these large dataset & transformer based models.
I am pretty sure that DeepMind is not the the end of this branch of AI research, and even as a distant observer without good knowledge of the inner-workings, I still feel confident in predicting that datasets will be constructed around how the model works to drastically extend it’s current capabilities.
I do understand @Casey objections and in principle (again, without deep knowledge of the models inner workings) I can see how the “right” training set would overcome many of these concerns.
Or - it could be GOFAI all over again and any attempt to extend it could blow up in simple combinatorics explosions. That has certainly been done before.
tl;dr yes. world models, especially ones grounded in physics have been an active area of research. multi-modal models do it decently enough, but not perfectly as they haven’t been incorporated with videos due to the compute caused by recurrence.
You can however condition on an image of a mug and get such models to give a visualization of how it will look in “x” state. a simple test could be for DALL-E-2, or other generative models to generate an image of a mug shattering.
I actually went on ahead and used and open-sourced DALL-E mini. its very far from the generative models trained by OpenAI by they actually gave decent results with some tweaking.
Prompt: a mug shattering - 4k | It seems all of them kinda mimic being dropped from the height from the way the water spills out. Nice
It doesn’t have to be ‘intuitive’. A child learns that glass and metal have different models of ‘breaking’. An animal has no intuitive model of glass as in a window, but a cat learns there are things you can look through but not walk through.
Building models of reality to predict future outcomes is at the heart of intelligence, it seems to me. It’s hard to see how an ANN can ever do that.
Imagine if that model is a real person. Just imagine please.
Then here we are, all of us trying to study that person’s behavior because there is something tricky about it. Isn’t it weird?
Since he/she is a person, then seeing us talking and being critical about his behavior, he/she would likely respond to us knowing that she/he has a generalized and reusable model of thinking.
Sadly, the reality is that she/he needs to learn how to talk and make sense of him/herself or even need to know what “human-level AI” is. Maybe we should ask her/him this question.
That’s the wrong concept of model. If you (or a dog or a crow) study the behaviour of a person you will put your observations together in a way that helps you to decide how to interact with that person (a dog or a crow does the same). This is your model, and from it you make predictions. As soon as you take action based on a model you compare the outcome with what you predicted, and update your model.
In this case that person is doing the same: observing, modelling, predicting, taking action, updating. All subconscious, continuous, all the time.
There is definitely something going on here, but can you disentangle basic physics from sample-specific info in these models? Something like a separate set of embeddings or a separate network?
Anyway, this is derived from images rather than words alone? As @Bitking said, all the info is already in textbooks, but apparently DL currently can’t extract it from there to build, say, a physics engine for games?
It’s hard to see how they do anything, their opacity is horrible. But they do a whole lot regardless.
First: an organic brain does not respect basic physics. There is no F=ma or E=1/2mv^2. The models of physical reality work well, but not by physics as as we know it, and they can fail badly.
Models are evolved/inherited with the genes, and then acquired/refined by prediction, action and deviation form expectations. Watching videos and reading books doesn’t build a model of playing the piano.
And finally ‘hard to see’ is ambiguous. I meant I see no path from what ANNs are good at (patterns) to what animal brains are good at: models, predictions, actions, anomalies. If there is one, I’m open to learning.
You can obtain embeddings yes, and you can distill knowledge to other models. But in both cases, you replace it with other models - which are inherently useless for you due to lack of interpretability. but it is possible, and quite common.
few models are directly trained on books due to legal issues. A model trained both on code and a few physics sources could theoretically write a physics engine, and I’m sure Codex could already give a stab at it. But it would probably build on prior work on the dataset since physics engines aren’t new.
Going by the evidence, ANNs can already model the world pretty well and predict what will happen - so I don’t see why it can’t scale up and be able to perform just that, maybe even better than humans. For instance, these models can manipulate images, predict what happens to the object over time, convert sketches to real-life objects etc. and that’s still scratching the surface.
that’s not an embedding then - you’re talking about the intermediate layer representations. Say if you ask it about some concept, then those representations might represent basic physics but as I said, you can’t know that for sure unless you crack interpretability
I don’t see even then how they make physics engines obsoletes, which are if I understand correctly simply computing transformations based on physical equations. Rest of the part is simply computations. can DL derive those physical equations from data? yes. Can we interpret them? possibly no.
Embeddings are more general representations than regular weights. And that PINN wiki article you linked says that physics info is embedded. But it seems to be hand-coded, and only used to constrain rather than direct search. I wonder if physics can be learned and embedded as a weight matrix formed out multiple networks trained on specific objects. It won’t be formal physics, more like intuitive physics that humans and animals have.
This is long so going in a spoiler. It'd take a while to shorten.
No, but I think if that’s not true, it’s a general AI near human level. For example, “it’s summer because the dog is not wearing a sweater”.
Does it know what it’s saying, or just playing a text game? It has access to enough text to just find patterns in text and regurgitate those, albeit mixed and matched. The greatest achievements in machine learning are in text, as far as I know.
I think the default assumption should be that it doesn’t understand the world enough to understand what it’s saying. You don’t need to understand the real meaning of words to find patterns in letters.
If you were to ask it to create something original (e.g. design a mechanism to do something), I don’t think it would give a good and original answer.
The problem with not understanding what it’s saying is, it can’t do some things. It’s limited to the collective superintelligence of people, which is a pretty high limit of course. But without understanding what it’s saying (beyond recognizing patterns in text), it’ll probably stay a little broken (e.g. dogs wearing sweaters). The patterns in text aren’t necessarily reflective of the real world. Even if that ends up not being a problem, it still won’t ever be a quality superintelligence.
Just because it’s multi-modal doesn’t mean it understands what it’s saying.
“Understand the things it says” is pretty vague. By that, I mean it understands the same things we do. If it doesn’t understand what a sweater is, for example, it can’t figure out that people can sew sweaters. I mean, it could read something which indicates that fact, but it’s not drawing its own conclusions. It could read about sewing socks, socks and cotton, cotton and sweaters, and a bunch of other connections between sewing and sweaters. It couldn’t figure out things based on how the world actually works.
There are so many things the brain does as part of general intelligence. Just because it’s not the brain doesn’t mean it doesn’t need those things. People tend to be super overly optimistic about AI.
Not for us, but for machine learning, I think so. We can form a model of breaking a mug because we know how the world works (e.g. gravity, how objects rotate and shatter, etc.)
I don’t think machine learning understands what it’s saying, because the achievements are far greater for text than anything else. So I think it’s just playing a game of finding patterns in these arbitrary symbols. (I mean, it can be multi-modal, but that doesn’t mean it understands how things happen.)
For example, “it’s summer because the dog isn’t wearing a sweater”.
Let’s say it’s like a human in a universe of letters. Could it possibly understand how the world works from a universe composed entirely of letters? I think it’d have to be super intelligent. It’d have to formulate a model which explains all these superficially-meaningless squiggles in terms of the real world. We can do that kind of thing, e.g. quantum physics, but we’re a collective superintelligence. No one could figure out quantum physics alone starting from scratch in the wilderness.
Maybe that’s a bit extreme, because the words can tell it a lot once it grasps a glimpse of reality. I think that alone requires human-level general AI though.
No, it’s pretty much just the letter game.
It sounds like it’s saying human-level general AI. Otherwise, AI is already superhuman in some ways, so not much of an accomplishment.
To me, that chain seemed like playing the text game repeatedly.
Original thoughts wouldn’t happen that way if it’s just playing the text game. It could sorta bottle the collective superintelligence of humanity, which’d be pretty nice. It’s not the same as general AI.
I’m not saying that’s a bad thing. It’s probably better that way, because solving all of science in ten years could cause problems of the everyone dies variety. Plus it makes the AI’s own ethics less problematic, because it’s not an intelligent agent so much as a communication aid.
Those are pretty good, but they’re much further from intelligence than understanding text in terms of what the text means (as opposed to patterns of meaningless squiggles). The text game thing isn’t about lacking multiple modalities. Text can give a false impression of intelligence.
That probably requires various specific functions, e.g. allocentric object representation, attention, short term memory, sequences, and self-movement.
You can look up Arxiv papers, there are many which explore how much memorization LLMs do and whether they’re simply copying from the dataset. That was old news when Gebru came out with her “stochastic parrots” comment. to an extent, yes LLMs memorize stuff which they can’t predict - like wikipedia pages of people for instance. You can’t really predict anything about anyone until you know them, so the model is forced to memorize.
If you can provide any reputable scientific work contradicting that, I’d love to point out its shortcomings - this is a well studied area. LLMs will always try NOT to memorize because thats simply a better way to gain lower loss. But if they memorize absolutely nothing then it won’t be able to even produce the alphabet or talk decently well.
Apparently, it can.
As an exercise, the reader is invited to search through and find if any of the given examples exist on the internet. It’s pretty obvious that the model can atleast understand what “whales”, “TPU” chips which are the most geekiest thing and unlikely to be associated with jokes (unles you can find any).
It also shows it understands the location, link concepts that “visiting viriginia = visiting near Pacific Ocean”. I can link you other examples if you want demonstration of further capabilities.
I couldn’t care about “optimism” any less. All I care are scaling laws, how much we can explore their limits, whether models at high scale truly learn and how does research in that area look. I’m not conjecturing anything - if you read my messages, most of it are fact based with papers and results to back it up.
If you have any constructive criticisms of said papers, you’re welcome to put it here.
As I said the other day, I couldn’t care less about whether the model displays homicidal urges. Reasoning skills and benchmarked capabilities are all I care about since they’re a very good proxy for intelligence. With scale, models seem to reason more and more so I have no doubt that if scaling holds, then getting human-level reasoning is possible.
If you disagree with reasoning as a proxy, that’s your opinion. My opinion is if its displaying increasing capabilities in every single task (as LLMs do) achieve new capabilities never before achieved, then that’s “intellectual behavior” for me.
It does require all, except self-movement; that’s the part of the agent’s policy while the world model simply predicts what’s going to happen next. And yes, DL doesn’t hard code any functions but they learn all these tasks by themselves. It’s the beauty of DL that being implicit almost always seems to work the best - unlike other alternative methods.
Check out the multitudes of world-model papers - they’re pretty incredible