There is a lot of money at stake. They have to make bolder and bolder claims. This reminds me of the whole $TSLA nonsense. Sooner or later, the house of cards will fall. We all see how it’s starting to crumble…
We can define intelligence by the statistical route, however we fail (like in the trading markets) to fully take into account the tail ends of the probability. For an intelligence it is the tails of the probability that are critical, not the 90%+ times it get’s the answer right. To me it’s how the answer turns out wrong that clearly creates a split between a model without the ability to reason and real internal reasoning.
If the model/system/AGI can explain why it thought a wrong answer was correct and itteratively explore through that answer with the model to the point the model “realises” why it was wrong that may be closer to the real test. This itterative interaction is what should be in the Chinese Room test.
When the model changes it’s answer without an explicit example (no n-shot learning) through continuous learning that may well be a true reasoned understanding. It’s maybe the bridging/triangulation of hidden concepts itteartively without an explicit example that is the real deffinition of intelligence. Unlimited depth of recursive thought.
The second part of this would be the model (with an associated memory better than a human) may well end up retrospectively re-analysing previous answers to verify it’s own perception and then self-correcting for historical mistakes. Asking the same question (which you recieved a wrong answer) at a later point in time should then derive a correct response, if enough implied learning occured between questions (not in the style of n-shot example patterns).
That said Richard Feynman’s “why” is an interesting situation for an AGI which is told it’s wrong.
Firstly, I don’t know much about ML, so if I say anything blatantly wrong, that’s not meant to be opinion. I know a lot about some aspects of neuroscience and about HTM.
I don’t know what reasoning is, but a prerequisite is a representation of the reason(s). Otherwise it’s not using those reasons, except indirectly. The actual reasons are probably just patterns in words. That’s just pattern recognition, like recognizing a line, not really reasoning.
If superhuman AI is gonna be like an eye (with a very basic visual cortex or something, I dunno) which sees massive amounts of data and directly perceives patterns, that’s fine, but it’s counterproductive to pretend it’s something else. It doesn’t think, yet it’s superhuman, because we can do the thinking (generate data e.g. words on the internet) and there are billions of us. That seems more like a way to enhance our communication (merge reasoning from billions of people) than an intelligence.
Does it have a model of the world, like what a painting is? Otherwise, all it can work with are words, attached to nothing besides other words. At best, it can make logical deductions about words, but it has to do so in terms of words, and those words only have meaning to it in terms of words. It’s just recognizing patterns in words. It can’t possibly conceive of an external cause of those words. It knows how words are associated with other words, but it doesn’t have a clue what it’s even saying. Ask it something, I dunno, how can we make laundry machines faster?, and it’ll sometimes say something ingenious or utter nonsense, because it’s not so ingenious or nonsensical in its world of words.
It’s already superhuman in some regards, so it doesn’t need to understand things to perceive patterns which we can’t.
I don’t know what reasoning is exactly. My question is whether it’s just seeing patterns in words. It has superhuman pattern recognition and a lot of data. It doesn’t know what a coffee mug is, so it can’t reason about mugs, only the word “mug”. I don’t see how reasoning is possible for it, because its world is composed of words. It can’t figure out that dropping a mug could break it. I mean, it could say that based on patterns in words, but it’s not possible for it to figure out something far more complex and original.
Learning to parrot billions of people. Heck, just 1000 could probably seem very smart. It’s not superintelligence, more like putting together the traces of a bunch of human-level intelligences (words written by humans, and whatnot).
Now we have something to discuss.
What I am getting from your argument is that somehow, reading that a dropped mug breaks is not as good as dropping a mug and seeing it break. Somehow, a few years of experience (being a 3 year old) teaches more world facts than reading the vast base of human experience set into words. The rote use of tools teaches utility more than reading about the utility of objects like mugs.
You describe the poverty of “just words” as an insufficient base experience for common sense and reasoning. I would offer Helen Keller with no sight or sound, “just touch,” as the base for her common sense and reasoning. From what I have read HK was very good with common sense and reasoning.
A good “word” training set has to include the facts that most people experience to learn “common sense.” It would be helpful if these were tagged with the personal relevance like a broken mug is a loss in utility. The very large training sets that the current models use is likely to have some of this now. This is not “just” a statistical framework inside the data but actual facts that can be the basis for reasoning.The WIKI data-set is not as likely to include these “common sense” items as - say - the corpus of romance novels.
I would point out that a brain also has the same problem; it sits alone in a bone box in the dark. The inputs are just pin-points of excitation. The missing parts you describe is experience of grounding some of the incoming signals and correlating coincidences of those signals.
The “non-word” grounding we humans get is changes in our homeostatic values correlated with those external signals - cold/hot, hunger/satiation, pain. I can see how you might load some of this grounding with a carefully engineered training set; you could create a locus of Amygdala and Hypothalamus in the training set. I would further say that the statistical underpinning for large word databases may end up doing that by default now.
It may be worth looking at ‘Chain of Thought Prompting’ already in use with several language models:
They’re definitely on the right track; but so far from the numerous reports of mismanagement within the company I don’t think things are looking too rosy for them until they stabilize things.
currently LLMs can do that, yes. you can ask them and they almost always supply a reasoning chain, even if its incorrect. However, Large models like PaLM with chain-of-thought prompting are able to consistently deduce things correctly all the while explaining their process - see my linked examples above.
Which is the alternative direction DL wants to go because it leads to intellectual leaps and behaviors. The parallel is that evolution has had millions of years and tons of energy to get to intelligence. However, it doesn’t make sense to start from scratch - so by simply replicating current intellectual processes our models learn approximations of those processes.
But really, what’s driving the entire area of research is results. DL simply has unparalleled results with no signs of stopping, and no opposition except arguments and opinions. So basically until things drastically stop working and no solutions arise for a long time, I doubt DL is going away anytime soon.
Yep, check out Flamingo and the hundreds of other multi-modal models. They connect all modalities and you can ask it to explain something in the picture. relinking the previous example
That’s because its loss (the objective) penalizes for producing no tokens - so it would waffle rather than stay quiet. Or you can prompt it in a standard Q/A fashion in which case it would reply with variations of “I don’t know”
That’s a good point if you discount an entire field of contrastive learning, and famous models like Imagen and DALL-E-2 as well as countless multi-modal models. I don’t get why people find it so hard to figure out that you can teach representations of almost every modality - so if it can answer questions about a mug as well as a human, I don’t see why it doesn’t understand what a “mug” is or how it looks like.
If you find a parrot that can do half of LLMs capabilities, I’ll give up
And it turns out, apparently you don’t always need to finetune - this was done on vanilla GPT3 simply by appending a simple statement. https://twitter.com/arankomatsuzaki/status/1529278580189908993?t=AYdg2mu5yVlbj7GfzmX3JA&s=19
Simply adding “Let’s think step by step” before each answer increases the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with GPT-3.
This effectively starts DL history at the Social Layer (or even unsocial layer considering they ingested all of Twitter ). You could hypothesize that this will create a different type of intelligence, but what would we expect?
I heavily disagree with the article, and some takes are just blatantly ill-informed and highly out of date
You can’t have a general multi-purpose unit outperform specialized functions. A big “do everything” mind can’t do everything as well as those things done by specialized agents.
lmao I’m not sure whether the author’s simply high, or they’ve completely forgotten all of the LM research for the past 4 years.
I highly doubt that hypothesis. There’s no different type of intelligence. It’s simply going to be a highly capable reasoning machine with a superhuman memory and human-like learning abilities.
58% of the time is consistently with a need for a prompt (example pattern - no reasoning) ? 
There is a very big difference between knowing a pattern and how that pattern works to actually understanding what the pattern actually represents.
In the sense of an extended tool kit? Still requiring a user?
If you’re talking about why n-shot at all, I recommend you the seminal scaling paper of Kaplan et al. Essentially, scaling reduces
n lesser and lesser. We’re already seeing plenty of Superhuman performances in few-shot benchmarks
I can’t predict the future, but most likely it will be a multi-modal model deployed as robot in the real world.
As cliche as it sounds, the real world is complex enough to act as a regularization of these large internet-bootstrapped models and promote more human-like imitation. so it won’t be just an “extended tool kit” if it can physically and mentally do what a human can. Check out socraticmodels.github.io for an example…
Helen Keller learned to read and write, so touch is enough connection to the real world to learn about the real world. In a word composed entirely of words, the words can’t mean anything besides stuff about other words. If you were to suddenly provide it with a sense but filtered out anything about words, it would need to learn everything we thought it knew. For example, it’d need to learn that dropping a mug makes it break.
It’s meaning, not just grounding. We don’t think mugs break when dropped because of a pattern in words we’ve learned, it’s because of a model of mugs.
In neuroscience, if an animal said “It looks like a summer day because the grass is green and the dog is not wearing a sweater,” that animal would be about as smart as a human. Animals need to work from sensory stimuli, so getting to that level of understanding is hard. Whereas AI can find patterns in words and fake that level of understanding. In that way, it’s not really a model of the world, at least that’s not how it’s concluding these things.
Because it’s much easier to find patterns in words than it is to internally model what’s happening in the real world. It’s not hard to recognize the visual pattern of a mug, but that doesn’t mean it could produce anything near the level of reasoning which words use. Which means the things it’s capable of doing are different than a human in some ways.
This is the important bit everyone overlooks. Brains build models of reality (time and space), project those models forwards and backwards in time, and compare actual sensory input with predicted. They choose actions based on predicted outcomes, and modify those choices based on actual outcomes.
Parts of those models and actions are built in (evolved), parts are learned during childhood, parts are continually updated. Language contains multiple hooks into those models.
An AI that does not do this model building and prediction can never pass any reasonable test of intelligence and can never perform as we do on language. Pattern matching is just not enough.
…and who was it that first described all of this in exquisite detail? Hint: he called it Consciousness.
A chain of thought is not useful simply because it solves a problem. The solution has to adjust the policy of the agent. E.g. If i have the thought, “i have to be more truthful from now on” . That thought resolution should affect all or most of my state action mappings from then on. Some thoughts result in disproportional adjustments to the policy and affetc learning rate. Its not jsut the policy that must be trained , but the thought trajectory will also change and must be trained. the key is , in the brain there will be some instant changes caused by such a thought as described while others will follow up as the new thought trajectory reinforces itself.
Our thought processes are actually a genetic algorithm at work, looking for lower minima in the policy loss function…we think in order to act better and we act in order to get a better model of the loss landscape to be optimised
My point is if they approach the creation of thought trajectory as training material i dont see why it couldnt become an AGI.
No, this has nothing to do with consciousness. This is basic brain behaviour easily demonstrated by animal experiments. It’s the essence of juvenile acquisition of skills, of learning/training in adults. In humans it works fine at the subconscious level, and in conditioned reflexes. It’s pervasive, and strongly represented in language.
HTM only scratches the surface of model building, continuous learning and anomaly detection, but the pattern matching people are nowhere.
Do you have any solid, incontrovertible evidence proving that? opinions are dime a dozen.
Sure. which is why the model I linked above is multi-modal, and so are several. Eventually, the hope is they can model the real world as well - which is already looking rosy from current breaktrhoughs like GATO controlling a real-world robot arm and socratic models.
It’s not; its simply a proxy of giving an extra memory and space for the model to work with. in the end, its just a way to improve intepretability and let the model keep track of complex things without requiring more compute.
Again - not to be rude, but ‘so you think’. treating scientific problems scientifically is the best policy. My recommendation is to actually research current tests of intelligence, and compare where different techniques lie. After that, you can make your own inferences as to viability But still not be privileged enough to conjecture anything with 100% certiaintly and accuracy.
Is there some inherent difference in forming this model by actually breaking a mug vs. reading about breaking a mug?
I teach a variety of technical subjects such as electronics, and task performance such as how to weigh aircraft. Technical education is heavy on abstract concepts that have little connection to personal experience. I communicate many of these very abstract concepts via text and lecture, and practical hands-on grounding when possible. I am very familiar with these instructional techniques and what each is good for and the limits of each method. In fact, I have had to take classes on teaching to receive my teaching license.
I teach many aspects of safety, yet I have not actually electrocuted any of my student nor dropped an aircraft on any of them. Somehow, words and text is able to build & ground these models in my students.
I am very familiar with the demarcation line between building and linking concepts and the linking of these concepts to personal experience I am referring to as grounding. This vital link connects the system of abstract models to a persons internal models based on homeostatic maintenance. Some teachers are better at doing this than others.
If reading is sufficient to build models in my students why is it not sufficient for a program to use words to build models of the world?
Does your concern center on the task of linking these models to some sense of I in the models?
I think some people might be reading the headline wrong?
It says “human-level” not “human-like”.
Human-level is a statement about its capabilities.
Human-like is a statement about how it works internally.
For example, we’ve had human-level chess AIs for decades,
but we still don’t have human-like AIs that play chess.
DeepMind might achieve human-level AIs in the near future,
but they won’t achieve human-like AIs without first studying how humans work.