Google DeepMind claims they're close to achieving human-level AI

I agree. Alexa can turn on the TV and set a reminder, but it really is pretty bad.

The only AGI models we have are ‘embodied’ animals. Boston Dynamics makes great videos, but I don’t see one rounding up sheep in the paddock.

I’m hopeful we’ll crack some of the simpler puzzles and be able to build a self-driving taxi within 20-30 years, but I won’t be putting money on it.

1 Like

In what context? Codex can write code and functions well out of its training dataset (which is a good chunk of GitHub) - In that sense, for complex prompts it can create novel functions to aid it in generation of other parts of code. Would that count? Does VPT, pretty much the best Minecraft model which can craft and utilize virtual tools count? Like using an iron sword?

Putting aside the fact that its a pretty bad statement, if DL models come up with any intrinsic algorithms to aid them in their task (for which we have just recently started finding evidence) does that count as “inventing” its own tools?

Yes, I am.

Even traditional (and old, there are more new and modern examples of this) DL simulations show agents learning new complex behavior to “cheat” their environment - novel strategies, exploiting physics engines etc. That raven is just another example which doesn’t understand how a car works (you can close the door and that’s it) but knows how to locate food and consume it - if you think that’s the peak of abilities required for AGI, then I don’t suppose any of our conversations are going to be productive.

That’s a huge misconception, so let me put it in bold. Benchmarks are a way where you can judge if a model has gained some new abilities without putting it through a complex environment and battery of challenges. If PaLM suddenly shows spikes in understanding nuanced humour and complex conversations, then you can bet it may be able to understand jokes.

It’s simply an exhaustive way to quickly asses the capabilities of a model and see what to expect. The goal isn’t to grok all of them (there are plenty baselines) but to quickly and accurately judge a model’s capabilities.

Grokking all benchmarks wouldn’t give you AGI - but if you have a model which can pass any no. of benchmarks thrown at it, well, then it may give you a tiny hint that you have a general model on your hand :wink:

Without going into embodiment, I would just serve a friendly reminder here which - Boston Dynamics relies on conventional optimization theory, which is quite an ancient and mature branch of creating mathematical models and optimizing them to perform certain maneuvers.

To put it simply, all Boston Dynamic robots are as hardcoded as symbolic systems. You can specify what actions you want to do, input the details of an environment and their mathematical frameworks effectively calculate a trajectory to perform it. That’s all. You can watch videos wayyy older (like 10-15 years old, TED demos are great - look at this 10 year old video)

Hopefully, you’ll notice some similarities between the capabilities of Boston Dynamics robots and the quadrupeds in above demo. The only difference is that BD uses perception algorithms (think SLAM) instead of relying on cameras and shiny balls.

That’s the farthest technique to AGI - but I agree. If a multi-modal DL model actually bootstrapped on the entire internet is formed, then the best test would definitely be the real world.

2 Likes

Code is just NLP, its statistical in nature.

I’m talking about picking up a piece of stick and figuring out you can carve a sharp hook out of it to reach tasty worms hidden inside tree barks.

and maybe also showing it to your offspring and having them do the same.

1 Like

and how exactly is it different to the above posted link where agents manipulate complex objects in interesting ways (such as using the walls for blocking off the seekers which is more faster and effective)?

The difference is simply between environments. The principal is still the same. Figuring out how objects interact with the rest of the world is a simple ability to master - so I’m not really sure why you’re holding it up as the cornerstone ability of simple organisms.

oh, and btw -

Those agents are fundamentally statistical too - modelling the environment from a purely statistical point of view. Animals figuring out that going to certain places where there’s a higher chance of food are probabilistic too (maybe even Bayesian). There is little difference beyond the surface.

The only difference is between what the model can do - Codex can generate and comprehend code, while OAI’s agents can survive in a difficult virtual world. They are different applications built upon the same fundamental theory - but now the paradigm is to simply unite all of them like GATO.

Exactly what I was talking about in my first reply in this thread. Great example. This would have been a bad example if we were talking about non-human level intelligence.

1 Like

Models of reality, novel problem solving, learning by doing or watching, all the things our AI right now just doesn’t do.

1 Like

Yea, thats a cool result indeed.
but those agents took millions of attempts to randomly stumble on the answer, in the animal world, you sometimes only have one chance to figure out a novel solution immediatelly or you may die.

I believe a Models of reality is the most important, I can imagine ways to derive all the others from it.

1 Like

But the models have to include time and space. The model of a cricket ball flying through the air projects backward to where it was hit, and forward to where it can be caught. To me, that’s a prime application of AGI, and many animals have it. Our best AI does not.

1 Like

Evolution itself took millions of years, and how much energy, trials and organisms it took it?

These models are trained from scratch - so they don’t transfer any knowledge at all. As you might’ve heard, transfer learning is a huge part of NLP simply because if you transfer learn you obtain better representations, much better accuracy, and obtain said accuracy nearly order of magnitudes faster, with less data of course.

That’s just how transferring knowledge works. You build a model, you update it overtime and then later leverage it in a sample efficient manner.

And of course, nothing I say isn’t backed by a citation :wink: Here’s a recent work by DeepMind which is much like the OpenAI example, except its slightly newer and exhibits the 0-shot generalization abilities (no examples, agents solves it on first try) and its more generalist policies (such as experimentation with the environment). They scale the tasks and environment vastly, which leads to agents learning more about the problem solving approach itself. I’m sure there’s a parallel to nature here.

World models do that explicitly. You can watch what the model predicts in the future.

I feel a really important point people don’t get here is that these Large Language Models don’t just do random stuff. All they do is predict what token is coming next. A chunk of tokens may represent an image (ViTs) which gets you that video predicting model, using those tokens as text gets you GPT3/PaLM/LaMBDa etc. Again applying same tokens can get you to do multiple tasks (GATO) or act as a speech recognition system (OpenAI’s new Whisper). Think of tokens as a discrete unit of information, which can represent anything, as long as it can be transmitted via bits and bytes.

The model doesn’t know what those tokens are. All its given is to somehow predict the next one (predictive coding) - very much like the cortical columns from TBT. Thus, almost every model is a “world model” where a tiny portion of the world is sampled and modelled. Larger models can allow for even bigger portions and hence multi-modal work like DALL-E-2 et al.

2 Likes

the human DNA is just 700Mb in size, its even smaller in other creatures and most of it is made of inactive genes, there’s not enough space for an entire trained model to be passed down the generations, but its enough space for an algorithm.

too bad nature is the best code obfuscator we’ve ever seen.

2 Likes

That’s not the point – I don’t want to ‘watch’ anything. I want an AI that constructs a model (or applies an existing template) to a set of sensory inputs and from projecting this model through time and space and based on previous experience makes predictions and chooses a strategy to achieve a goal. Animals do this now, our best AI does not.

too bad nature is the best code obfuscator we’ve ever seen.

Not really. What you see is the final result of a mix of emergent and chaotic behaviour, which has the effect that you can’t discover the mechanism from its outputs, and you can’t predict the outputs other than by running the mechanism. But if simply replicating columns leads to more brain power there might be something that could be reverse engineered.
The result

1 Like

if sensory inputs can be qualified as bits and bytes, sure. We have plenty of those.

Absolutely - except the important point that you won’t have access to those predictions (well you do, you just can’t interpret them) only the model would be able to utilize those representations - or other models.

That’s the learning algorithm - Large language models devote most of their space to memory. So effectively, its makes more sense to talk about comparing the size of the brain to an LLM (brains are order of magnitudes larger). As retrieval models demonstrate, LLMs commit a lot of stuff to memory first because its way more efficient in modelling the world if you just memorize stuff - and it tries to find shortcuts.

Eventually, the model reaches a stage where its capacity is exhausted and it can’t predict more complex sequences - so its starts meta-learning for minimum disruption to current weights and maxmimize accuracy.

1 Like

for me it makes more sense to compare the source code written by the engineers who made those language models and the source code in the DNA.

the point is that whatever is algorithm encoded in the DNA, its so much better than the ones we have figured out so far.

1 Like

fair enough

I wholeheartedly agree with that. DL is simply another approach for AGI, where the point is to discover that algorithm itself from the data we produce. The reason why there are a lot of recent gains is that the fundamental idea of recovering an approximation of the underlying function is useful in myriad ways - from decoding visual stimuli in the visual cortex to generating art.

However, I doubt that other methods would catch up anytime soon. The problem with explictly finding mechanisms is that you could easily miss those which are important, and there could be plenty which are undiscovered. Moreover, you’d also need an underlying theory to wrap it all up neatly which has proved elusive.

At this point, it might be advantageous to start applying DL to accelerate neuroscience and find interesting correlations to ablate and understand. Some work has started on that journey (like Jean Remi-King pointing out the similarity of DL and neocortex activations)

1 Like

I’ve been trying to get my head around why DL works, and the more I learn about it the more I notice that the process of building a DL model is less like engineering and more like putting layers together and praying for the derivative spirits to minimize your loss function.

its cool to be able to quantitatively prove that a loss function can be minimized by mathmagic but I simply dont like this approach, I don’t like the idea of “differentiating” cognition into existence, it might be part of it but is not the whole story.

I prefer to actually understand why the weights do what they do, and I preffer to look at an activation and be able to tell what it means.

my intuition is that a over-engineered spaghetti of recurrent perceptrons ought to be enough for intelligence.

2 Likes

The whole idea of minimizing loss is perverse. It’s a double negative with diminishing returns and limit=0. Intrinsically, it only makes sense for supervised learning. That’s why they have to replace unsupervised with “self-supervised”.
Open-ended GI should be doing the opposite: maximizing projected match, or predictive value of indefinitely and actively extended model. The feedback should be searching for new input, vs. fitting the old one to some template. Then the limit is infinity.

1 Like

if sensory inputs can be qualified as bits and bytes, sure. We have plenty of those.

Absolutely - except the important point that you won’t have access to those predictions (well you do, you just can’t interpret them) only the model would be able to utilize those representations - or other models.

No problem. The idea is for the algorithm to do exactly as per HTM: use the model to make a prediction (eg where the ball will be next), compare that against sensory input, update, learn.

Note that we don’t care where the ball really is, we care about the sensory input, where we see the ball to be.

the human DNA is just 700Mb in size

Coding or junk? I have a theory that junk DNA is data storage of another kind, possibly available algorithms.

1 Like

I don’t get your math at all.

The problem is you can definitely do that for smaller networks - to an extent. However, for larger networks it becomes a problem because you’ll need a more complex set of hypotheses to analyze the large network in the first place and rule out why certain activations take place on what piece of data - which is effectively optimizing another network to do the job, creating an endless cycle.

By that definition, yep I can gurantee DL models definitely do all three - which apparently HTM does too (I know little about it, only a tiny bit about TBT).

1 Like

honestly I think deep learning probably works but its probably only one of the several ways to approach the problem.

I can guess that to build an actually smart machine, we need the following components, which could be built with DL but not necessarily.

  • A humongous and fast memory storage.
  • A world model that feeds on the memory storage and sensory inputs to generate latent world states and is able to:
    • given an world state T, predict the next state at T + X
    • given two world states at different times, predict a intermediate state
    • given a world state, generate a similar state but that would lead to more reward
  • An action generator that feeds on world states
    • given two world states predict an action that would cause one state to change to the other.

Memory transformers

Funny we’re talking about world models right now - Meta just released their latest text-to-video a few hours ago: https://makeavideo.studio/
Quite a few drawbacks, but well, its a new direction.

Yep you can condition on the reward - throwback to decision transformers.

That could be quite tricky indeed. It’s pretty much getting video question answering and generation all in one. I’ve cited Flamingo by DeepMind here multiple times - but you’d get an idea of what a model that can do that would be like.

Effectively, all these tasks can be done by a single multi-modal model (GATO can answer questions, predict future states for given action and interact between modalities, eg. text and images). Which is why multi-modal models are on the rise right now (with the above Make A Video example :wink: and several others you might’ve already seen - there is no dearth of interesting generative models)