In today’s meeting, Jeff Hawkins gives some brief comments on the book, “Human Compatible” by Stuart Russell. Then Michaelangelo sparks a discussion on why dimensionality is important in The Thousand Brains Theory.
License: Creative Commons Attribution license (reuse allowed)
In today’s meeting, Jeff Hawkins gives some brief comments on the book, “Human Compatible” by Stuart Russell. Then Michaelangelo sparks a discussion on why dimensionality is important in The Thousand Brains Theory.
(Hi everyone, I’m been a lurker here but I’ve been blogging elsewhere about the implications of “What if Numenta or similar groups succeed?” for AGI safety and existential risk—see here.)
Jeff: "He says that, if you give a machine an objective, it will always figure out how to prevent you from stopping it from pursuing that objective …
I don’t remember exactly what Stuart Russell says, but that statement is definitely too strong:
- Certain types of objectives, like “do whatever the human overseer wants you to do”, would plausibly involve respecting off-switches.
- To unexpectedly seize control of an off-switch, the machine needs a certain level of foresight and long-term planning
- …and needs to be aware of itself and its environment
- …and needs to be doing cross-domain intelligence and reasoning…
Each of these ingredients leads to a possible way to avoid the problem. And the AGI safety/alignment people are indeed working on all of those directions! These research programs map roughly to “corrigibility”, “myopia”, “self-unawareness”, and “task-limited AGI” respectively, more or less, I think. Each of those research directions has some promise but also lots of open questions and issues—basically, no one has yet pinned down exactly what the specification needs to be, or how to design an AGI such that it will actually have that property.
Incidentally, humans do not have a single, simple, all-consuming goal; instead they have context-dependent goals, and contradictory goals, not to mention a ton of behaviors that are not really goal-directed at all. But humans do at least sometimes do long-term planning towards strongly-held goals, like Jeff’s goal of understanding the brain in order to design intelligent machines. And that’s sufficient for Stuart Russell’s argument to more-or-less go through. That said, I think it’s reasonably likely that we’ll eventually get brain-like AGIs with a single, simple, all-consuming goal, even if human brains aren’t like that. I think this for various reasons, none of which I am confident about.
Jeff, cont’d: “Humans don’t do this … I just find that hard to believe … If I give it an objective, and then give it another objective, I say ‘stop that please’, why would it ignore the new objective and just keep going with the first objective? You know, it’s like, y’know, I might pursue something aggressively too, but someone might come along later and say ‘please don’t do that’ and I’ll say OK.”
If a system has a goal, “the programmers are going to replace my goal with a different goal” is in the same category as “the programmers are going to turn me off”.
“I can’t fetch the coffee if I get reprogrammed to stop wanting to fetch coffee!” says the robot.
So if the programmers change the objective, then the robot will definitely follow the new objective. The concern is that the robot will thwart the programmer’s attempt to change the objective, before they actually do so.
Imagine you’re in the middle of writing your masterpiece, the Great American Novel. You may feel strongly that you don’t want to die before you get a chance to finish writing the book. If so, you would presumably feel equally strongly that you don’t want to get brain surgery that removes your desire to write fiction, before you finish writing the book.
I once saw a more lighthearted example, which I will try to paraphrase because I can’t find the original: The aliens appear and say "Hi everyone. We’ve been hiding for the past million years, secretly doing a science experiment in controlled breeding and genetic engineering to design the human brain! But we just noticed, we messed up our experiment! You’re supposed to want to eat your children, but apparently Xuuwwerk mixed up the nucleotides, so instead you love your children. Pretty ridiculous, I know! Imagine loving your children! Ha! Well, we’re terribly sorry for the mix-up. Anyway, can you please step into this machine so that we can rewire your brain to correct that error?
Subutai: “That actually seems very dangerous to me. I would not want a machine that listens to any random humans who asked them to do anything, that seems super-dangerous to me.”
Consider the 4 categories of (1) accidents (like Chernobyl), (2) bad actors (like terrorists), (3) coordination problems (like arms races, pollution, etc.), and (4) avoiding all of those but still getting a bad or sub-optimal post-AGI future (like the benefits of AGI not being widely shared, or somehow not winding up in some crazy awesome post-work utopia after AGIs can do literally every job better and cheaper than humans).
I think there’s a stereotype that the people concerned about AGI are obsessively focused on accidents (1), and don’t care about (2,3,4). This is not the case! I think they are thinking harder about all four of these categories than anyone else is. Probably (1) is the lion’s share of the literature (including this Stuart Russell book) because there is comparatively more work to do that can be done today, and especially work that needs to be done way far out in advance of developing AGI (as Stuart Russell argues in his book; I have also made that argument, see here).
For bad actors (2), it’s a hard problem and I don’t think anyone has great solutions. It’s obviously not feasible to keep computer chips and GitHub repositories out of the hands of terrorists! There’s a general hope that good actors’ AGIs will protect us against bad actors’ AGIs, but it’s hard to be sure about the offense-defense balance. (There’s been a little bit of theorizing, e.g. here.) There’s also been an effort (especially by OpenAI) to broach the subject of not necessarily immediately open-sourcing everything, for areas where you think there are bad-actor risks, or where you’re not sure yet. I don’t want to argue about “openness” vs “responsible disclosure” here—I can feel everyone here glaring at me for even bringing it up…—but I think that’s a nice demonstration that the very same people worried about AGI accidents are also the people trying to start a discussion about how to keep AGI out of the hands of bad actors. We don’t have to agree that less-open publication norms is part of the solution, but I’m just saying that they’re trying. Again, it’s a hard problem.
To be clear, you won’t find complete and airtight solutions to issues (1), (2), (3), or (4) in Human Compatible, because neither Stuart Russell nor anyone else knows complete and airtight solutions! Human Compatible is mainly a call-to-action that (1) is important, plus somewhat-vague ideas about possible approaches to doing it. The book talks about bad post-AGI futures (4) a little bit in the discussion of “enfeeblement”. I don’t recall any substantial discussion of bad actors (2), but Stuart Russell is actually a leading activist in that area … I guess he just left it out of the book for reasons like length and scope. Or maybe it’s in there and I don’t remember.
Michaelangelo: “Put a kind of paradigm of, it will have to propose what it’s going to do, and then you have to approve it…”
I’m all for it! If we could make that happen, it would be an amazing leap forward for reducing AGI accident risk. However, it’s hard to implement, or at least I don’t know how you would do that, and I haven’t seen anyone else propose a way to do it either.
Think about a brain-like AGI, Numenta-style. You have millions of virtual cortical mini-columns, connected to a virtual HC/EC and virtual thalamus. You feed in sense data and give it motor outputs, which are connected to either a virtual world or a robot body with sensors. I would argue that you also need a virtual basal ganglia, and some other virtual subcortical systems (maybe a virtual midbrain?) to direct attention and effort towards the specific things that you want the AGI to understand. After all, an intelligence wandering around in a forest can spend its time constructing better and better predictive models of insects, or models of dirt, or models of light scattering, or models of clouds, or maybe it’s ignoring its surroundings and trying to prove the Riemann Hypothesis! So again, I would expect that we need a virtual midbrain that can direct attention and especially reward towards correctly modeling the things we humans want it to model.
I could be wrong about the details, but the point is, you the programmer will be in control of things like HTM hyperparameters, input signals (to some extent), what the output actuators (or whatever) are, and possibly signals related to reward and attention, and I don’t know what else.
So the question is: Given these types of control parameters in your source code, how do you actually write the source code and set up the system such that it definitely will “propose what it’s going to do”, and you can be sure that its proposals will never be false, or manipulative, or confused, or omitted entirely?
I don’t think there is an obvious answer. But maybe it’s possible! I’m very happy to brainstorm.
Hello Steve, welcome to the forum.
I think the issue @jhawkins has with this is whether a system is truly intelligent without a sufficiently complete model of the world. (Can you truly know how to drive if you don’t understand how rain makes the road slippery, and why a cyclist that disappears behind a truck isn’t really gone? Or in the words of Douglas Adams: can you truly make tea without knowing about sun-dried tea leaves, putting in the milk before the tea so it wouldn’t get scalded, and a (brief) history of the East India Company?)
And so, an AGI with a sufficiently complete model of the world would never have a single goal, but rather lots of temporary parallel goals with lots of sub-goals each.
A coffee fetcher that stubles over the baby is not an AGI. And so the real question is: do we make baby killers (we’re very good at that already) or do we make AGI?
This is really another version of the off-switch avoider. (And it’s also kind of the plot of William Gibson’s Neuromancer).
I agree that AIs without broad cross-domain world-models are less powerful and thus create less risk of catastrophic accidents. But it’s presumably possible to both have a single all-consuming goal and a broad and ever-improving world model, because broadly understanding the world is instrumentally useful for better achieving the goal. For example, if someone had an all-consuming goal of “understand the brain and use that knowledge to make intelligent machines”—to pick a totally random example —they still might develop sub-goals of understanding venture capital and management and HR and professional networking and nutrition etc.
Maybe the only difference between us is that you call them “temporary parallel goals with lots of sub-goals each” and I call them “temporary parallel sub-goals with lots of sub-sub-goals each”?
I would also add that I don’t think it’s possible to train a brain-like AGI to have a single all-consuming goal from the moment it turns on. I think when it’s still learning the very most basic aspects of the world, it needs more specific guidance than that—for example, I think it probably needs to be “told” (by a hardcoded circuit in the virtual midbrain) that it’s very important to make a good model of human speech sounds, moreso than other sounds. An “adult” brain-like AGI can wind up with a single simple all-consuming goal, I would argue, but via a more complicated path.
Again¸ this is a bit of a side-track because I would argue that Stuart Russell’s AGI catastrophic risk argument basically goes through even if we never see an AGI with a single simple all-consuming goal. But still interesting and worth discussing
This is a problem which does not exist yet, and is therefore nigh impossible to act upon. Preventative measures for problems are implemented reactively, after the problem manifests. You can not plan for things which you’ve never experienced.
The dangers of AGI are not comparable to that of cloning humans, because one of those technologies actually exists. The prohibition on human genetic experimentation came about after we had demonstrated the ability to clone several species of mammals. Then the scientists made an informed decision based on their experiences in the laboratory and in life.
It’s not that an AGI would be better with a broad model of the world. It’s that the model is the intelligence. If your machine is not based on a broad model, it is not really intelligent. And the broader the model, the more intelligent it is. (The same goes for humans).
Could you give an example? I bet it is possible to prevent that with a broader understanding (by the system) of underlying factors.
Well, that’s entirely possible! I think it’s worth trying anyway.
My theory is: We know a fair amount about what kind of AGI we’ll get if the HTM research program succeeds, and can therefore think pretty concretely about how that would play out. And that kind of brainstorming can affect research prioritization decisions that we can act on today!
The future will have surprises, and therefore we won’t always make the right decisions. But the question is not: “If we make research prioritization decisions by trying to think through the downstream impacts on safe & beneficial AGI, will we make the right choice in every case?” We certainly won’t!! But that’s not the question! The question is: “If we make research prioritization decisions by trying to think through the downstream impacts on safe & beneficial AGI, will we make better decisions than not even trying to think through the downstream impacts on safe & beneficial AGI?” That’s a much lower bar! I think we can pass it!
I’ll jump out on a limb and give an example. There’s a research program where we try to build better computational models of the midbrain and limbic system, what calculations they do, and how they interface with the cortex, and for example how (again computationally) that interaction gives rise to social instincts like sympathy and guilt. My vague thoughts about how these computations and interactions work are in this blog post; I’m very interested in feedback on that btw.
My claim is this: we should try to make a lot more progress on this research program sooner than we finish writing code for a fast, fully-functioning virtual neocortex. Why? Because, as in my blog post linked above (and also @Bitking 's motto of “dumb boss, smart advisor”), the subcortical structures are “steering” the neocortex to do evolutionarily adaptive things. When we have our own virtual neocortex, we will likewise need to know how to properly steer it, such that it stays under our control and does the things we want it to do. To me, it seems prudent to develop the science of steering a virtual neocortex before we actually finish building a virtual neocortex. (Just like how Fermi figured out the science of controlling nuclear reactors (temperature coefficients, control rods, etc.) before he finished building the first nuclear reactor.)
So that’s my out-on-a-limb example, my theory that we’ll (more likely than not) get a better post-AGI future by prioritizing and accelerating research into computational models of how subcortical structures steer the neocortex. But it’s just a theory; I offer it with low confidence, especially because I haven’t discussed it much with neuroscience experts like y’all.
(Reminder yet again that I don’t think this is an essential component of the Stuart Russell argument; I think the argument is valid even for systems with complex human-like goals and motivations. But it’s still worth discussing.)
I’m thinking that there are two main categories of scenarios where we might wind up with a brain-like AGI that has one simple all-consuming goal (albeit lots of different sub-goals that help achieve it). I don’t think it’s inevitable, but I do think it could happen.
The first category would be that the programmer purposely tries to give it a single simple all-consuming goal.
Why would they do that? First, it’s an easy and well-known way to figure out how well your system is working, and thus do automatic hyperparameter search, publish high-impact papers in NeurIPS, beat benchmarks, get tenure and high-paying jobs, etc. Second, because maybe the programmer has a certain goal that they have in mind for the system, that motivated them to build the system in the first place, possibly flowed down from management and investors. “Earn as much money as possible.” “Find a more efficient solar cell.” “Find a cure for Alzheimer’s.” “Hack into my adversary’s weapons systems.” You get the idea. In this case, the programmer would find it natural and intuitive to make the system have exactly that goal.
How would they do that? I imagine that the programmer has a way to detect progress towards the goal, and associates a reward for that in the virtual basal ganglia, and gradually ramps it up higher and higher until that goal (and its derivative sub-goals) dominates everything else.
Now, we responsible foresighted people can go tell this hypothetical programmer that whatever their goal is, they’re better off not building an AGI that single-mindedly pursues it, as the system may get out of control. Unfortunately, the programmer might not be listening to this advice, or might think that’s not true, for whatever dumb reasons you can imagine. (Just as it seems hard to keep powerful AGIs out of the hands of bad actors, it seems equally hard to keep powerful AGIs out of the hands of careless actors, or actors with confused ideas about how their systems will behave.) The problem is actually worse than mere carelessness and confusedness. Up to a point, an AGI single-mindedly pursuing Goal X probably is, in fact, the best way to pursue Goal X! It suddenly stops being the best way when the system becomes intelligent and self-aware enough to seize control of its off-switch etc. So we would be asking programmers of early-stage systems not only to “not be careless”, but actually to slow down their progress, like making less effective systems, making less money, getting less impressive results on benchmarks, etc. etc. I can easily imagine a well-intentioned person trying to have their AGI build better solar cells, deciding that the real urgent risks of climate change outweigh the speculative risk that their AGI is more capable than they realize and may get out of control. Or imagine the person at a struggling startup desperate to avoid laying off their employees, etc.
So that’s the first category, where the programmer purposely tries and succeeds in giving their brain-like AGI a single simple all-consuming goal.
The second category is when the programmer gives the AGI write access to its own neocortical connections. There’s an obvious appeal and excitement and opportunity in trying to allow an AGI to edit itself to improve its own operation. However, there has never been a human or other animal with fine-grained (deliberate and conscious) editing power over their own neocortex. We have no experience of what would happen! And I think one possibility is that they would modify themselves—not entirely on purpose—to single-mindedly pursue a single simple goal.
For example, whenever I read a description of gestation crates, I feel a very strong resolve not to eat pork. But then the resolve fades over time, and sooner or later I’m eating pork again…right until the next time I read about gestation crates, and then the process repeats. I imagine that, if I could edit my own neocortex, and I read about gestation crates, I would not want that resolve to fade, and I would go edit my neocortical models to bring back that memory and feeling very strongly whenever I am about to eat pork forever without the resolve ever fading.
This is a general pattern: Sometimes I feel a strong pull towards some goal, but it doesn’t dominate my life because other times I have different goals that pull me in different directions. If I could edit my own brain connections, then during those times when I am feeling a pull towards a particular goal, I may go in and delete drives that conflict with that goal. This would make me pursue the goal more confidently and consistently in the future … and then maybe later on I would be even more aggressive at deleting drives that conflict with that goal, until the process spirals out of control and I wind up as a single-minded extremist.
(Or maybe I would foresee this whole process playing out, and not do any of that, but rather throw my neocortical-connection-editing machine in the garbage. Hard to say for sure what would happen!)
We’re used to having contradictory goals and context-dependent goals—it feels like an inevitable part of cognition—but it seems to me like that would at least potentially be systematically driven away in an AGI with deliberate control over its neocortical connections.
So, as before, we can go take out advertisements and give speeches to warn all the AI programmers that it’s a bad idea to give brain-like AGIs direct write access to their own neocortical connections. But just like a few paragraphs above, those programmers might not be listening to us, or they might not believe us, or they might not care.
But is this really an AGI then?
A real AGI needs to be able to attempt everything a human can, and probably more. You should just as easily be able to ask it to install Winblows on your PC as to fix the leaking tap in the bathroom, without telling it where the PC is or which tap is leaking. And if the AGI needs some particular tools or material you don’t have handy, it needs to be able to go out and fetch them.
Anything short of that is not an AGI.
And having the required knowledge (i.e. the world model) to do such a diverse type of operations, will allow it to reconsider certain potential harmful sub goals. Because it might need to know what walking down stairs is to install your PC, or what a credit card is to fix your leaky tap. And what cuing in line means. And how to be polite to someone when asking directions to the hardware store, or what size washers are needed for a certain model of water tap.
Think about it: if you had that power, but you also knew it could result in a potential lethal outcome, wouldn’t you do some tests first? Wouldn’t you simulate first? Just like we do with testing prescription drugs before commercializing then? Same with cars and home appliances and so on. Simple people wouldn’t, but you are not a simple person. And AGI shouldn’t be either.
Testing can only be done by a system with a correct model of the world. That’s the rational way to think about intuition. “Am I going to wear a mask today? What is the chance of getting infected in the store at this time of the day? Do I want to risk getting sick if I plan to visit my elderly parents next week?”
A simple AI with a limited world model can not see that far in the future. But we want to make an AGI.
8 posts were split to a new topic: AGI and Wireheading