Very open to discussion and feedback on any and all aspects of this!!!
Does machine intelligence pose any risk for humanity?
Fear, Uncertainty, Doubt.
You say: “motivation is in general an unsolved problem” and then go on to explain in great detail how an ill-motivated robot could do bad things. This hardly seems like a well informed opinion…
If this stuff really concerns you that much, then perhaps you should spend some time studying the basal ganglia? The BG is responsible for making decisions; it is as you believe the “Judge”. However it is not the true source of motivation. The BG performs a reinforcement learning algorithm which requires reward & punishment signals as inputs. These input signals are the actual sources of motivation. In specific serotonin is released when you experience good things, and the basal ganglia seeks to maximize the release of serotonin.
For example, let’s say I set up a machine intelligence to be the CEO of a company. This being America […] OK, maybe you’ll say “anyone could have seen that coming, obviously maximizing stock price is a dumb and dangerous goal”.
This being 'murica you can be sure that the DOD and FBI will also have AGI at their disposal, and they won’t take kindly to a rogue agent “engineering a new pandemic virus” or “hacking into military systems”. In your hypothetical scenario there is a severe power imbalance between the rogue agent (god-like) and everyone else (merely-mortal) but that’s just not realistic. Every other AGI is going to see the rogue CEO as an existential threat and react accordingly.
Its easy to think “if only I had this superpower, then I could get away with anything” but, in fact, if you have super powers then probably someone else has super powers as well.
Oh I’m absolutely planning to study the basal ganglia more! My current mental model of the basal ganglia is (1) it stores a database of what patterns of neocortical activity seem to be good vs bad, (2) it updates the database based on a reward signal that comes from elsewhere in the brain, (3) it uses the database to put its thumb on the scale of neocortical activity, so that when there are simultaneous competing patterns of neocortical activity, the BG suppresses the one that predicts lower reward and/or amplifies the one that predicts higher reward. This is highly provisional, but I don’t think it’s so different from what you believe, is it?
I wouldn’t say that’s the “Judge” in the brain, because the Judge also needs to include everything that goes into the reward signal, like practically the whole brainstem, and the amygdala, etc. etc. Actually normally I lump the BG with the neocortex, thalamus, hippocampus, etc. and call that whole thing “the neocortex subsystem”, and then that subsystem does both RL and self-supervised learning. And then instead of the word “Judge” I would just call it an RL reward signal. It’s just that in this particular post I wanted to stick closely to what Jeff wrote in the book, without introducing any extra complexity or baggage.
By the way, I thought phasic dopamine was supposed to be the reward-prediction-error signal. What’s the relation between serotonin and phasic dopamine, in your picture? Sorry if that’s a stupid question—like I said, this stuff is still on my to-do list
This being 'murica you can be sure that the DOD and FBI will also have AGI at their disposal, and they won’t take kindly to a rogue agent “engineering a new pandemic virus” or “hacking into military systems”.
OK. It’s the future. You just got a new job as a programmer at the FBI. They’ve just set up a new huge powerful supercomputer. Should be helpful for hunting down and erasing those annoying rogue machine intelligences that are self-replicating around the internet!
In this new supercomputer, 90% of the processing power is a bank of ASICs designed to run HTM version 72, a neocortex-like algorithm which has the potential to learn and grow over time into an unusually knowledgeable, fast, and savvy machine intelligence.
The other 10% of the processing power is reserved for the motivational system / reward signal calculator / “Judge” box / old-brain-equivalent / whatever you want to call it. You want the resulting system to be motivated to follow properly-issued FBI commands—following them in both letter and spirit. How do you build that motivational system ( / reward signal calculator / “Judge” box / old-brain-equivalent / whatever)? What code do you write?
This isn’t a “gotcha” or rhetorical question, I really want to know the answer. What are your thoughts?
Yes, that’s correct.
My understanding is that dopamine is the “reward-prediction-error” signal, and serotonin is the “reward” signal which is being predicted. The reward (serotonin) is signaled when you receive the reward, in response to hard-wired stimuli such as food or sex.
I am still reading Jeff’s new book but I will offer my current view of subcortical structures and edit if I see any differences after reading the book.
I see that the subcortical structures are like “training wheels” to kickstart the learning programs in the cortex. In the beginning the subcortex does the heavy lifting of running the body. As the cortex becomes more capable it slowly takes over the old brain and overrides it on most things. As has been pointed out - you can only force yourself to hold your breath so long; eventually the old brain fail-safe mechanisms reassert control and take over.
You speak of reinforcement learning. I don’t think that is as easy as “dopamine receptors drive reinforcement” learning.
Take the amygdala - it comes preprogrammed with visual primitives, and perhaps other sensory primitives. It is heavily interconnected with the hippocampus - the digested version of the world at the top of the WHAT/WHERE streams. The information has received as much analysis as the cortex is capable of performing on your experience.
I say that a key feature of the hippocampus is to buffer episodic experience. You can’t know if an experience is good or bad until it is over. The amygdala is matching up the stored primitives and sending back good/bad judgments to mix in with the current episode.
In sleep, this mix of episode and good/bad judgment is pushed back onto the cortex by the process we call dreaming.
This process happens to every experience throughout your life and builds judgment of good or bad onto the object particles in the cortical stores. All of them.
You generally don’t reason out every situation from first principles - you use the building blocks of the “best flavored” sensed and remembered objects to formulate a plan of action in the ventromedial prefrontal cortex.
The good/bad features in the amygdala are somehow updated as you grow older; this is one of the areas I have been studying.
In regards to how that fits with cortex that is divorced from the fearful old brain - humans with damaged amygdalas have very poor judgment. I have trouble envisioning a useful intelligence that does not have the same basics drives as a human every forming empathy or a value system that we humans would be comfortable with.
Oh, I have a post about the role of the amygdala in setting goals & motivations:
I’ve also been working my way through A systems-neuroscience model of phasic dopamine which has a lot about the amygdala, BG, habenula, and much more. I really like it but there’s a lot to digest … I plan to write an “explainer” post when I’m done.
I don’t disagree with your point in principle (basically you are describing a wireheading problem – just like humans have invented things like pornography to “trick” our subcortical networks into issuing rewards). But I would like to point out that this particular example is rather weak IMO, as it seems like a pretty naive way to design the “judge”. A proper judge would need to learn from the world in parallel with the simulated neocortex. As such, it would learn from past episodes that if it is bull-headed about ignoring the robot owner, that would result in a “bad” experience which it wouldn’t want to repeat (I assume the owner has some reward/punishment mechanism, or that the robot is built to crave social acceptance, etc). Likewise, it would learn that being attentive to the robot owner when he changes his mind results in a “good” experience.
would require careful parenting, here…
Thanks!! Yeah, I was trying to make a pedagogical / illustrative point, i.e. that the robot may aggressively prevent its goals from changing, even if the programmer did not try to make the robot aggressively prevent its goals from changing. If you’re already on board with that (and it sounds like you are), then the implication is that the programmer needs to try to make the robot not aggressively prevent its goals from changing. And the question is, how does the programmer do that?
The “owner has some reward/punishment mechanism” idea is not crazy. How about, let’s give the programmer a remote control, with “good” and “bad” buttons, and these are reward signals for the robot’s RL system? Here’s one problem: the robot can wind up in any of several “states of mind”:
- The robot comes to “like” when the programmer approves of its behavior and presses the button
- The robot comes to “like” when the programmer presses the button
- The robot comes to “like” when the button is pressed
The third might lead to the robot killing the person and pressing the button (wireheading again!), the second might lead to the robot enslaving the human and forcing them to press the button, the first might lead to the robot being deceiving the human when it knows it won’t get caught, although that’s probably the least bad. The best of all would be if the human can reward the robot for doing the right thing for the right reason—then there wouldn’t even be a deception problem—but that would require some great advances in transparency / interpretability beyond what anyone knows how to do today (a promising research direction btw!) Failing that, is there a way to figure out which one of these three bullet points will actually happen? I don’t know. Or maybe all three would happen, and when the robot considers stealing the button it “feels conflicted”? How will it resolve its internal conflict? I dunno … I’m inclined to think that the button is a promising approach if and only if we have that breakthrough in transparency / interpretability I mentioned above.
You also mention building a robot “to crave social acceptance”. That’s also an interesting and potentially-promising idea. In particular, if the robot has the whole suite of human social instincts implemented with sufficiently high fidelity, then presumably it will have behaviors in the normal range of human behaviors, which at least isn’t terrible, and is potentially quite good if we go in and turn off jealousy and crank up conservatism, sympathy, etc. Then the question is: how do you write code for a robot to have human-like social instincts (or at least, to crave social acceptance)? What’s the algorithm for that? I’ve thought about this too, although I haven’t gotten very far and I have a long reading list to try to learn more…
Naively I’d say: maximize the number of rogue AI’s that it’s caught. It’s a very simply goal to understand and to program, as simple as “maximize the share price”. Unfortunately it suffers from similar hypothetical problems as with maximizing the share price.
The true answer is that the AIs motivational system should be a lot more complex, similar to real animals. It should have multiple goals which at times may conflict with each other. Use many heuristics to determine when to give rewards/penalties. For example, the CEO’s goal should be to perform well at its job, and share price is one heuristic for job performance. You could use the number of lawsuits against the company as another heuristic to penalize the CEO for bad behavior.
Real animals use many reward heuristics and they are based on multiple sensory modalities, which together make the reward system more robust, and it makes it harder for animals to trick their old brain into giving them rewards. For example: pornography isn’t as good as the real thing because there is no sense of touch or taste or smell. Also there is no interactivity which is I think necessary to trigger some of the reward heuristics because the old-brain heuristics can tell the difference between watching versus participating. Furthermore there often are happy/rewarding emotions between partners which are missing when watching pornography (unless of course it’s your partner in the image/video).
But the goals did not change! The robot gets no reward for making coffee, it did not even want to make coffee until I asked it to. It gets rewarded for pleasing me by doing what I tell it to do. If I decide I don’t want the coffee and the robot makes it anyways, then the robot is not going to get a reward. The robot might think its going to get a reward and it might throw a hissy fit when it doesn’t. But regardless I won’t be pleased (and so it won’t be rewarded) until it completes the new objective.
I think that the worst possible outcome of this scenario is that the robot learns that I’m never going to be pleased, no matter which beverage it brings me, and so it becomes unmotivated to do anything at all.
I’m following this discussion with considerable interest, but I would make one suggestion. We’ll be doing really well if the first AGI operates at the level of some small-brained animal. The algorithms are usually thought to be a property of the mammalian cortex, but in all probability the same algorithms operate in birds, and therefore most likely in common ancestors. Cortex or no, this stuff has been around a while.
Also, there is not just one algorithm. There is (at least) 20 or more sensory modalities, place and physics, plus more algorithms on the motor side. That goes throughout the animal kingdom. Focussing on HTM alone and human-like behaviour seems to me blinkered.
High schooler here, I have been doing ML for quite some time and got really interested in HTM’s and how we could possibly build an actual (even if somewhat limited) AGI using it.
My opinion is worth jack shit, but still, I would like to put my 5 cents here:-
In the blog post, I saw that the author had a pretty vague idea of how an actual AGI might play out. The basics resembled that of old school RL stuff just on a much more scaled-up complexity.
I wanted to order the book but it would be here in a month with pre-orders, so I decided to join this community anyways and learn whatever I can. BUT - if the blog post indeed is a reflection of how actual AGI can be created, I find the reasoning very vague. I do see that the focus is more on replication of Biology (Hawkins saying that even if we simulate some stuff, we might be able to still achieve a powerful AI). While emulating mother nature is not bad, I do not believe that this is the most efficient way.
If the aim is to create just a small intelligence within 20 years that can’t think/do much except awe the humanities people, that seems appropriate.
What I would expect is rather an insight on how exactly the random mutations lead to us being intelligent and therefore increasing our chance of survival.
We human beings are needlessly complicated, our intelligence/consciousness allows us to think in different ways and philosophize upon our existence - none of which adds much survival value.
Rather, if we had a much simpler brain (like that which can’t self-contemplate, philosophize, or appreciate art) just a brain that can form long-term strategies, our survival would already be assured (we would still be able to build basic housing structures, mediocre cooking of food, etc.)
So, my POV is that we should put the effort in actually replicating the process of how our brain suffered millennia of evolutions to get where it is - simulate&accelerate seeming like a good roadmap to work through. Because our current brains haven’t evolved just to ensure our survival - there is something else.
some missing parts that we have entirely missed that is a good motive to allow us.
→ Either nature does not work to simply guarantee survival
→ or there is some other function of intelligence (that may be connected to our reason of existence) that we have evolved to be actually conscious.
Sorry If I said something wrong, I wouldn’t mind if you all correct me about some of my assumptions
Yes, this cuts to the core of the problem. And this isn’t theoretical, it is a problem now even today with weak AI (if there is a simpler way to “cheat” to reach the solution than the one the developer intended, the AI will tend to find it).
For this, it probably is worth studying the behavior of social animals. Nature has had a long time to work on this problem, so there are likely some useful strategies to be learned.
While a perfectly reasonable theory, just thought I’d mention that this contradicts one of the main points in the book. Unless of course you are referring to sub-cortical networks.
I’m not willing to invest either the time or the money to read the book, so if you care to summarise some key points that would be helpful.
My interpretation of what I’ve read here is that the theory provides for one universal data reprsentation (SDR), one computational unit (cortical column), and one known algorithm (HTM/SM). I did not see anything to say there are no other algorithms. To me it seems obvious that one algorithm is not enough, think visual imagery vs temporal structure language/music vs 3D/physics vs motor/kinetics vs theory of mind.
I see no reason why the same CC ‘hardware’ should not run different ‘software’, we just haven’t found it yet. Time will tell.
I’m still reading it myself (still in Part 1 “A New Understanding of the Brain”), but so far I would summarize it with a quote from page 24 “all the things we associate with intelligence which on the surface appear to be different, are, in reality, manifestations of the same underlying cortical algorithm”. When I finish the book, I’ll write up a more detailed summary.
I think the thing you have in mind is a robot whose goal is “Do whatever my owner wants me to do”. Right? If so, I agree that this is a good goal for a robot to have—it does not have any of the obvious problems that you get from more direct goals like “fetch the coffee”. (It might have very non-obvious problems. Or maybe not, maybe that goal is just totally safe. This is an ongoing controversy, not worth getting into here.)
Then the question is: how do we make a robot whose goal is “Do whatever my owner wants me to do”? If we had a good answer to that question, it would be a giant leap forward, in my book.
Agreed, this is something of a holy grail. I’m thinking the area of sentiment analysis might be a good starting point. If one could train up some weak AI networks to recognize emotional states/reactions of humans (across multiple modalities, to dmac’s point), that might be a good foundation for such a goal.
I think steve has a point, a safe-AGI problem could be hard. And if we imagine for a moment that an AGI is a potentially dangerous tool, then we’d better think about it some more.
However, I’ve just thought about a possible reason why we could see it as a non-problem, as Jeff probably does:
You assume a coffee-fetching humanoid AGI, with drives to be of some use. So its positronic brain runs a cortical HTM simulation on top of a dumb “drive-and-judgement-dispenser”. And yes, maybe there’s no good-and-safe way to make an expert system in charge of a “drive-and-judgement-dispenser”.
Jeff assumes on the other hand a positronic HTM cortical slice on a dish, and not necessarily with a man in the middle kind of “weakness” : it’s just some really efficient and evolved pattern-recognizer with an ability to predict the future better than we could.
i’ll translate for you:
- “Will it rain next month, Al ?”
- “Sure, boss. Rain it will”
No judgement here. No drive. You’re the the basal ganglia, asking. And maybe you don’t even bother rewarding the poor slice after that.
If you can envision such a use for an AI, then the questions could become:
- Is it realistic to hope to reach AGI-level prediction performance without an automated Reward System at runtime (existence of performant deep nets would tend to answer “yes”, but then they don’t even require to crack the cortical column as Jeff hopes to)
- Is it realistic to hope to use the same kind of setup, when what we ask of it is to autonomously drive safe, explore Mars, or fetch coffee ?
I honestly don’t know at this point.
I see where you are coming from, but from one perspective, this interaction itself may require some “drive-and-judgement-dispenser”. The question involves sensory input, and the answer involves motor output. I’ve always found it difficult to imagine motor output without some sort of drive – if there is nothing to drive the system to choose one action over another, how does a system choose what action to take, or to take any action at all for that matter.