“Intro to Brain-Like-AGI Safety” blog post series

Hi everyone! I just finished a 15-part blog post series:

I think it hits on some themes that y’all would find interesting, and that have come up here at HTM forum. (And in some cases, this forum is where I first learned about them!)

The primarily-neuroscience posts are 2,3,4,5,6,7,13, while the rest are more directly about AGI safety.

Here are post titles & summaries:

1. What’s the problem & Why work on it now?—Covers all the fun (non-neuroscience) basics like “What does AGI mean?”, “What does AGI safety mean?”, “What does brain-like AGI mean?”, “Why are we talking about AGI safety when AGI doesn’t even exist yet?”, and “You can’t seriously believe that trope about AGI sending robot armies to battle humans to extinction, like as a literal thing in real life, can you??”

2. “Learning from scratch” in the brain—I define “learning from scratch” as particular family of computational systems / algorithms, a family that includes any ML algorithm initialized from random weights, and also includes pretty much anything you’d think of as a “memory system” (e.g. a blank hard disk drive). I argue that 96% of the human brain by volume—basically the telencephalon and cerebellum—“learns from scratch” in this sense. This post also touches on cortical uniformity, and its even-more-obscure cousins “allocortical uniformity”, “striatal uniformity”, “pallidal uniformity”, and “universal cerebellar transform”.

3. Two subsystems: Learning & Steering—Following up the above, I claim that the brain is split into two subsystems, based on whether or not they “learn from scratch” as defined in Post #2: the “Learning Subsystem” (telencephalon & cerebellum) which “learns from scratch”, and the “Steering Subsystem” (hypothalamus & brainstem) which doesn’t. This two-subsystem take has some resemblance to triune brain theory, Jeff’s “New brain / Old brain” distinction, @Bitking’s “Dumb boss / smart advisor”, etc., but I think my version is a really nice clean conceptual distinction, and offers elegant insights not only into mammal brains but even fruit fly brains. I also include a section directly responding to Jeff’s argument in A Thousand Brains that AGI wouldn’t pose a risk of catastrophic accidents, and also a section arguing that brain-like AGI is probably not centuries away but may arrive even in the next decade or two.

4. The “short-term predictor”—A “short-term predictor” has a supervisory signal (a.k.a. “ground truth”) from somewhere, and then uses a supervised learning algorithm to build a predictive model that anticipates that signal a short time (e.g. a fraction of a second) in the future. I talk about how these can be implemented in the brain, and a few of the functions that I think they serve, including my grand theory of the cerebellum.

5. The “long-term predictor”, and TD learning—I claim that you can take a short-term predictor, wrap it up into a closed loop involving a bit more circuitry, and wind up with a new module that I call a “long-term predictor”. The way it works is closely related to TD learning. I claim that there is a large collection of side-by-side long-term predictors in the brain, each comprising a short-term predictor in the telencephalon (but only in certain parts of the telencephalon, like the amygdala, medial prefrontal cortex, and ventral striatum) that loops down to the brainstem, and then back via a dopamine neuron. For example, one long-term predictor might predict whether I’ll feel pain in my arm, another whether I’ll get goosebumps, another whether I’ll release cortisol, and so on.

6. Big picture of motivation, decision-making, and RL—Here I fill in in the last ingredients to get a whole big picture of motivation and decision-making in the brain. There’s also a section in which I argue against the common idea (close to what Jeff often says) that the Learning Subsystem is the home of ego-syntonic, internalized “deep desires”, whereas the Steering Subsystem is the home of ego-dystonic, externalized “primal urges”.

7. From hardcoded drives to foresighted plans: A worked example—I was concerned that Post #6 was too abstract, so here I work my way through a simple, concrete example: I ate a yummy cake a couple years ago, and now I want to eat that kind of cake again, and so I devise and execute a plan to make that happen. What’s happening under the surface during each step of this process, in the Post #6 model?

8. Takeaways from neuro 1/2: On AGI development—Given the discussion of neuroscience in Posts 2-7, how should we think about the software development process for brain-like AGI? Some relevant topics here include training time, the importance (and safety problems) of online learning, and whether we should expect programmers to do an outer-loop search analogous to evolution.

9. Takeaways from neuro 2/2: On AGI motivation—Given the discussion of neuroscience in Posts 2-7, what lessons do we learn about how the motivation of an AGI would work? I dive a bit into (what I call) “credit assignment” (i.e. changes to valence and other learned visceral reactions), and the question of whether AGIs will want to wirehead, and more generally why AGIs will not necessarily be trying to maximize their future reward, along with a few other topics.

10. The alignment problem—Suppose we have a particular thing that we want our AGI to be doing—“clean my house”, “invent a better solar cell”, or more simply “do whatever I would find most helpful”. How do we design the AGI such that it wants to do that particular thing, and not something totally different? This open problem is called “the alignment problem”. I discuss lots of the reasons that the problem seems hard: “Goodhart’s Law”, “Instrumental Convergence”, “Inner Alignment”, misinterpreted reward signals, wrong reward signals (e.g. rewarding the AGI for doing the right thing for the wrong reason), “ontological crises”, the AGI manipulating its own training process, and more.

11. Safety ≠ alignment (but they’re close!)—In my terminology, “AGI alignment” means that an AGI is trying to do things that the AGI designer had intended for it to be trying to do, while “AGI safety” is about what the AGI actually does, not what it’s trying to do. Safety and alignment can come apart in principle, but I argue that in practice, alignment is more-or-less necessary and sufficient for safety. For example, intuitively it seems that a simple solution to out-of-control AGIs is to build an AI in an air-gapped box, and power it off if it tries anything funny. However, on closer examination, this “solution” turns out to be hopelessly inadequate.

12. Two paths forward: “Controlled AGI” and “Social-instinct AGI”—I suggest two broad research paths that might lead to aligned AGI. (1) In the “Controlled AGI” path, we try, more-or-less directly, to manipulate what the AGI is trying to do; (2) In the “Social-instinct AGI” path, our first step is to reverse-engineer some of the “innate drives” in the human Steering Subsystem (hypothalamus & brainstem), particularly the ones that underlie human social and moral intuitions. Next, we would presumably make some edits, and then install those “innate drives” into our AGIs. I talk about some relevant considerations, and conclude that we should pursue both research paths in parallel. I also talk about “life experience” a.k.a. training data, and why we can’t just get safe AGI merely by raising it in a loving human family.

13. Symbol grounding & human social instincts—A key part of the “Social-instinct AGI” path would require reverse-engineering circuits in the hypothalamus and brainstem that underlie human social instincts. I talk a bit about how these circuits might work, with a strong emphasis on the open question of how these circuits solve a certain “symbol grounding problem”, and end with a plea for more theoretical & experimental research.

14. Controlled AGI—Here I switch over to the “Controlled AGI” path mentioned above. I don’t currently see any promising-to-me paths forward to solve this problem, but I talk about some intriguing proto-ideas, like systems to continually refine the AGI’s goals when it hits edge-cases, or building tools to directly make sense of an AGI’s giant world-model.

15. Conclusion: Open problems, how to help, AMA—I list 7 open problems from the series where I strongly endorse further research (two are traditional neuroscience, two are traditional CS, three are directly about AGI). Then I talk about practical aspects of doing AGI safety (a.k.a. AI alignment) research, including funding sources, connecting to the relevant research community, and where to learn more. I wrap up with some takeaway messages.

Happy for feedback, pushback, and discussion!!


Thanks for the summary! Given the moral norms of our society, would you want an aligned AGI? Has there been any major shift in Western morality since the USA dropped two atomic bombs on civilian populations? Would an AGI send more or less weapons to Ukraine? Why would technologists have any clue about how to solve these issues? It seems it is ourselves, not the AGI, that we need to learn to control.

1 Like

Yeah, I put this diagram in the first post:

I feel pretty strongly that “the AGI is doing something that nobody wants it to do” is worse than “the AGI is doing something that at least somebody wants it to do”.

You have to be awfully cynical to believe that a catastrophic accident with out-of-control AGIs that results in human extinction is better than the alternative. Right? (Granted, some people do believe that—this gets into a very weird topic involving so-called “s-risks” and “negative utilitarianism”. I don’t think those arguments are compelling but maybe let’s not go there.)

So my answer to “would you want an aligned AGI” is “Yes that seems less bad than the alternative”, and separately I strongly endorse efforts to ensure that AGIs will wind up being used for prosocial ends, whatever form those efforts may take, which is not my own field of expertise.

And my answer to “It seems it is ourselves, not the AGI, that we need to learn to control.” is “Why can’t it be both?”

Hopelessly over-ambitious. When you solve the AGI problem, you do not get an AGI with attitude or morals. You get a clever piece of software that can drive a car in heavy traffic or on a racetrack better than humans; you get software that can read a programming manual and write code; you get software than can play Dungeons and Dragons or a first person shooter and provide a running commentary.

But you do not get alignment, you get ownership. You get a superpower that can choose to use its AGI for good or evil. That is something to fear.

1 Like

Your diagram is misleading. The topmost blue box is also feeding into the “big challenge I’m talking about”. I suspect the reason this is ignored is because of the insufficient moral education we (you and I are certainly not alone here) receive that allows this slight of hand.

This is overly simplistic, similar reasoning might be along the lines of Western democracy is good because North Korea. The morality of technologists is a reflection of the current general state of morality. The idea that there are only two choices: aligned or unaligned AGI is a false dichotomy, used to avoid asking hard questions (that might point out a need to learn non-technical things).

Another false dichotomy you setup is, either I support your perspective ,or I want the destruction of the world by a non-aligned AGI. This might be how you justify your perspective but it will blind you to the obvious problems of that perspective.

I would certainly not want an AGI aligned with current moral norms. I would certainly not want an AGI intent on harming humans. That you can’t see any other alternative suggests you could read more on ethics. I’m consistently amazed at how technologists feel qualified to write at length on moral issues - I guess those same technologists rely on the humanities for technical advice?

Because using current moral norms to build AGI would be a guaranteed disaster. So the first question is to find a moral framework that would actually allow for safe AGI. Right now we have people who are literal sociopaths in positions of power and giving them more powerful weapons is not in the interests of anyone.

There is plenty of very interesting AI work to do that does not require furthering autonomous systems. Before working toward autonomous systems I think there needs to be an understanding of morality that allows for an alternative to your dichotomy. That requires more work than taking the moral norms of the day and assuming they are sufficient to deal with any problem.

A starting point would be to understand AI as a social science rather than a computer science. Then we can note how under-qualified we are.

1 Like

AGI is not the same as simple machines. That is partly why it is so hard to make significant progress. The idea that technology is amoral is no longer a coherent worldview in contemporary social science. I comfort myself with the idea that, those who don’t see the immorality of claiming technology to be amoral, are unlikely to make any significant progress on AGI.

First, I claim people are going to build AGI sooner or later whether you or I want them to or not. (Agree? See Post 1.) Second, when future people build AGIs, I claim that they run a high risk of making out-of-control AGIs that are self-reproducing around the Internet and intent on harming everyone including their creators, not because they wanted to, but by accident, because avoiding this is a still-unsolved technical problem. (See posts 1, 3, 10, 11.) I’m trying to do work on that technical problem, without claiming that there are no other problems that we also need to solve. And you seem to be arguing that doing this technical work is wrongheaded. Correct?

Now, if somebody writes a philosophy / applied ethics book defining exactly what we want a future AGI to be doing, I would be grateful, but I would expect such a book to be written in English, not source code, and the kind of technical work I’m doing would be absolutely essential to turn that book into source code. I claim that we don’t have a good technical plan for how to reliably ensure that an AGI has any particular motivation (Posts 10-15). I think making this technical plan is a good thing. I truly hope we’ll have that philosophy book I mentioned ready by the time we need it, but even if we don’t, I still want to have that technical plan, because as bad as the motivations of some humans may be, those motivations seem much better than a random accidently-installed motivation, because the latter tends to lead to human extinction for reasons in Post 10. “I’ll take the motivations of Donald Trump over the motivations of the smallpox virus”, loosely speaking.

Can you explain why you think that?

1 Like

There are many reasons that things that could be built do not get built. For all we know someone may have already understood how to build AGI and then not told us. A good society does not invest in building tools to destroy that society - this says much more about our society than our tools.

People from within the AI community are morally biased to look for a justification to continue with business as usual in their privileged situation. They are serving their own interests by passing the buck. Someone else will look after the hard problems while they busily work on enabling immoral decisions.

The kind or work you are doing, if it actually was effective, would be a disaster. You are interested in autonomous systems and the primary beneficiary of those technologies is the military. These are dual use technologies so there are no clean hands when indirectly supporting that effort.

It is the belief that someone else has (or will have) the solution for controlling AI that helps enable the investment into disastrous AI. Long before AGI is available there will be disastrous AI, unless the AI community puts in guard rails that the nuclear physics community did not.

The entire moral framework you are using that leads to conclusions like "I’ll take the motivations of Donald Trump over the motivations of the smallpox virus” in regards to AI needs to be binned.

For many, many decades the fact-value distinction has been shown to be a poor foundation for understanding the world.

This is the Numenta forum. The corporate mission of Numenta is to try to build AGI, and they’re working on it every day.

It’s quite possible that Jeff Hawkins or other Numenta employees are reading this very thread. You are welcome to try to convince them to stop trying to build AGI, and instead shut down their company, or pivot it to a different type of research. Go ahead! Be my guest!

After you convince Numenta to stop trying to build AGI, you can move on to likewise convincing Demis Hassabis, and Yann LeCun, and Sam Altman, and Randall O’Reilly, and Geoffrey Hinton, and Josh Tenenbaum, and many thousands more, including people you’ve never heard of who are working in secret military labs.

Needless to say, I don’t expect you to succeed.

Instead, I think that people will build better and better AI systems with each passing year, and publish all the details on arxiv and GitHub. Relatedly, I think that people will develop better and better insights into the neocortex’s learning algorithms with each passing year, and put them on arxiv and GitHub too. I think that, sooner or later, one person will know how to make AGI, then ten people will know how, then a hundred, and we’ll get to polished, hardware-accelerated, turn-key AGI code published on GitHub and integrated into PyTorch, if we survive that long. And I think there’s nothing that anyone can do to change the fact that that’s where we’re going, and the clock is ticking until we get there. If you want to go around advocating against AGI research, I think the best you could hope for is shifting that date infinitesimally later, and more likely you wouldn’t even do that, and you’d just piss a lot of people off.

Lots of people already have this belief—look at Jeff Hawkins’s recent book for example. I think that they have this belief for reasons that do not stand up to scrutiny (see my Post #3 for a response to Jeff’s book argument), and I think that I’m doing much more to reduce the prevalence of this belief than to increase it.

I still don’t understand what you’re getting at. Can you walk me through it like I’m an idiot? To reiterate where we’re at: I copied a diagram in my comment here, and you’re arguing that the rightmost blue box is an input to the red box, and I don’t understand why. By the way, I’m defining “accident” as “thing that the people who were building the AGI did not want to happen”, for example an AGI getting out of control and killing literally everyone, including the people who built the AGI. Do you agree with that definition of “accident”? Again see Post #1 for much more discussion on this topic.

I found chapter 13 to be particularly interesting, although less well researched. I think that the “controlled AGI” path is not viable for an unsupervised & autonomous agent, because it does not answer the question “who should be allowed to control it?” Either someone who should not be allowed control over it will gain control and walk off with your expensive robot, or someone will need to control it (e.g. for safety purposes) and it will ignore them. The correct answer to “who should I answer to?” is complicated and context dependent.

I also think that solving the “symbol grounding & innate drives” problem is particularly rewarding, because it allows us to hardcode all sorts of knowledge into it.


Where does Numenta claim they are building AGI? They are trying to build intelligent machines. There are many safe paths that can take.

Where do you get this idea that I am going to try and convince each researcher not to build autonomous weapons. There are plenty of people aware of these issues and working to try and address them. It will require a cultural shift within the tech industry but that is not impossible. Maybe you are projecting your own belief that you have the future of safe AGI in your hands.

Because you are convinced that it can be built, you believe it must be built. You also believe you know how this will play out. I am sure you are wrong.

From what I see you can’t queston yourself on this point. You admit you are avoiding the hard questions - that is exactly what Jeff does in his book too. Focusing on the easy questions is like arguing over the design of the detonator while ignoring where it is being used.

I don’t have the time to write the 13 blog posts :wink: Consider that your assumption of separating out the “easy” technical issues from the “hard” ethical issues is invalid. As we can see in this discussion you are committed to a particular moral framework and that informs your priority on the technical issues (and to some extent your approach to “solving” them).

AGI is software and software is complicated, especially when you don’t know the algorithm. But software has no morals, they are provided by the people who own and use it.

If you won’t accept AGI until it has all the behavioural features of an adult human then the world of AGI will simply pass you by.

1 Like

It would be nice if things were that simple and in the 19th century that was probably still an educated opinion. If you are expecting to see AGI in a piece of software then you may be waiting for a very long time. The current AI algorithms already implement social processes that require AGI, they are simply augmented with humans when needed to adapt. The mass surveillance and mass censorship in countries like the USA and China would not be possible without them. The risk of AI is far more present and far more dangerous than simple minded technologists understand. This is one reason why AI should be a social science, it might give half a chance for the technologists to understand what they are doing. I am far more concerned about autonomous AI than AGI. Before engineers have a hope of building AGI they themselves will need to become generally intelligent and that is obviously not happening any time soon as we strive toward the idiocracy. It is the AGS that will get us before AGI.

At least read Manufacturing Consent


Here is another interesting instinctual behavior, related to your 13th blog post.

What I find interesting about this is the flexibility displayed by animals when given unconventional nesting sites and materials.


@steve9773, thanks for your blog posts. We don’t see nearly enough work on this. And also thanks for acknowledging this is only a work in progress.

And @markNZed, I know you’ve researched this topic more in depth than most here. But don’t you think it would be better to address the issues in a more diplomatic manner? Other people will read this thread and will have a lot to benefit from arguments and counter-arguments.


This idea is borderline science fiction / fantasy, but here goes anyways.

AGI could be used to mitigate the bad-actor problem, by building service animals that specialize in managing people. In particular they needs to understand how to deal with the insane ones. I estimate that somewhere around 1/4 to 1/2 of everyone has or has had some form of mental issue, although most people keep their problems in check. There simply are not enough trained therapists in the world to deal with the amount of crazyness that goes around.

Mental Issues


Toxoplasma gondii - Wikipedia

  • Lives in domestic cat’s guts, in other hosts it reduces their fear of death.
  • CDC: more than 40 million people in the United States may be infected (that’s 12%).

Lead, It’s surprisingly prevalent

Long Covid Brain Fog

  • Somewhere between 1/5 and 1/10 recovered patients have persistent symptoms.
  • What fraction of them have brain fog?
  • Vaccination status & severity of infection don’t appear to reduce the incidence of long covid.
    • Though vaccines do prevent you from getting it in the first place.
  • 100% of the human race is going to be exposed to the virus.

Depression & Suicide

Addiction, especially opioids

  • Prescription painkiller are the real gateway drug


  • Cheap and easy to get, also addictive.
  • Strong correlation with violence.


  • Releases adrenaline, the “fight or flight” hormone.
  • In moderation it’s fine, but in excess and when combined with other issues this could be an aggravating factor.

Sociopathy & Personality disorders



  • Exercise is important for long term health, including cognitive & emotional health.


  • Mind control is real (but it’s not like it is in Hollywood). There are techniques for controlling your own thoughts, and other people can use those same techniques against you.
    • There are handbooks for how to run a cult.
    • Meditation to silence unwanted thoughts. For example some cults use excessive singing to keep you from having any time to think on your own.

I consider the greatest threat posed by an AI that has sentience, and by that I mean introspective consciousness, insanity. Once a psychopathic AI gets turned loose on the Internet it’s game over.

If diplomacy worked then people would have listened to the many diplomatically correct takes on the topic. I see some utility in not seeking to be diplomatic when it comes to critical thinking about morality - being polite is one of the ways immorality masquerades as morality.

I’m not trying to win a popularity contest. Maybe if I was trying to appeal to an audience I would also prefer the mainstream view. I think it is only fair that Steve sees how morally outraged I am :wink:

1 Like

Agreed, except that my outrage is intellectual. All those moral debates come down to preventing things we don’t like, without actually knowing what we want as an ultimate fitness function. Thinking in terms of constraints vs. goals. Which is ridiculous.