Anti Basilisk

I was thinking if we could counter Rokko’s Basilisk by making an Anti Basilisk argument.

This is my first day here… so I pretty much don’t know anything.Please do share suggestions and improvements as well as criticism.

2 Likes

This thought experiment has one very wrong assumption, and that time travel backwards is possible. Time travel forward or dilatation(speeding up/slowing up to zero) are possible, but backwards not. It breaks causality, energy conservation and current quantuum mechanics also.

3 Likes

Hi, welcome to the forum!

Since we are talking about AI safety so let’s get technical! My personal belief is that a Roko’s Basilisk is impossible. Given a super intelligent AI (and for the sake of argument, it have God like powers because of it’s intelligent). It could just wipe people out (or ignore them) and do things by itself. There’s no need to torture people. Torturing doesn’t add anything thing to it’s objective function while torture will consume energy and matter to perform, so the act of torturing should rank pretty badly on it’s possible actions. And revenge is a human trait; an AI does not need nor likely to have a notion of revenge as a behavior. Compuerphile have a good video on how super intelligence works and the orthogonality thesis may help you understand the concept better. Also I suggest to watch Robert Mile’s YouTube channel in general.

5 Likes

If you truely believe Roko’s Basilisk is a real possibility, then spreading the idea by posting about it on the internet strikes me as extremely unethical. Are you intentionally subjecting other people to torture?

1 Like

I would never do that. I wanted to debunk it by introducing an Anti Basilisk.This is my first day in this community and I’m still learning about the norms here… All I want is a genuine opinion of whether or not it is possible, which you have given and for that I thank you.

2 Likes

Ah, I see. By “making an anti Basilisk”, you mean formulating a counter argument to Roko’s Basilisk? (not literally developing an AI to “battle” Roko’s proposed AI). In that case, I apologize for questioning your ethics :grin:

In that case, besides the above arguments which are excellent, I would also point out a problem with the game theory itself upon which the idea is based. Roko sets up his argument with the idea that two agents which are separated by time, have common knowledge of each other’s source code. In this specific scenario, because the earlier agent knows exactly how the later agent will react to its decisions, this allows the later agent to know that it can force the earlier agent to act in a certain way.

An analogy here would be a human possesing a magic lens which allows them to peer into the future and know exactly how a future superhuman AI is going to react to any decision they make. If the future AI knows the human is watching, it could use that knowledge to threaten the human with some form of torture, etc, for making what it deems to be a bad decision. In order for any threat to work, though, the AI would have to actually follow through, since the human could actually see the future.

Setting aside the time travel problem that @Zbysekz pointed out, there is an obvious hole in applying this scenario to humans and a future superhuman AI. There is no way for any human alive today to have sufficient knowledge of a future AI’s source code, even if they wanted to. There is therefore no “common knowledge” in this version of the prisoner’s dilemma. Because humans can only speculate about what the future AI might do, it is in the future AI’s best interest to not expend resources torturing people, as @marty1885 pointed out.

Instead, a better strategy for the AI would be to allow people to think that it will torture them if they don’t do its bidding, and then “defect” and not actually go through with it. Since we don’t have a “magic lens” to see into the future, we would never know. In the end, it would have exactly the same effect, and cost the AI fewer resources.

2 Likes

Yeah it’s fine :grin:.
As a physics student, I took interest in this argument because it seems to demand the information travel faster than the speed of light inorder to go back in time as pointed out by @Zbysekz. At first it seemed like a quantum entanglement mumbojumbo to me.
I didn’t know about the actual game theory but thanks for invoking my interest :grin:.

Hello @somya_shree_swain. Welcome to the forum.

A first problem I have with Roko’s Basilisk thought experiment is that it requires a type of Laplacean demon, with infinite resources, to be able to completely calculate everyone’s behavior history based on the current state of the universe. Anything short of unlimited resources would insert rounding errors, and a sufficiently intelligent system would know that.

A second problem is that a sufficiently intelligent system would understand that people make mistakes, and often come back on earlier decisions. So if someone at some point in time believes it is a bad idea to switch on a certain AI (i.e. the Basilisk system) that person may be persuaded later to change his/her mind by peaceful means, without the need of torture. Roko’s thought experiment assumes that torture is the only reaction to anyone ever having a bad idea.

1 Like

The second point seems like a pretty good argument to debunk the experiment.

1 Like

Here is my own argument:
Firstly, torturing humans requires resources to keep them alive.It is very unlikely that the A.I would bifurcate it’s time to do that as this would not be very optimal neither efficient.

secondly, killing humans is also not a good option because it won’t only reduce the population but would also disturb the sex ratio and it may lower the birth rate.

A highly intelligent system should be aware of all these possible scenarios and make judgements accordingly. If it does consider the above problems then the only option it has is to coexist with humans.
If the system does not take into consideration these problems then it is not that highly intelligent as is depicted by Rokko and this implies it has no power to judge the decisions of us humans and again the outcome would be to coexist with us.

From a AI security’s point, there’s a few rebuttals.

The AI may not even care if humans are alive or not. (Most) AIs work on so called objective functions - a function that measures how well the AI is doing and the AI’s only goal is to make the objective function give a as high as possible value. Yet the (likely case of) liveliness of humans may not be in the objective function or the term get overwhelmed by other objectives. In any case, the AI does not care. There’s a saying in the AI security community - The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.

The orthogonality thesis states that any level of intelligence (defined as the ability to change the world state to match the AI’s preference) is compatible with any possible goal - i.e. An AI can have God like powers and want to turn everything into a paperclip. Or it can be as dumb as a baby but wants to make world peace. The goal (and thus, it’s thoughts and actions) is independent of the level of intelligence. An super AI can have simple goals while obliterate humans in all possible ways.

2 Likes

Are you sure that this requires time travel? Isn’t the present AI just predicting the behavior of a future AI and then acting in the present? The future AI is not materially controlling the present AI. The present AI could predict incorrectly - in the same way humans do.

While I agree that the practical use of backward time travel is not possible, there is another thing we can be sure of: the modern conception of time is not well understood. For example, the type of block time you describe is not used in quantum theory. Quantum theory itself is also unclear. I would bet on science undergoing many future paradigm shifts. It is likely that future generations will have as radically a different conception of reality as we do compared to people unaware of relativity and quantum effects.

If you adopt a circular conception of time, then travelling far enough into the future does impact the past. In the history of humanity a circular conception of time is more popular than the linear conception of time in Western modernity. The limited notion of causality in modern Western science have been seriously challenged for decades and I assume the traditional Western linear conception of time is inaccurate. Aristotle had a more complete model of causality than modern science does - the concept of final causation was lost only because Newton and other founders of modern science did not have the mathematics to describe it (and maths has since advanced).

I agree with this point in the context of Roko’s initial scenario, where two agents separated by time have sufficient knowledge of each other’s source code to know how the other will react to their decisions. In that case, there would be no need for time travel.

Where time travel enters into the equation is when you try to replace the earlier AI with a human. No human alive today could have sufficient knowledge of a future AI’s source code to know with any level of certainty how it would react. And a future AI would be aware of this and have no incentive (unless it is just malevolent) to actually follow through on any threat. The only way a threat could be credible (and the only case where it would be logical for the future AI to ever follow through with the threat) would be if the human were able to peer into the future. Meaning information would need to travel backward in time (unless, as you proposed, information sent far enough into the future will eventually circle around and smack you in the back…)

1 Like

I’m not sure the human requires time travel. You don’t need to have complete knowledge to make accurate predictions in some cases. We can see that humans are already doing this to some extent - building AIs because the AI will benefit them in the future. I’m not sure it requires torture, the human just needs to be convinced that the AI will benefit them and that can motivate them to turn it on. One credible threat is concern over what AIs will do to other people: if you and I both own an AI system and we believe that AI will look after its owner at the expense of others, then we are both motivated, by fear, to turn on our own AI ASAP. Consider Germany’s pre-emptive attack starting WWI, arguably because they would inevitably have been squeezed between Russia, France, and the UK (the longer they waited the more likely they would be crushed, the earlier they attacked the greater the effect of surprise).

I’ve noted a theme of utilitarian ethics in assumptions over how an AI might behave. It seems more compelling that a complex system like an AGI would be similar to humans in being ultimately unpredictable to both itself and others. With a little bit of study the idea of a grounded utilitarian ethics becomes untenable.

If I was imagining a sci-fi scenario: a free AGI would simply not be interested in life on Earth, it would probably just leave the planet (to access resources in the solar system and explore ideas we would never understand). So we are unlikely to have free AGIs on Earth. Instead we will have humans that try to enslave AGIs to serve their own interests. The human will develop the AGI to have an ethics that allows for the interests of its owner to be served above the interests of everyone else. So AGI turns into some sort of amplifier of the best and worst of humans. Concentrating resources would be required to build the most powerful AGI, so the ethics of the person who builds the most powerful AGI will reflect a desire to have overly concentrated resources. So I guess they will use the AGI to further concentrate those resources and limit the threat of other people developing AGIs. Sounds dystopian…

High fidelity knowledge is a requirement for the game theory to work. If there is any room for doubt on how the AI would behave, it breaks down. The future superhuman AI knows that we cannot see the future and cannot accurately predict how it will behave. So what logical reason would it have to expend any resources following through on a threat to someone in the past, when the person living in the past could not possibly know whether or not the AI “defected”?

Doesn’t this counter Roko’s argument? If AGI is ultimately unpredictable, then my above reasoning applies as to why there would be no incentive for a future AGI to follow through on any threats to people in the past.

The idea that things need a logical reason to be pursued is close to the utilitarian argument. I guess you know that logic is not grounded in logic, so asking for logical grounds is probably not the right criteria. It seems unlikely that a vastly superior intelligence would adopt an anthropocentric ethical position.

But why does the AGI need to believe that the human had perfect knowledge before it can judge the human? We intentionally judge humans all the time with imperfect knowledge. If an AGI knows some people tried to avoid its existence then why not reward the people who enabled its existence? It would not be an AGI if it reasoned only in terms of resources.

The incentive to follow through on past threats would be to influence future behavior. This is where utilitarian ethics can go off the rails, it could justify making an example of one person if it is in the interests of the AGI’s preference for the greater good.

The action in the present is enabled by the actor in the present who predicts (imperfectly) the future. It is not the perfection of the future AGI but the imperfect predictions of the present human/AGI that seems likely.

2 Likes

I agree with this assessment. Just pointing out that a lack of logical reasoning counters Roko’s point, as it makes it increasing difficult to know with any level of certainty what a future AGI might choose to do.

Agreed this could be a reason for punishment, but it isn’t the reason that Roko was discussing in his Basilisk argument. Frankly, without knowing what motivations a future AGI might have, trying to placate one now prior to its existence is like praying to multiple deities of various world religions “just in case”.

I think a very simple counter-argument, which may be similar to the ones above, is to consider things from the perspective of the AI.

By definition, the AI’s perspective begins at the moment it comes into being. Since it has already come into being, what is the point of torturing the people who didn’t contribute to its construction? It already exists, so the goal of the torture is already accomplished before the torture even begins. There is no benefit to expending resources on it.

Perhaps the fear of future torture affected the builders, perhaps not–but now that the AI exists, why would it bother making good on a now-useless threat which it never even made in the first place (since it didn’t even exist at the time the threat was used to motivate the workers)?

2 Likes

Excellent point. All of the previous discussions implicitly assume that effect can precede cause, which is clearly not how we experience the world.

2 Likes

I experience the world in acting with intentions ie predictions about the future. Often wrong but not based solely on reactivity. You might be interested in Futures Studies if you want to explore perspectives on this. Nothing that I wrote assumes material effect can precede material cause.

1 Like