Anti Basilisk

Ah, I see. By “making an anti Basilisk”, you mean formulating a counter argument to Roko’s Basilisk? (not literally developing an AI to “battle” Roko’s proposed AI). In that case, I apologize for questioning your ethics :grin:

In that case, besides the above arguments which are excellent, I would also point out a problem with the game theory itself upon which the idea is based. Roko sets up his argument with the idea that two agents which are separated by time, have common knowledge of each other’s source code. In this specific scenario, because the earlier agent knows exactly how the later agent will react to its decisions, this allows the later agent to know that it can force the earlier agent to act in a certain way.

An analogy here would be a human possesing a magic lens which allows them to peer into the future and know exactly how a future superhuman AI is going to react to any decision they make. If the future AI knows the human is watching, it could use that knowledge to threaten the human with some form of torture, etc, for making what it deems to be a bad decision. In order for any threat to work, though, the AI would have to actually follow through, since the human could actually see the future.

Setting aside the time travel problem that @Zbysekz pointed out, there is an obvious hole in applying this scenario to humans and a future superhuman AI. There is no way for any human alive today to have sufficient knowledge of a future AI’s source code, even if they wanted to. There is therefore no “common knowledge” in this version of the prisoner’s dilemma. Because humans can only speculate about what the future AI might do, it is in the future AI’s best interest to not expend resources torturing people, as @marty1885 pointed out.

Instead, a better strategy for the AI would be to allow people to think that it will torture them if they don’t do its bidding, and then “defect” and not actually go through with it. Since we don’t have a “magic lens” to see into the future, we would never know. In the end, it would have exactly the same effect, and cost the AI fewer resources.

2 Likes