Anti Basilisk

Paul_Lamb · June 7, 2020, 10:39am

Ah, I see. By “making an anti Basilisk”, you mean formulating a counter argument to Roko’s Basilisk? (not literally developing an AI to “battle” Roko’s proposed AI). In that case, I apologize for questioning your ethics

In that case, besides the above arguments which are excellent, I would also point out a problem with the game theory itself upon which the idea is based. Roko sets up his argument with the idea that two agents which are separated by time, have common knowledge of each other’s source code. In this specific scenario, because the earlier agent knows exactly how the later agent will react to its decisions, this allows the later agent to know that it can force the earlier agent to act in a certain way.

An analogy here would be a human possesing a magic lens which allows them to peer into the future and know exactly how a future superhuman AI is going to react to any decision they make. If the future AI knows the human is watching, it could use that knowledge to threaten the human with some form of torture, etc, for making what it deems to be a bad decision. In order for any threat to work, though, the AI would have to actually follow through, since the human could actually see the future.

Setting aside the time travel problem that @Zbysekz pointed out, there is an obvious hole in applying this scenario to humans and a future superhuman AI. There is no way for any human alive today to have sufficient knowledge of a future AI’s source code, even if they wanted to. There is therefore no “common knowledge” in this version of the prisoner’s dilemma. Because humans can only speculate about what the future AI might do, it is in the future AI’s best interest to not expend resources torturing people, as @marty1885 pointed out.

Instead, a better strategy for the AI would be to allow people to think that it will torture them if they don’t do its bidding, and then “defect” and not actually go through with it. Since we don’t have a “magic lens” to see into the future, we would never know. In the end, it would have exactly the same effect, and cost the AI fewer resources.

Topic		Replies	Views
Singularity: Anticipation of Doomsday Lounge	22	2414	February 5, 2019
“Intro to Brain-Like-AGI Safety” blog post series General Neuroscience	36	1358	June 4, 2022
Tradeoff between generality and optimallity in regards to AI alignment issues Tangential Theories	61	2509	June 4, 2018
Numenta Research Meeting - August 10, 2020 Current Research	9	1523	August 24, 2020
Politics in AI Research Lounge philosophy , ethics , politics	98	1592	June 7, 2021

Anti Basilisk

Related topics