Tradeoff between generality and optimallity in regards to AI alignment issues


The Huge Rulebook AGI would have more appeal to me if it were remotely reasonable to implement with our current technology, which it is not. I don’t have anything against it philosophically. The brain solution is just more feasible. I can see it happening in my lifetime.


Of course the huge table is not a remotely practical solution and will never be. But it’s useful conceptually and philosophically.

I agree that studying the brain is currently the best approach we have to reach AGI and will probably give us (humans) the most insight into what makes a system intelligent. I look forward to a time we crack the intelligence problem completly and understand exactly what makes a system intelligent, similarly to how turing machines are a universal model of computation, I want a universal model of intelligence.

(Also, AI alignment is purely a theoretical mathematical subject and has nothing to do with actual implementation of AGI, see this great lecture for more)


But I’m curious to know what you guys think about the tradeoff I mentioned, do you think it’s correct that there’s this inherent tradeoff? and does it solve AI alignment?


I agree, there is definitely a tradeoff there. I don’t know whether it solves AI alignment because I still don’t quite get that concept. My default answer would be “no”? Or better yet does it even matter?


Coming back to the alignment problem.

This was thrashed out in excruciating detail in the great AI Foom debate.

You alluded to the brain working to “maximizes their survival, pleasure, well-being (what have you…)” as the driving force motivating action.

If you are creating an AGI I assume that part of the task is to select suitable motivations, preferably ones that don’t entail the destruction or enslavement of humans.


I’m curious if there’s some mathematical formulation/proof of this intuition.
Re AI Alignment, I highly highly recommend the lecture I mentioned above if you want to get a basic understanding of the subject.
No is probably right, haha…
I always try to make some meaningful progress on AI Alignment and I always fail miserably (not a researcher or anything).
“Does it matter?” - definitely, I think we should be terrified at the possibility of losing control over our AI. This is a real and very dangerous possibility in my opinion, we don’t have the slightest clue how to solve it, we don’t put enough attention/resources into it, we only have one chance and if we mess up we are completelty doomed with no recourse whatsoever.


Sure I am, but I just don’t see the possibility yet. We are so very far away from that.


I don’t think so… :slight_smile:
Especially when you guys are trying your best to make it happen.
(Also, given that this subject seems to be very hard, if not impossible, we should start out researching it as soon as possible)


Yes. that’s exactly what AI alignment is about, choosing a safe objective function for humans. Aligning the AI values with our human values.


That makes sense. Yes I approve of this effort. :slight_smile:


The biggest problem I personally have with the whole concept of “aligning AI values with our human values” is that humans do not have alignment between themselves. If we are not careful, this could lead to instead of the machines taking over, an elitist ideology using machines to take over…


I think pretty much all humans agree that turning the whole universe into paperclips is not desirable.:sweat_smile:


I don’t know. Most humans continue to take action every day that contributes to our eventual demise (like creating garbage, eating unsustainable food, etc). It is hard to take action for long term change in our short-term political climate. I think @Paul_Lamb has a point.


This is why I have been a proponent of keeping this technology in the public domain in any way possible. Thus all Numenta’s code is OS.


Probably getting off topic, sorry, but there was a similar discussion a while back where a member was proposing the idea of using machines to build a better society.


I agree that humans are not perfectly rational, they do not have a well defined objective function.
But there are commonalities, we should solve them first, and in fact we have no idea how to even go about it.
Choosing between different human ideologies/world views/value systems is a much smaller problem.


Probably the best way to tackle the problem is to make sure the machine have to “eat their own dog food” so to speak. If they have to play well with humans in order to survive in a complex society, then they should naturally align on similar values, just as humans do. I don’t personally believe we are born with an innate value system other than instincts for basic empathy. Complex value systems and beliefs are learned.


That’s not a serious solution. The problem is much more difficult.


Why does it need to be? Unless you are building a fully-trained “i-robot” that can function at a human level out of the box (which we are nowhere near with current technology), then the AI will have to learn about reality from its experiences, and form its own value system. As long as we make sure it starts with innate instincts to be a “social animal” (build a dog, not a cat…), and as long as the proper legal framework is in place to make sure it is responsible for its own actions, then why would we assume that an AI will naturally come to the conclusion that humans need to be destroyed?


Again I defer to this lecture to explain exactly what’s difficult about it. :slight_smile: