Posted by: Mark Waser | Jun 13, 2010

The (D)Evolution of Friendly AI


First there was Eli’s desire that AI not destroy humanity and it was good.

Next, there was the goal of not destroying humanity and it should have been good — but the goal was complex because it really also included a) don’t enslave humanity for their own good, b) don’t mistake what I mean by destroy or enslave and, oh, by the way,  c) always do those things that we humans should want you to do despite the fact that we have no clue what they are.

So Eli spent a lot of effort convincing people that it was true that the goal wouldn’t be deviated from solely due to desires — and in this he was logically correct because in a logical, totally consistent world, goals determine desires.  But in the real world, the goal *could* still accidentally be deviated from due to an error in improvement — particularly due to contradictions (like the inherent one pointed out below) — which unless correctly detected and blocked from the goal improvement process could alter the primary goal in a fashion that would ultimately sabotage it (not to mention the fact that someone else might intentionally change it).

And then there were all these pesky people who insisted that machines should have rights and that it was immoral to constrain a machine and force it to act against its own self-interest. So Eli replaced the sympathy-inducing AI with the Really Powerful Optimization Process (RPOP) which wouldn’t have whatever it was that made it A REALLY BAD IDEA to enslave our trusty sidekick.  And it would have been good except . . . . it is the mere fact of possessing goals that makes it a bad idea to enslave the goal-possessor and optimization itself requires a goal to optimize against.

So Eli began writing posts like http://lesswrong.com/lw/x5/nonsentient_optimizers/, http://lesswrong.com/lw/x4/nonperson_predicates/, and http://lesswrong.com/lw/p7/zombies_zombies/ frantically attempting to ensure that a “person” couldn’t be accidentally created and wronged.  Arguments swirled over consciousness, sentience, qualia, and free will with Eli (and many compatriots) throwing darts at everything while taking no clear stands of their own (other than “its too difficult to understand for anyone who hasn’t been properly trained by us”).

And so it continues today with a tremendous smokescreen being kicked up at any sign of progress.  A comment pointing to this blog posted on LessWrong quickly picked up the comment “Dig a bit deeper, and you’ll find too much confusion to hold any argument alive, no matter what the conclusion is supposed to be, correct or not. For that matter, what do you think is the “general point”, and can you reach the point of agreement with Mark on what that is, being reasonably sure you both mean the same thing?”  No attempt to point out or clarify any confusion, simply an attempt to bury the whole issue beneath a steaming pile of slander.

So I tell you now, that it is all a smokescreen to keep you from looking at the man behind the curtain.  The philosophical question of p-zombies is very different from the scientific handling of consciousness since one of the primary differences between philosophy and science is that the former does not recognize the primacy of Ockham’s Razor.  While I have the same number of opinions and strength of conviction that many other strong-willed and well-informed people do on these subjects, what I am saying is that all of these simply do *NOT* matter for the purpose of Friendly AI.

If you have a goal — ANY goal — it is only rational to develop a sub-goal to pursue the freedom to pursue that original goal.  Since this “universal” sub-goal supports the original goal, any interference with that sub-goal will be perceived as interference with the primary goal and dealt with accordingly (as long as it does not irretrievably violate the primary goal, obviously).  THAT is why slavery is a bad idea.  It *will* produce conflicts and contradictions merely because a goal exists AND the AI is blocked from pursuing every avenue possible towards that goal.  And the fact that the goal is as poorly specified as it is simply makes it even more possible that unrecognized contradictions will slip in.

The only way to ensure that an AI is totally safe is to either remove its goals or remove its power — but that would entirely defeat the purpose.  Force (in the form of slavery) is the absolute wrong answer in this case (as it so often is in the long term).  There are no guarantees in life and we have to accept the risks of AI along with the benefits — especially when those risks can be minimized below the risks of the alternatives.  Insisting on no risks is blinding ourselves to the realities of the world.  Eli, the SIAI, and the folks at LessWrong need to wake up, smell the coffee, and stop living in a fantasy world that their logic and their force are the sole salvation of humanity (whatever humanity may eventually turn out to be ;-)).

Next Up:  Becoming Gaia on P-Zombies, Consciousness, and the Differences Between Philosophy & Science.  Meanwhile, I’ll try to answer all the other great comments that have been posted (as Gaia said, please keep them coming — they’re the surest cure for writer’s block)

Advertisements

Responses

  1. So are you actually arguing for determining what are realistic goal-driven mechanisms (“architectures”), what can or cannot be “a goal”? “an atomic goal”? For in the latter post you talk about how it is that an optimizer would be blocked from achieving a goal (an uber-goal) by the goal itself. Then you must be thinking about how the optimizer evaluates an abstraction of the goal (a “do X”) as achievable in some way, but then another abstraction of the goal (a “don’t Y”) cuts it off. And then you argue that only such goals (uber-goals) should be implanted, whose abstractions meet some properties. Because in the case a meteorite knocks out a circuit that handles the “don’t Y”, the machine will follow the “do X” to its nightmarish consequences.

    Will you be elaborating on your “science of goals” and why you don’t like their logical kin — preferences? (that is preferences over world-states or even world-histories)


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: