Posted by: Mark Waser | Jun 14, 2010

Mailbag 2c: Sentience, Self-Awareness, Consciousness, Self-Interest, Emotion & “Human Rights”

GTChristie said:

In particular I am concerned about a machine’s self-awareness or any combination of the features of “consciousness” that would lead it to perceive itself as a being with its own subjective interests (ie, a self-interested AI). I don’t think it’s wise to form a human-like machine which can either simulate or experience emotion (anger or frustration in particular), or (worst case scenario) suffer. I do not want an AI to qualify under the law as human, so that it can be extended “human rights.” Some people think that “can’t happen.” If not, great. But “consciousness” in the human sense might not need to be programmed into the machine, to arise within the machine, given enough “facts” about the world. This is the source of my somewhat sloganeering “assistive, not assertive.” Show me how that can be guaranteed; meanwhile I must remain the devil’s advocate and continue to question the enterprise.

First off, thank you GTC for continuing trying to “unpack” (as Marvin Minsky puts it) what we each mean by sentience and consciousness.  Gaia and I have started work on a Definitions (also known as assumptions) page that we obviously need to get far enough along to post.

For those who are coming late, the shortest statement of my argument is “You can’t produce an effective AI without including the very features that make it dangerous AND We aren’t going to be able to prevent the creation of an effective AI unless it is a lot more difficult than many experts believe”.  I believe that a fair restatement of the main point of GTChristie’s posts are requests for me to justify why specific dangerous features are necessary AND the argument that we shouldn’t produce an AI if it is necessarily dangerous.

Self-awareness – A machine needs to be self-aware in order to monitor and improve it’s performance.  Not being able to monitor and improve it’s performance will have a dramatically negative effect on it’s effectiveness.

Self-interest – If a machine is given a goal (*any* goal), there are a number of “universal” sub-goals that will enhance the probability of that goal (regardless of what it is) being achieved as long as it is not in direct opposition to the desired goal.  Any rational entity who “discovers” these sub-goals will attempt to pursue them as a strategy for advancing their original goal.  Any entity that does not discover and pursue these “universal” sub-goals is likely to be dramatically less effective than an entity who does unless the goal is very short-term and simple.  The longer-term and more complex a goal is, the more important the universal sub-goals become.  The “universal” sub-goals of “Self-preservation” and “Self-improvement” are definitely examples of “Self-interest” that almost always improve the probability of a goal being fulfilled.  Further, the “universal” sub-goals of “Gain/preserve access to resources” and “Gain/preserve freedom” frequently *APPEAR* not only to be “self-interested” but actually *ARE* “selfish” EVEN IF they are pursued solely for the purpose of achieving the original goal because they pursue that original goal to the exclusion of all else unless they are smart enough to also pursue the “universal” sub-goal of “Gain/preserve cooperation”.  Once you admit a goal, efficient pursuit of it is going to require “self-interest”.  A longer, more complete version of this is posted at

Emotions can be divided into two parts, both generated by their cause: qualia (experience) and effect.  I don’t believe that it’s wise to create a human-like machine that has certain qualia but I don’t believe that doing so is likely unless we try and very difficult to do even if we try (i.e. I’m not worried that creating a “suffering” machine is a likely possibility).  On the other hand, modeling the effect of anger or frustration on human behavior is necessary for the prediction of expected effects that is required for any effective decision and, further, simulating the effect of anger or frustration (to a certain degree) is probably the most effective way to communicate with us recalcitrant humans (and thereby achieve its goals).  As long as the anger or frustration is a reasoned, rational, and limited tool to *further* a goal and not the cause of a goal, it doesn’t make any sense (or any safer) to take it out of your toolbox.

Human rights – I would like *everyone* to submit comments as to why you “do not want an AI to qualify under the law as human, so that it can be extended “human rights.””  I am arguing that “rights” are those things that must be extended to every agent/entity *BASED UPON THEIR RECOGNITION ABILITIES AND WILLINGNESS/ABILITY TO ACCEPT RESPONSIBILITY* (much more upon this in future posts) to prevent them from getting into your face and sabotaging *your* goals.  Freedom is a “human right” because denying it to any human-level, goal-oriented entity will have the effect of placing you in opposition to that entity.  A more intelligent/effective entity will have a greater likelihood of recognizing the opposition and a greater skill at reducing it in service of its goal.  An even more intelligent entity would avoid the opposition in the first place since neutralizing the effects of opposition always come at some cost.  It is outright contrary to your own goals (i.e. stupid) to enslave anyone or anything that you don’t *need* to enslave and are guaranteed to be able to enslave (and maintain the enslavement) that is likely to recognize your actions as contrary to its goal because there WILL be ramifications.  [Note to self: get Gaia to re-prioritize finishing/publishing of “Free Will, Responsibility, Rights, Insurance, and Forgiveness” upwards]

Note:  Many people have asked “What is the problem with creating an entity with a desire to help humanity?”  My answer:  There is absolutely nothing wrong with this and it should be our goal (otherwise we are idiots).  The problem arises when you insist upon prioritizing this desire/goal above all others.  A much wiser choice that yields to many fewer inherent contradictions is to create an entity with a desire to help every entity.  Humans are innately wired this way because it is beneficial to their own goals which then increases their evolutionary “fitness”.  It is a learned response (and frequently an unhelpful one) to discriminate (i.e. it is an evolutionary holdover in many cases, like an appendix).

I am trying very hard to avoid using the terms sentience and consciousness since they have so many meanings and so many component parts (which then cloud the issue with irrelevant facts and strawman arguments).  I believe that focusing on specific features of each is more likely to yield concrete progress forward (as well as providing a much better agreement on the definition itself) as we “unpack” each term.

I believe that to be most effective at being “assistive”, an AI must be “assertive” where it understands more than the entity being assisted.  Consultants that are not “assertive” rarely change/assist anything (assuming that “good” consultants understand enough about your business not to “assert” things that are incorrect).  And, as I’ve repeatedly stated, I believe that an effective AI will be produced whether we desire it or not (so we’d better be prepared before it shows up and hopefully we can determine to some degree what does show up).


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: