Posted by: Mark Waser | Mar 16, 2011

“Superintelligence does not imply benevolence” (Intelligence vs. Wisdom 1)

Joshua Fox presented a talk co-authored by Carl Shulman (long abstract) with the above title last year to the Singularity track at ECAP10 (VIII European Conference on Computing and Philosophy).  In 1993, Vernor Vinge summed up “the singularity” in two succinct sentences:

Within thirty years, we will have the technological means to create superhuman intelligence. Shortly after, the human era will be ended.

A more accurate and correct title for Fox and Shulman’s paper would have been “Superintelligence does not ensure benevolence” but a number of their arguments (and oversights and errors) are quite interesting.

Artificial intelligence (AI) researchers generally define intelligence as the ability to achieve widely-varied goals under widely-varied circumstances.  It should be obvious, if an intelligence has (or is given) a malevolent goal or goals, that every increase in its intelligence will only make it more capable of succeeding in that malevolence and equally obvious therefore that mere super-intelligence does not ensure benevolence.

Where Fox and Shulman and I disagree is whether increasing intelligence *generally, usually, or frequently* makes benevolence more likely or it is not true that, as they phrase it,  “many extremely intelligent beings would converge on (possibly benevolent) substantive normative principles upon reflection.”

In their paper, after acknowledging that “history tends to suggest … more cooperative and benevolent behavior”, they claim (incorrectly, I believe) that generalization from this is likely incorrect.  Their argument focuses on three reasons why increased intelligence might prompt favorable behavior and why they are unlikely, but it overlooks (or dismisses without consideration) what I consider the primary reason(s) for favorable behavior.

Their first reason for good behavior is  “direct incentives for cooperation” (or “direct instrumental motivation”).  They point out that “Even sociopaths will typically purchase the necessities of life rather than attempting to seize them by force, in light of the costs of being caught” but claim that this reason “works only while humans maintain high relative power” and does not, therefore, apply to the case of super-powerful, super-intelligent entities.

Their second reason is that humans may be able to convince an AI to permanently adopt a benevolent disposition before it is too strong, smart, or set in its ways.  I fully agree with them that we can’t count on this necessarily occurring.

Their third reason is that “intelligent reflection might cause the AI to intrinsically desire human welfare, independently of instrumental concerns.”  I see absolutely no good argument for why this reason would possibly be true and would have been perfectly happy to allow them to dismiss summarily; however, it prompted them into an interesting philosophical digression that I will examine in more detail in the next post.

However, the critical reason prompting benevolent behavior that they overlooked (or dismissed without consideration) is indirect (or long-term) instrumental concerns. Despite citing Omohundro’s Basic AI Drives and the instrumental value of cooperation with sufficiently powerful “peers”, they fail to sufficiently consider the magnitude of the inherent long-term losses and inefficiencies of *all* non-cooperative interactions, the enormous value of trustworthiness, and that a machine destroying humanity would be exactly analogous to our destruction of the rainforests, tremendous knowledge and future capabilities traded for short-sighted convenience (or alleviation of fear). The instrumental advantages of cooperation and benevolence are clearly more than sufficient to make them “Omohundro drives” wherever they do not directly conflict with goals – and are clearly enough to cause sufficiently intelligent/far-sighted beings to converge on benevolence wherever possible.

Pre-commitment to a strategy of universal cooperation/benevolence through optimistic tit-for-tat and altruistic punishment for those who don’t follow such a strategy has tremendous instrumental benefits. If you have a verifiable history of being trustworthy when you were not directly forced to be, others do not have to commit nearly as much time and resources to defending against you – and can pass some of those savings on to you. On the other hand, if you destroy interesting or useful entities, more powerful benevolent entities will likely decide that you need to spend time and resources helping other entities as reparations and altruistic punishment (as well as repaying any costs of enforcement). Short of always being tremendously more powerful than everyone else, never having children, and never relying upon others (even pieces fissioned off from yourself for efficiency), a path of non-benevolence is likely to come back and haunt any entity who is not or does not wish to be forever alone in the universe.

About these ads


  1. Fair enough, but given humanity’s propensity for blindly attacking and destroying anything it doesn’t understand and/or fears then it may be rational for a superintelligence to remove that threat before the stupid humans realise it is within its capabilities of doing that very thing.

    Which could be considered a tit-for-tat response to humanity’s prior sins.

    • Absolutely correct and what I live in fear of. The real reason why we need to discover “machine” ethics is so that we can fix our own (we are biological machines after all) before it’s too late.

    • ;-)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s



Get every new post delivered to your Inbox.

Join 52 other followers

%d bloggers like this: