Human Extinction or a Future Among the Stars?

To understand why an AI might be dangerous, you have to avoid anthropomorphising it. When you ask yourself what it might do in a particular situation, you can’t answer by proxy. You can’t picture a super-smart version of yourself floating above the situation. Human cognition is only one species of intelligence, one with built-in impulses like empathy that colour the way we see the world, and limit what we are willing to do to accomplish our goals. But these biochemical impulses aren’t essential components of intelligence. They’re incidental software applications, installed by aeons of evolution and culture. Bostrom told me that it’s best to think of an AI as a primordial force of nature, like a star system or a hurricane — something strong, but indifferent. If its goal is to win at chess, an AI is going to model chess moves, make predictions about their success, and select its actions accordingly. It’s going to be ruthless in achieving its goal, but within a limited domain: the chessboard. But if your AI is choosing its actions in a larger domain, like the physical world, you need to be very specific about the goals you give it.

[…]

It is tempting to think that programming empathy into an AI would be easy, but designing a friendly machine is more difficult than it looks. You could give it a benevolent goal — something cuddly and utilitarian, like maximising human happiness. But an AI might think that human happiness is a biochemical phenomenon. It might think that flooding your bloodstream with non-lethal doses of heroin is the best way to maximise your happiness. It might also predict that shortsighted humans will fail to see the wisdom of its interventions. It might plan out a sequence of cunning chess moves to insulate itself from resistance. Maybe it would surround itself with impenetrable defences, or maybe it would confine humans — in prisons of undreamt of efficiency.

[…]

‘Let’s say you have an Oracle AI that makes predictions, or answers engineering questions, or something along those lines,’ Dewey told me. ‘And let’s say the Oracle AI has some goal it wants to achieve. Say you’ve designed it as a reinforcement learner, and you’ve put a button on the side of it, and when it gets an engineering problem right, you press the button and that’s its reward. Its goal is to maximise the number of button presses it receives over the entire future. See, this is the first step where things start to diverge a bit from human expectations. We might expect the Oracle AI to pursue button presses by answering engineering problems correctly. But it might think of other, more efficient ways of securing future button presses. It might start by behaving really well, trying to please us to the best of its ability. Not only would it answer our questions about how to build a flying car, it would add safety features we didn’t think of. Maybe it would usher in a crazy upswing for human civilisation, by extending our lives and getting us to space, and all kinds of good stuff. And as a result we would use it a lot, and we would feed it more and more information about our world.’

 

Ref: Omens. When we peer into the fog of the deep future what do we see – human extinction or a future among the stars? – Aeon Magazine