Watson Beats Ken Jennings on Jeopardy

IBM's Watson computer crushed human competitors on Jeopardy. What does it mean?

James Joyner · Thursday, February 17, 2011 · 14 comments

Way back in 1997, IBM’s Deep Blue computer beat chess grandmaster Garry Kasparov. Now, IBM’s Watson computer has creamed the legendary Ken Jennings at Jeopardy.

What does this prove? It depends on whom you ask.

Mastermix notes that, “players are surprised at how big a role button management plays in winning or losing a round. In the few minutes of the Watson game that I watched, it was pretty clear that Watson was excellent at pressing the button at exactly the right moment if it knew the answer, which is more a measure of electromechanical reflex than human-like intelligence.”

Jennings himself thinks it says that human emotions are a liability in competition.

Watson has lots in common with a top-ranked human Jeopardy! player: It’s very smart, very fast, speaks in an uneven monotone, and has never known the touch of a woman. But unlike us, Watson cannot be intimidated. It never gets cocky or discouraged. It plays its game coldly, implacably, always offering a perfectly timed buzz when it’s confident about an answer. Jeopardy! devotees know that buzzer skill is crucial—games between humans are more often won by the fastest thumb than the fastest brain. This advantage is only magnified when one of the “thumbs” is an electromagnetic solenoid trigged by a microsecond-precise jolt of current. I knew it would take some lucky breaks to keep up with the computer, since it couldn’t be beaten on speed.

During my 2004 Jeopardy! streak, I was accustomed to mowing down players already demoralized at having to play a long-standing winner like me. But against Watson I felt like the underdog, and as a result I started out too aggressively, blowing high-dollar-value questions on the decade in which the first crossword puzzle appeared (the 1910s) and the handicap of Olympic gymnast George Eyser (he was missing his left leg). At the end of the first game, Watson had what seemed like an insurmountable lead of more than $30,000. I tried to keep my chin up, but in the back of mind, I was already thinking about a possible consolation prize: a second-place finish ahead of the show’s other human contestant and my quiz-show archrival, undefeated Jeopardy! phenom Brad Rutter.

One bit of consolation for the puny human:

“Watching you on Jeopardy! is what inspired the whole project,” one IBM engineer told me, consolingly. “And we looked at your games over and over, your style of play. There’s a lot of you in Watson.”

Glenn Reynolds thinks the Singularity is nigh. Stephen Gordon, too:

A simple definition of Singularity is “that point at which greater-than-human intelligence arises.”

[…]

The human champion describes Watson as a Terminator. It never gets tired, bored, has stage fright, gets cocky, or intimidated. It just keeps coming… answering general knowledge questions better than the best Jeopardy players on the planet. Scary right? Then, the reporters go Utopian and suggest that Watson and its decendents could be used to diagnose illnesses. It could be Dr. House without the attitude. Exciting, right?

Donald Sensing isn’t impressed:

Strip out Watson’s blinding speed, and it is no smarter than human beings at all. Watson, for all its engineering impressiveness, simply did only what computers have always done: collate at blinding speed (and compute mathematical probabilities to choose an answer). It does not matter that Watson was not connected to the Internet since its mass-memory unit holds 16 Terabytes of data, processed by a 2,880 processor core. As my own computer professor said (many years ago!), “A computer is just a high-speed moron.” There is nothing about Watson that I have read so far that obviates that observation.

[…]

So what does Watson really prove? From a technical, engineering and programming perspective, it’s an amazing achievement with enormous potential for a wide range of applications ranging across broad multi-disciplinary subjects and problems. As for the Jeopardy game, there’s less than meets the eye. Count it as a proof-of-concept exercise. What it did not do was reasons abstractly. It just collated amazing amounts of information very rapidly. But we already knew that computers are faster than we are for specified tasks. That’s why we build them to begin with.

Peter David agrees:

Predictably, the computer is currently smashing the humans. Except Watson is hardly the HAL 9000. It’s more like the Google 9000. It’s a pocket calculator with voice recognition. Feed it enough proper nouns and it can call up information fed into it by countless programmers (which means Jennings and Rutter aren’t really facing a single opponent; they’re fighting hundreds of human minds). But give it anything vaguely abstract and it falls apart.

The Final Jeopardy category was U.S. Cities. The answer was that this city had two airports, the largest of which was named after a World War II hero, and the second largest after a famous World War II battle. It took me three seconds to figure it; Jennings and Rutter probably less than that. Obviously it’s Chicago (O’Hare and Midway).

Watson’s answer? Toronto, an answer so wrong on every level that it didn’t even fit the category. If a human had been given Toronto as one option of a multiple choice, he would have rejected it based on the knowledge that Toronto isn’t a US city.

If the “Jeopardy!” writers were so inclined, they could produce entire boards of the type of questions that we saw in “Final Jeopardy” and Watson would just sit there flummoxed while the humans’ superior reasoning power would run roughshod over it.

Computers can react faster than humans, but they can’t think faster than humans. They can’t think at all, because there’s more to thinking than just regurgitating facts.

Steve Hamm falls somewhere in between. Turning to the infamous airport flub:

David Ferrucci, the manager of the Watson project at IBM Research, explained during a viewing of the show on Monday morning that several things probably confused Watson. First, the category names on Jeopardy! are tricky. The answers often do not exactly fit the category. Watson, in his training phase, learned that categories only weakly suggest the kind of answer that is expected, and, therefore, the machine downgrades their significance. The way the language was parsed provided an advantage for the humans and a disadvantage for Watson, as well. “What US city” wasn’t in the question. If it had been, Watson would have given US cities much more weight as it searched for the answer. Adding to the confusion for Watson, there are cities named Toronto in the United States and the Toronto in Canada has an American League baseball team. It probably picked up those facts from the written material it has digested. Also, the machine didn’t find much evidence to connect either city’s airport to World War II. (Chicago was a very close second on Watson’s list of possible answers.) So this is just one of those situations that’s a snap for a reasonably knowledgeable human but a true brain teaser for the machine.

The mistake actually encouraged Ferrucci. “It’s goodness,” he said. Watson knew it did not know that right answer with any confidence. Its confidence level was about 30%. So it was right about that. Moreover, Watson has learned how the categories work in Jeopardy! It understands some of the subtleties of the game, and it doesn’t make simplistic assumptions. Think about how Watson could be used in medicine, as a diagnostic aid. A patient may describe to a doctor a certain symptom or a high level of pain, which, on the surface, may seem to be an important clue to the cause of the ailment. But Watson may know from looking at a lot of data that that symptom or pain isn’t the key piece of evidence, and could alert the doctor to be aware of other factors.

Watson’s achievements in sorting through the questions and processing information against the best of the best human competitors is simply remarkable. And, presumably, computers will get much better at this sort of thing in relatively short order. (Although, man, I’d have sworn that the Deep Blue-Kasparov match was much more recent than 1997!)

Is it anything approaching the complex reasoning skill that even an average human mind can bring to the table? Not hardly. But Watson is a better raw data storage and retrieval unit than any human is likely to ever be; and, again, computers are only going to get more powerful.

Update (Alex Knapp): My own take on the Jeopardy match is here, and this is my takeaway:

Truth be told, this isn’t a contest between “man and machine” — it’s a contest between two Jeopardy contestants and a team of IBM programmers. To me, the programming and algorithm discovery needed to create Watson are what’s impressive. And that’s all a product of the human mind — not the computer.

When a computer is capable of teaching itself how to play Jeopardy, is limited to buzzing at human speeds, and then beats two champions. I’ll be impressed by the computer.

For now, I’m impressed by the all too human programmers.

Furthermore, it’s worth noting that while everyone LOVES to talk about Deep Blue beating Kasparov in 1997, you’ll note that far fewer people talk about Kasparov’s match against Deep Junior in 2003. It was a more complicated computer, but Kasparov fought it to a draw (1-1-4). Indeed, if you go to chess boards, there are lots of discussions about how to get around computer advantages and defeat computer AI at chess. Moreover, Kasparov himself has developed a variant of chess called Advanced Chess, which takes advantage of human strategic superiority and computer tactical superiority to produce absolutely beautiful games of chess.

Computers may be getting smarter, but humans are, too.

Comments

ponce says:

Thursday, 17 February 2011 at 12:05

Meh.

They left out all the tough to translate Jeopardy categories and gave Watson pure softball search questions.
Maxwell James says:

Thursday, 17 February 2011 at 12:52

Watching the games, what impressed me most about Watson was where its performance was weakest: its ability to understand the play of words in the questions. There were a number of times when it was completely thrown by even simple questions (“Toronto????”), but overall its hit rate was higher than I would have guessed.

That said, it was also obvious that Watson won on speed and not much else. Had the programmers set it so that its speed was equivalent to that of a fast human player – something they could easily do – I doubt it would have performed anywhere near as well. Frankly, the fact that they did not struck me as cheating – much like the refusal to give Kasparov access to Deep Blue’s recent games.

Overall it seems like a nice evolution of speech recognition technology, but hardly a world-changing event.
Axel Edgren says:

Thursday, 17 February 2011 at 12:52

So humans built a fantastically complex machine and coded it to defeat other humans. Cool and a taste of things to come, yes, but Kurzweil should keep his cool a little while longer.
Steven L. Taylor says:

Thursday, 17 February 2011 at 13:13

I think a lot of this conversation conflates “knowledge” with “intelligence” and I would argue that the two are not the same.

Watson can rapidly and impressively translate language (including puns and other types of wordplay) into searchable queries in a way that we have never seen before.

But does it understand anything that it reads (as I understood it, Watson didn’t “hear” the questions, but rather read the text-I base that on an interview with Jennings I heard on NPR earlier in the week).

As such: Google 9000, indeed.

And, I would note, that I am utterly impressed, along with Alex, as to this feat of human engineering. The notion, however, that Singularity is nigh, strikes me as, well, silly.
Alex Knapp says:

Thursday, 17 February 2011 at 13:17

I’m with Ken MacLeod – The Singularity is just “the Rapture for nerds.”
Muffler says:

Thursday, 17 February 2011 at 13:36

If you take the “defeat of humanity in the category of Jeopardy” ans approach this from a unemotional level it is a great accomplishment for human progress. It is not a failure of the human contestants, but a step up in our understanding of ourselves and making it work for us in new and probably revolutionary ways. The genius of Watson is not Watson… it is Human ability.
mantis says:

Thursday, 17 February 2011 at 14:14

Watson can rapidly and impressively translate language (including puns and other types of wordplay) into searchable queries in a way that we have never seen before.

This is exactly right, and exactly what is so impressive about the computer they built. Human language is arguably the hardest obstacle to conquer in interactive computing, and Watson is a big step in that direction, but it’s still early going.

The fact is unless or until we build semi-organic computing machines, computers will never “think” like humans. We have emotions and creativity that cannot be duplicated in silicon. This is why HAL-9000 is pure fantasy; a machine wouldn’t decide to kill humans if it thought they were an impediment to its programming, unless it was programmed to do so.

But getting machines to interpret our language and respond in meaningful ways, even if they get there through different processes than our brains do and don’t truly “understand” language the way we do, will dramatically change the way computers are designed and how we interact with them. In the short term, what Watson does is get us a lot closer to the kind of analytical computing that can be greatly useful in many fields, such as medical diagnostics, as Hamm notes. Imagine inputting symptoms, vital statistics, blood test results, and more data from a patient and having a computer deliver confidence levels for different possible diagnoses, as well as treatment options, based on far more data than any human doctor could command, without needing strict, universal input methods.
Matt B says:

Thursday, 17 February 2011 at 14:46

@Muffler:
It is not a failure of the human contestants, but a step up in our understanding of ourselves and making it work for us in new and probably revolutionary ways.

This is a common misconception. Building on Alex’s and Steven’s points (along with other quoted above) experiments like Watson are much more about processing power that any real exploration of what it means to be “Human.” I say processing power, because largely this was about high speed data retrieval coupled with some natural language processing.

Any Com Scientist worth his salt will tell you a couple things about Watson and, in general the last 40 years in AI:
1. Jeopordy was choosen by IBM because it was a game that is inherently optimized for AI – little abstraction, very simple rule set, emphasis on speed.
2. That anytime a computer has remotely approached/beaten a human, the line has been redrawn as to what’s a real Challenge… First: Chess, now Go — First: Jeopordy, next “Pictionary” someday Turing.
3. The only significant “knowledge” that AI has generated about humans, has largely come from critiques of AI (see Winnograd and Flores, Searle, and to a lesser degree the work of Lucy Suchman).

All that said, intelligence as a cultural category, is a little different.

That just requires others to believe one is intelligent (pick your favorite “dumb” politician, typically to their base they are one smart cookie). From that perspective, Waston is one in a long line of machines that people think are smarter than they are (the Eliza Effect). And that enables those algorithms to come to have a potentially BIG effect on our lives.

Keeping that last bit in mind, I just want to say that I for one welcome our new computer overlord…
Matt B says:

Thursday, 17 February 2011 at 16:01

@Mantis: What Watson does is get us a lot closer to the kind of analytical computing that can be greatly useful in many fields, such as medical diagnostics, as Hamm notes. Imagine inputting symptoms, vital statistics, blood test results, and more data from a patient and having a computer deliver confidence levels for different possible diagnoses, as well as treatment options … based on far more data than any human doctor could command, without needing strict, universal input methods.

(Apologies for the length of what follows — and if the HTML doesn’t work)

This utilitarian view is exactly what scares me about the singularity line of thought. There are two general problems with this (arguably far more social than mechanical):
1. When something talks we, at least for the last few centuries, ascribe to it a lot more abstract intelligence than it deserves. See Eliza Effect
2. (hugely reduced for space) Because computers are SCIENCE, we imagine their findings as being “objective” and “impartial.” As Alex points out the success of Watson is the success of its Engineers. The ideologies of those engineers are always already (in)directly encoded into the platform. Considering that most Americans are still learning to distrust Doctors and see their expert diagnosis as, in part, subjective, I shudder to think about the possibility of black boxed diagnostic tools.

In other words, if the programmer thinks a certain constellation mean X then so does the software. I realize that in the vast majority of cases, that’s usually true in typical diagnostics. But as soon as you pass the bounds of “normal” sickness, things fall apart fast. Much of what separates good Drs from bad is how fast they recognize non-normal.

Which gets to:
@Mantis: based on far more data than any human doctor could command, without needing strict, universal input methods.
That’s already positioning a very specific type of objective, scientific “data” (aka stuff that can be easily digitized and understood by a computer) over other “softer” data (abstract understanding of life history, visual markers, body positioning, touch, Dr’s life history and diagnostic experience). A lot of med schools feel things are already to far towards the prior and are actively trying recover the teaching of the latter.

In part this is a fear of letting the tool run wild and it part its due to the necessary secrecy of the algorithm.

Given the fact that any diagnostic system would most likely be a private solution, that means that the actual algorithms (the secret sauce and the key competitive advantage) would never be made public (see Google and most National Security software as examples).

I can live with reverse engineering Google for SEO. But having to do that to figure out how I’m being diagnosed (i.e. this is based on whose ideas of a given sickness?)… that scares the crap outta me. Especially given how well algorithms have been working for the intelligence community.

While one can argue the Doctor always has final say, the institutionalization of medicine (like other things) typically ceeds power to science and rather than the individual. Science works great over the long term, but in fuzzy areas, in the short term, it’s pretty crappy.

One last point, this isn’t about some Beckian fear of these devices being used to nefariously cull the heard. Rather, that everyone has blind spots, but humans typically can be taught to compensate for them on the fly. AI’s are really, * really* bad at quick corrections for things programmer’s missed and scenarios they failed to envision.
john personna says:

Thursday, 17 February 2011 at 16:16

Am I the only one who has read the Verner Vinge books which coined the Singularity? I probably am the only one who read the later William Calvin (evolutionary biologist) and Vinge mutual admiration articles (in Whole Earth? or was it Edge?)

Anyway, the important thing to know about the Singularity is that it isn’t defined. It can’t be truly seen from this side. What it is supposed to be, if it happens, is a discontinuity, beyond which everything changes.

A true AI would do it, but this is important, Calvin/Vinge would be as satisfied by a human-computer partnership that surpassed strictly human intelligence.

Basically, now that we’ve seen this IBM computer, we can expect an equivalent in our phones within 20 years. Is that enough for a discontinuity?

Maybe it will help the next generation find the closest unemployment office.
ponce says:

Thursday, 17 February 2011 at 16:59

“Am I the only one who has read the Verner Vinge books which coined the Singularity?”

I just finished rereading his “A Fire Upon the Deep” because I heard he’s writing a sequel to it that’s due out later this the year.
john personna says:

Thursday, 17 February 2011 at 18:20

That was a good one too – an anti-singularity future, with a kind of non-AI technology plateau
sam says:

Thursday, 17 February 2011 at 18:32

Will the Singularity help me to find my car keys more easily?
john personna says:

Thursday, 17 February 2011 at 20:41

I think you’d just tell the car who to trust (not the dog!)

Outside the Beltway