Computers have beaten humans at chess and “Jeopardy!,” and now they can master old Atari games such as “Space Invaders” or “Breakout” without knowing anything about their rules or strategies.
Playing Atari 2600 games from the 1980s may seem a bit “Back to the Future,” but researchers with Google’s DeepMind project say they have taken a small but crucial step toward a general learning machine that can mimic the way human brains learn from new experience.
Unlike the Watson and Deep Blue computers that beat “Jeopardy!” and chess champions with intensive programming specific to those games, the Deep-Q Network built its winning strategies from keystrokes up, through trial and error and constant reprocessing of feedback to find winning strategies.
“The ultimate goal is to build smart, general-purpose [learning] machines. We’re many decades off from doing that,” said artificial intelligence researcher Demis Hassabis, coauthor of the study published online Wednesday in the journal Nature. "But I do think this is the first significant rung of the ladder that we’re on."
The Deep-Q Network computer, developed by the London-based Google DeepMind, played 49 old-school Atari games, scoring “at or better than human level,” on 29 of them, according to the study.
The algorithm approach, based loosely on the architecture of human neural networks, could eventually be applied to any complex and multidimensional task requiring a series of decisions, according to the researchers.
The algorithms employed in this type of machine learning depart strongly from approaches that rely on a computer’s ability to weigh stunning amounts of inputs and outcomes and choose programmed models to “explain” the data. Those approaches, known as supervised learning, required artful tailoring of algorithms around specific problems, such as a chess game.
The computer instead relies on random exploration of keystrokes bolstered by human-like reinforcement learning, where a reward essentially takes the place of such supervision.
“In supervised learning, there’s a teacher that says what the right answer was,” said study coauthor David Silver. “In reinforcement learning, there is no teacher. No one says what the right action was, and the system needs to discover by trial and error what the correct action or sequence of actions was that led to the best possible desired outcome.”
The computer “learned” over the course of several weeks of training, in hundreds of trials, based only on the video pixels of the game -- the equivalent of a human looking at screens and manipulating a cursor without reading any instructions, according to the study.
Over the course of that training, the computer built up progressively more abstract representations of the data in ways similar to human neural networks, according to the study.
There was nothing about the learning algorithms, however, that was specific to Atari, or to video games for that matter, the researchers said.
The computer eventually figured out such insider gaming strategies as carving a tunnel through the bricks in “Breakout” to reach the back of the wall. And it found a few tricks that were unknown to the programmers, such as keeping a submarine hovering just below the surface of the ocean in “Seaquest.”
The computer’s limits, however, became evident in the games at which it failed, sometimes spectacularly. It was miserable at “Montezuma’s Revenge,” and performed nearly as poorly at “Ms. Pac-Man.” That’s because those games also require more sophisticated exploration, planning and complex route-finding, said coauthor Volodymyr Mnih.
And though the computer may be able to match the video-gaming proficiency of a 1980s teenager, its overall “intelligence” hardly reaches that of a pre-verbal toddler. It cannot build conceptual or abstract knowledge, doesn’t find novel solutions and can get stuck trying to exploit its accumulated knowledge rather than abandoning it and resort to random exploration, as humans do.
“It’s mastering and understanding the construction of these games, but we wouldn’t say yet that it’s building conceptual knowledge, or abstract knowledge,” said Hassabis.
The researchers chose the Atari 2600 platform in part because it offered an engineering sweet spot -- not too easy and not too hard. They plan to move into the 1990s, toward 3-D games involving complex environments, such as the “Grand Theft Auto” franchise. That milestone could come within five years, said Hassabis.
“With a few tweaks, it should be able to drive a real car,” Hassabis said.
DeepMind was formed in 2010 by Hassabis, Shane Legg and Mustafa Suleyman, and received funding from Tesla Motors’ Elon Musk and Facebook investor Peter Thiel, among others. It was purchased by Google last year, for a reported $650 million. Hassabis, a chess prodigy and game designer, met Legg, an algorithm specialist, while studying at the Gatsby Computational Neuroscience Unit at University College, London. Suleyman, an entrepreneur who dropped out of Oxford University, is a partner in Reos, a conflict-resolution consulting group.
Follow me on Twitter: @LATsciguy