Mastering Atari, Go, chess and shogi by planning with a learned model

Schrittwieser J; Antonoglou I; Hubert T; Simonyan K; Sifre L; Schmitt S; Guez A; Lockhart E; Hassabis D; Graepel T; Lillicrap T; Silver D

Nature, Vol.588, No.7839, 604-+, 2020

DOI10.1038/s41586-020-03051-4 Export Citation

Mastering Atari, Go, chess and shogi by planning with a learned model

Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, Guez A, Lockhart E, Hassabis D, Graepel T, Lillicrap T, Silver D

Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess(1) and Go(2), where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning: the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games(3)-the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled(4)-the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi-canonical environments for high-performance planning-the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm(5) that was supplied with the rules of the game.