What has AlphaZero taught us about chess?

AlphaZero taught itself chess (as well as go and shogi) starting with no knowledge about the game beyond the basic rules. AlphaZero’s reinforcement learning has given it a distinctive and instantly recognisable style, and it implements its ideas in a direct, efficient way, without undue regard for the material balance.

How is AlphaZero trained?

AlphaZero was trained solely via “self-play” using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the neural networks, all in parallel, with no access to opening books or endgame tables. The trained algorithm played on a single machine with four TPUs.

What are the parameters of the AlphaZero game?

AlphaZero was given nine parameters that altered key actions of the game, and then was left on its own to learn the game and to devise new strategies to win.

What did AlphaZero learn from the chess game?

At the end of this game, AlphaZero had learned that the losing side had done stuff that wasn’t all that smart, and that the winning side had played better. AlphaZero had taught itself its first chess lesson. The quality of chess in game two was a just a tiny bit better than the first.

Why do you need to read the AlphaZero book?

The book is focused on showing how AlphaZero’s drive for piece mobility, open lines, and disruption of the opponent’s castled position – despite AlphaZero’s often being down in material – makes a change from how players evaluate options now and in opening repertoire. Others are better qualified to discuss the accuracy of the analysis.

How does AlphaZero augment a pure MCT strategy?

AlphaZero creates a number of playouts on each move (800 during its training). It also augments pure MCTS by preferring moves that it has not tried (much) already, that seem probable and that seem to lead to “good” positions, where “good” means that the evaluation function (more on this next article) gives them a high value.