Knowledge Mapping and Management

The very slow learning rate of AlphaZero

We have all been recently puzzled by the AlphaZero algorithm, which learns to master strategic board games in a matter of days much over top-world players and programms. What is the part of computing power and the efficiency of learning ?

Alphazero (latest version of alphago) has learned the go game by itself by simulating 21 million games, with 700,000 learning batches. The progression is depicted in the following graphs :

Using Google’s TPUs, AlphaZero could reach human world champion level in chess in 4 hours.

A human player, who would play 2 go games a day (that’s not bad), could play ~700 games a year.

So in this experiment, Alphazero sums up 30,000 years of of experience in go gameplay, which is an acceleration factor of about 1 million due to the technology (but that does not come as a surprise since we know the usual comparison between our ability to calculate with numbers and that of the computer).

We can consider that these calculations are very useful years, because if we look carefully about the learning curve, we can read that AlphaZero reaches the world champion level of the human world in 150K stages. So AlphaZerospends most of his time (550,000/700,000 ~ 80%) at the best level to improve the knowledge of the game.

If we switch back to human-years for the go game, the level of world champion is reached (red line) in about 20% * 30,000 = 6000 h-years, while Lee Sedol mastered the game within approximately 15 years.

So the machine appears to learn very slowly as compared to the best humans, which are about 400 times faster to reach the same level, while playing also much less games.

If we look into chess play, the learning appears faster as for the go game, with a world level reached at about 120K games (ELO ~2800). 120K games are also in the thousands of years of human play, and the conclusion is similar. If AlphaZero would learn chess as fast as humans, it would reach world class level in 35 seconds !

Discussion

The final level reached by AlphaZero, much higher than the human level, shows that the level of game is definitely much higher than what the best humans can achieve. But this sets new questions :

  • Why can’t human play beyond their current level, given their high learning rate ?
  • Do we reach a biological limit, or is it that our brain is incapable of generating higher level concepts ?
  • Are all players stuck in a line of play which is a ‘culture’, but which did not visit all the nodes of the game (there are millions of humans playing go/chess, …)?

Related papers :

Note

ELO rating is a number qualifying the level of play in some games.

chess : grand masters is above 2500 while world champion is ~2800, and computer programs are ~3400

go : best players qualify ~3600, while AlphaZero is ~4600