Knowledge Mapping and Management

Limits to AlphaZero : the loop-game

There is no doubt that the AlphaZero algorithm [1] is a huge step forward for solving board games (like chess and go), and that numerous non-game applications for optimizations can be derived from this idea and implementation. As for any algorithm, it is intersting to find the limits and discuss performance issues. In an earlier post, we have discussed the very slow learning rate (as compared to human) of AlphaZero.

We discuss here the loop-game, as a board game which is both simple and intractable to AlphaZero, while a simple mathematical analysis can provide the winning strategy.

Introduction

It is complicated to summarize all the feature  of the AlphaZero algorithm [1] in a few sentences, but in a simplified manner:

  • AlphaZero is dedicated to board games for 2 players, where each player can set or move pieces on a board, and where the final situation can be clearly measured (win/loss).
  • AlphaZero gives win/lose values/approximations to intermediate positions by feedback propagation in a space implicitely approximated by a deep neural network.
  • Using these winning probabilities, AlphaZero climbs monotonically in the state space toward a winning position.

Board games like chess and go share a common feature that winning games can be patiently built, but any wrong move can change the final result of the game.

A physical image clarifies the concept : AlphaZero identifies and walks over crests of mountains toward the top. While on a crest, going up is easy, and it is easy to step aside and loose the game. Otherwise, if not on a crest, a lookaround view (MCTS) can give enough information to find the nearest crest :

In doing so, we can easily understand that AlphaZero does not need to explore and map the whole territory : an efficient coding of the crests is sufficient to go to winning positions.

So the neural network which is built by AlphaZero identifies the crests from winning and losing positions. We should not be surprised from this identification, since recognizing patterns is what neural network do the best : as a first approximation, recognizing a win/loss position is computationally the same as reading a matrix to recognize letters.

Here the ‘magic’ of AlphaZero is to be able to identify efficiently those crests. We should not however overestimate the speed of learning of AlphaZero, which appears to be 400 times slower than humans [3].

The Loop-game

The loop-game is a 2 player playing in sequence (player 1 plays 1st, player 2 follows.). The game starts with a simple board of size 2Nx2N :

Each player may choose to set anywhere on the board of the 3 types of tiles a,b and c, with two black segments each:

Tiles are not moved neither displaced, tiles can not be stacked.

For example, an intermediate board position could be :

 

At the end of the game, all tiles are placed, and the black segments are connected forming loops, which have been closed either by player 1 or by player 2 by setting a tile on the board :

We consider the 2 variants of the game, defined with the following winning goals :

  1. The winner closes more loops than the other player
  2. If the total number of loops is even, player 1 wins, if odd, player 2 wins.

Analysis and winning strategy for the loop-game

The analysis is very simple : player 2 always wins or draws :

  1. In variant 1, player 2 always draws or wins : playing symmetrically (with right/left symmetry) of player 1 (if player 1 plays tile x in (i,j), player 2 plays tile y in (2N+1-i, j), with (x,y) in {(a,a), (b,c), (c,b)}) , thus imitating player 1 makes the game even all the time. Player 2 closes at least the same number of loops as player 1, and possibly closes one more at the end, if the path of this loop crosses the vertical axis of symmetry.
  2. Variant 2 : player 2 always wins. The last tile can be chosen such as it closes the unfinished loops in 1 or 2 loops, as shown in the diagram below :

    So the last player chooses the parity to win.

As a consequence for the loop-game, no intermediate position will give a hint on the future winner : only the last player wins or draws. In our former image of mountains, the loop-game is a totally flat world, with singular peaks and abysses at the end of the game and no crests.

This means that AlphaZero will not find its way toward a winning position.

No oracle as well …

But furthermore, if we look at the final position, we conclude that neural networks will find a very hard time to identify if a final position is a win or a loss. The computation of the number of loops is very easy by a program (oracle), which (even) takes N as a parameter. However, if we want to build an oracle for the loop game with a learning method, just by showing final positions and a win/loss classification using neural methods, this is going to be very difficult for several reasons [4]:

  1. AlphaZero can learn only with a fixed N ;
  2. The longest possible path is in O(N²), which means that a feedforward network would require at least this depth to play the game.
  3. Since all the win/lose configurations are separated by the change of one tile, there is no flagrant separation between the win and loss subspaces.

Conclusion

The AlphaZero algorithm is very interesting, and understanding its possibilities and limits may lead to future better algorithms. With this objective, we have found that AlphaZero does not solve an elementary board game which we have presented, and does neither generalize learning nor work for high sizes of the game. The existence of a valuation function growing monotonically with the advance of the game seems a prerequisite. In that direction, we must continue to investigate properties of search algorithms. The loop game, very simple, allows to quick check strategies, while the rules can be slightly complicated (for example : no immediate symmetric play) to investigate less obvious winning strategies and compare them with the results of algorithms.

References :

{1] Silver et al. “Mastering the game of go without human knowledge.” Nature 550.7676 (2017): 354–359. https://www.gwern.net/docs/rl/2017-silver.pdf

[2] Belkhir, Dreo, Savéant, Schoenauer Thales/INRIA]  PER INSTANCE ALGORITHM CONFIGURATION OF CM1-ES WITH LIMITED BUDGET, July 2017, conf GECCO Berlin

[3] Gouzènes L, The very slow learning rate of AlphaZero, Feb 28th, 2018.           https://km2conseil.fr/en/2018/02/28/the-very-slow-learning-rate-of-alphazero/

[4] Gouzènes, L. Conceptual limits of neural networks : Reasoning or advanced reflexes ? Linkedin, January 12, 2017 (https://www.linkedin.com/pulse/conceptual-limits-neural-networks-reasoning-advanced-laurent-gouz%25C3%25A8nes/ )