Using Neural Networks to Play Board Games

Using Neural Networks to Play Board Games

By Max Candocia

|

July 22, 2017


Last Updated 07/30/2017

Update: an in-depth follow-up article for Machi Koro is here!

The other night I was playing a tabletop game called "Machi Koro". The concept of the game is similar to Settlers of Catan™, where your dice roll determines whether or not you get resources. In Machi Koro, your only resource is your money, but you can buy properties that generate money on certain roll values, as well as properties that can take money away from others when they make certain rolls.

Initially, the only decision players make in the game is what property to buy, if any. Later in the game, though, players can choose to roll one or two dice, reroll their dice, or, in rare cases, steal coins directly from another player or swap one of their buildings with an opponent's.

In the end, the only thing that matters for victory is that you build four special buildings, three of which being very expensive.

After I played the game, I wondered if I could make an AI that could play the game effectively. The board state is known to all players all the time, and there aren't too many variables to keep track of. The following day, I built a simulator integrated with Keras, a neural network library for Python, in order to see how effectively I could train the AI to play the game. (The source code can be found here).

The training process is as follows:

  1. A game is initially created, along with four players.
  2. Five initially randomized neural networks are shared among all four players, with each of the five networks representing a type of decision that can be made.
  3. 50 games are played, with the game state being recorded for each player at each decision they make.
  4. Each of the neural networks is trained over 10 epochs (run-throughs of the data), and then step 3 repeats 9 more times, appending data to the existing data each time.
  5. The game history is erased to allow for higher quality data (more realistic, intelligent gameplay), and step 4 is repeated 24 more times
  6. I repeated all of the above on the same network. During the first run-through, I let the AI make more random decisions, while the second time it almost exclusively made the "best" decisions. The best decision is the decision that the network determined to give the highest probability of winning the game.

Results

Number of Turns to Complete Game

One of the major results I noticed is that the average number of turns went from 110 (26.5 per player) when the AI made random decisions, to an average of 61 turns (15.25 per player) when I more or less fully trained the network. That is reasonably fast for a game. Below is a histogram of the number of turns over 1,000 simulated games.

histogram of number of turns taken in Machi Koro game, peaked in the 50s, with a small peak in the 70s

Compare this to an untrained network's number of turns:

histogram of number of turns taken in Machi Koro game when untrained AI is used; wide peak in the 100s-110s

Best First Move

Additionally, the best first move is, apparently, to purchase a ranch, which produces one coin when anyone rolls a 2. This isn't too surprising, as it is one of the cheapest buildings, and later in the game it increases the amount of coins another property, cheese factories, produces.

Turn Order Winning Bias

There is a starting bias in terms of win probability, at least for the strategies adopted by the neural network. In particular, the first two players to take their turn have a significant advantage over the other two players, as can be seen in the figure below:

Buildings Built by Winning Player

It is also interesting to look to see what buildings the winning AI built, on average, as well as what the differences between the winner and losers (on average) were:


Trained AI vs. Untrained AI

Out of curiosity, I also ran the AI versus three untrained AI. Over 1,000 games, the trained AI won 985 of them, or 98.5%. The trained AI decided to focus more heavily on cafes, money-stealing establishments, when facing untrained AI than when it faced its well-trained equals.

For reference: a log of a few dozen AI-played games to see what decisions they make

Additional Exploration

I have a Jupyter notebook with some example of the code being run here: https://github.com/mcandocia/machi_ai/blob/master/exploration_post_modulation.ipynb

Model Info

Input Variables

The numbers for each of theinputs to the five networks are between 400 and 900. Most of those were recording the game state, which included a unique variable for each possible count an opponent could have of a particular building, as well as a discrete-valued variable for each opponent's number of coins. For each player, the state of the game was recorded as their current board state, followed by the board state of the next player and so on.

Network Architecture

The current architecture that seems to be successful is a set of 512-256-128 ReLU-activated layers, with a tiny amount of dropout between them. The output is a simple 2-class softmax, representing winning or losing the game.


Tags: 

Recommended Articles

Learning Strategies through Reinforcement Learning

Reinforcement learning can be used to teach computers to perform complex tasks. In this lecture, I discuss two of my board game + reinforcement learning projects as a guest lecturer for STAT 385 at University of Illinois at Urbana-Champaign.

"Error Bars" on Tiled Heatmaps

Heatmaps are a useful way of plotting 2-dimensional data, such as cross-tabulations. Adding "error bars" can seem non-intuitive, but expressing them in your visualization is possible with a small trick.