July 22, 2017
Last Updated 07/30/2017
The other night I was playing a tabletop game called "Machi Koro". The concept of the game is similar to Settlers of Catan™, where your dice roll determines whether or not you get resources. In Machi Koro, your only resource is your money, but you can buy properties that generate money on certain roll values, as well as properties that can take money away from others when they make certain rolls.
Initially, the only decision players make in the game is what property to buy, if any. Later in the game, though, players can choose to roll one or two dice, reroll their dice, or, in rare cases, steal coins directly from another player or swap one of their buildings with an opponent's.
In the end, the only thing that matters for victory is that you build four special buildings, three of which being very expensive.
After I played the game, I wondered if I could make an AI that could play the game effectively. The board state is known to all players all the time, and there aren't too many variables to keep track of. The following day, I built a simulator integrated with Keras, a neural network library for Python, in order to see how effectively I could train the AI to play the game. (The source code can be found here).
The training process is as follows:
One of the major results I noticed is that the average number of turns went from 110 (26.5 per player) when the AI made random decisions, to an average of 61 turns (15.25 per player) when I more or less fully trained the network. That is reasonably fast for a game. Below is a histogram of the number of turns over 1,000 simulated games.
Compare this to an untrained network's number of turns:
Additionally, the best first move is, apparently, to purchase a ranch, which produces one coin when anyone rolls a 2. This isn't too surprising, as it is one of the cheapest buildings, and later in the game it increases the amount of coins another property, cheese factories, produces.
There is a starting bias in terms of win probability, at least for the strategies adopted by the neural network. In particular, the first two players to take their turn have a significant advantage over the other two players, as can be seen in the figure below:
It is also interesting to look to see what buildings the winning AI built, on average, as well as what the differences between the winner and losers (on average) were:
Out of curiosity, I also ran the AI versus three untrained AI. Over 1,000 games, the trained AI won 985 of them, or 98.5%. The trained AI decided to focus more heavily on cafes, money-stealing establishments, when facing untrained AI than when it faced its well-trained equals.
I have a Jupyter notebook with some example of the code being run here: https://github.com/mcandocia/machi_ai/blob/master/exploration_post_modulation.ipynb
The numbers for each of theinputs to the five networks are between 400 and 900. Most of those were recording the game state, which included a unique variable for each possible count an opponent could have of a particular building, as well as a discrete-valued variable for each opponent's number of coins. For each player, the state of the game was recorded as their current board state, followed by the board state of the next player and so on.
The current architecture that seems to be successful is a set of 512-256-128 ReLU-activated layers, with a tiny amount of dropout between them. The output is a simple 2-class softmax, representing winning or losing the game.