Reinforcement Learning approach to Crosses-and-Noughts (TicTacToe) and it's various generalizations.
Tic-Tac-Toe[1] is a well-know pencil-and-paper game, played by two players: X-player, or Crosses and O-player, or Noughts. The goal is to make 3 figures, X or O, in a horizontal line, vertical line, or any of two diagonals, on a boards with size 3x3. The state space consists of 5478[2] different positions, which can be further reduces to 765[1] due to symmetries. The limited state space size makes Tic-Tac-Toe a perfect playground for Reinforcement Learning tabular methods.
m,n,k-game[3] is a generalization of Tic-Tac-Toe to wider boards and longer lines.
The board is of size m x n
, and the goal is to place k
crosses or nought in a row.
Some examples of such game are Connect Four[4] which is (7,6,4)
, Gomoku[5] which is
(15,15,5)
or (19,19,5)
(the Go board), Connect Six[6] which is (19,19,6)
or even (59,59,6)
.
Crosses player makes first move, so it always has advantage. Therefore, for perfect play
two outcomes are possible: crosses player always wins, and there is always a draw.
This outcome depends on game parameters, m
, n
, and k
.
m,n,k-game is widely studied, and following results are known:
- if
(M,N,K)
is a draw,(m,n,k)
is a draw as well if eitherK
≤k
orM x N
board is bigger thanm x n
board, since if it is not possible to make shorter line on a bigger board, it is also impossible to make longer line on a smaller board - conversely, if
(M,N,K)
is a win,(m,n,k)
is a win as well if eitherK
≥k
orM x N
board is smaller thanm x n
board, since if it is possible to make longer line on a smaller board, it is also possible to make shorter line on a bigger board (3,3,3)
, or classical Tic-Tac-Toe is a draw(3,4,3)
and(4,3,3)
are wins, it implies that(m,n,3)
is also a win ifm+n
≥7
(5,5,4)
is a draw, it implies(4,4,4)
and(5,4,4)
are draws as well(6,5,4)
is a win, it implies that, in particular, Connect Four[4] is a win as well(8,8,5)
is a draw, it implies(m,n,5)
is a draw for anym
≤8
andn
≤8
(15,15,5)
, or Gomoku[5] is a win
Another generalization of classical Tic-Tac-Toe is make the board multidimensional N^d
hypercube[7]. Few examples are 3^3
or 3D Tic-Tac-Toe[8],
and 4^3
known as Qubic[9], which is a Tic-Tac-Toe on a 4 x 4 x 4
board and a line is made of 4 symbols.
The number of winning combinations (lines) for N^d
game is[10]
(N+2)^d - (N^d)
---------------
2
for d=3
it is reduced to the polynomial 3*N^2 + 6*N + 4
.
Conclusions similar to existing to m,n,k-game can be made in the case of multidimensional Tic Tac Toe, and following results are known:
- If
d
- dimensional game is a win, anyD
- dimensional game is a win as well forD
≥d
, since there is always possible to reduce the game ond
- dimensional hyperplane ofD
- dimensional board - If
D
- dimensional game is a draw, anyd
- dimensional game is a draw as well ford
≤D
, since if it is not possible to make a line in an outer space, it is also impossible to make it in any hyperplane 3 x 3 x 3
3D Tic-Tac-Toe[8] is a win in a simple way, it has49
winning combinations:8
for each of 3 hyperplanes9
verticals6
xz and6
yz diagonals4
xyz diagonals
4 x 4 x 4
, or Qubic[9] is also a win, however explicit win strategy is more difficult, it has76
winning combinations:10
for each of 4 hyperplanes16
verticals8
xz and8
yz diagonals4
xyz diagonals
N^d
game is a draw[1] ifN
is odd andN
≥3^d-1
, for 3d it meansN=27,29,31,...
N
is even andN
≥2^d-2
, for 3d it meansN=6,8,10,...
Ultimate Tic-Tac-Toe or Super Tic-Tac-Toe[11] is a fractal generalization of classical Tic-Tac-Toe.
The board size is 9 x 9
, and this board is viewed as a 3 x 3
super-board of 3 x 3
sub-boards.
Move on a sub-board determines a super-board, where the opponent should make the next move.
For example, if the last move is to the upper-right corner of any sub-board,
then the opponent can only play on upper-right corner super-board.
Each sub board can be a draw, or can be won by any of player. Winner is the player who can build it's won sub-board in a line. If no more moves can be done and none of two players has winning line, it's a draw.
The following variations are known:
- Standard Ultimate Tic-Tac-Toe, where sub-board where any of two players has already won is sealed, and if a player is directed to this sub-board, it can make next move on any board which is not yet sealed
- Alternative Ultimate Tic-Tac-Toe where already won sub-boards are not sealed, and can be used for further play. It was shown that this version is a won[12]
- Tic-Tac-Ku, where winner is a player winning at least 5 of 9 sub-boards
Ultimate Tic-Tac-Toe can also be generalized in an m,n,k-game similar way. For example,
one can consider Ultimate 4,4,4-game which is played on a 4 x 4
super-board made
of 4 x 4
sub-boards and the goal is to win on the super-board 4 sub-boards in a line.
The sub-boards and super-board do not necessary have square form, one can play
Ultimate 4,3,3-game which is played on 4 x 3
super board made of 4 x 3
sub-boards
and the goal is to win on the super-board 3 sub-boards in a line.
[1] Tic-Tac-Toe
[2] TicTacToe State Space Choose Calculation
[3] m,n,k-game
[4] Connect Four
[5] Gomoku
[6] Connect 6
[7] Nd-game
[8] 3D Tic-Tac-Toe
[9] Qubic
[10] Hypercube Tic-Tac-Toe, Golomn, Hales, 2002
[11] Ultimate Tic-Tac-Toe
[12] At Most 43 Moves, At Least 29: Optimal Strategies and Bounds for Ultimate Tic-Tac-Toe, Bertholon et al, 2020