This project is isnpired from a project done by harpribot(https://github.com/harpribot/Rl-TicTacToe).
In this project I have used Q learning to train the netwok so as to get either a win or a draw outcome. I have already stored the Q table for 1,00,000 and 1,000,000 runs. To load this please load the .mat files and then run the Training Checker. Do play with the different values of Epsilon Greedy and number of iterations to get different outcomes.
Therefore to summarize:
- run Q_learning.m
- run TrainingChecker.m
To understand this project in detail you can also go trough the following project report. https://drive.google.com/open?id=1uW471k9STu7ZrxOD6jTdhmaS4Ac09Uke