Learn computer to play Tic Tac Toe using reinforcement learning.
This is an excersice problem(1.4) in this Book. Here computer play with itself. There are 10,000 episodes to train computer which data is stored in data.csv and graph is in this image.
After 10,000 episodes it show good result with its opponent player which is human :) .
Reward for winning state is 1.0, for losing and draw 0.0 and otherwise 0.5. In every iteration it calculate is possible reward
V(s) = V(s) + alpha * [V(s')-V(s)]
Computer player play with 90% greedy(try to get winning state) while 10% in exploration state in which it play randomly in available choice. Where it explore state and learn something new :p .