The othello-rl from dragonwarrior15

othello-rl's Issues

create basic players/agents

random player
uniformly samples from available legal moves
monte carlo rollout
run several simulations to determine expected utilities of all legal moves 47c72b2
min max player
runs the game till end and uses that to determine which branch is best to play ahead on 5f78714
alpha beta pruning
modification of min max player to intelligently reduce search space 5f78714

Issues with commit 615eff3

next legal moves should be used to calculate discounted reward in train_agent in DeepQLearningAgent
next legal moves and next player should be added to the history object of Game class
next legal moves should be passed to the replay buffer
done should be 1 for all players in whatever their last move for the game was
convert board to float 32 before passing to the network (otherwise raises errors in tf sometimes)
change dtype to np.uint64 whenever adding bitboard to numpy arrays for consistency
change buffer dtypes accordingly
make add_to_buffer in class Game generic to not hard code indices
in case of no next legal moves, the function train_agent in class DeepQLearningAgent encounters nan because
np.max(np.where(next_legal_moves==1, discounted_reward, -np.inf),axis=1) * (1-done) will not make np.inf * 0 equal 0 as np.inf * 0 = np.nan. Use a check for np.isfinite()

change board history, currently the following variables are stored
[current state, legal moves, current player, action, next state, next legal moves, next player, done, winner]
to
[current state, legal moves, current player, action, done winner]
since the next state is selectively chosen using a different logic to store in the replay buffer
also, reducing the objects in this history will help speed up the board augmentation function
resolved in e40ba99

create simple ui to test and play the game

ui should allow quick testing using human players
animations etc can be added later on
~~can have a simple random player at the other side for purposes of testing~~ all players can be added since move function is exposed for all players
~~can use pygame/django~~ using flask with js for front end

compile features of the game environment

init
board size : default 8
count of frames : no of frames to keep in memory, default 4
game type : should the staring pieces be only in the center or on the corners as well, need to define base templates in the code, this cannot be changed until new env object is created
this function is public since this is called during creation of the object
reset
takes no parameters, sets everything to default
initialize the deque for storing frames
initialize the board array here, along with the initial state defined in init
returns the starting state of the board, legal moves and whose turn it is
board contains three planes, one for black coins, one for white, and one denoting which player will play the next turn (all 1s for white and all 0s for black)
this function is public and should be called immediately after the object creation, must also be called when the game has ended and needs to be "reset"
step
takes action as the input
returns reward, next state, an information dict, whether the game has ended, legal moves for next state
is public, and is called whenever an action needs to be taken in the game
has the logic to perform the move and update the board accordingly
can have private functions to handle a variety of logic for updating the board
simulation environment
a separate environment can be created with the facility to run simulations such as monte carlo rollouts
this environment will take the current board state and an action as input, and return the next board state, and hence will be memory less
should implement logic similar to the base environment, but can be a separate inherited class
a game class (separate from environment)
a better idea could be to create a game class that handle all the aspects of game, like remembering what player to play, keeps track of points, history of board states etc
a separate light environment will be created that encompasses the rules, and given a board, which player to play and action, executes all of it

Added simple memoryless environment with init, reset and step functions 77f2e0b
To do

check logic in terminal conditions
add logic for situations like passing of turn, termination

Added bitboard environment c3f83a4
To do

optimize runtime of bitboard environment
partial optimization in 8db56f7 and slightly more optimization in 0ac8950
Further optimization will require converting the entire code to C which will make interacting with all agents very difficult
optimize board conversion if possible
add function for board augmentation b1f3fce
since running a game is expensive, an alternative to playing a large number of games is to prepare augmentations of board states, which are simply rotated and mirrored versions of the board (including actions and next states). bitboards support these via bit operations.
~~[ ] modify the board augmentation function to add boards with inverted colors~~ because inverting colors will change the first player and boards observed subsequently will never happen in reality.

MCTS and Minimax

Modify the MCTS agent to use C functions for improvement in speed 3ccf8e1
Modify the Minimax agent to use C functions for improvement in speed 15412fb

MCTS Comparison of Python and C at 100 simulations per move

Game Environment Type	Player Type	Time
Python	Python with Python Environment	0.16 it/s
C	Python with C Environment	0.33 it/s
C	C	16 it/s

Minimax Comparison of Python and C at max depth of 3

Game Environment Type	Player Type	Time
Python	Python with Python Environment	2 it/s
C	Python with C Environment	5 it/s
C	C	130 it/s
C	C (max depth 4)	43 it/s
C	C (max depth 5)	14 it/s

Depth 4 and 5 bring good consistency in win rate, still not perfect though

dragonwarrior15 / othello-rl Goto Github PK

othello-rl's People

Contributors

Watchers

Forkers

othello-rl's Issues

create basic players/agents

Issues with commit 615eff3

modify board history

create simple ui to test and play the game

compile features of the game environment

MCTS and Minimax

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent