Git Product home page Git Product logo

othello-rl's People

Contributors

dragonwarrior15 avatar guitarremote avatar

Watchers

 avatar  avatar

Forkers

guitarremote

othello-rl's Issues

create basic players/agents

  • random player
  • uniformly samples from available legal moves
  • monte carlo rollout
  • run several simulations to determine expected utilities of all legal moves 47c72b2
  • min max player
  • runs the game till end and uses that to determine which branch is best to play ahead on 5f78714
  • alpha beta pruning
  • modification of min max player to intelligently reduce search space 5f78714

Issues with commit 615eff3

  • next legal moves should be used to calculate discounted reward in train_agent in DeepQLearningAgent
  • next legal moves and next player should be added to the history object of Game class
  • next legal moves should be passed to the replay buffer
  • done should be 1 for all players in whatever their last move for the game was
  • convert board to float 32 before passing to the network (otherwise raises errors in tf sometimes)
  • change dtype to np.uint64 whenever adding bitboard to numpy arrays for consistency
  • change buffer dtypes accordingly
  • make add_to_buffer in class Game generic to not hard code indices
  • in case of no next legal moves, the function train_agent in class DeepQLearningAgent encounters nan because
    np.max(np.where(next_legal_moves==1, discounted_reward, -np.inf),axis=1) * (1-done) will not make np.inf * 0 equal 0 as np.inf * 0 = np.nan. Use a check for np.isfinite()

modify board history

change board history, currently the following variables are stored
[current state, legal moves, current player, action, next state, next legal moves, next player, done, winner]
to
[current state, legal moves, current player, action, done winner]
since the next state is selectively chosen using a different logic to store in the replay buffer
also, reducing the objects in this history will help speed up the board augmentation function
resolved in e40ba99

create simple ui to test and play the game

  • ui should allow quick testing using human players
  • animations etc can be added later on
  • can have a simple random player at the other side for purposes of testing all players can be added since move function is exposed for all players
  • can use pygame/django using flask with js for front end

compile features of the game environment

  • init
  • board size : default 8
  • count of frames : no of frames to keep in memory, default 4
  • game type : should the staring pieces be only in the center or on the corners as well, need to define base templates in the code, this cannot be changed until new env object is created
  • this function is public since this is called during creation of the object
  • reset
  • takes no parameters, sets everything to default
  • initialize the deque for storing frames
  • initialize the board array here, along with the initial state defined in init
  • returns the starting state of the board, legal moves and whose turn it is
  • board contains three planes, one for black coins, one for white, and one denoting which player will play the next turn (all 1s for white and all 0s for black)
  • this function is public and should be called immediately after the object creation, must also be called when the game has ended and needs to be "reset"
  • step
  • takes action as the input
  • returns reward, next state, an information dict, whether the game has ended, legal moves for next state
  • is public, and is called whenever an action needs to be taken in the game
  • has the logic to perform the move and update the board accordingly
  • can have private functions to handle a variety of logic for updating the board
  • simulation environment
  • a separate environment can be created with the facility to run simulations such as monte carlo rollouts
  • this environment will take the current board state and an action as input, and return the next board state, and hence will be memory less
  • should implement logic similar to the base environment, but can be a separate inherited class
  • a game class (separate from environment)
  • a better idea could be to create a game class that handle all the aspects of game, like remembering what player to play, keeps track of points, history of board states etc
  • a separate light environment will be created that encompasses the rules, and given a board, which player to play and action, executes all of it

Added simple memoryless environment with init, reset and step functions 77f2e0b
To do

  • check logic in terminal conditions
  • add logic for situations like passing of turn, termination

Added bitboard environment c3f83a4
To do

  • optimize runtime of bitboard environment
  • partial optimization in 8db56f7 and slightly more optimization in 0ac8950
  • Further optimization will require converting the entire code to C which will make interacting with all agents very difficult
  • optimize board conversion if possible
  • add function for board augmentation b1f3fce
  • since running a game is expensive, an alternative to playing a large number of games is to prepare augmentations of board states, which are simply rotated and mirrored versions of the board (including actions and next states). bitboards support these via bit operations.
  • [ ] modify the board augmentation function to add boards with inverted colors because inverting colors will change the first player and boards observed subsequently will never happen in reality.

MCTS and Minimax

  • Modify the MCTS agent to use C functions for improvement in speed 3ccf8e1
  • Modify the Minimax agent to use C functions for improvement in speed 15412fb

MCTS Comparison of Python and C at 100 simulations per move

Game Environment Type Player TypeTime
Python Python with Python Environment0.16 it/s
C Python with C Environment0.33 it/s
C C16 it/s

Minimax Comparison of Python and C at max depth of 3

Game Environment Type Player TypeTime
Python Python with Python Environment2 it/s
C Python with C Environment5 it/s
C C130 it/s
C C (max depth 4)43 it/s
C C (max depth 5)14 it/s
Depth 4 and 5 bring good consistency in win rate, still not perfect though

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.