FLOW
- Using RLlib - change model, parameter, algorithm => Performance evaluation => but not efficent ~ 5/18?
- Tree based Search algo (MCTS, Minimax algorithm) => possible, but many heuristic ~ 5/25?
- AlphaZero - policy net, value net applies to MCTS => AWESOME POLICY ~ 6/01
TODO (~5/18)
- CNN Model + RLlib - Faeyza
- MCTS implementation - TAEGU
- PPO, SAC, DQN, dreamerv3, impala - Gyojun