Git Product home page Git Product logo

microtbs-rl's Introduction

RL algorithms for the MicroTbs learning environment

This repository contains:

  • Implementation of the gym-compatible learning environment called MicroTbs (short for Micro Turn-Based Strategy)
  • Reinforcement learning algorithms aimed to solve this game:
    • DQN algorithm, inspired by the original DeepMind's work
    • Synchronous Advantage Actor-Critic (A2C), based on the original A3C algorithm, and OpenAI baselines implementation.

This repository was created for educational purposes, to try and learn different RL techniques. It may be useful for people who also want to experiment with deep RL or just seek simple and easy-to-understand implementations of existing RL algorithms

Here's how it looks

  1. A2C solves simple version of the environment MicroTbs-CollectWithTerrain-v2:

    a2c_terrain

    a longer animation: https://youtu.be/JykBihC0TvM

  2. Feed-forward A2C in a more challenging version of the environment MicroTbs-CollectPartiallyObservable-v3 (clickable):

    Feed-forward A2C

As seen in this example, a simple feed-forward method underperforms on this task, mostly due to following reasons:

  • Feed-forward architecture does not "remember" where the agent has been on previous steps, thus it regularly gets stuck and fails to explore unseen parts of the environment. This can be solved by adding some memory to the policy network (like LSTM cells) and training it as a recurrent net.

  • Agent has only visual input, the numeric information like the number of remaining movepoints isn't passed to the agent. Therefore, it is not able to plan it's actions optimally in many cases.

  • Quite curiously, the agent avoids "Stables" (brown object that gives additional movepoints). This is caused by the tiny negative reward that agent receives for making each step that does not collect gold. Early in the training, when the agent moves mostly randomly, taking stables means significantly increasing the penalty received by the end of the episode.

  • The agent's policy is stochastic, and during training the policy net is penalized for having too low entropy of the probability distribution of actions. This encourages exploration and prevents early convergence to sub-optimal policies, but sometimes can lead to seemingly stupid behavior during the policy execution. Maybe it's a good idea to decay the entropy penalty during training.

  • Due to the nature of the A2C algorithm, the agent fails to plan sufficiently far ahead, and thus often gets stuck in obstacles. This can be solved by something like a Monte-Carlo playout into the (predicted) future or an imagination module.

About the environment

This environment was created as a playground, to experiment with various RL algorithms. It resembles some of the traits of certain turn-based strategies like HOMM3, hence the name. The task is to collect as much resources (gold) as possible with a limited number of movepoints.

Cell types in the environment:

  • Red - a hero
  • Yellow - gold piles
  • Grey - walls, obstacles
  • Green - swamp, increases movepoint penalty per move
  • Brown - stables, increase hero's movepoints
  • Light blue - lookout tower, opens the map in a certain radius
  • Black - fog-of-war, unknown territory

Versions of the environment:

  • CollectSimple - plain gold collection, no terrain or obstacles
  • CollectWithTerrain - same, but with walls and obstacles in play area
  • CollectPartiallyObservable - with all types of cells, map is bigger than view size and must be explored

There's also a PvP version of the environment, that allows experiments with self-play, but it is unfinished.

How-to

Play the environment by yourself, with human controls:

python -m microtbs_rl.envs.gameplay

Train a DQN agent with default parameters and see how it works:

python -m microtbs_rl.algorithms.dqn.train_dqn
python -m microtbs_rl.algorithms.dqn.enjoy_dqn

Train an A2C agent with default parameters and see how it works:

python -m microtbs_rl.algorithms.a2c.train_a2c
python -m microtbs_rl.algorithms.a2c.enjoy_a2c

Train a baseline OpenAI DQN implementation and see how it works:

python -m microtbs_rl.algorithms.baselines.openai_baselines.train_baseline_dqn
python -m microtbs_rl.algorithms.baselines.openai_baselines.enjoy_baseline_dqn

To compare performance and learning curves of different algorithms you can modify the file plotter.py to add the experiments you're interested in. Then just run:

python -m microtbs_rl.utils.plotter

Run unit tests:

python -m unittest

You can install this package into your python env and use it as a dependency:

pip install -e .

If you have any questions or problems please feel free to reach me: [email protected] Or just go ahead and open an issue here on Github.

microtbs-rl's People

Contributors

alex-petrenko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.