Git Product home page Git Product logo

quirky-quokka's Introduction

Environment-related difference of Deep Q-Learning and Deep Double Q-Learning

This project was conducted for the Reinforcement Learning course of the Master Artificial Intelligence at the University of Amsterdam during the Winter term 2018/2019,

Introduction

The Q-learning algorithm is known to overestimate state-action values under certain conditions. A positive bias is introduced, as Q-learning uses the same function to select actions and evaluate a state-action pair. Hence, overoptimistic values are more likely to be used in updates. Hasselt (2016) showed that this can indeed be harmful for performance using Deep Neural Networks and proposed the Deep Double Q-Learning algorithm. We apply both algorithms and compare their performance on different environments provided by Open AI Gym (Brockman et al., 2016).

Experiments

We used the environments CartPole-v1, Acrobot-v1, MountainCar-v0, Pendulum-v0 from Open AI Gym and performed joined grid search over the hyperparameters on random seeds for both models with 10 models each , the best configuration was selected based on the highest overall rewards achieved. The hyperparameters can be found in hyperparameters.py.

Afterwards, we trained 15 models on each environment and used a Whitney-Man-U test to test whether the significance in Q-value estimates is statistically significant (p=0.05).

Results

The figures below show the average of Q-Value estimates during training for DQN and DDQN on four different environments over 15 different models. Intervals are determined by averaging the two most extreme values during every time step. The true values are determined by applying Monte Carlo rollout where the whole episode is played out to compute the total rewards following the policy found after training terminated. These true returns are indicated by the dashed lines. Markers (bottom of x-axis) indicate time steps with statistically significant differences between DQN and DDQN (p = 0.05).


We observe the following things:

  • Both algorithms perform well on CartPole-v1; environment less challenging due to easy credit assignment (immediate, positive and constant rewards)
  • Pendulum-v0: Both get similar Q-value estimates, but DQN performs better than DDQN. The reason for these observations might be due to the complex reward function requiring careful actions
  • We can confirm the findings of Hasselt et al. (2016) for Acrobot-v1: DDQN performs better on this task while also having Q-estimates that are closer to the true values
  • Although both algorithms learn to solve the environment MountainCar-v0 , we suspect that using Deep Neural Networks to parameterize the Q-function leads to unstable and unrealistic Q-values (non-negative)

Conclusion

  • Only Acrobot-v1 shows significant performance improvement when using DDQN
  • Whether DDQN improves performance depends on the reward structure of the environment, in our environments the reward distributions are very narrow, so there is not much room for maxmization bias to occur
  • Due to parameterization with Deep Neural Networks, Q-values can be unstable while still achieving the objective

Usage

To run the experiments yourself, first install all the necessary requirements

cd quirky-quokka
pip3 install -r requirements.txt

Then run the training script:

python3 train.py

Models will be saved to the models/ folder, plots and the data they are based on into img/ and data/ respectively. Lastly, trained models can also be tested by rendering their environments using

python3 test.py ./models/<model_name>.pt

where the environment name has to be included in <model_name>.

References

Hado van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double q-learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence , AAAI’16, pages 2094–2100. AAAI Press, 2016.

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arXiv preprint arXiv:1606.01540 , 2016.

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al.

quirky-quokka's People

Contributors

christina-winkler avatar davidmrau avatar kaleidophon avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.