Deep Reinforcement Learning
DRL university course lecture notes & exercises
Chapter |
Sections recap |
Hello world |
Basic terminology and definitions (based on spinning up RL, by openAI) |
RL Basics |
MDPs, Polciy/Value-Iteration, MC, SARSA & Q-Learning |
DQN & it's derivatives |
Deep Q-Network (DQN), Double DQN, Dueling-DQN |
Policy Gradients |
REINFORCE, REINFORCE with Baseline, Actor-Critic methods |
Imitation Learning |
Apprenticeship, Supervised and forward learning. Dagger, Dagger with coaching |
Multi-Armed Bandit |
Bandit algorithm, Gradient based algorithm, contextual bandits, Thompson sampling |
RL use-case: AlphaGo |
Monte Carlo Tree Search, AlphaGo, AlphaZero |
Meta and Transfer Learning |
Concepts in Meta learning and Transfer learning in the context of RL |
Exercise |
Description |
ex1 |
Q-Learning and Deep-Q-Learning (DQN) implementations from scratch |
ex2 |
REINFORCE (with and without baseline) and Monte Carlo Actor-Critic implementations from scratch |
ex3 |
|