The reinforcement-learning from ashwinpn

RL Landscape
RL History
RL Agents Implementation
RL Environments
RL Mechanisms
RL Applications
References
- Reference Implementations
- Review Papers
- RL Platforms
- Deep Reinforcement Learning Papers

RL Landscape

Source: eleurent/phd-bibliography

RL Agents Implementation

Value Optimization
- [QR-DQN]
- [DQN] - [Slides] [Code] [rainbow]
- [Bootstrapped DQN]
- [DDQN]
- [NEC]
- [MMC]
- [N-step Q Learning]
- [PAL]
- [Categorical DQN]
- [NAF]
Policy Optimization
- [Policy Gradient]
- [Actor Critic]
  - [DDPG] [Code]
    - [HAC DDPG]
    - [DDPG with HER]
  - [Clipped PPO]
  - [PPO]
[DFP]
Imitation
- [Behavioural cloning]
- [Inverse Reinforcement Learning] [Code] [irl-imitation-code]
- [Generative Adversarial Imitation Learning]

Imitation Learning Agents

Behavioral Cloning (BC) (code)

Hierarchical Reinforcement Learning Agents

Hierarchical Actor Critic (HAC) (code)

Memory Types

Exploration Techniques

E-Greedy (code)
Boltzmann (code)
Ornstein–Uhlenbeck process (code)
Normal Noise (code)
Truncated Normal Noise (code)
Bootstrapped Deep Q Network (code)
UCB Exploration via Q-Ensembles (UCB) (code)
Noisy Networks for Exploration (code)

RL History

Temporal difference(TD) learning (1988)
Q‐learning (1998)
BayesRL (2002)
RMAX (2002)
CBPI (2002)
PEGASUS (2002)
Least‐Squares Policy Iteration (2003)
Fitted Q‐Iteration (2005)
GTD (2009)
UCRL (2010)
REPS (2010)
DQN (2014) - DeepMind

RL Environments

[Acrobot]
[Bike]
[Blackjack]
[Cartpole]
[ContextBandit]
[Continuous Chain]
[Corridor]
[Discrete Chain]
[Discretiser (for continuous environments)]
[Double Loop]
[Environment]
[Gridworld]
[Inventory management]
[Linear context bandit]
[Linear dynamic quadratic]
[Mountaincar (2d and 3d)]
[POMDP Maze]
[Optimistic Task]
[Puddleworld]
[Random MDPs]
[Riverswim]

RL Mechanisms

[Attention and Memory]
[Unsupervised learning ]
- [GANs]
- [GQN]
- [UNREAL]
[Hierarchical RL]
- [FuNs]
- [Option-Critic]
- [STRAW]
- [h-DQN]
- [Stochastic Neural Networks]
[Multi-agent RL]
[Relational RL]
[Learning to Learn, a.k.a. Meta-Learning]
- [Few/One/Zero-shot Learning]
  - [MAML]
- [Transfer and Multi-Task Learning]
- [Learning to Optimize]
- [Learning to Re-inforcement Learn]
- [Learning Combinatorial Optimization]
- [AutoML]

RL Games

Chinook (1997;2007) for Checkers,
Deep Blue (2002) for chess,
Logistello (1999) for Othello,
TD-Gammon (1994) for Backgammon,
GIB (2001) for contract bridge,
MoHex (2017) for Hex,
DQN (2016)(2018) for Atari 2600 games,
AlphaGo (2016a) and AlphaGo Zero (2017) for Go,
Alpha Zero (2017) for chess, shogi, and Go,
Cepheus (2015), DeepStack (2017), and Libratus (2017a;b) for heads-up Texas Hold’em Poker,
Jaderberg et al. (2018) for Quake III Arena Capture the Flag,
OpenAI Five, for Dota 2 at 5v5, https://openai.com/five/,
Zambaldi et al. (2018), Sun et al. (2018), and Pang et al. (2018) for StarCraft II

[Board Games]
- [Computer Go]
- [AlphaGo: Trainig pipeline with MCTS]
- [AlphaGo Zero]
- [Alpha Zero]
[Card Games]
- [DeepStack]
[Video Games]
- [Atari 2600 games]
- [StarCraft]
- [StarCraft II mini-games]
- [Quake III Arena]
- [Minecraft]
- [Super Smash Bros]
- [Doom]
- [ViZDoom]

DRL applied to Robotics

[Sim-to-Real]
- [MuJoCo]
[Imitation Learning]
[Value-based Learning]
[Policy-based Learning]
[Model-based Learning]
[Autonomous Driving Vehicles]

DRL applied to NLP

[Sequence Generation]
[Machine Translation]
[Dialogue Systems]

DRL applied to Vision

[Recognition]
[Motion Analysis]
[Scene Understanding]
[Vision + NLP]
[Visual Control]
[Interactive Perception]

ashwinpn / reinforcement-learning Goto Github PK

reinforcement-learning's Introduction

Contents

RL Landscape

RL Agents Implementation

Value Optimization Agents

Policy Optimization Agents

General Agents

Imitation Learning Agents

Hierarchical Reinforcement Learning Agents

Memory Types

Exploration Techniques

RL History

RL Environments

RL Mechanisms

RL Games

DRL applied to Robotics

DRL applied to NLP

DRL applied to Vision

References

reinforcement-learning's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org