q_ret update not used

MPO (Maximum a Posteriori Policy Optimization)

Pytorch implementation of MPO (works cited below) with the help of other repositories (also cited below).

Policy evaluation is done using Retrace.

Currently only accommodate Discrete gym environments.

Usage

Look at main.py for examples of using MPO.

The architectures for Actor and Critic can be changed in mpo_net.py.

Citations

Maximum a Posteriori Policy Optimisation (Original MPO algorithm)

https://arxiv.org/abs/1806.06920

Relative Entropy Regularized Policy Iteration (Improved MPO algorithm)

https://arxiv.org/abs/1812.02256

daisatojp's mpo github repository (MPO implementation as reference)

https://github.com/daisatojp/mpo

Openai's ACER github repository (Replay buffer implementation as reference)

https://github.com/openai/baselines/tree/master/baselines/acer

Training Results

5 parallel environments

5 paralle environments

	for step in reversed(range(nsteps)):
	q_ret = reward_batch[step] + self.γ * q_retraces[step + 1] * (1 - done_batch[step + 1])
	q_retraces[step] = q_ret
	q_ret = (rho_i[step] * (q_retraces[step] - q_i[step])) + val[step]

acyclics / mpo Goto Github PK

mpo's Introduction

MPO (Maximum a Posteriori Policy Optimization)

Usage

Citations

Training Results

mpo's People

Contributors

Stargazers

Watchers

Forkers

mpo's Issues

q_ret update not used

My 'loss_l' goes to 1.837 and the model never improves

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent