Git Product home page Git Product logo

bentrevett / pytorch-rl Goto Github PK

View Code? Open in Web Editor NEW
256.0 6.0 74.0 56.99 MB

Tutorials for reinforcement learning in PyTorch and Gym by implementing a few of the popular algorithms. [IN PROGRESS]

License: MIT License

Jupyter Notebook 98.84% Python 1.16%
pytorch pytorch-tutorial pytorch-implmention pytorch-implementation reinforcement-learning reinforcement-learning-algorithms rl pytorch-tutorials pytorch-rl policy-gradient

pytorch-rl's Introduction

PyTorch Reinforcement Learning

This repo contains tutorials covering reinforcement learning using PyTorch 1.3 and Gym 0.15.4 using Python 3.7.

If you find any mistakes or disagree with any of the explanations, please do not hesitate to submit an issue. I welcome any feedback, positive or negative!

Getting Started

To install PyTorch, see installation instructions on the PyTorch website.

To install Gym, see installation instructions on the Gym GitHub repo.

Tutorials

All tutorials use Monte Carlo methods to train the CartPole-v1 environment with the goal of reaching a total episode reward of 475 averaged over the last 25 episodes. There are also alternate versions of some algorithms to show how to use those algorithms with other environments.

Potential algorithms covered in future tutorials: DQN, ACER, ACKTR.

References

pytorch-rl's People

Contributors

bentrevett avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pytorch-rl's Issues

3 - Advantage Actor Critic (A2C) [CartPole].ipynb - Returns do not need to be detached

Hi Ben

Thanks for the interesting notebooks. Upon studying the "3 - Advantage Actor Critic (A2C) [CartPole].ipynb" notebook, I came to the conclusion that detaching the returns in the update_policy() function is not necessary. The returns are only calculated on the rewards which are environment outputs and therefore not part of the computational graph. So even leaving out the .detach() call should not affect the model. Would you agree?

Taking 'done' into consideration while calculating returns

Hello, thank you for making this repo,
I think while calculating the returns you should take done into consideration as,


    def calculate_returns(self, rewards, dones, normalize = True):
       
        returns = []
        R = 0
        for r, d in zip(reversed(rewards), reversed(dones)):    
            if d:
                R = 0
            R = r + R * self.gamma
            returns.insert(0, R)
            
        returns = torch.tensor(returns).to(device)
        
        if normalize:
            returns = (returns - returns.mean()) / returns.std()
            
        return returns

Also can you please briefly describe the Generalized Advantage Estimation (GAE) while calculating the advantages.

Adding a sample_action method for ActorCritic

Hello! I've been learning how to code RL form your repo. I've replace duplicating code lines from
def train
def update_policy

to agent's method self.sample_action(). And it seems that agent now solves Cart-Pole problem x2 slower(num of episodes). And it happes everytime. I have no idea what happens with torch and havn't found anything on Internet.
Can you pls help me?

https://github.com/lemikhovalex/pytorch-rl
5_tr - Proximal Policy Optimization (PPO) [CartPole]-Copy1.ipynb

actor critic possible error

Hello, Ben!
Thank you for a great tutorial series. I have a question regarding your actor-critic notebook.
In function update_policy

def update_policy(returns, log_prob_actions, values, optimizer):

    returns = returns.detach()
    
    policy_loss = - (returns * log_prob_actions).sum()
    
    value_loss = F.smooth_l1_loss(returns, values).sum()
    
    optimizer.zero_grad()
    
    policy_loss.backward()
    value_loss.backward()
    
    optimizer.step()
    
    return policy_loss.item(), value_loss.item()

in policy loss you calculate the usual policy gradient for agent, in value loss you calculate loss for the critic. They seem to be independent, the critic does not affect the agent at all. Shouldn't returns for policy loss be calculated with values from critic or something like that?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.