Git Product home page Git Product logo

random-network-distillation-pytorch's Introduction

Random Network Distilation

This is an implementation of Random network distillation on Montezuma's Revenge using Pytorch.
paper link: https://arxiv.org/abs/1810.12894

Setup

To run the program, first install the required packages by executing:

$ pip3 install -r requirements.txt

Play

Run the program with pretrained model and see the agent playing:

$ python3 main.py --play --path models/pretrained_model.pth

entropy
x-axis: train steps, y-axis: entropy
this diagram shows how entropy decreases. the agent starts by total random movements and learns a stochastic policy after being trained.

The pretrained model models/pretrained_model.pth is obtained by training with the following settings:

variable value
environment type "MR"
number of train steps 11400
normilization steps parameter 1000
number of environments 64
number of epoches 4
agent steps(rollout) 128
number of mini batches 2
learning rate 0.0001
discount factor 0.999
intrinsic discount factor 0.99
lambda(related to generilized advantage estimation algorithm) 0.95
clip(related to PPO algorithm) 0.1
value loss coefficient 0.5
entropy coefficient 0.001
the predictior's update proportion 0.25
intrinsic advantages coefficient 1
extrinsic advantages coefficient 2

Train

You can train from a model from scratch by using the following command. Note that if you don't specify the variables, They match the default value described in the table above. The save_int varibale describes the interval of saving a model checkpoint.

Some useful diagrams are stored in tensorboard format while training.

python3 main.py --train --num_env 64 --train_steps 12000 --predictor_update_p 0.25 --num_pre_norm_steps 10 --game_steps 128 --num_epoch 4 --mini_batch 2 --save_int 100 

Train from a checkpoint:

python3 main.py ---train --path logs/desired_checkpoint --num_env 64 --train_steps 12000 --predictor_update_p 0.25 --num_pre_norm_steps 10 --game_steps 128 --num_epoch 4 --mini_batch 2 --save_int 100 

random-network-distillation-pytorch's People

Contributors

justkim avatar dependabot[bot] avatar

Stargazers

 avatar Denamganai Kevin avatar Kaustubh Mani avatar  avatar zfu avatar Pietro Mazzaglia avatar

Watchers

James Cloos avatar  avatar

random-network-distillation-pytorch's Issues

Reference to why to use the 'extra' dense layer in the value network

Hi @Justkim!

Thank you for this implementation :) I have a question. In your model implementation you use an extra dense layer in a fashion that reassemble the residual connection. Why is that? Couldn't we simply make the value head deeper (have one more dense layer without this skip connection)? Can you provide me with some reference that shows it helps and discusses why?

I'm talking about this line:

predicted_int_value = self.int_value(F.relu(self.extra(x)) + x)[:, 0]

Thanks,
Piotr

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.