glample / arnold Goto Github PK

Arnold - DOOM Agent

Python 97.56% Shell 2.44%

reinforcement-learning artificial-intelligence q-learning doom doom-2 self-playing neural-network vizdoom-competition

arnold's Introduction

Arnold

Arnold is a PyTorch implementation of the agent presented in Playing FPS Games with Deep Reinforcement Learning (https://arxiv.org/abs/1609.05521), and that won the 2017 edition of the ViZDoom AI Competition.

This repository contains:

The source code to train DOOM agents
A package with 17 selected maps that can be used for training and evaluation
5 pretrained models that you can visualize and play against, including the ones that won the ViZDoom competition

Installation

Dependencies

Arnold was tested successfully on Mac OS and Linux distributions. You will need:

Python 2/3 with NumPy and OpenCV
PyTorch
ViZDoom

Follow the instructions on https://github.com/mwydmuch/ViZDoom to install ViZDoom. Be sure that you can run import vizdoom in Python from any directory. To do so, you can either install the library with pip, or compile it, then move it to the site-packages directory of your Python installation, as explained here: https://github.com/mwydmuch/ViZDoom/blob/master/doc/Quickstart.md.

Code structure

.
├── pretrained                    # Examples of pretrained models
├── resources
    ├── freedoom2.wad             # DOOM resources file (containing all textures)
    └── scenarios                 # Folder containing all scenarios
        ├── full_deathmatch.wad   # Scenario containing all deathmatch maps
        ├── health_gathering.wad  # Simple test scenario
        └── ...
├── src                           # Source files
    ├── doom                      # Game interaction / API / scenarios
    ├── model                     # DQN / DRQN implementations
    └── trainer                   # Folder containing training scripts
├── arnold.py                     # Main file
└── README.md

Scenarios / Maps

Train a model

There are many parameters you can tune to train a model.

python arnold.py

## General parameters about the game
--freedoom "true"                # use freedoom resources
--height 60                      # screen height
--width 108                      # screen width
--gray "false"                   # use grayscale screen
--use_screen_buffer "true"       # use the screen buffer (what the player sees)
--use_depth_buffer "false"       # use the depth buffer
--labels_mapping ""              # use extra feature maps for specific objects
--game_features "target,enemy"   # game features prediction (auxiliary tasks)
--render_hud "false"             # render the HUD (status bar in the bottom of the screen)
--render_crosshair "true"        # render crosshair (targeting aid in the center of the screen)
--render_weapon "true"           # render weapon
--hist_size 4                    # history size
--frame_skip 4                   # frame skip (1 = keep every frame)

## Agent allowed actions
--action_combinations "attack+move_lr;turn_lr;move_fb"  # agent allowed actions
--freelook "false"               # allow the agent to look up and down
--speed "on"                     # make the agent run
--crouch "off"                   # make the agent crouch

## Training parameters
--batch_size 32                  # batch size
--replay_memory_size 1000000     # maximum number of frames in the replay memory
--start_decay 0                  # epsilon decay iteration start
--stop_decay 1000000             # epsilon decay iteration end
--final_decay 0.1                # final epsilon value
--gamma 0.99                     # discount factor gamma
--dueling_network "false"        # use a dueling architecture
--clip_delta 1.0                 # clip the delta loss
--update_frequency 4             # DQN update frequency
--dropout 0.5                    # dropout on CNN output layer
--optimizer "rmsprop,lr=0.0002"  # network optimizer

## Network architecture
--network_type "dqn_rnn"         # network type (dqn_ff / dqn_rnn)
--recurrence "lstm"              # recurrent network type (rnn / gru / lstm)
--n_rec_layers 1                 # number of layers in the recurrent network
--n_rec_updates 5                # number of updates by sample
--remember 1                     # remember all frames during evaluation
--use_bn "off"                   # use BatchNorm when processing the screen
--variable_dim "32"              # game variables embeddings dimension
--bucket_size "[10, 1]"          # bucket game variables (typically health / ammo)
--hidden_dim 512                 # hidden layers dimension

## Scenario parameters (these parameters will differ based on the scenario)
--scenario "deathmatch"          # scenario
--wad "full_deathmatch"          # WAD file (scenario file)
--map_ids_train "2,3,4,5"        # maps to train the model
--map_ids_test "6,7,8"           # maps to test the model
--n_bots 8                       # number of enemy bots
--randomize_textures "true"      # randomize walls / floors / ceils textures during training
--init_bots_health 20            # reduce initial life of enemy bots (helps a lot when using pistol)

## Various
--exp_name new_train             # experiment name
--dump_freq 200000               # periodically dump the model
--gpu_id -1                      # GPU ID (-1 to run on CPU)

Once your agent is trained, you can visualize it by running the same command, and using the following extra arguments:

--visualize 1                    # visualize the model (render the screen)
--evaluate 1                     # evaluate the agent
--manual_control 1               # manually make the agent turn about when it gets stuck
--reload PATH                    # path where the trained agent was saved

Here are some examples of training commands for 3 different scenarios:

Defend the center

In this scenario the agent is in the middle of a circular map. Monsters are regularly appearing on the sides and are walking towards the agent. The agent is given a pistol and limited ammo, and must turn around and kill the monsters before they reach it. The following command trains a standard DQN, that should reach the optimal performance of 56 frags (the number of bullets in the pistol) in about 4 million steps:

python arnold.py --scenario defend_the_center --action_combinations "turn_lr+attack" --frame_skip 2

Health gathering

In this scenario the agent is walking on lava, and is losing health points at each time step. The agent has to move and collect as many health pack as possible in order to survive. The objective is to survive the longest possible time.

python arnold.py --scenario health_gathering --action_combinations "move_fb;turn_lr" --frame_skip 5

This scenario is very easy and the model quickly reaches the maximum survival time of 2 minutes (35 * 120 = 4200 frames). The scenario also provides a supreme mode, in which the map is more complicated and where the health packs are much harder to collect:

python arnold.py --scenario health_gathering --action_combinations "move_fb;turn_lr" --frame_skip 5 --supreme 1

In this scenario, the agent takes about 1.5 million steps to reach the maximum survival time (but often dies before the end).

Deathmatch

In this scenario, the agent is trained to fight against the built-in bots of the game. Here is a command to train the agent using game features prediction (as described in [1]), and a DRQN:

python arnold.py --scenario deathmatch --wad deathmatch_rockets --n_bots 8 \
--action_combinations "move_fb;move_lr;turn_lr;attack" --frame_skip 4 \
--game_features "enemy" --network_type dqn_rnn --recurrence lstm --n_rec_updates 5

Pretrained models

Defend the center / Health gathering

We provide a pretrained model for each of these scenarios. You can visualize them by running:

./run.sh defend_the_center

./run.sh health_gathering

Visual Doom AI Competition 2017

We release the two agents submitted to the first and second tracks of the ViZDoom AI 2017 Competition. You can visualize them playing against the built-in bots using the following commands:

Track 1 - Arnold vs 10 built-in AI bots

./run.sh track1 --n_bots 10

Track 2 - Arnold vs 10 built-in AI bots - Map 2

./run.sh track2 --n_bots 10 --map_id 2

Track 2 - 4 Arnold playing against each other - Map 3

./run.sh track2 --n_bots 0 --map_id 3 --n_agents 4

We also trained an agent on a single map, using a same weapon (the SuperShotgun). This agent is extremely difficult to beat.

Shotgun - 4 Arnold playing against each other

./run.sh shotgun --n_bots 0 --n_agents 4

Shotgun - 3 Arnold playing against each other + 1 human player (to play against the agent)

./run.sh shotgun --n_bots 0 --n_agents 3 --human_player 1

References

If you found this code useful, please consider citing:

[1] G. Lample* and D.S. Chaplot*, Playing FPS Games with Deep Reinforcement Learning

@inproceedings{lample2017playing,
  title={Playing FPS Games with Deep Reinforcement Learning.},
  author={Lample, Guillaume and Chaplot, Devendra Singh},
  booktitle={Proceedings of AAAI},
  year={2017}
}

[2] D.S. Chaplot* and G. Lample*, Arnold: An Autonomous Agent to Play FPS Games

@inproceedings{chaplot2017arnold,
  title={Arnold: An Autonomous Agent to Play FPS Games.},
  author={Chaplot, Devendra Singh and Lample, Guillaume},
  booktitle={Proceedings of AAAI},
  year={2017},
  Note={Best Demo award}
}

Acknowledgements

We acknowledge the developers of ViZDoom for constant help and support during the development of this project. Some of the maps and wad files have been borrowed from the ViZDoom git repository. We also thank the members of the ZDoom community for their help with the Action Code Scripts (ACS).

arnold's People

Contributors

Stargazers

Watchers

Forkers

alokranjan1234 jiths kastnerkyle mhamine deanofthewebb codeaudit borasy shubhampachori12110095 nisheethj unity-dl tony32769 jdc08161063 martinrgb kefault thinkronize hongdazhang tensorpro kalengo hengqujushi zzmjohn 1987618girl teach-gtav vincentcaow a382695908 jiangfengjason concohnb lilonghua1987 edbeeching dunchen 25hrsaday pandinosaurus nke001 ja1r0 violaciao weizx208 billxie1 wwxfromtju flyers jangocheng mysonhushu afcarl reinforcement-learning zhamaoge codercodercoder lan1991xu rozenastraychen philipjball dannysdeng cxz dsapandora rosenfeldamir peteroxic puzhao8 bitbeyhub lydonlee wh-forker lgh0504 alwaysproblem wanghw0122 aloshkad janaegj biancini openerror mayoo00 rishavbhurtel qianljun phymucs banben sailfish009 nordnes superjeary armando-fandango eugeneyuz stjordanis ootwk donfanning cyqds wesley-yang atagade kbostan mikuh horizonyu leejwuniverse throne-myworld mtisz xiyuan68 zebrajack idiomaticrefactoring erenulu2020 xu-ht anoxiacxy marktrovinger oktonion purpleyoung jisonz asherbond karman103 alys28 shuopei lenguyen2592004

arnold's Issues

training in deathmatch

Hi, when I run "python arnold.py --scenario deathmatch --wad deathmatch_rockets --n_bots 8
--action_combinations "move_fb;move_lr;turn_lr;attack" --frame_skip 4
--game_features "enemy" --network_type dqn_rnn --recurrence lstm --n_rec_updates 5"

some error happen:
NFO [25900] - 04/19/18 18:46:51 - 0:00:00 - Input shape: (3, 60, 108)
INFO [25900] - 04/19/18 18:46:51 - 0:00:00 - Conv layer output dim : 4608
INFO [25900] - 04/19/18 18:46:51 - 0:00:00 - Hidden layer input dim: 4672
Traceback (most recent call last):
File "arnold.py", line 24, in
parse_game_args(remaining + ['--dump_path', dump_path])
File "/home/wwx/Arnold/src/args.py", line 114, in parse_game_args
module.main(parser, args)
File "/home/wwx/Arnold/src/doom/scenarios/deathmatch.py", line 110, in main
network = get_model_class(params.network_type)(params)
File "/home/wwx/Arnold/src/model/dqn/recurrent.py", line 62, in init
super(DQNRecurrent, self).init(params)
File "/home/wwx/Arnold/src/model/dqn/base.py", line 119, in init
self.module.cuda()
File "/home/wwx/anaconda3/envs/doom/lib/python3.5/site-packages/torch/nn/modules/module.py", line 216, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/wwx/anaconda3/envs/doom/lib/python3.5/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/home/wwx/anaconda3/envs/doom/lib/python3.5/site-packages/torch/nn/modules/rnn.py", line 123, in _apply
self.flatten_parameters()
File "/home/wwx/anaconda3/envs/doom/lib/python3.5/site-packages/torch/nn/modules/rnn.py", line 111, in flatten_parameters
params = rnn.get_parameters(fn, handle, fn.weight_buf)
File "/home/wwx/anaconda3/envs/doom/lib/python3.5/site-packages/torch/backends/cudnn/rnn.py", line 165, in get_parameters
assert filter_dim_a.prod() == filter_dim_a[0]
AssertionError

Different dimensions between model and checkpoint

I run this command at the terminal:
python arnold.py --scenario deathmatch --wad deathmatch_rockets --n_bots 8 --action_combinations "move_fb;move_lr;turn_lr;attack" --frame_skip 4 --game_features "enemy" --network_type dqn_rnn --recurrence lstm --n_rec_updates 5 --visualize 1 --evaluate 1 --manual_control 1 --reload /home/yang/src/Arnold/pretrained/deathmatch_shotgun.pth
And this error occur:
Traceback (most recent call last): File "arnold.py", line 24, in <module> parse_game_args(remaining + ['--dump_path', dump_path]) File "/home/yang/src/Arnold/src/args.py", line 114, in parse_game_args module.main(parser, args) File "/home/yang/src/Arnold/src/doom/scenarios/deathmatch.py", line 116, in main network.module.load_state_dict(reloaded) File "/home/yang/local/anaconda3/envs/py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 519, in load_state_dict .format(name, own_state[name].size(), param.size())) RuntimeError: While copying the parameter named conv.0.weight, whose dimensions in the model are torch.Size([32, 3, 8, 8]) and whose dimensions in the checkpoint are torch.Size([32, 4, 8, 8]).
Do you know why?Thanks a lot.

Error While training

When I try to train the model on a kaggle by running

python arnold.py
--freedoom "true"                # use freedoom resources
--height 60                      # screen height
--width 108                      # screen width
--gray "false"                   # use grayscale screen
--use_screen_buffer "true"       # use the screen buffer (what the player sees)
--use_depth_buffer "false"       # use the depth buffer
--labels_mapping ""              # use extra feature maps for specific objects
--game_features "target,enemy"   # game features prediction (auxiliary tasks)
--render_hud "false"             # render the HUD (status bar in the bottom of the screen)
--render_crosshair "true"        # render crosshair (targeting aid in the center of the screen)
--render_weapon "true"           # render weapon
--hist_size 4                    # history size
--frame_skip 4                   # frame skip (1 = keep every frame)


--action_combinations "attack+move_lr;turn_lr;move_fb"  # agent allowed actions
--freelook "false"               # allow the agent to look up and down
--speed "on"                     # make the agent run
--crouch "off"                   # make the agent crouch


--batch_size 32                  # batch size
--replay_memory_size 1000000     # maximum number of frames in the replay memory
--start_decay 0                  # epsilon decay iteration start
--stop_decay 1000000             # epsilon decay iteration end
--final_decay 0.1                # final epsilon value
--gamma 0.99                     # discount factor gamma
--dueling_network "false"        # use a dueling architecture
--clip_delta 1.0                 # clip the delta loss
--update_frequency 4             # DQN update frequency
--dropout 0.5                    # dropout on CNN output layer
--optimizer "rmsprop,lr=0.0002"  # network optimizer


--network_type "dqn_rnn"         # network type (dqn_ff / dqn_rnn)
--recurrence "lstm"              # recurrent network type (rnn / gru / lstm)
--n_rec_layers 1                 # number of layers in the recurrent network
--n_rec_updates 5                # number of updates by sample
--remember 1                     # remember all frames during evaluation
--use_bn "off"                   # use BatchNorm when processing the screen
--variable_dim "32"              # game variables embeddings dimension
--bucket_size "[10, 1]"          # bucket game variables (typically health / ammo)
--hidden_dim 512                 # hidden layers dimension


--scenario "deathmatch"          # scenario
--wad "full_deathmatch"          # WAD file (scenario file)
--map_ids_train "2,3,4,5"        # maps to train the model
--map_ids_test "6,7,8"           # maps to test the model
--n_bots 8                       # number of enemy bots
--randomize_textures "true"      # randomize walls / floors / ceils textures during training
--init_bots_health 20            # reduce initial life of enemy bots (helps a lot when using pistol)

--exp_name new_train             # experiment name
--dump_freq 200000               # periodically dump the model
--gpu_id -1

I get the following Error
Can someone help me fix this

No such file or directory, confusing concatenation?

(I'm using a conda environment on windows) When trying to run from the root directory:

python arnold.py --exp_name test --main_dump_path dumped --scenario defend_the_center --frame_skip 2 --action_combinations "turn_lr+attack" --reload pretrained/defend_the_center.pth --evaluate 1 --visualize 1 --gpu_id 0

I get the error No such file or directory: 'dumped\\test\\<testid>\\pretrained/defend_the_center.pth'.
Looking into this I saw that the paths are concatenated.

Shouldn't the weights be loaded from the pretrained directory instead?
Eg:
model_path = params.reload

What is the expected functionality?

What's the training settings for track_1 and track_2 model

First of all thanks for releasing the source code for the successful training of Doom agents.
The pretrained models contain the winner model of the last year's competition. May I know what's the exact training setting for those two models? Does it requires some curriculum learning stages?
Thanks very much.

Could not initialize SDL video:

I tried to run it on linux(the wsl) but things turned to be complex:
Traceback (most recent call last):
File "arnold.py", line 24, in
parse_game_args(remaining + ['--dump_path', dump_path])
File "/mnt/d/yuan/Arnold-master/src/args.py", line 114, in parse_game_args
module.main(parser, args)
File "/mnt/d/yuan/Arnold-master/src/doom/scenarios/defend_the_center.py", line 90, in main
evaluate_defend_the_center(game, network, params)
File "/mnt/d/yuan/Arnold-master/src/doom/scenarios/defend_the_center.py", line 107, in evaluate_defend_the_center
game.start(map_id=map_id, episode_time=params.episode_time, log_events=True)
File "/mnt/d/yuan/Arnold-master/src/doom/game.py", line 491, in start
self.game.init()
vizdoom.vizdoom.ViZDoomErrorException: Could not initialize SDL video:
No available video device

Not support PyTorch 0.4.0

I test the code using PyTorch 0.4.0 under windows, it occurs an error beacase of the changed 'Variable' in PyTorch 0.4.0. Then I test on PyTorch 0.3.1, it's ok.

AssertionError

Traceback (most recent call last):
  File "arnold.py", line 24, in <module>
    parse_game_args(remaining + ['--dump_path', dump_path])
  File "/home/ricardoliu/Arnold/src/args.py", line 114, in parse_game_args
    module.main(parser, args)
  File "/home/ricardoliu/Arnold/src/doom/scenarios/health_gathering.py", line 101, in main
    evaluate_health_gathering(game, network, params)
  File "/home/ricardoliu/Arnold/src/doom/scenarios/health_gathering.py", line 144, in evaluate_health_gathering
    action = network.next_action(last_states)
  File "/home/ricardoliu/Arnold/src/model/dqn/base.py", line 193, in next_action
    scores, pred_features = self.f_eval(last_states)
  File "/home/ricardoliu/Arnold/src/model/dqn/feedforward.py", line 51, in f_eval
    [variables[-1, i] for i in range(self.params.n_variables)]
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ricardoliu/Arnold/src/model/dqn/feedforward.py", line 27, in forward
    for x in x_variables)
AssertionError

How to combine LSTM and experience replay

I notice that you use replay buffer with experience decomposed into independent screens, while processing these screens with a LSTM layer. From my point of view, LSTM has a strong relationship with temporal sequence and seems to be unsuitable for this kind of usage. Did I miss anything?

FileNotFoundError:

when I download the code and run arnold.py using pycharm in Windows, things go like that:

FileNotFoundError: [Errno 2] No such file or directory: 'D:\PyCharmproject\Arnold-master\Arnold-master\dumped\default\vhzihaaocq\train.log'
Can this programm run successfully using Win10?

You made this only with PyTorch? OMG!

And how are things with fps?

Question about batch size

hello glample，I want to know why the default batch size is set to 32. Is there any special meaning?Have you tried a larger batch size for training? @glample

probelms in comparing code to article "Playing FPS Games with Deep Reinforcement Learning"

as far as i understand according to the article there are supposed to be two neural nets.
one DQN for navigation scope decision and the other DRQN for action scope decision and feature extraction. in the code i don't see the option of 2 networks loading simultaneously. also in the next_action function i don't see any use of the game_features that extracted from DRQN net.