inoryy / reaver Goto Github PK
View Code? Open in Web Editor NEWReaver: Modular Deep Reinforcement Learning Framework. Focused on StarCraft II. Supports Gym, Atari, and MuJoCo.
License: MIT License
Reaver: Modular Deep Reinforcement Learning Framework. Focused on StarCraft II. Supports Gym, Atari, and MuJoCo.
License: MIT License
First, if I run python -m reaver.run --env MoveToBeacon --agent a2c --n_envs 4 2> stderr.log
I get UnimplementedError (see above for traceback): Generic conv implementation only supports NHWC tensor format for now.
So I changed line67 in run.py into if not int(args.gpu)
And after that, this problem seems to be solved, but I got another Problem whenever the game loading is done:ValueError: Argument is out of range for 12/Attack_screen (3/queued [2]; 0/screen [0, 0]), got: [[1], [8, 40]]
The Argument that is out of range is not the same each time. So is there something I overlooked? Thx
Most likely due to faulty masking of unavailable actions. Maybe clipping probs to 1e-6 is a bad idea?
Related to #7
Dear Inorry,
Thanks your sharing, I can learn a lot. Now I have trained four minigames and they are consistent with your results. But the other three minigames can not run. the error is id 1/id 17 unknown. I use gtx1080 ti , ubuntu16.04. I wonder if it has something to do with it?
Hey,
I wanted to ask about calculation in sample
function.
return tf.argmax(tf.log(u) / probs, axis=1)
it divides from probs
. Does that mean that lower probabilities have better chances to get picked? Better exploration???
Current default is essentially /dev/null
which is probably not the expected behavior for people trying to run reaver from inside their own codebase.
Hello I m trying to run the script with these flags that essentially specify both screen and feature observations:
`parser = argparse.ArgumentParser()
parser.add_argument("--gpu", type=int, default=0)
parser.add_argument("--sz", type=int, default=32)
parser.add_argument("--feature_screen_size", type=int, default=84)
parser.add_argument("--feature_minimap_size", type=int, default=64)
#action space features 1, rgb 2 (needed if both rgb and features are on)
parser.add_argument("--action_space", type=str, default='features')
parser.add_argument("--rgb_screen_size", type=str, default="120")
parser.add_argument("--rgb_minimap_size", type=str, default="64")
parser.add_argument("--envs", type=int, default=32)
parser.add_argument("--render", type=int, default=1)
parser.add_argument("--steps", type=int, default=16)
parser.add_argument("--updates", type=int, default=1000000)
parser.add_argument('--lr', type=float, default=7e-4)
parser.add_argument('--vf_coef', type=float, default=0.25)
parser.add_argument('--ent_coef', type=float, default=1e-3)
parser.add_argument('--discount', type=float, default=0.99)
parser.add_argument('--clip_grads', type=float, default=1.)
parser.add_argument("--run_id", type=int, default=-1)
parser.add_argument("--map", type=str, default='MoveToBeacon')
parser.add_argument("--cfg_path", type=str, default='config.json.dist')
parser.add_argument("--test", type=bool, nargs='?', const=True, default=False)
parser.add_argument("--restore", type=bool, nargs='?', const=True, default=False)
parser.add_argument('--save_replay', type=bool, nargs='?', const=True, default=False)`
but I got this error:
return [self._preprocess(obs, _type) for _type in ['screen', 'minimap'] + self.feats['non_spatial']] File "/home/dstefanidis/starcraft_codes/pysc2-rl-agent/common/config.py", line 106, in _preprocess spatial = [[ob[_type][f.index] for f in self._feats(_type)] for ob in obs] File "/home/dstefanidis/starcraft_codes/pysc2-rl-agent/common/config.py", line 106, in <listcomp> spatial = [[ob[_type][f.index] for f in self._feats(_type)] for ob in obs] File "/home/dstefanidis/starcraft_codes/pysc2-rl-agent/common/config.py", line 106, in <listcomp> spatial = [[ob[_type][f.index] for f in self._feats(_type)] for ob in obs] KeyError: 'screen'
Hello,
Can you please elaborate further on how you recorded the full graphics replay? I am currently using SC2 Linux version 4.1.2 and have been trying to watch the full graphics replay on Windows by logging into the Battle.net, but I keep failing to do so presumably because of the version difference. You briefly mentioned on the README about it, but can you please explain it in a little more detail on how you did it?
Thank you.
There's no os.fork()
in Windows, so it seems that when I launch a new worker it re-creates ProcEnv
object, which no longer has access to MultiProcEnv
shared memory reference. Need to either rewrite how I give the reference or temporarily implement message-based communication instead for Windows.
What does the parameter "--sz" mean in main.py?
Thank you for always kindly answering.
Suddenly the computer stopped when learning and tried to use the restore function to start with the learning results done so far. However, I discovered in the log file the phenomenon that reading was no longer progressing and stopped at a place.
In this case, which part is the problem?
From Dohyeong
I received some errors when running the code.
InvalidArgumentError (see above for traceback): CPU BiasOp only supports NHWC. [[Node: Conv/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](Conv/convolution, Conv/biases/read)]]
I want to use 2 gpu so I modified the args.gpu=2.
Hey I created an env on conda to test reaver and when i tried the command to try Beacon I had the logger issue. I followed your hotfix but I end up having weird errors depending on the agent I specify (A2C/PPO).
For PPO for instance I get this error :
Traceback (most recent call last): File "C:\ProgramData\Anaconda3\envs\tf\lib\multiprocessing\process.py", line 258, in _bootstrap self.run() File "C:\ProgramData\Anaconda3\envs\tf\lib\multiprocessing\process.py", line 93, in run self._target(*self._args, **self._kwargs) File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\base\msg_multiproc.py", line 48, in _run obs = self._env.reset() File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\sc2.py", line 73, in reset obs, reward, done = self.obs_wrapper(self._env.reset()) File "C:\ProgramData\Anaconda3\envs\tf\lib\multiprocessing\process.py", line 258, in _bootstrap self.run() File "C:\ProgramData\Anaconda3\envs\tf\lib\multiprocessing\process.py", line 258, in _bootstrap self.run() File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\sc2.py", line 130, in __call__ obs['feature_screen'][self.feature_masks['screen']], File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\pysc2\lib\named_array.py", line 145, in __getitem__ index = _get_index(obj, index) File "C:\ProgramData\Anaconda3\envs\tf\lib\multiprocessing\process.py", line 93, in run self._target(*self._args, **self._kwargs) File "C:\ProgramData\Anaconda3\envs\tf\lib\multiprocessing\process.py", line 93, in run self._target(*self._args, **self._kwargs) File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\pysc2\lib\named_array.py", line 207, in _get_index "Can't index by type: %s; only int, string or slice" % type(index)) File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\base\msg_multiproc.py", line 48, in _run obs = self._env.reset() File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\base\msg_multiproc.py", line 48, in _run obs = self._env.reset() TypeError: Can't index by type: <class 'list'>; only int, string or slice File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\sc2.py", line 73, in reset obs, reward, done = self.obs_wrapper(self._env.reset()) File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\sc2.py", line 73, in reset obs, reward, done = self.obs_wrapper(self._env.reset()) File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\sc2.py", line 130, in __call__ obs['feature_screen'][self.feature_masks['screen']], File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\reaver\envs\sc2.py", line 130, in __call__ obs['feature_screen'][self.feature_masks['screen']], File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\pysc2\lib\named_array.py", line 145, in __getitem__ index = _get_index(obj, index) File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\pysc2\lib\named_array.py", line 145, in __getitem__ index = _get_index(obj, index) File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\pysc2\lib\named_array.py", line 207, in _get_index "Can't index by type: %s; only int, string or slice" % type(index)) File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\pysc2\lib\named_array.py", line 207, in _get_index "Can't index by type: %s; only int, string or slice" % type(index)) TypeError: Can't index by type: <class 'list'>; only int, string or slice TypeError: Can't index by type: <class 'list'>; only int, string or slice
I love your work anyway and looking forward a fix thanks
Need to investigate if clipping or scaling rewards improves performance.
Does it even make sense if I'm already clipping grads?
How will the agent known that one action is better than other if both get reward = 1?
Hi there! Thanks for your great work! ๐
But I met some unexpected problem, my environment is Window 10.
my code is as follow:
import reaver as rvr
from multiprocessing import Process
if __name__ == '__main__':
p = Process()
p.start()
env = rvr.envs.SC2Env(map_name='MoveToBeacon')
agent = rvr.agents.A2C(env.obs_spec(), env.act_spec()
, rvr.models.build_fully_conv, rvr.models.SC2MultiPolicy, n_envs=1)
agent.run(env)
But I got those Traceback and Marine just will not move to anywhere:
Process Process-2:
Traceback (most recent call last):
File "C:\Users\Saber\Anaconda3\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Users\Saber\Anaconda3\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\Saber\Anaconda3\lib\site-packages\reaver\envs\base\multiproc.py", line 52, in _run
obs = self._env.reset()
File "C:\Users\Saber\Anaconda3\lib\site-packages\reaver\envs\sc2.py", line 69, in reset
obs, reward, done = self.obs_wrapper(self._env.reset())
File "C:\Users\Saber\Anaconda3\lib\site-packages\reaver\envs\sc2.py", line 126, in __call__
obs['feature_screen'][self.feature_masks['screen']],
File "C:\Users\Saber\Anaconda3\lib\site-packages\pysc2\lib\named_array.py", line 145, in __getitem__
index = _get_index(obj, index)
File "C:\Users\Saber\Anaconda3\lib\site-packages\pysc2\lib\named_array.py", line 207, in _get_index
"Can't index by type: %s; only int, string or slice" % type(index))
TypeError: Can't index by type: <class 'list'>; only int, string or slice
and I also got stuck in 'CartPole-v0'. Nothing will be shown after I have wait for quite a few moment and my code is:
import reaver as rvr
from multiprocessing import Process
if __name__ == '__main__':
p = Process()
p.start()
env = rvr.envs.GymEnv('CartPole-v0')
agent = rvr.agents.A2C(env.obs_spec(), env.act_spec())
agent.run(env)
Any idea about this, Thanks!
Hi just wanted to reproduce the results reported by downloading the zip files from releases
I ran into two issues:
Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint.
Thanks again!
Hi,
Thank you very much for the nice open-source project! After installation, I have a tf.summary.FileWriter is not compatible with eager execution
error when I try agent = rvr.agents.A2C(env.obs_spec(), env.act_spec(), rvr.models.build_fully_conv, rvr.models.SC2MultiPolicy, n_envs=4)
. I think this is because of tensorflow version issue. I wonder how did you handle this issue!
Thanks
First, Following the install instructions to use source for both reaver and pysc2.
The error has changed. I realized I did not have TF-probability installed. However I'm still receiving an error.
When running import reaver
i receive the fallowing error
Traceback (most recent call last):
File "", line 1, in
File "/home/hf/.local/lib/python3.6/site-packages/reaver/init.py", line 1, in
import reaver.envs
File "/home/hf/.local/lib/python3.6/site-packages/reaver/envs/init.py", line 6, in
from .gym import GymEnv
File "/home/hf/.local/lib/python3.6/site-packages/reaver/envs/gym.py", line 3, in
from reaver.envs.atari import AtariPreprocessing
File "/home/hf/.local/lib/python3.6/site-packages/reaver/envs/atari.py", line 29, in
import gin.tf
File "/home/hf/.local/lib/python3.6/site-packages/gin/tf/init.py", line 20, in
from gin.tf.utils import GinConfigSaverHook
File "/home/hf/.local/lib/python3.6/site-packages/gin/tf/utils.py", line 34, in
config.register_file_reader(tf.io.gfile.GFile, tf.io.gfile.exists)
AttributeError: module 'tensorflow._api.v1.io' has no attribute 'gfile'
=====\==============//====================
OLD ERROR:
When running import reaver
i receive the fallowing error
File "<stdin>", line 1, in <module>
File "/home/hf/.local/lib/python3.6/site-packages/reaver/__init__.py", line 1, in <module>
import reaver.envs
File "/home/hf/.local/lib/python3.6/site-packages/reaver/envs/__init__.py", line 2, in <module>
from .sc2 import SC2Env
File "/home/hf/.local/lib/python3.6/site-packages/reaver/envs/sc2.py", line 5, in <module>
from pysc2.lib import actions
ModuleNotFoundError: No module named 'pysc2.lib'
However I have no problems with 'import pysc2'
Need to try adding max pool layer into the model.
Intuitively agent might benefit from spatial translation invariance on some maps like DefeatRoaches.
Why doesn't DM use it?
Thank you for the great codes. When I tried new maps, I found some problems in runner.py. When there are more than one env, one env have done before others, then it is going to restart the game. At the end, all envs are done, the calculated rewards contain many episodes, which is a much bigger number. If you understand what I am talking about, please tell me is there any problem?
Thank you for great release. I try to train an agent on CollectMineralShards, but can not repeat the performance as reported. I made several tries, but only get reward=75 at 100k steps. Is there any config parameters I should change? Thanks~
Hi, inoryy.
How did you set your random seed, as shown in your learning curves?
When I run the demo code shown on the readme page, there is a error occured as below.
RuntimeError: v1.summary.FileWriter is not compatible with eager execution. Use tf.summary.create_file_writer
,or a with v1.Graph().as_default():
context
env.yml
for condaHello, thank you for sharing good code.
I am trying to solve a DefeatRoaches minigame by using a Relational Network.
I found a example code of Transformer for MNIST classification and modified a fully_conv.py file for it. Unlike original code, I only use a screen feature without a minimap feature. But, result is still not good.
Would you like give me recommendation how to modify it for reaching performance of DeepMind?
Thank you.
From Dohyeong
Implementation code : https://github.com/kimbring2/pysc2_transformer/blob/master/graph_network.py
Seems using a single, separate variable for (log?) standard deviation is more popular than making it part of the network, e.g. (Schulman et al., 2015). Should probably use this way instead of currently implemented, at least while comparing algorithms against baselines.
Can't use tf.get_variable()
though, goes away in 2.0.
Need to update code for PySC2 v2.0
. Maybe TensorFlow?
Related to #6
After carefully inspecting the code, I don't see a way to specify the replay directory or flag. However in previous versions, it seems that this functionality was included.
I have received some errors when running the code. But I don't know why this happens.
Process Process-1:
Traceback (most recent call last):
File "C:\Anaconda3\lib\multiprocessing\process.py", line 252, in _bootstrap
self.run()
File "C:\Anaconda3\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "E:\liupenghui\pysc2-rl-agent-master\common\env.py", line 22, in worker
env = env_fn_wrapper.x()
File "E:\liupenghui\pysc2-rl-agent-master\common\env.py", line 14, in _thunk
env = sc2_env.SC2Env(**params)
File "C:\Anaconda3\lib\site-packages\pysc2\env\sc2_env.py", line 132, in init
self._setup((agent_race, bot_race, difficulty), **kwargs)
File "C:\Anaconda3\lib\site-packages\pysc2\env\sc2_env.py", line 173, in _setup
self.run_config = run_configs.get()
File "C:\Anaconda3\lib\site-packages\pysc2\run_configs_init.py", line 38, in get
if FLAGS.sc2_run_config is None: # Find the highest priority as default.
File "C:\Anaconda3\lib\site-packages\absl\flags_flagvalues.py", line 488, in getattr
raise _exceptions.UnparsedFlagAccessError(error_message)
absl.flags._exceptions.UnparsedFlagAccessError: Trying to access flag --sc2_run_config before flags were parsed.
Hi @inoryy , in addition to the results on these minigames. I notice there isn't any results on BuildMarines, may I ask if there is an update or if there is a planned follow-up?
By the way, awesome repo!
rvr.models.build_fully_conv
?the last step,it shows:
ERROR: ld.so: object '/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
import reaver as rvr
env = rvr.envs.SC2Env(map_name='MoveToBeacon')
agent = rvr.agents.A2C(env.obs_spec(), env.act_spec(), rvr.models.build_fully_conv,rvr.models.SC2MultiPolicy, n_envs=1)
agent.run(env)
1st error :..pysc2/lib/features.py:737: FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an error in the future.
2ed error:
.../pysc2/lib/named_array.py", line 208, in _get_index
"Can't index by type: %s; only int, string or slice" % type(index))
I received some errors when running the code.
OOM when allocating tensor with shape[512,1850,32,32]
So I would like to ask how much memory is needed to run this code?
Thank you.
Hi, thank you for great reaver,
I test the run.py with --env MoveToBeacon --agent ppo --n_envs 1 in macOS without GPU, but get follow error
Traceback (most recent call last):
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/xx/sc2/venv/lib/python3.8/site-packages/reaver/envs/base/shm_multiproc.py", line 48, in _run
obs, rew, done = self._env.step(data)
File "/Users/xx/sc2/venv/lib/python3.8/site-packages/reaver/envs/sc2.py", line 87, in step
obs, reward, done = self.obs_wrapper(self._env.step(self.act_wrapper(action)))
File "/Users/xx/sc2/venv/lib/python3.8/site-packages/pysc2/lib/stopwatch.py", line 212, in _stopwatch
return func(*args, **kwargs)
File "/Users/xx/sc2/venv/lib/python3.8/site-packages/pysc2/env/sc2_env.py", line 537, in step
actions = [[f.transform_action(o.observation, a, skip_available=skip)
File "/Users/xx/sc2/venv/lib/python3.8/site-packages/pysc2/env/sc2_env.py", line 537, in
actions = [[f.transform_action(o.observation, a, skip_available=skip)
File "/Users/xx/sc2/venv/lib/python3.8/site-packages/pysc2/env/sc2_env.py", line 537, in
actions = [[f.transform_action(o.observation, a, skip_available=skip)
File "/Users/xx/sc2/venv/lib/python3.8/site-packages/pysc2/lib/stopwatch.py", line 212, in _stopwatch
return func(*args, **kwargs)
File "/Users/xx/sc2/venv/lib/python3.8/site-packages/pysc2/lib/features.py", line 1608, in transform_action
raise ValueError("Function %s/%s is currently not available" % (
ValueError: Function 331/Move_screen is currently not available
the error about action id is not same every time.
then I run the command python3 -m pysc2.bin.agent --map MoveToBeacon
, the result:
I1023 10:11:01.933034 4632368576 sc2_env.py:506] Starting episode 1: [terran] on MoveToBeacon
0/no_op ()
1/move_camera (1/minimap [64, 64])
2/select_point (6/select_point_act [4]; 0/screen [84, 84])
3/select_rect (7/select_add [2]; 0/screen [84, 84]; 2/screen2 [84, 84])
4/select_control_group (4/control_group_act [5]; 5/control_group_id [10])
7/select_army (7/select_add [2])
453/Stop_quick (3/queued [2])
451/Smart_screen (3/queued [2]; 0/screen [84, 84])
452/Smart_minimap (3/queued [2]; 1/minimap [64, 64])
331/Move_screen (3/queued [2]; 0/screen [84, 84])
332/Move_minimap (3/queued [2]; 1/minimap [64, 64])
333/Patrol_screen (3/queued [2]; 0/screen [84, 84])
334/Patrol_minimap (3/queued [2]; 1/minimap [64, 64])
12/Attack_screen (3/queued [2]; 0/screen [84, 84])
13/Attack_minimap (3/queued [2]; 1/minimap [64, 64])
274/HoldPosition_quick (3/queued [2])
my pysc2 version is 3.0.0 , reaver version is 2.1.9
I had set the ensure_available_actions=False and it works,but I dont think it`s good idea
Hi @inoryy ,
I am trying to figure out how to figure out how to use this plot.py file in your utils folder, I want more data on the experiment I am running.
Sorry if this is basic, but could you explain how you use this util file?
As in do I have to call this from command line or a python file?
Hello,
I am trying to apply Relational Network in DefeatRoaches environment using the code you uploaded.
The size of the screen feature is 16, but is it too small to affect performance?
I want to know that the performance graph shown on the web page is the performance when using the size of the screen feature.
From Dohyeong Kim
Hello!
I am new to Reinforcement Learning, and really wanted to try and implement a model that would be able to play itself, and I found your awesome project!
Is there a way to make it play itself, speedup?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.