openai / universe-starter-agent Goto Github PK

A starter agent that can solve a number of universe environments.

License: MIT License

Python 100.00%

universe-starter-agent's Introduction

This repository has been deprecated in favor of the Retro (https://github.com/openai/retro) library. See our Retro Contest (https://blog.openai.com/retro-contest) blog post for detalis.

universe-starter-agent

The codebase implements a starter agent that can solve a number of universe environments. It contains a basic implementation of the A3C algorithm, adapted for real-time environments.

Dependencies

Python 2.7 or 3.5
Golang
six (for py2/3 compatibility)
TensorFlow 0.12
tmux (the start script opens up a tmux session with multiple windows)
htop (shown in one of the tmux windows)
gym
gym[atari]
libjpeg-turbo (brew install libjpeg-turbo)
universe
opencv-python
numpy
scipy

Getting Started

conda create --name universe-starter-agent python=3.5
source activate universe-starter-agent

brew install tmux htop cmake golang libjpeg-turbo      # On Linux use sudo apt-get install -y tmux htop cmake golang libjpeg-dev

pip install "gym[atari]"
pip install universe
pip install six
pip install tensorflow
conda install -y -c https://conda.binstar.org/menpo opencv3
conda install -y numpy
conda install -y scipy

Add the following to your .bashrc so that you'll have the correct environment when the train.py script spawns new bash shells source activate universe-starter-agent

Atari Pong

python train.py --num-workers 2 --env-id PongDeterministic-v3 --log-dir /tmp/pong

The command above will train an agent on Atari Pong using ALE simulator. It will see two workers that will be learning in parallel (--num-workers flag) and will output intermediate results into given directory.

The code will launch the following processes:

worker-0 - a process that runs policy gradient
worker-1 - a process identical to process-1, that uses different random noise from the environment
ps - the parameter server, which synchronizes the parameters among the different workers
tb - a tensorboard process for convenient display of the statistics of learning

Once you start the training process, it will create a tmux session with a window for each of these processes. You can connect to them by typing tmux a in the console. Once in the tmux session, you can see all your windows with ctrl-b w. To switch to window number 0, type: ctrl-b 0. Look up tmux documentation for more commands.

To access TensorBoard to see various monitoring metrics of the agent, open http://localhost:12345/ in a browser.

Using 16 workers, the agent should be able to solve PongDeterministic-v3 (not VNC) within 30 minutes (often less) on an m4.10xlarge instance. Using 32 workers, the agent is able to solve the same environment in 10 minutes on an m4.16xlarge instance. If you run this experiment on a high-end MacBook Pro, the above job will take just under 2 hours to solve Pong.

Add '--visualise' toggle if you want to visualise the worker using env.render() as follows:

python train.py --num-workers 2 --env-id PongDeterministic-v3 --log-dir /tmp/pong --visualise

For best performance, it is recommended for the number of workers to not exceed available number of CPU cores.

You can stop the experiment with tmux kill-session command.

Playing games over remote desktop

The main difference with the previous experiment is that now we are going to play the game through VNC protocol. The VNC environments are hosted on the EC2 cloud and have an interface that's different from a conventional Atari Gym environment; luckily, with the help of several wrappers (which are used within envs.py file) the experience should be similar to the agent as if it was played locally. The problem itself is more difficult because the observations and actions are delayed due to the latency induced by the network.

More interestingly, you can also peek at what the agent is doing with a VNCViewer.

Note that the default behavior of train.py is to start the remotes on a local machine. Take a look at https://github.com/openai/universe/blob/master/doc/remotes.rst for documentation on managing your remotes. Pass additional -r flag to point to pre-existing instances.

VNC Pong

python train.py --num-workers 2 --env-id gym-core.PongDeterministic-v3 --log-dir /tmp/vncpong

Peeking into the agent's environment with TurboVNC

You can use your system viewer as open vnc://localhost:5900 (or open vnc://${docker_ip}:5900) or connect TurboVNC to that ip/port. VNC password is "openai".

Important caveats

One of the novel challenges in using Universe environments is that they operate in real time, and in addition, it takes time for the environment to transmit the observation to the agent. This time creates a lag: where the greater the lag, the harder it is to solve environment with today's RL algorithms. Thus, to get the best possible results it is necessary to reduce the lag, which can be achieved by having both the environments and the agent live on the same high-speed computer network. So for example, if you have a fast local network, you could host the environments on one set of machines, and the agent on another machine that can speak to the environments with low latency. Alternatively, you can run the environments and the agent on the same EC2/Azure region. Other configurations tend to have greater lag.

To keep track of your lag, look for the phrase reaction_time in stderr. If you run both the agent and the environment on nearby machines on the cloud, your reaction_time should be as low as 40ms. The reaction_time statistic is printed to stderr because we wrap our environment with the Logger wrapper, as done in here.

Generally speaking, environments that are most affected by lag are games that place a lot of emphasis on reaction time. For example, this agent is able to solve VNC Pong (gym-core.PongDeterministic-v3) in under 2 hours when both the agent and the environment are co-located on the cloud, but this agent had difficulty solving VNC Pong when the environment was on the cloud while the agent was not. This issue affects environments that place great emphasis on reaction time.

A note on tuning

This implementation has been tuned to do well on VNC Pong, and we do not guarantee its performance on other tasks. It is meant as a starting point.

Playing flash games

You may run the following command to launch the agent on the game Neon Race:

python train.py --num-workers 2 --env-id flashgames.NeonRace-v0 --log-dir /tmp/neonrace

What agent sees when playing Neon Race (you can connect to this view via note above)

Getting 80% of the maximal score takes between 1 and 2 hours with 16 workers, and getting to 100% of the score takes about 12 hours. Also, flash games are run at 5fps by default, so it should be possible to productively use 16 workers on a machine with 8 (and possibly even 4) cores.

Next steps

Now that you have seen an example agent, develop agents of your own. We hope that you will find doing so to be an exciting and an enjoyable task.

universe-starter-agent's People

Contributors

Stargazers

Watchers

Forkers

cnheider ml-lab coderx7 sam-si benjamesbabala puppycodes sigmaquan andrewliao11 joycesfdeveloper alongwithyou zzmjohn aistrych techscientist hyphendigital vunb williamd4112 chingyaoc fatemi thefon mmmika feherbalazs html5cat boffee phollands jenstimmerman wilsonmar matthieubizien moaazsidat hharnisc juggernaut93 malhotraa duguyue100 shidahe snowwolph nburn42 soprof mr-cloud wateryoma rileyedmunds kaixianglin jacktang mehramoh crawlik amansoni richts44 futurely yarwelp ashern auser minkvsky peachball jithsjoy telechong michaelblume capybaralet goldsloam eddiepierce jkramar rdcsung tnedev e2crawfo justheuristic benkant jimgoo marviel garrettsocling cpehle gpunti vladfi1 leomao brtkwr satroan vitprado mkolod dvcanton mansimov murdo25 faustomilletari net-mist philpilkington blgene machinelearning-spain matthewsbarnes macgyverwang wdbronac shamoya carltonsemple pengsun wilsonwangthu louiehelm kfriesth richardliaw z13z4ck nat-d voidminded rogaha hbcbh1999 scientist1642 kashif syzer

universe-starter-agent's Issues

neonrace

tmux can't find pane

Using Python 3.5 on Ubuntu 16.04. Tmux is 2.1. It may be an issue with the tmux version, which version did you use for the experiment?

❯ python train.py --num-workers 2 --env-id PongDeterministic-v3 --log-dir /tmp/pong
mkdir -p /tmp/pong
tmux kill-session
tmux new-session -s a3c -n ps -d
tmux new-window -t a3c -n w-0
tmux new-window -t a3c -n w-1
tmux new-window -t a3c -n tb
tmux new-window -t a3c -n htop
sleep 1
tmux send-keys -t ps 'CUDA_VISIBLE_DEVICES= /home/dom/conda/bin/python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --job-name ps' Enter
tmux send-keys -t w-0 'CUDA_VISIBLE_DEVICES= /home/dom/conda/bin/python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --job-name worker --task 0 --remotes 1' Enter
tmux send-keys -t w-1 'CUDA_VISIBLE_DEVICES= /home/dom/conda/bin/python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --job-name worker --task 1 --remotes 1' Enter
tmux send-keys -t tb 'tensorboard --logdir /tmp/pong --port 12345' Enter
tmux send-keys -t htop 'htop' Enter
can't find pane ps
can't find pane w-0
can't find pane w-1
can't find pane tb
can't find pane htop

Need some examples for running universe-starter-agent locally and default passwd=openai

Need to give some examples for running universe-starter-agent locally
For Example: python train.py --num-workers 1 -r 1 --env-id flashgames.NeonRace-v0 --log-dir /tmp/neonrace
Also for end user to be able to see through vnc he needs password="openai".
One more point on my MacOS default viewer is ScreenSharing which sometimes does not work properly whereas https://www.realvnc.com/download/viewer/macosx/ works better.

cv2.resize() incredibly slow

For processing the frames, I see that cv2.resize() is being used. I tried running the starter code in both a c4.4xlarge (8 workers) and m4.10xlarge (16 workers) instance. In both, I see my global_step/sec at around only 120. Instead of cv2.resize(), if I use scipy's resize I get much better performance (around 1000 steps/sec). From running perf top, I see a lot of time spent in communicating in the process libgomp.so. I tried setting OMP_NUM_THREADS=1, but the problem seems to persist. Can someone please explain how to fix this?

some threads die occasionally

Some threads die and it varies from run to run. With the same code, it some times run all good, sometimes I got several threads broken.
the command I use is
python train.py --num-workers 32 --env-id PongDeterministic-v3 --log-dir /tmp/pong

I found two situations a threads will die

First is FailedPreconditionError: Attempting to use uninitialized value local/l2/W
Second traceback to Queue.Empty and this also cause Attempting to use uninitialized value

I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
[2017-01-04 20:42:26,465] Writing logs to file: /tmp/universe-31816.log
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_NO_DEVICE
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: illidan-gpu-3
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: illidan-gpu-3
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: 375.26.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:356] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  375.26  Thu Dec  8 18:36:43 PST 2016
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 375.26.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:293] kernel version seems to match DSO: 375.26.0
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job ps -> {0 -> 127.0.0.1:12222}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job worker -> {0 -> 127.0.0.1:12223, 1 -> 127.0.0.1:12224, 2 -> 127.0.0.1:12225, 3 -> 127.0.0.1:12226, 4 -> 127.0.0.1:12227, 5 -> 127.0.0.1:12228, 6 -> 127.0.0.1:12229, 7 -> 127.0.0.1:12230, 8 -> 127.0.0.1:12231, 9 -> 127.0.0.1:12232, 10 -> 127.0.0.1:12233, 11 -> localhost:12234, 12 -> 127.0.0.1:12235, 13 -> 127.0.0.1:12236, 14 -> 127.0.0.1:12237, 15 -> 127.0.0.1:12238, 16 -> 127.0.0.1:12239, 17 -> 127.0.0.1:12240, 18 -> 127.0.0.1:12241, 19 -> 127.0.0.1:12242, 20 -> 127.0.0.1:12243, 21 -> 127.0.0.1:12244, 22 -> 127.0.0.1:12245, 23 -> 127.0.0.1:12246, 24 -> 127.0.0.1:12247, 25 -> 127.0.0.1:12248, 26 -> 127.0.0.1:12249, 27 -> 127.0.0.1:12250, 28 -> 127.0.0.1:12251, 29 -> 127.0.0.1:12252, 30 -> 127.0.0.1:12253, 31 -> 127.0.0.1:12254}
I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:206] Started server with target: grpc://localhost:12234
[2017-01-04 20:42:26,535] Making new env: PongDeterministic-v3
[2017-01-04 20:42:31,470] Events directory: /tmp/pong/train_11
[2017-01-04 20:42:31,606] Starting session. If this hangs, we're mostly likely waiting to connect to the parameter server. One common cause is that the parameter server DNS name isn't resolving yet, or is misspecified.
I tensorflow/core/distributed_runtime/master_session.cc:928] Start master session a05a01fed67b6549 with config:
device_filters: "/job:ps"
device_filters: "/job:worker/task:11/cpu:0"

I tensorflow/core/distributed_runtime/master_session.cc:928] Start master session eba1a87afe6c8f45 with config:
device_filters: "/job:ps"
device_filters: "/job:worker/task:11/cpu:0"

[2017-01-04 20:43:06,143] Resetting environment
[2017-01-04 20:43:06,150] Starting training at step=35960
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/linkaixi/anaconda2/envs/tensorflow11/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/home/linkaixi/Dropbox/code/a3c.py", line 91, in run
    self._run()
  File "/home/linkaixi/Dropbox/code/a3c.py", line 100, in _run
    self.queue.put(next(rollout_provider), timeout=600.0)
  File "/home/linkaixi/Dropbox/code/a3c.py", line 121, in env_runner
    fetched = policy.act(last_state, *last_features)
  File "/home/linkaixi/Dropbox/code/model.py", line 83, in act
    {self.x: [ob], self.state_in[0]: c, self.state_in[1]: h})
  File "/home/linkaixi/anaconda2/envs/tensorflow11/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 717, in run
    run_metadata_ptr)
  File "/home/linkaixi/anaconda2/envs/tensorflow11/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 915, in _run
    feed_dict_string, options, run_metadata)
  File "/home/linkaixi/anaconda2/envs/tensorflow11/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _do_run
    target_list, options, run_metadata)
  File "/home/linkaixi/anaconda2/envs/tensorflow11/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 985, in _do_call
    raise type(e)(node_def, op, message)
FailedPreconditionError: Attempting to use uninitialized value local/value/b
         [[Node: local/value/b/read = Identity[T=DT_FLOAT, _class=["loc:@local/value/b"], _device="/job:worker/replica:0/task:11/cpu:0"](local/value/b)]]

Traceback (most recent call last):
  File "worker.py", line 122, in <module>
    server.join()
  File "/home/linkaixi/anaconda2/envs/tensorflow11/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "worker.py", line 114, in main
    if args.job_name == "worker":
  File "worker.py", line 61, in run
    trainer.process(sess)
  File "/home/linkaixi/Dropbox/code/a3c.py", line 257, in process
    rollout = self.pull_batch_from_queue()
  File "/home/linkaixi/Dropbox/code/a3c.py", line 241, in pull_batch_from_queue
    rollout = self.runner.queue.get(timeout=600.0)
  File "/home/linkaixi/anaconda2/envs/tensorflow11/lib/python2.7/Queue.py", line 176, in get
    raise Empty
Queue.Empty

The only thing I changed is to replace tmux interface with nohup in the train.py, and it can run well for all threads sometimes. code attached train.py.zip

Is this expected or I miss something? Any suggestions to fix this ? Thanks a lot!

Persisting Docker Images

I was having an issue with viewing my environments through VNC using open vnc://localhost:5900 and have found that it is an issue with old Docker containers persisting. If you have multiple Docker containers stored locally the localhost port iteratively increases, i.e. 5901, 5902, 5903...

I've been using tmux kill-session to terminate my agents, but it looks like that does not terminate the local Docker images as well. Restarting Docker seems to clean out old containers, but the containers are still hosted starting at localhost:5901 and up, not localhost:5900

Agent stops working after a while

I am using 16CPU EC2 and running 8 workers on it. Eventually, my rewards go to zero and when I connect with vnc to check, I see no action is taken by worker. After a while, I get following info. I don't have any idea what may cause it. My error looks like that

universe-RaMjBj-0 | [2017-01-08 18:30:19,623] [INFO:universe.pyprofile] [pyprofile] period=5.02s timers={"rewarder.sleep": {"std": "142.90us", "mean": "16.24ms", "calls": 301}, "rewarder.frame": {"std": "
23.78us", "mean": "16.79ms", "calls": 301}, "reward.parsing.score": {"std": "12.80us", "mean": "281.21us", "calls": 23}, "score.crop_cache.get.MatchImage": {"std": "2.96us", "mean": "47.54us", "calls": 23
}, "rewarder.compute_reward": {"std": "124.65us", "mean": "243.00us", "calls": 301}, "reward.parsing.gameover": {"std": "5.12us", "mean": "103.48us", "calls": 23}, "vnc_env.VNCEnv.vnc_session.step": {"std
": "17.97us", "mean": "72.58us", "calls": 301}, "score.crop_cache.get.OCRScorerV0": {"std": "3.24us", "mean": "84.88us", "calls": 23}} counters={"agent_conn.reward": {"std": 0, "mean": 0.0, "calls": 1}, "
reward.vnc.updates.n": {"std": 0.26609850880794444, "mean": 0.07641196013289042, "calls": 301}, "score.crop_cache.hit.MatchImage": {"std": 0.0, "mean": 1.0, "calls": 23}, "score.crop_cache.hit.OCRScorerV0
": {"std": 0.0, "mean": 1.0, "calls": 23}} gauges={"reward_parser.score.last_score": {"std": 0.0, "mean": 41.0, "calls": 23, "value": 41.0}} (export_time=108.96us)
universe-RaMjBj-0 | [2017-01-08 18:30:19,623] [INFO:universe.rewarder.remote] [Rewarder] Over past 5.02s, sent 1 reward messages to agent: reward=0 reward_min=0 reward_max=0 done=False info={'rewarder.vnc
.updates.bytes': 0, 'rewarder.vnc.updates.n': 0, 'rewarder.vnc.updates.pixels': 0, 'rewarder.profile': '<1573 bytes>'}
universe-RaMjBj-0 | [2017-01-08 18:30:24,639] [INFO:universe.wrappers.logger] Stats for the past 5.02s: vnc_updates_ps=4.6 n=1 reaction_time=None observation_lag=None action_lag=None reward_ps=0.0 reward_
total=0.0 vnc_bytes_ps[total]=5930.2 vnc_pixels_ps[total]=43993.2 reward_lag=None rewarder_message_lag=None fps=60.01
universe-RaMjBj-0 | [2017-01-08 18:30:24,640] [INFO:universe.pyprofile] [pyprofile] period=5.02s timers={"rewarder.sleep": {"std": "155.25us", "mean": "16.23ms", "calls": 301}, "rewarder.frame": {"std": "
24.60us", "mean": "16.79ms", "calls": 301}, "reward.parsing.score": {"std": "12.91us", "mean": "280.01us", "calls": 23}, "score.crop_cache.get.MatchImage": {"std": "1.40us", "mean": "47.06us", "calls": 23
}, "rewarder.compute_reward": {"std": "126.64us", "mean": "243.67us", "calls": 301}, "reward.parsing.gameover": {"std": "3.80us", "mean": "101.80us", "calls": 23}, "vnc_env.VNCEnv.vnc_session.step": {"std
": "47.10us", "mean": "74.96us", "calls": 301}, "score.crop_cache.get.OCRScorerV0": {"std": "4.33us", "mean": "85.29us", "calls": 23}} counters={"agent_conn.reward": {"std": 0, "mean": 0.0, "calls": 1}, "
reward.vnc.updates.n": {"std": 0.2660985088079445, "mean": 0.07641196013289045, "calls": 301}, "score.crop_cache.hit.MatchImage": {"std": 0.0, "mean": 1.0, "calls": 23}, "score.crop_cache.hit.OCRScorerV0"
: {"std": 0.0, "mean": 1.0, "calls": 23}} gauges={"reward_parser.score.last_score": {"std": 0.0, "mean": 41.0, "calls": 23, "value": 41.0}} (export_time=107.53us)
universe-RaMjBj-0 | [2017-01-08 18:30:24,640] [INFO:universe.rewarder.remote] [Rewarder] Over past 5.02s, sent 1 reward messages to agent: reward=0 reward_min=0 reward_max=0 done=False info={'rewarder.vnc
.updates.bytes': 0, 'rewarder.vnc.updates.n': 0, 'rewarder.vnc.updates.pixels': 0, 'rewarder.profile': '<1572 bytes>'}
universe-RaMjBj-0 | [2017-01-08 18:30:29,656] [INFO:universe.wrappers.logger] Stats for the past 5.02s: vnc_updates_ps=4.6 n=1 reaction_time=None observation_lag=None action_lag=None reward_ps=0.0 reward_
total=0.0 vnc_bytes_ps[total]=6052.5 vnc_pixels_ps[total]=44785.4 reward_lag=None rewarder_message_lag=None fps=60.01
universe-RaMjBj-0 | [2017-01-08 18:30:29,656] [INFO:universe.pyprofile] [pyprofile] period=5.02s timers={"rewarder.sleep": {"std": "143.80us", "mean": "16.23ms", "calls": 301}, "rewarder.frame": {"std": "
22.46us", "mean": "16.79ms", "calls": 301}, "reward.parsing.score": {"std": "15.82us", "mean": "283.26us", "calls": 23}, "score.crop_cache.get.MatchImage": {"std": "6.98us", "mean": "49.24us", "calls": 23
}, "rewarder.compute_reward": {"std": "121.48us", "mean": "243.62us", "calls": 301}, "reward.parsing.gameover": {"std": "8.45us", "mean": "106.37us", "calls": 23}, "vnc_env.VNCEnv.vnc_session.step": {"std
": "17.72us", "mean": "73.30us", "calls": 301}, "score.crop_cache.get.OCRScorerV0": {"std": "3.60us", "mean": "85.27us", "calls": 23}} counters={"agent_conn.reward": {"std": 0, "mean": 0.0, "calls": 1}, "
reward.vnc.updates.n": {"std": 0.2660985088079444, "mean": 0.07641196013289046, "calls": 301}, "score.crop_cache.hit.MatchImage": {"std": 0.0, "mean": 1.0, "calls": 23}, "score.crop_cache.hit.OCRScorerV0"
: {"std": 0.0, "mean": 1.0, "calls": 23}} gauges={"reward_parser.score.last_score": {"std": 0.0, "mean": 41.0, "calls": 23, "value": 41.0}} (export_time=102.52us)
universe-RaMjBj-0 | [2017-01-08 18:30:29,657] [INFO:universe.rewarder.remote] [Rewarder] Over past 5.02s, sent 1 reward messages to agent: reward=0 reward_min=0 reward_max=0 done=False info={'rewarder.vnc
.updates.bytes': 0, 'rewarder.vnc.updates.n': 0, 'rewarder.vnc.updates.pixels': 0, 'rewarder.profile': '<1578 bytes>'}

After for a while watching this non-action on all my workers, I got this error:

universe-U6zfHu-0 | [2017-01-08 18:35:52,590] [INFO:universe.rewarder.remote] [Rewarder] Over past 5.02s, sent 1 reward messages to agent: reward=0 reward_min=0 reward_max=0 done=False info={'rewarder.vnc.updates.bytes': 0, 'rewarder.vnc.updates.n': 0, 'rewarder.profile': '<1577 bytes>', 'rewarder.vnc.updates.pixels': 0}
universe-U6zfHu-0 | [nginx] 2017/01/08 18:35:56 [info] 62#62: *47 client closed connection while waiting for request, client: 172.17.0.1, server: 0.0.0.0:15900
universe-U6zfHu-0 | [nginx] 2017/01/08 18:35:56 [info] 62#62: *48 client closed connection while waiting for request, client: 172.17.0.1, server: 0.0.0.0:15900
universe-U6zfHu-0 | [nginx] 2017/01/08 18:35:56 [info] 62#62: *49 client closed connection while waiting for request, client: 172.17.0.1, server: 0.0.0.0:15900
universe-U6zfHu-0 | [nginx] 2017/01/08 18:35:56 [info] 62#62: *50 client closed connection while waiting for request, client: 172.17.0.1, server: 0.0.0.0:15900
universe-U6zfHu-0 | [nginx] 2017/01/08 18:35:56 [info] 62#62: *51 client closed connection while waiting for request, client: 172.17.0.1, server: 0.0.0.0:15900
universe-U6zfHu-0 | [nginx] 2017/01/08 18:35:56 [info] 62#62: *52 client closed connection while waiting for request, client: 172.17.0.1, server: 0.0.0.0:15900
universe-U6zfHu-0 | [nginx] 2017/01/08 18:35:56 [info] 62#62: *53 client closed connection while waiting for request, client: 172.17.0.1, server: 0.0.0.0:15900
universe-U6zfHu-0 | [nginx] 2017/01/08 18:35:56 [info] 62#62: *54 client closed connection while waiting for request, client: 172.17.0.1, server: 0.0.0.0:15900
universe-U6zfHu-0 | [nginx] 2017/01/08 18:35:56 [info] 62#62: *55 client closed connection while waiting for request, client: 172.17.0.1, server: 0.0.0.0:15900
universe-U6zfHu-0 | [nginx] 2017/01/08 18:35:56 [info] 62#62: *56 client closed connection while waiting for request, client: 172.17.0.1, server: 0.0.0.0:15900
universe-U6zfHu-0 | [2017-01-08 18:35:57,606] [INFO:universe.wrappers.logger] Stats for the past 5.02s: vnc_updates_ps=4.6 n=1 reaction_time=None observation_lag=None action_lag=None reward_ps=0.0 reward_total=0.0 vnc_bytes_ps[total]=5908.2 vnc_pixels_ps[total]=43801.3 reward_lag=None rewarder_message_lag=None fps=60.01
universe-U6zfHu-0 | [2017-01-08 18:35:57,606] [INFO:universe.pyprofile] [pyprofile] period=5.02s timers={"score.crop_cache.get.OCRScorerV0": {"mean": "85.79us", "calls": 23, "std": "7.72us"}, "score.crop_cache.get.MatchImage": {"mean": "49.62us", "calls": 23, "std": "2.81us"}, "rewarder.compute_reward": {"mean": "249.79us", "calls": 301, "std": "121.38us"}, "rewarder.sleep": {"mean": "16.23ms", "calls": 301, "std": "139.39us"}, "vnc_env.VNCEnv.vnc_session.step": {"mean": "75.64us", "calls": 301, "std": "18.30us"}, "rewarder.frame": {"mean": "16.79ms", "calls": 301, "std": "30.29us"}, "rewarder_protocol.latency.rtt.skew_unadjusted": {"mean": "1.21ms", "calls": 10, "std": "491.26us"}, "reward.parsing.score": {"mean": "278.84us", "calls": 23, "std": "12.59us"}, "reward.parsing.gameover": {"mean": "107.23us", "calls": 23, "std": "6.68us"}} counters={"agent_conn.reward": {"std": 0, "calls": 1, "mean": 0.0}, "reward.vnc.updates.n": {"std": 0.26609850880794456, "calls": 301, "mean": 0.0764119601328904}, "rewarder_protocol.messages": {"std": 0.0, "calls": 10, "mean": 1.0}, "rewarder_protocol.messages.v0.control.ping": {"std": 0.0, "calls": 10, "mean": 1.0}, "score.crop_cache.hit.MatchImage": {"std": 0.0, "calls": 23, "mean": 1.0}, "score.crop_cache.hit.OCRScorerV0": {"std": 0.0, "calls": 23, "mean": 1.0}} gauges={"reward_parser.score.last_score": {"mean": 41.0, "std": 0.0, "calls": 23, "value": 41.0}} (export_time=116.83us)
universe-U6zfHu-0 | [2017-01-08 18:35:57,606] [INFO:universe.rewarder.remote] [Rewarder] Over past 5.02s, sent 1 reward messages to agent: reward=0 reward_min=0 reward_max=0 done=False info={'rewarder.vnc.updates.bytes': 0, 'rewarder.vnc.updates.n': 0, 'rewarder.profile': '<1904 bytes>', 'rewarder.vnc.updates.pixels': 0}
Traceback (most recent call last):
  File "worker.py", line 123, in <module>
    tf.app.run()
  File "/home/ubuntu/anaconda3/envs/agents/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "worker.py", line 115, in main
    run(args, server)
  File "worker.py", line 62, in run
    trainer.process(sess)
  File "/home/ubuntu/Downloads/universe-starter-agent/a3c.py", line 257, in process
    rollout = self.pull_batch_from_queue()
  File "/home/ubuntu/Downloads/universe-starter-agent/a3c.py", line 241, in pull_batch_from_queue
    rollout = self.runner.queue.get(timeout=600.0)
  File "/home/ubuntu/anaconda3/envs/agents/lib/python3.5/queue.py", line 172, in get
    raise Empty
queue.Empty
[2017-01-08 18:35:59,661] Killing and removing container: id=8d5388841769fd6245a92c3b017ee048654af9a0c79867354276b7706a3c8e73

entropy prematurely plummeting

I have been playing around with the starter code for quite some time. I wanted to tweak the architecture a bit and check if I can get the same results as the original A3C paper. Instead of an LSTM I am using just a FC layer after the convolutional layers. Instead of 42x42, I use 84x84 images. The only changes I make to the existing code are following,

class FCPolicy(object):
    def __init__(self, ob_space, ac_space):
        self.x = x = tf.placeholder(tf.float32, [None] + list(ob_space))
        x = tf.nn.elu(conv2d(x, 16, "l{}".format(1), [8, 8], [4, 4], pad='VALID'))
        x = tf.nn.elu(conv2d(x, 32, "l{}".format(2), [4, 4], [2, 2], pad='VALID'))
        x = flatten(x)
        fc = tf.nn.elu(linear(x, 256, 'fc', normalized_columns_initializer(1.0)))
        self.logits = linear(fc, ac_space, "action", normalized_columns_initializer(0.01))
        self.vf = tf.reshape(linear(fc, 1, "value", normalized_columns_initializer(1.0)), [-1])
        self.sample = categorical_sample(self.logits, ac_space)[0, :]
        self.var_list = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,tf.get_variable_scope().name)

        self.dummy_features = [np.zeros(5), np.zeros(5)] #placeholders (required for LSTM but not for FC)

    def get_initial_features(self):
        return self.dummy_features

I am providing a wrapper which will give me four stacked frames in the following way

class AtariEnvWithHistory(gym.Wrapper):
    def __init__(self,env'):
        super(AtariEnvWithHistory, self).__init__(env)
        self.env = env
        shape = [84,84,4]
        self.observation_space = Box(0, 255, shape)

    def _reset(self):
        x_t = self.env.reset()
        self.s_t = np.stack((x_t,x_t,x_t,x_t), axis=2)
        return self.s_t

    def _step(self, action):
        x_t1, reward, terminal, info = self.env.step(action)
        s_t1 = np.append(self.s_t[:,:,1:], np.expand_dims(x_t1,axis=2),axis=2)
        self.s_t = s_t1
        return s_t1, reward, terminal, info

and process my frames

def _process_frame(frame):
    frame = frame[34:34+160, :160]
    frame = cv2.resize(frame, (84,84))
    frame = frame.mean(2)
    frame = frame.astype(np.float32)
    frame *= (1.0 / 255.0)
    frame = np.reshape(frame, [84,84])
    return frame

But the network converges to a very bad deterministic policy. The entropy quickly drops to zero. This was on PongDeterministic-v3

I think possible solutions can be to increase the entropy penalty or decrease the learning rate but I was just sticking to the parameters provided in the paper (I mean A3C is supposed to be more robust right?). I wanted to know if I am doing something more fundamentally wrong.

The NeonRace example is not learning

To solve the tmux problem as described in #2, the example was run in different shell windows with the following commands.

CUDA_VISIBLE_DEVICES= /usr/bin/python worker.py --log-dir /tmp/neonrace --env-id flashgames.NeonRace-v0 --num-workers 2 --job-name ps

CUDA_VISIBLE_DEVICES= /usr/bin/python worker.py --log-dir /tmp/neonrace --env-id flashgames.NeonRace-v0 --num-workers 2 --job-name worker --task 0 --remotes 1

CUDA_VISIBLE_DEVICES= /usr/bin/python worker.py --log-dir /tmp/neonrace --env-id flashgames.NeonRace-v0 --num-workers 2 --job-name worker --task 1 --remotes 1

tensorboard --logdir /tmp/neonrace --port 12345

The agents seemed to be doing nothing and the TensorBoard was empty.

OS: Ubuntu 16.04

TensorFlow: 0.12.1

universe.flashgames: 0.20.21

getting started problem

$ pip install conda
Collecting conda
Downloading conda-4.2.7.tar.gz (235kB)
100% |████████████████████████████████| 235kB 1.7MB/s
Building wheels for collected packages: conda
Running setup.py bdist_wheel for conda ... done
Stored in directory: /Users/sergejkrivonos/Library/Caches/pip/wheels/aa/94/d8/a845fe13112e0e3bdc3f907702a06c0b3ba66d90903060e467
Successfully built conda
Installing collected packages: conda
Successfully installed conda-4.2.7
MacBook-Sergej:~ sergejkrivonos$ conda create --name universe-starter-agent python=3.5
Traceback (most recent call last):
File "/usr/local/bin/conda", line 7, in
from conda.cli.main import main
File "/usr/local/lib/python2.7/site-packages/conda/cli/init.py", line 8, in
from .main import main # NOQA
File "/usr/local/lib/python2.7/site-packages/conda/cli/main.py", line 46, in
from ..base.context import context
File "/usr/local/lib/python2.7/site-packages/conda/base/context.py", line 13, in
from .constants import DEFAULT_CHANNELS, DEFAULT_CHANNEL_ALIAS, ROOT_ENV_NAME, SEARCH_PATH, conda
File "/usr/local/lib/python2.7/site-packages/conda/base/constants.py", line 13, in
from enum import Enum
ImportError: No module named enum
MacBook-Sergej:~ sergejkrivonos$ pip install enum numpy
Collecting enum
Downloading enum-0.4.6.tar.gz
Collecting numpy
Downloading numpy-1.12.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.4MB)
100% |████████████████████████████████| 4.4MB 215kB/s
Requirement already satisfied: setuptools in /usr/local/lib/python2.7/site-packages (from enum)
Building wheels for collected packages: enum
Running setup.py bdist_wheel for enum ... done
Stored in directory: /Users/sergejkrivonos/Library/Caches/pip/wheels/d9/b9/23/e7aa8b7d643b49a3df6532fddf18d7ba3863d706c836d59515
Successfully built enum
Installing collected packages: enum, numpy
Successfully installed enum-0.4.6 numpy-1.12.0
MacBook-Sergej:~ sergejkrivonos$ conda create --name universe-starter-agent python=3.5
Traceback (most recent call last):
File "/usr/local/bin/conda", line 7, in
from conda.cli.main import main
File "/usr/local/lib/python2.7/site-packages/conda/cli/init.py", line 8, in
from .main import main # NOQA
File "/usr/local/lib/python2.7/site-packages/conda/cli/main.py", line 46, in
from ..base.context import context
File "/usr/local/lib/python2.7/site-packages/conda/base/context.py", line 18, in
from ..common.configuration import (Configuration, MapParameter, PrimitiveParameter,
File "/usr/local/lib/python2.7/site-packages/conda/common/configuration.py", line 40, in
from ruamel.yaml.comments import CommentedSeq, CommentedMap # pragma: no cover
ImportError: No module named ruamel.yaml.comments
MacBook-Sergej:~ sergejkrivonos$ pip install ruamel.yaml.comments
Collecting ruamel.yaml.comments
Could not find a version that satisfies the requirement ruamel.yaml.comments (from versions: )
No matching distribution found for ruamel.yaml.comments

missing cmake dependency

Clean install following the readme has this error

Running setup.py install for atari-py ... error
Complete output from command /Users/millarm/miniconda3/envs/universe-starter-agent/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/kq/gklfsgfd4fq8yc00kj3sg5zm0000gn/T/pip-build-3f_kk38_/atari-py/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /var/folders/kq/gklfsgfd4fq8yc00kj3sg5zm0000gn/T/pip-j9a7ikbv-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
mkdir -p build && cd build && cmake .. && make -j4
/bin/sh: cmake: command not found
make: *** [build] Error 127

On OSX

Simply fixed by adding

brew install cmake

To the instructions

Error on python train.py

running command python train.py --num-workers 2 --env-id PongDeterministic-v3 --log-dir /tmp/pong in the universe-starter-agent conda env.
I get the following error:
python: can't open file 'train.py': [Errno 2] No such file or directory

First task doesn't seem to run properly

Ran the first task:
python train.py --num-workers 2 --env-id PongDeterministic-v3 --log-dir /tmp/pong

Got the following result:

mkdir -p /tmp/pong
tmux kill-session
tmux new-session -s a3c -n ps -d
tmux new-window -t a3c -n w-0
tmux new-window -t a3c -n w-1
tmux new-window -t a3c -n tb
tmux new-window -t a3c -n htop
sleep 1
tmux send-keys -t ps 'python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --job-name ps' Enter
tmux send-keys -t w-0 'python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --job-name worker --task 0' Enter
tmux send-keys -t w-1 'python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --job-name worker --task 1' Enter
tmux send-keys -t tb 'tensorboard --logdir /tmp/pong --port 22012' Enter
tmux send-keys -t htop 'htop' Enter

On http://localhost:22012/#events in Chrome:

discount computation

Can someone explain the discount computation in a3c.py? Thanks a lot.

def discount(x, gamma):
    return scipy.signal.lfilter([1], [1, -gamma], x[::-1], axis=0)[::-1]

env.render() / vnc / go vnc driver throws exception "NSInternalInconsistencyException"

Hi, when I use env.render (or the --visualise) option for the A3C agent on OS X, my workers die due to some vnc/os x issue.

(universe-starter-agent) bash-4.4$ python train.py --num-workers 2 --env-id gym-core.PongDeterministic-v3 --log-dir /tmp/vncpong --visualise

output from one worker: (left off the first part of the output where stuff was running normally)

universe-bJY8M8-0 | [2017-02-02 06:51:03,129] Running environment: env_id=PongDeterministic-v3
[2017-02-01 22:51:03,131] [0:localhost:5901] Initial reset complete: episode_id=1
[2017-02-01 22:51:03,157] Resetting environment
2017-02-01 22:51:03.260 python[53025:1081291] WARNING: nextEventMatchingMask should only be called from the Main Thread! This will throw an exception in the future.
2017-02-01 22:51:03.261 python[53025:1081291] *** Assertion failure in +[NSUndoManager _endTopLevelGroupings], /Library/Caches/com.apple.xbs/Sources/Foundation/Foundation-1349.25/Misc.subproj/NSUndoManager.m:363
2017-02-01 22:51:03.262 python[53025:1081291] *** Assertion failure in +[NSUndoManager _endTopLevelGroupings], /Library/Caches/com.apple.xbs/Sources/Foundation/Foundation-1349.25/Misc.subproj/NSUndoManager.m:363
2017-02-01 22:51:03.263 python[53025:1081291] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: '+[NSUndoManager(NSInternal) _endTopLevelGroupings]
is only safe to invoke on the main thread.'
*** First throw call stack:
(
        0   CoreFoundation                      0x00007fffbb776e7b __exceptionPreprocess + 171
        1   libobjc.A.dylib                     0x00007fffd0360cad objc_exception_throw + 48
        2   CoreFoundation                      0x00007fffbb77bb82 +[NSException raise:format:arguments:] + 98
        3   Foundation                          0x00007fffbd1c5d50 -[NSAssertionHandler handleFailureInMethod:object:file:lineNumber:description:] + 195
        4   Foundation                          0x00007fffbd1503f3 +[NSUndoManager(NSPrivate)
_endTopLevelGroupings] + 170
        5   AppKit                              0x00007fffb9213047 -[NSApplication run] + 1200
        6   go_vncdriver.so                     0x000000010fae3d5a initializeAppKit + 314
        7   go_vncdriver.so                     0x000000010fae3b1d _glfwPlatformCreateWindow + 29
        8   go_vncdriver.so                     0x000000010fadc7d0 glfwCreateWindow + 864
        9   go_vncdriver.so                     0x000000010fae9c62 _cgo_21dbbd5d2900_Cfunc_glfwCreateWindow + 82
        10  go_vncdriver.so                     0x000000010f92a3e0 runtime.asmcgocall + 112
)
libc++abi.dylib: terminating with uncaught exception of type NSException
Abort trap: 6

The last few lines of my /tmp/universe-*.log look like this:

[2017-02-01 22:51:03,131] [0:localhost:5901] RewardBuffer: dropping stale episode data: dropped={'0', None} episode_id=1
[2017-02-01 22:51:03,131] [0:localhost:5901] Initial reset complete: episode_id=1
[2017-02-01 22:51:03,133] [0:localhost:5901] RewardState: popping reward 0.0 from episode_id 1
[2017-02-01 22:51:03,133] [0:localhost:5901] Episode began: episode_id=1 env_state=running
[2017-02-01 22:51:03,133] [0] Sending out new action probe: [('KeyEvent', 782065, True), ('KeyEvent', 782065, False)]
[2017-02-01 22:51:03,156] [0] Could not find metadata anchor pixel
[2017-02-01 22:51:03,157] Resetting environment
[2017-02-01 22:51:03,183] [0:localhost:5901] RewardState: popping reward 0.0 from episode_id 1
[2017-02-01 22:51:03,183] [0:localhost:5901] RewardState: popping reward 0.0 from episode_id 1
[2017-02-01 22:51:03,203] [0:localhost:5901] RewardState: popping reward 0.0 from episode_id 1
(END)

This happens consistently, not sure why. Turning off env.render() works fine. Has anyone else run into this?

Attempting to use uninitialized value

When I am trying to use more than 1 worker, the first worker starts fine but all the others have an error like this one:

Caused by op 'local/hidden_1/w/read', defined at:
  File "worker.py", line 122, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "worker.py", line 114, in main
    run(args, server)
  File "worker.py", line 23, in run
    trainer = A3C(env, args.task)
  File "/Users/matehegedus/Downloads/universe-starter-agent/a3c.py", line 181, in __init__
    self.local_network = pi = DensePolicy(env.observation_space.shape, env.action_space.n)
  File "/Users/matehegedus/Downloads/universe-starter-agent/model.py", line 53, in __init__
    x = linear(x, 32, "hidden_1", normalized_columns_initializer(0.01))
  File "/Users/matehegedus/Downloads/universe-starter-agent/model.py", line 37, in linear
    w = tf.get_variable(name + "/w", [x.get_shape()[1], size], initializer=initializer)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1024, in get_variable
    custom_getter=custom_getter)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 850, in get_variable
    custom_getter=custom_getter)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 346, in get_variable
    validate_shape=validate_shape)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 331, in _true_getter
    caching_device=caching_device, validate_shape=validate_shape)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 677, in _get_single_variable
    expected_shape=shape)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 224, in __init__
    expected_shape=expected_shape)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 370, in _init_from_args
    self._snapshot = array_ops.identity(self._variable, name="read")
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1424, in identity
    result = _op_def_lib.apply_op("Identity", input=input, name=name)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value local/hidden_1/w
         [[Node: local/hidden_1/w/read = Identity[T=DT_FLOAT, _class=["loc:@local/hidden_1/w"], _device="/job:worker/replica:0/task:1/cpu:0"](local/hidden_1/w)]]

The performance is as not good as the expected when running Pong with Remote mode.

According to the README file, I have built an environment with the remote mode in which I trained Pong game in parallel 2 workers for more than ten hours. But the results show that my agent only has tied in the game. The global statistics on tensorboard is depicted below. And the reaction_time is about 40ms.

I have also trained the AI as the demo said in README file for "noen race game" and the result is still not good, see figure below. And the reaction_time is about 80ms.

My training host is an Ubuntu 16.04 LTS VM with two cores. Why cannot I reach a goal as what the demo in README file did? And how can I debug or tune my agent? Any suggestion would be appreciated.

Tmux not keeping virtual environments active

When train.py is called and generates the tmux sessions. The virtual environment is not carried over into the tmux sessions.

I installed everything in an anaconda python 3 environment and it failed because it didn't keep the environment active. Afte installing everything in the main conda environment and it worked. I was quickly looking around for a way to set tmux to automatically keep it active, but haven't found away. If anyone else has solved it, it would be good to see a solution. I'll post one if I figure it out.

Multi-Discrete Action Space

I am attempting to adapt this agent to work with a custom gym/universe environment I'm building. The action space of my environment uses a MultiDiscrete space. This starter agent explicitly references env.action_space.n from the Discrete space which doesn't exist on a MultiDiscrete object. I am able to change those references to env.action_space.shape instead, but that breaks for Discrete spaces, and, more significantly in my case, it results in the agent choosing a single action from the MultiDiscrete space to 'max out' with these calls:

value = tf.squeeze(tf.multinomial(logits - tf.reduce_max(logits, [1], keep_dims=True), 1), [1])
tf.one_hot(value, d)
[...]
env.step(action.argmax())

My action space looks like this:

self.action_space = spaces.MultiDiscrete([[-80, 80],
                                          [-80, 80],
                                          [0, 1],
                                          [0, 1],
                                          [0, 1]])

The agent chooses an action like [0. 0. 0. 0. 1.] or [0. 1. 0. 0. 0.]. I would like it to choose an action from within the action space such as [74. -52. 0. 1. 0.].

Does anyone have any suggestions as to how I can accommodate this? Ultimately, I think it would be best to build support for all the types of Space since it seems the idea is to abstract away environment details from the agent anyway. I'm new to Gym, Universe and Tensorflow and really ML in general, so any help you can offer would be much appreciated. Thanks.

How long does it take to solve PongDeterministic-v3 ?

Hi,

I was having a issue that cannot replicate the training time as reported.
In the readme, it is said the training time is under 2 hours even with mac book pro. I have tried both 16 workers with 10 CPU cores (Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz) and 32 workers with 6 CPU cores (Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz). The solved time is more than 4.5 hours.

16 workers

I also tried with 6 workers with a 6 CPU cores as recommended. The reward is around -19 after training 2 hours. Is there any tricks that can accelerate the training?

Also, I have a output like this:
no server running on /tmp/tmux-1002/default
Will this affect training a lot?

Thanks a lot!

ImportError: cannot import name 'DiscreteToVNCAction'

Using revision c0058c51 of the universe package:

$ ~/anaconda3/bin/python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --job-name ps
Traceback (most recent call last):
  File "worker.py", line 9, in <module>
    from envs import create_env
  File "/Users/scribu/git/universe-starter-agent/envs.py", line 9, in <module>
    from universe.wrappers import BlockingReset, DiscreteToVNCAction, EpisodeID, Unvectorize, Vectorize, Vision, Logger
ImportError: cannot import name 'DiscreteToVNCAction'

Universe-starter-agent kills any pre-existing tmux sessions

Reproduction:
start a new tmux session according to
tmux new -s testing_daemon
run
python train.py --num-workers 8 --env-id Pong-v0 --log-dir /tmp/notimportant
in separate shell from universe-starter-agent dir to see that the original testing_daemon session now is dead. We're currently using tmux for other projects and I noticed that they were dead post universe-starter-agent initialization

Additional dependencies

I'm not sure if this is covered by installing Anaconda (I didn't install it myself). Anyways, I was able to get the universe-starter-agent running under Python 2.7 with very minor changes.

I installed the following dependencies through pip:

opencv-python
scipy

Continuous action space

Would it be possible to also make a continuous action space version of A3C available. I find it not trivial to make the adjustments described in the original paper and would love to see how its done.

How to modify network architecture

I know that the neural network is defined in model.py as the LSTMPolicy class. I am trying to change it to be just a simple densely connected network, in order to understand this starter kit better.

However, I am having difficulties. Would it be possible to post an extremely simple 1 layer dense neural network policy in order to make it easier to iterate and define custom architectures?

Support for Python 2.7?

The universe project supposedly supports Python 2.7 ("We run Python 3.5 internally, so the Python 3.5 variants will be much more thoroughly performance tested; please let us know if you see any weirdness on Python 2.7."), which brings the question: should the universe-starter-agent support Python 2.7 as well?

If so, the current starter code has a few issues:

Use of the unpacking operator on line 94 of a3c.py
Use of the queue module (was Queue)

AttributeError: 'VectorizeFilter' object has no attribute 'filter_n'

Following the setup instructions on OSX I get most of the way toward a working setup, but training fails with this message:

[2017-01-11 14:46:17,526] Starting training at step=0
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/Users/jhurliman/anaconda/envs/universe-starter-agent/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/Users/jhurliman/Code/openai/universe-starter-agent/a3c.py", line 92, in run
    self._run()
  File "/Users/jhurliman/Code/openai/universe-starter-agent/a3c.py", line 101, in _run
    self.queue.put(next(rollout_provider), timeout=600.0)
  File "/Users/jhurliman/Code/openai/universe-starter-agent/a3c.py", line 112, in env_runner
    last_state = env.reset()
  File "/Users/jhurliman/anaconda/envs/universe-starter-agent/lib/python3.5/site-packages/gym/core.py", line 123, in reset
    observation = self._reset()
  File "/Users/jhurliman/anaconda/envs/universe-starter-agent/lib/python3.5/site-packages/universe/wrappers/vectorize.py", line 46, in _reset
    observation_n = self.env.reset()
  File "/Users/jhurliman/anaconda/envs/universe-starter-agent/lib/python3.5/site-packages/gym/core.py", line 123, in reset
    observation = self._reset()
  File "/Users/jhurliman/anaconda/envs/universe-starter-agent/lib/python3.5/site-packages/universe/vectorized/vectorize_filter.py", line 27, in _reset
    observation_n = [filter._after_reset(observation) for filter, observation in zip(self.filter_n, observation_n)]
AttributeError: 'VectorizeFilter' object has no attribute 'filter_n'

Weights from `normalized_columns_initializer` are held constant

normalized_columns_initializer returns a tf.constant instead of a tf.Variable. This means that the weights of the linear layers will be held constant through training. I'm not sure if this was intentional.

unknown env spec tag

I've got the error.

File ".../universe-starter-agent/a3c.py", line 142, in env_runner
    if terminal or length >= timestep_limit:
TypeError: unorderable types: int() >= NoneType()

It points to the code

timestep_limit = env.spec.tags.get('wrapper_config.TimeLimit.max_episode_steps')
if terminal or length >= timestep_limit:

But I was not able to find where that tag is loaded to specs. I was working with PongDetermenistic-v3 env.
flashgames.NeonRace-v0 - is working fine.

Performance worse than README claims

README says that for NeonRace:

Getting 80% of the maximal score takes between 1 and 2 hours with 16 workers, and getting to 100% of the score takes about 12 hours. Also, flash games are run at 5fps by default, so it should be possible to productively use 16 workers on a machine with 8 (and possibly even 4) cores.

I get about 35000 in NeonRace after 6 hours with 8 workers. Someone else gets 90000 points with his branch of vnc-agents after 5M steps.

8 workers brings a 8-core machine (like tlb-0.devbox.sci) down to 0% idle, and the frame rate varies between 3.5 and 4.8 when we requested 5.

automatic start of docker instances in MacBook Pro

The automatic start of docker instances doesn't work in my MacBook Pro.

On terminal (1) I can see this error after a minute:

docker.errors.DockerException: Error while fetching server API version: HTTPSConnectionPool(host='192.168.59.103', port=2376): Max retries exceeded with url: /version (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x1204313c8>, 'Connection to 192.168.59.103 timed out. (connect timeout=60)'))

It seems like it tries to connect to the default docker IP (192.168.59.103) instead of the actual configured one in the system. When I start the docker image beforehand myself everything runs fine.

Trained Agent not performing as wel as TensorBoard claims...

I've been playing around with the example code for a few days now but I keep getting the same issue:
eg for Pong, after about 3 hours of training, the TensorBoard global/episode_reward goes to about +20, so I'm assuming that means the agent wins with on average 20 points.

Now, I have changed the saver in worker.py to also dump the meta_graph and added some ops from model.py to a tf_collection. Next, I've created the following IPython notebook to visualise the trained Agent:

import numpy as np
import gym
import cv2
import tensorflow as tf
import time
import os
import scipy.misc

def _process_frame42(frame):
    frame = frame[34:34+160, :160]
    frame = cv2.resize(frame, (80, 80))
    frame = cv2.resize(frame, (42, 42))
    frame = frame.mean(2)
    frame = frame.astype(np.float32)
    frame *= (1.0 / 255.0)
    frame = np.reshape(frame, [42, 42, 1])
    return frame

def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / np.sum(e_x)

def categorical_sample(x,logistic = True):
    softmax_logits = softmax(x)
    if logistic:
        return np.argmax(np.random.multinomial(1, softmax_logits))
    else:
        return np.argmax(x)

folder = "/home/tr1pzz/Desktop/Universe/universe_dumps/logs2/visualise"
games_to_play = 200
games_to_play_before_rendering = 5

games_played = 0
total_episode_reward = 0

all_episode_rewards = []
env = gym.make("Pong-v0")
#env = gym.make("Breakout-v0")
input_frame = [_process_frame42(env.reset())]

for filename in os.listdir(folder):
    if "ckpt" in filename and "meta" not in filename:
        save_path = folder+"/"+filename
print("Model will be loaded from: %s" %save_path)

initial_state = np.zeros((1,256)).astype(float)
prev_state = [initial_state, initial_state]

tf.reset_default_graph()

with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)) as sess:
    new_saver = tf.train.import_meta_graph(save_path+".meta",clear_devices=True)
    new_saver.restore(sess, save_path)
    
    inference_op = tf.get_collection("inference_logits")[0]
    state_out_0 = tf.get_collection("state_out_0")[0]
    state_out_1 = tf.get_collection("state_out_1")[0]
    get_sample_op = tf.get_collection("action_sample")[0]
    
    get_output_state = [state_out_0, state_out_1]
    print("Model loaded, ready for simulation!")
    
    while games_played<games_to_play:
        if(games_played>games_to_play_before_rendering):
            env.render()
        #time.sleep(0.003) 
        feed_dict = {"global/input_pixels:0": input_frame, 
                     "global/LSTM_cin:0": prev_state[0], 
                     "global/LSTM_hin:0": prev_state[1]}
        
        logits, state, sampled_action = sess.run([inference_op, get_output_state, get_sample_op], feed_dict = feed_dict)
        #action = categorical_sample(logits[0])
        action_tf = sampled_action.argmax()
        #print(action,action_tf)
        
        observation, reward, done, info = env.step(action_tf)
        total_episode_reward += reward
        input_frame = [_process_frame42(observation)]
        prev_state = state
        
        if done:
            print("Game done. Score was %d" %total_episode_reward)
            all_episode_rewards.append(total_episode_reward)
            observation = env.reset()
            input_frame = [_process_frame42(observation)]
            total_episode_reward = 0
            games_played += 1

The problem is that when I run the trained agent it loses most of it's games, averaging a score of something like -14... Any idea what I am doing wrong? Initially I thought it might have something to do with the LSTM state needing to 'warm up' but that doesn't seem to be the case..

Additionally, the first time I ran the code with 7 workers it was running at about 45fps per thread, now I only get half that speed running on the same (local) laptop.. Any ideas as to what might cause this to happen? Finally, is there an easier way to run a trained agent, or is what I'm doing the general way to approach this?

Thanks in advance!

The agent hit the global step limit, how do I restore from checkpoints and restore trainning

I am working on a ideal that requires long time training, about 10 days. I forgot to modify the global step limit. So the agent stopped at 100M step. I want to restore the model and go on my training. I have been look through the code and wandered what should I do.

Sincerely thank you to open this project, it has been a very very respectable work and help a lot with my research. Truly, we find it really enjoyable to develop agents. Thank you a lot.

Is this repo compatible with tensorflow 1.0?

I have been using it with TF 1.0, but I just saw that the documentation says we should use tensorflow 0.12. Should we? Or the docs have not been updated?

Thanks for this great resource!

tf.VERSION: object has no attribute 'VERSION'

Just a simple issue:

File "a3c.py", line 10
use_tf12_api = distutils.version.LooseVersion(tf.VERSION) >= distutils.version.LooseVersion('0.12.0')
AttributeError: 'module' object has no attribute 'VERSION'

tf.__version__ works though. I am still using the 0.10.0 version

Helping to learn

Hello
I'm in ubuntu 16.04 running correctly NeonRace-v0, the example that you show in universe-starter-agent
When I connect via vnc to NeonRace, I can drive the car with the keyboard keys
When i do this, I'm really helping in the learning process?
Or my actions with the keyboard are useless?

Thanks

Playing game locally

Excuse me,
I've completed training a model of pong game(using command : python train.py --num-workers 2 --env-id PongDeterministic-v3 --log-dir /tmp/pong ).
How can I use the model checkpoint to see how it performed on the game visually?

flashgames.json is missing from pip installation

For some reason every time I use pip to install it, that file is missing from the runtimes folder. The other files seem normal, since I can put flashgames.json manually and it runs without problem.

EDIT: wrong repository, sorry

train.py - no server running

$ python --version
Python 3.5.2 :: Anaconda 4.2.0 (x86_64)

$ python train.py --num-workers 2 --env-id PongDeterministic-v3 --log-dir /tmp/pong
mkdir -p /tmp/pong
tmux kill-session
tmux new-session -s a3c -n ps -d
tmux new-window -t a3c -n w-0
tmux new-window -t a3c -n w-1
tmux new-window -t a3c -n tb
tmux new-window -t a3c -n htop
sleep 1
tmux send-keys -t ps 'python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --job-name ps' Enter
tmux send-keys -t w-0 'python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --job-name worker --task 0' Enter
tmux send-keys -t w-1 'python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --job-name worker --task 1' Enter
tmux send-keys -t tb 'tensorboard --logdir /tmp/pong --port 22012' Enter
tmux send-keys -t htop 'htop' Enter
no server running on /private/tmp/tmux-502/default
lost server
no server running on /private/tmp/tmux-502/default
no server running on /private/tmp/tmux-502/default
no server running on /private/tmp/tmux-502/default
no server running on /private/tmp/tmux-502/default
no server running on /private/tmp/tmux-502/default
no server running on /private/tmp/tmux-502/default
no server running on /private/tmp/tmux-502/default
no server running on /private/tmp/tmux-502/default

/tmp/pong just contains an empty train_0 directory.

If I run the first task manually, it seems to work:

$ python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --job-name ps
[2016-11-27 02:39:59,295] Writing logs to file: /tmp/universe-4626.log
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:206] Initialize HostPortsGrpcChannelCache for job ps -> {localhost:12222}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:206] Initialize HostPortsGrpcChannelCache for job worker -> {127.0.0.1:12223, 127.0.0.1:12224}
I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:202] Started server with target: grpc://localhost:12222

Output from running the second task in a separate terminal:

$ python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 2 --job-name worker --task 0
[2016-11-27 02:41:17,209] Writing logs to file: /tmp/universe-4868.log
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:206] Initialize HostPortsGrpcChannelCache for job ps -> {127.0.0.1:12222}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:206] Initialize HostPortsGrpcChannelCache for job worker -> {localhost:12223, 127.0.0.1:12224}
I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:202] Started server with target: grpc://localhost:12223
[2016-11-27 02:41:17,230] Making new env: PongDeterministic-v3
/Users/scribu/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py:1811: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
  result_shape.insert(dim, 1)
[2016-11-27 02:41:18,046] Events directory: /tmp/pong/train_0
[2016-11-27 02:41:18,070] Starting session. If this hangs, we're mostly likely waiting to connect to the parameter server. One common cause is that the parameter server DNS name isn't resolving yet, or is misspecified.
[2016-11-27 02:41:18,730] Initializing all parameters.
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207280.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207280.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207280.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207280.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
[2016-11-27 02:41:20,260] Starting training at step=0
[2016-11-27 02:41:20,269] Resetting environment
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207280.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207280.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207280.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207280.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207280.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207280.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207281.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207281.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207282.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207282.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207282.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207282.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207283.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207283.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
[2016-11-27 02:41:24,171] Episode terminating: episode_reward=-21.0 episode_length=884
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207284.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207284.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
[2016-11-27 02:41:24,183] Resetting environment
Episode finished. Sum of rewards: -21. Length: 884
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207284.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207284.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207284.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207284.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207285.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207285.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207286.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207286.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207286.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207286.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207287.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207287.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
[2016-11-27 02:41:27,559] Episode terminating: episode_reward=-21.0 episode_length=764
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207287.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207287.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
[2016-11-27 02:41:27,571] Resetting environment
Episode finished. Sum of rewards: -21. Length: 764
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207288.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207288.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207289.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207289.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207289.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207289.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
E tensorflow/core/util/events_writer.cc:62] Could not open events file: /tmp/pong/train_0/events.out.tfevents.1480207290.imac: Failed precondition: /tmp/pong/train_0/events.out.tfevents.1480207290.imac
E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
^C
...

ImportError: No module named 'tensorflow.contrib.rnn.python.ops.core_rnn_cell'

Using tensorflow 0.12.1:

Traceback (most recent call last):
File "worker.py", line 8, in
from a3c import A3C
File "/Users/tlb/openai/universe-starter-agent/a3c.py", line 5, in
from model import LSTMPolicy
File "/Users/tlb/openai/universe-starter-agent/model.py", line 4, in
from tensorflow.contrib.rnn.python.ops.core_rnn_cell import BasicLSTMCell
ImportError: No module named 'tensorflow.contrib.rnn.python.ops.core_rnn_cell'

@futurely What version of tensorflow did you find that this worked on?

Neonrace indefinitely using only ArrowUp

Hi,

Testing Neonrace with four workers, I note that at the beginning of the game all the keys are used, ArrowUp, left and right arrow, but after a few hours only the ArrowUp key is used, and no longer use right and left arrow, therefore it has never just passed the first level, always gets off the road, even after 48 hours stays indefinitely using only ArrowUp, however in the documentation it says that:

Getting 80% of the maximal score takes between 1 and 2 hours with 16 workers, and getting to 100% of the score takes about 12 hours

In my case never goes beyond the first level and of course never reaches 100%

Is this behavior normal?

Thanks

Issue with Viewing VNC

I managed to view the gym-core.PongDeterministic-v3 environment through VNC last night using open vnc://localhost:5900, but now when I try to run that command I get an error saying "You cannot control your own screen", using Screen Sharing as the default VNC app on OSX 10.12

I've tried to connect through both TurboVNC and VNC Viewer, and when I connect through localhost:5900 it just connects to a copy of my desktop rather than to the Gym environment.

I'm running Docker locally and using Python 2.7. When I look at the tmux and Tensorboard output everything seems to be running fine, I just can't view through VNC.

ubuntu getting started

root@name-System-Product-Name:/home/name# pip install gym[atari]
Collecting gym[atari]
Downloading gym-0.7.3.tar.gz (150kB)
100% |████████████████████████████████| 153kB 1.6MB/s
Requirement already satisfied: numpy>=1.10.4 in /usr/lib/python2.7/dist-packages (from gym[atari])
Requirement already satisfied: requests>=2.0 in /usr/local/lib/python2.7/dist-packages (from gym[atari])
Requirement already satisfied: six in /usr/lib/python2.7/dist-packages (from gym[atari])
Collecting pyglet>=1.2.0 (from gym[atari])
Downloading pyglet-1.2.4-py2-none-any.whl (964kB)
100% |████████████████████████████████| 972kB 981kB/s
Collecting atari_py>=0.0.17 (from gym[atari])
Downloading atari-py-0.0.18.tar.gz (750kB)
100% |████████████████████████████████| 757kB 1.2MB/s
Requirement already satisfied: Pillow in /usr/local/lib/python2.7/dist-packages (from gym[atari])
Collecting PyOpenGL (from gym[atari])
Downloading PyOpenGL-3.1.0.tar.gz (1.2MB)
100% |████████████████████████████████| 1.2MB 827kB/s
Requirement already satisfied: olefile in /usr/local/lib/python2.7/dist-packages (from Pillow->gym[atari])
Installing collected packages: pyglet, atari-py, PyOpenGL, gym
Running setup.py install for atari-py ... done
Running setup.py install for PyOpenGL ... done
Running setup.py install for gym ... done
Successfully installed PyOpenGL-3.1.0 atari-py-0.0.18 gym-0.7.3 pyglet-1.2.4
root@name-System-Product-Name:/home/name# pip install universe
Collecting universe
Downloading universe-0.21.2.tar.gz (133kB)
100% |████████████████████████████████| 143kB 1.1MB/s
Requirement already satisfied: autobahn>=0.16.0 in /usr/local/lib/python2.7/dist-packages (from universe)
Requirement already satisfied: docker-py==1.10.3 in /usr/local/lib/python2.7/dist-packages (from universe)
Requirement already satisfied: docker-pycreds==0.2.1 in /usr/local/lib/python2.7/dist-packages (from universe)
Collecting fastzbarlight>=0.0.13 (from universe)
Downloading fastzbarlight-0.0.14.tar.gz (728kB)
100% |████████████████████████████████| 737kB 1.0MB/s
Collecting go-vncdriver>=0.4.8 (from universe)
Downloading go_vncdriver-0.4.19.tar.gz (638kB)
100% |████████████████████████████████| 645kB 1.2MB/s
Requirement already satisfied: gym>=0.7.0 in /usr/local/lib/python2.7/dist-packages (from universe)
Requirement already satisfied: Pillow>=3.3.0 in /usr/local/lib/python2.7/dist-packages (from universe)
Collecting PyYAML>=3.12 (from universe)
Downloading PyYAML-3.12.tar.gz (253kB)
100% |████████████████████████████████| 256kB 2.2MB/s
Requirement already satisfied: six>=1.10.0 in /usr/lib/python2.7/dist-packages (from universe)
Collecting twisted>=16.5.0 (from universe)
Downloading Twisted-16.6.0.tar.bz2 (3.0MB)
100% |████████████████████████████████| 3.0MB 366kB/s
Collecting ujson>=1.35 (from universe)
Downloading ujson-1.35.tar.gz (192kB)
100% |████████████████████████████████| 194kB 2.8MB/s
Requirement already satisfied: txaio>=2.5.2 in /usr/local/lib/python2.7/dist-packages (from autobahn>=0.16.0->universe)
Requirement already satisfied: backports.ssl-match-hostname>=3.5; python_version < "3.5" in /usr/local/lib/python2.7/dist-packages (from docker-py==1.10.3->universe)
Requirement already satisfied: requests<2.11,>=2.5.2 in /usr/local/lib/python2.7/dist-packages (from docker-py==1.10.3->universe)
Requirement already satisfied: websocket-client>=0.32.0 in /usr/local/lib/python2.7/dist-packages (from docker-py==1.10.3->universe)
Requirement already satisfied: ipaddress>=1.0.16; python_version < "3.3" in /usr/lib/python2.7/dist-packages (from docker-py==1.10.3->universe)
Requirement already satisfied: numpy in /usr/lib/python2.7/dist-packages (from go-vncdriver>=0.4.8->universe)
Requirement already satisfied: pyglet>=1.2.0 in /usr/local/lib/python2.7/dist-packages (from gym>=0.7.0->universe)
Requirement already satisfied: olefile in /usr/local/lib/python2.7/dist-packages (from Pillow>=3.3.0->universe)
Requirement already satisfied: zope.interface>=3.6.0 in /usr/lib/python2.7/dist-packages (from twisted>=16.5.0->universe)
Collecting constantly>=15.1 (from twisted>=16.5.0->universe)
Downloading constantly-15.1.0-py2.py3-none-any.whl
Collecting incremental>=16.10.1 (from twisted>=16.5.0->universe)
Downloading incremental-16.10.1-py2.py3-none-any.whl
Installing collected packages: fastzbarlight, go-vncdriver, PyYAML, constantly, incremental, twisted, ujson, universe
Running setup.py install for fastzbarlight ... error
Complete output from command /usr/bin/python2.7 -u -c "import setuptools, tokenize;file='/tmp/pip-build-lWeysK/fastzbarlight/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-pryYFZ-record/install-record.txt --single-version-externally-managed --compile:
/usr/lib/python2.7/distutils/extension.py:133: UserWarning: Unknown Extension options: 'optional'
warnings.warn(msg)
running install
running build
error: [Errno 13] Permission denied

----------------------------------------

Command "/usr/bin/python2.7 -u -c "import setuptools, tokenize;file='/tmp/pip-build-lWeysK/fastzbarlight/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-pryYFZ-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-lWeysK/fastzbarlight/
root@name-System-Product-Name:/home/name# pip install six
Requirement already satisfied: six in /usr/lib/python2.7/dist-packages
root@name-System-Product-Name:/home/name# pip install tensorflow
Collecting tensorflow
Downloading tensorflow-0.12.1-cp27-cp27mu-manylinux1_x86_64.whl (43.1MB)
100% |████████████████████████████████| 43.1MB 27kB/s
Collecting mock>=2.0.0 (from tensorflow)
Downloading mock-2.0.0-py2.py3-none-any.whl (56kB)
100% |████████████████████████████████| 61kB 5.3MB/s
Requirement already satisfied: numpy>=1.11.0 in /usr/lib/python2.7/dist-packages (from tensorflow)
Collecting protobuf>=3.1.0 (from tensorflow)
Downloading protobuf-3.2.0-py2.py3-none-any.whl (360kB)
100% |████████████████████████████████| 368kB 2.1MB/s
Collecting wheel (from tensorflow)
Downloading wheel-0.29.0-py2.py3-none-any.whl (66kB)
100% |████████████████████████████████| 71kB 4.6MB/s
Requirement already satisfied: six>=1.10.0 in /usr/lib/python2.7/dist-packages (from tensorflow)
Collecting funcsigs>=1; python_version < "3.3" (from mock>=2.0.0->tensorflow)
Downloading funcsigs-1.0.2-py2.py3-none-any.whl
Requirement already satisfied: pbr>=0.11 in /usr/local/lib/python2.7/dist-packages (from mock>=2.0.0->tensorflow)
Requirement already satisfied: setuptools in /usr/local/lib/python2.7/dist-packages/setuptools-18.1-py2.7.egg (from protobuf>=3.1.0->tensorflow)
Installing collected packages: funcsigs, mock, protobuf, wheel, tensorflow
Found existing installation: funcsigs 0.4
Uninstalling funcsigs-0.4:
Successfully uninstalled funcsigs-0.4
Successfully installed funcsigs-1.0.2 mock-2.0.0 protobuf-3.2.0 tensorflow-0.12.1 wheel-0.29.0
root@name-System-Product-Name:/home/name# conda install -y -c https://conda.binstar.org/menpo opencv3
Traceback (most recent call last):
File "/usr/local/bin/conda", line 9, in
load_entry_point('conda==4.2.7', 'console_scripts', 'conda')()
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 558, in load_entry_point
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 2682, in load_entry_point
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 2355, in load
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 2361, in resolve
File "/usr/local/lib/python2.7/dist-packages/conda/cli/init.py", line 8, in
from .main import main # NOQA
File "/usr/local/lib/python2.7/dist-packages/conda/cli/main.py", line 46, in
from ..base.context import context
File "/usr/local/lib/python2.7/dist-packages/conda/base/context.py", line 18, in
from ..common.configuration import (Configuration, MapParameter, PrimitiveParameter,
File "/usr/local/lib/python2.7/dist-packages/conda/common/configuration.py", line 40, in
from ruamel.yaml.comments import CommentedSeq, CommentedMap # pragma: no cover
ImportError: No module named ruamel.yaml.comments
root@name-System-Product-Name:/home/name# conda install -y numpy
Traceback (most recent call last):
File "/usr/local/bin/conda", line 9, in
load_entry_point('conda==4.2.7', 'console_scripts', 'conda')()
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 558, in load_entry_point
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 2682, in load_entry_point
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 2355, in load
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 2361, in resolve
File "/usr/local/lib/python2.7/dist-packages/conda/cli/init.py", line 8, in
from .main import main # NOQA
File "/usr/local/lib/python2.7/dist-packages/conda/cli/main.py", line 46, in
from ..base.context import context
File "/usr/local/lib/python2.7/dist-packages/conda/base/context.py", line 18, in
from ..common.configuration import (Configuration, MapParameter, PrimitiveParameter,
File "/usr/local/lib/python2.7/dist-packages/conda/common/configuration.py", line 40, in
from ruamel.yaml.comments import CommentedSeq, CommentedMap # pragma: no cover
ImportError: No module named ruamel.yaml.comments
root@name-System-Product-Name:/home/name# conda install -y scipy
Traceback (most recent call last):
File "/usr/local/bin/conda", line 9, in
load_entry_point('conda==4.2.7', 'console_scripts', 'conda')()
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 558, in load_entry_point
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 2682, in load_entry_point
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 2355, in load
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 2361, in resolve
File "/usr/local/lib/python2.7/dist-packages/conda/cli/init.py", line 8, in
from .main import main # NOQA
File "/usr/local/lib/python2.7/dist-packages/conda/cli/main.py", line 46, in
from ..base.context import context
File "/usr/local/lib/python2.7/dist-packages/conda/base/context.py", line 18, in
from ..common.configuration import (Configuration, MapParameter, PrimitiveParameter,
File "/usr/local/lib/python2.7/dist-packages/conda/common/configuration.py", line 40, in
from ruamel.yaml.comments import CommentedSeq, CommentedMap # pragma: no cover
ImportError: No module named ruamel.yaml.comments
root@name-System-Product-Name:/home/name#

[SOLVED] Trouble watching VNC from Docker

SOLVED the issue right as I finished writing the issue, but it was enough of a headache that I am posting here for future reference. Scroll down for solution

I am currently using docker to run the starter agent, but I am having trouble viewing it using VNC. I have tried to view it with both TurboVNC and in browser. I am running docker using the following command:

docker run --privileged --rm -it \ 

    -v /usr/bin/docker:/usr/bin/docker \

    -v /root/.docker:/root/.docker \

    -v /var/run/docker.sock:/var/run/docker.sock \

    -e DOCKER_NET_HOST=172.17.0.1 \

    -p 12345:12345 \

    -p 5900:5900 \

    -v `pwd`:/srv/ \

    universe

I know that the browser version is kind of working because it says connnecting-> disconnecting ->disconnect timeout. So something is happening, but I never see it render. But when I point TurboVNC at the docker ip port 5900 I get Failed to connect to server. I've tried iterating the port number but that doesn't help. All tmux processes are running without error, and I can see it training on tensorboard. Any ideas are appreciated.

Solution
The VNC is being hosted on a different docker than the one it's started from so allocating the 5900 port like I did above prevents 5900 from being mapped to the child docker which hosts the VNC. The solution is to run docker like this:

docker run --privileged --rm -it \ 

    -v /usr/bin/docker:/usr/bin/docker \

    -v /root/.docker:/root/.docker \

    -v /var/run/docker.sock:/var/run/docker.sock \

    -e DOCKER_NET_HOST=172.17.0.1 \

    -p 12345:12345 \

    -v `pwd`:/srv/ \

    universe

Daniel

Any possible to use GPU and make it faster

The project is great and I am very appreciate it. But it still not implement on GPU, and I would if modify some lines in the code should make it work with GPU and make is faster. I have a tensorflow GPU ready platform and tensorflow 1.0 (well, I have made a few unimportant change to the code so that it works fine with tensorflow 1.0).
Again, thanks a lot for opening this project, it's been a great contribution.

tmux config renames windows

The training script was failing with with tmux can't find pane XXX for each of the window names the script was trying to create.

After looking around, it seems that the default config on OS X (I didn't have a .tmux.conf in my home directory) is to rename the windows to be that of the path the script was run from.

The following setting added to .tmux.conf disables the behavior and the script can run as expected:

set-option -g allow-rename off

Performance on other Atari games

Hi, Pong is a good sanity check. Has anyone tried/adopted the code (A3C-LSTM) on other Atari games like BreakoutDeterministic-v3 and SpaceInvadersDeterministic-v3, and managed to get average scores 500+ and 2500+ respectively?

I understand there is much difference with the A3C paper (like reward clipping, shared rms optimization, network architecture, input image size...) in this implementation... But I still can't reproduce the results on breakout and space_invaders after modifying the code...

Any suggestion/discussion is welcomed!

Is shared Optimizer possible?

Hi, thanks for the nice code, which also shows advanced distributed Tensorflow coding. Wondering if it is possible to use a shared optimizer using also the distributed Tensorflow, which should achieve better results as reported by the paper? Currently each A3C worker has its own:
https://github.com/openai/universe-starter-agent/blob/master/a3c.py#L241

I understand this repository intends for universe... just a quick question if you happen to know the answer:)

crash when run distributed training

VNC into environment

Could you add a quick note on how to VNC into simulated env on a Mac? Tried using TurboVNC but seems to fail to connect.
Thank you!