zhejz / carla-roach Goto Github PK
View Code? Open in Web Editor NEWRoach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach. ICCV 2021.
Home Page: https://zhejz.github.io/roach
License: Other
Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach. ICCV 2021.
Home Page: https://zhejz.github.io/roach
License: Other
I meet this error of time-out when I train the RL model.:
RuntimeError: time-out of 60000ms while waiting for the simulator, make sure the simulator is ready and connected to localhost:2000
And it seems normal to collect the data
Can you give me some suggestions?
CARLA version: 0.9.10
Platform/OS: Ubuntu 22.04
CUDA :12.2
When I run these two commands, I encounter an error, and it may have caused me to be unable to start the Carla Simulator. Calling for help.
bash /home/tech/carla/CarlaUE4.sh
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
./run/train_rl.sh
CarlaUE4-Linux: 未找到进程
[2023-09-11 11:41:53,139][utils.server_utils][INFO] - Kill Carla Servers!
CarlaUE4-Linux: 未找到进程
[2023-09-11 11:41:54,156][utils.server_utils][INFO] - Kill Carla Servers!
[2023-09-11 11:41:54,156][utils.server_utils][INFO] - SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=1 bash /home/njtech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2000
[2023-09-11 11:41:54,161][utils.server_utils][INFO] - SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=1 bash /home/njtech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2005
[2023-09-11 11:41:54,165][utils.server_utils][INFO] - SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=1 bash /home/njtech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2010
[2023-09-11 11:41:54,168][utils.server_utils][INFO] - SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=1 bash /home/njtech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2015
[2023-09-11 11:41:54,171][utils.server_utils][INFO] - SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=1 bash /home/njtech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2020
[2023-09-11 11:41:54,174][utils.server_utils][INFO] - SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=1 bash /home/njtech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2025
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
[2023-09-11 11:42:01,081][agents.rl_birdview.rl_birdview_agent][INFO] - Resume checkpoint latest ckpt/ckpt_6082560.pth
[2023-09-11 11:42:05,036][agents.rl_birdview.rl_birdview_agent][INFO] - Loading wandb checkpoint: ckpt/ckpt_6082560.pth
carla_map load_world(): Town02
carla_map load_world(): Town05
carla_map load_world(): Town04
carla_map load_world(): Town06
carla_map load_world(): Town01
carla_map load_world(): Town03
Traceback (most recent call last):
File "train_rl.py", line 64, in main
env = SubprocVecEnv([lambda config=config: env_maker(config) for config in server_manager.env_configs])
File "/home/njtech/anaconda3/envs/carla/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 98, in __init__
observation_space, action_space = self.remotes[0].recv()
File "/home/njtech/anaconda3/envs/carla/lib/python3.7/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/njtech/anaconda3/envs/carla/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/njtech/anaconda3/envs/carla/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[2023-09-11 11:43:07,615][wandb.sdk.internal.internal][INFO] - Internal process exited
PYTHON_RETURN=0!!! Start Over!!!
CarlaUE4-Linux: 未找到进程
Bash script done.
Hello, I'm trying to run run/benchmark.sh and I need to modify agent.cilrs.rl_run_path and agent.cilrs.rl_ckpt_step such that I could load my ckpt. Sorry for a rookie in wandb, as shown in the official docs, agent.cilrs.rl_run_path should be in 'entity/project' format, while the entity is '' and project is 'il_leaderboard_roach' if I just follow the lb running in run/train_il.sh. But it keeps telling me that project is not found, any helps? Thanks!!
CARLA version :0.9.14 in ubuntu.
Hi all, when I run the data_collect.py file, I am getting this error -
"from gym.wrappers.monitoring.video_recorder import ImageEncoder
ImportError: cannot import name 'ImageEncoder' from 'gym.wrappers.monitoring.video_recorder"
When I search for error I got to know that ImageEncoder is deprecated. So how to change code for this line?
Thanks in advance.
Hi,
First of all, End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
is an excellent work and thank you for sharing your work and your code. However, I couldn't
understand some points in the paper. I would appreciate it if you help me.
In my understanding, history information of vehicles and pedestrians are given to bev space while training the roach expert. On the other hand, it looks like it might be conflicted with Markov Models since the concept of Markov model is to give only the current state. What is the motivation of adding history information in state ? What kind of advantage does it brings to the rl model ?
In addition, the imitation model of the paper does not use any history information. Do you think this might cause a problem ? Rl expert can extract additional information by using history information. On the other hand, the imitation model does not receive any of these information in its state. However, we expect the similar behaviours from both models.
In your project, you use PPO in the coach stage, Have you tried other RL algorithm such as SAC?
Hello, I'm running run/train_rl.sh
and meet this error. I don't know what the problem is.
CarlaUE4-Linux: 未找到进程
[2023-09-05 15:34:29,212][utils.server_utils][INFO] - Kill Carla Servers!
CarlaUE4-Linux: 未找到进程
[2023-09-05 15:34:30,225][utils.server_utils][INFO] - Kill Carla Servers!
[2023-09-05 15:34:30,225][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/tech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2000
[2023-09-05 15:34:30,232][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/tech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2005
[2023-09-05 15:34:30,238][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/tech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2010
[2023-09-05 15:34:30,242][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/tech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2015
[2023-09-05 15:34:30,246][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/tech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2020
[2023-09-05 15:34:30,250][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/tech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2025
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
Process ForkServerProcess-6:
Traceback (most recent call last):
File "/home/tech/anaconda3/envs/carla/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/tech/anaconda3/envs/carla/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 13, in _worker
env = env_fn_wrapper.var()
File "train_rl.py", line 64, in <lambda>
env = SubprocVecEnv([lambda config=config: env_maker(config) for config in server_manager.env_configs])
File "train_rl.py", line 57, in env_maker
seed=cfg.seed, no_rendering=True, **config['env_configs'])
File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 145, in make
return registry.make(id, **kwargs)
File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 90, in make
env = spec.make(**kwargs)
File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 60, in make
env = cls(**_kwargs)
File "/home/tech/carla-roach/carla_gym/envs/suites/endless_env.py", line 9, in __init__
obs_configs, reward_configs, terminal_configs, all_tasks)
File "/home/tech/carla-roach/carla_gym/carla_multi_agent_env.py", line 28, in __init__
self._init_client(carla_map, host, port, seed=seed, no_rendering=no_rendering)
File "/home/tech/carla-roach/carla_gym/carla_multi_agent_env.py", line 151, in _init_client
self._world = client.load_world(carla_map)
RuntimeError: map not found
/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
wandb: Currently logged in as: greatest-of-all-time (use `wandb login --relogin` to force relogin)
/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
wandb: Tracking run with wandb version 0.10.12
wandb: Syncing run roach
wandb: ⭐️ View project at https://wandb.ai/greatest-of-all-time/train_rl_experts
wandb: 🚀 View run at https://wandb.ai/greatest-of-all-time/train_rl_experts/runs/22x9vtjs
wandb: Run data is saved locally in /home/tech/carla-roach/outputs/2023-09-05/15-34-28/wandb/run-20230905_153439-22x9vtjs
wandb: Run `wandb offline` to turn off syncing.
wandb: WARNING Symlinked 3 files into the W&B run directory, call wandb.save again to sync new files.
trainable parameters: 1.53M
Traceback (most recent call last):
File "train_rl.py", line 74, in main
agent.learn(env, total_timesteps=int(cfg.total_timesteps), callback=callback, seed=cfg.seed)
File "/home/tech/carla-roach/agents/rl_birdview/rl_birdview_agent.py", line 107, in learn
model.learn(total_timesteps, callback=callback, seed=seed)
File "/home/tech/carla-roach/agents/rl_birdview/models/ppo.py", line 216, in learn
self.env.seed(seed)
File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 114, in seed
remote.send(("seed", seed + idx))
File "/home/tech/anaconda3/envs/carla/lib/python3.7/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/home/tech/anaconda3/envs/carla/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header + buf)
File "/home/tech/anaconda3/envs/carla/lib/python3.7/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
wandb: Waiting for W&B process to finish, PID 943313
wandb: Program failed with code 1. Press ctrl-c to abort syncing.
Process ForkServerProcess-5:loaded (0.00MB deduped)
Traceback (most recent call last):
File "/home/tech/anaconda3/envs/carla/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/tech/anaconda3/envs/carla/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 13, in _worker
env = env_fn_wrapper.var()
File "train_rl.py", line 64, in <lambda>
env = SubprocVecEnv([lambda config=config: env_maker(config) for config in server_manager.env_configs])
File "train_rl.py", line 57, in env_maker
seed=cfg.seed, no_rendering=True, **config['env_configs'])
File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 145, in make
return registry.make(id, **kwargs)
File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 90, in make
env = spec.make(**kwargs)
File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 60, in make
env = cls(**_kwargs)
File "/home/tech/carla-roach/carla_gym/envs/suites/endless_env.py", line 9, in __init__
obs_configs, reward_configs, terminal_configs, all_tasks)
File "/home/tech/carla-roach/carla_gym/carla_multi_agent_env.py", line 28, in __init__
self._init_client(carla_map, host, port, seed=seed, no_rendering=no_rendering)
File "/home/tech/carla-roach/carla_gym/carla_multi_agent_env.py", line 152, in _init_client
self._tm = client.get_trafficmanager(port+6000)
RuntimeError: trying to create rpc server for traffic manager; but the system failed to create because of bind error.
wandb:
wandb: Find user logs for this run at: /home/tech/carla-roach/outputs/2023-09-05/15-34-28/wandb/run-20230905_153439-22x9vtjs/logs/debug.log
wandb: Find internal logs for this run at: /home/tech/carla-roach/outputs/2023-09-05/15-34-28/wandb/run-20230905_153439-22x9vtjs/logs/debug-internal.log
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 4 other file(s)
wandb:
wandb: Synced roach: https://wandb.ai/greatest-of-all-time/train_rl_experts/runs/22x9vtjs
PYTHON_RETURN=0!!! Start Over!!!
Bash script done.
Hi, I wander have you tried to add the scenarios in Carla's scenario manager to your enviroment? I plan to do so but I am not sure whether py_trees used in the scenario manager supports multi-processing trainning (SubprocVecEnv). Thank you in advance!
Thank you for your wonderful work! I am very much impressed with the consistency at which the RL training progresses and that must have required some phenomenal hyperparameter tuning and debugging skills. I would like to learn to tune hyperparamters for big and dynamic environments like these. I would appreciate it a lot if you could give me some suggestions and tips for hyperparameter tuning and debugging RL training process. Thank you once again! Cheers!
Hi
I'm trying to benchmark the RL experts 'iccv21-roach/trained-models/1929isj0: Roach' on w&b, but I couldn't achieve the results in the paper. The test suites is nocrash_dense, and the success rate is less than 0.5.
Does the trained model on w&b correspond to the test results in the paper?How many steps does it need to train to get the results in the paper?
Hello, thank you for the wonderful work. I want to benchmark the model on unseen towns 7 and 10. I generated town07.h5 maps using carla_gym/utils/birdview_map.py code. However, to use the benchmarking script, I need route.xml file for town 7 and 10. Could you please tell me how to go about this?
Hi,
First of all thank you very much for sharing your code and your work in the paper is very interesting. But I encounter a problem while training RL expert. I would appreciate if you help me.
I installed carla 0.9.10 like you said in Installation.md. I can run benchmark.py and observe the car's behavior in video log. However, when I run train_rl.py, it shows segmentation fault. Moreover, I also noticed that the problem occurs when self._world tries to get vehicle_bbox_list and walker_bbox_list in chauffeurnet.py.
Have you encountered similar problem while you train your RL expert ?
Thank you,
Dear zhejz,
I would like to use train_rl.sh to train a new model from scratch in Town10HD. I have changed the part of Town02 in endless_all.yaml into a Town10HD version, as the pattern shown below.
Besides, I have used the file carla_gym/utils/birdview_map.py to generate a new BEV hdf5 map of Town10, which is Town10HD.h5 in /maps. However, it report some errors as shown below
Process ForkServerProcess-1:
Traceback (most recent call last):
File "/home/hcis-s09/miniconda3/envs/carla/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/hcis-s09/miniconda3/envs/carla/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/hcis-s09/miniconda3/envs/carla/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 13, in _worker
env = env_fn_wrapper.var()
File "train_rl.py", line 64, in
env = SubprocVecEnv([lambda config=config: env_maker(config) for config in server_manager.env_configs])
File "train_rl.py", line 57, in env_maker
seed=cfg.seed, no_rendering=True, **config['env_configs'])
File "/home/hcis-s09/miniconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 145, in make
return registry.make(id, **kwargs)
File "/home/hcis-s09/miniconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 90, in make
env = spec.make(kwargs)
File "/home/hcis-s09/miniconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 60, in make
env = cls(_kwargs)
File "/home/hcis-s09/Downloads/carla-roach/carla_gym/envs/suites/endless_env.py", line 9, in init
obs_configs, reward_configs, terminal_configs, all_tasks)
File "/home/hcis-s09/Downloads/carla-roach/carla_gym/carla_multi_agent_env.py", line 28, in init
self._init_client(carla_map, host, port, seed=seed, no_rendering=no_rendering)
File "/home/hcis-s09/Downloads/carla-roach/carla_gym/carla_multi_agent_env.py", line 152, in _init_client
self._world = client.load_world(carla_map)
RuntimeError: time-out of 60000ms while waiting for the simulator, make sure the simulator is ready and connected to localhost:2000
I guess that just changing the town name from Town02 to Town10HD in endless_all.yaml is not sufficient enough to change the target town of RL training.
Could you please give me some hints about how to retrain a new RL model in Town10?
Thank you all for all of your beautiful works!
Sincerely,
sonicokuo
I have collected NoCrash-dense data successfully:
https://github.com/neilsambhu/carla-roach/blob/NeilBranch0/run/data_collect_bc_NeilBranch0.sh
https://github.com/neilsambhu/carla-roach/blob/NeilBranch0/data_collect_NeilBranch0.py
When I run my version of train_rl.py ( https://github.com/neilsambhu/carla-roach/blob/NeilBranch0/train_rl_NeilBranch0.py ), I get the following error:
Traceback (most recent call last):
File "train_rl_NeilBranch0.py", line 87, in main
agent = AgentClass('config_agent.yaml')
File "/home/nsambhu/github/carla-roach/agents/rl_birdview/rl_birdview_agent.py", line 31, in init
self.setup(path_to_conf_file)
File "/home/nsambhu/github/carla-roach/agents/rl_birdview/rl_birdview_agent.py", line 205, in setup
self._policy, self._train_cfg['kwargs'] = self._policy_class.load(self._ckpt)
File "/home/nsambhu/github/carla-roach/agents/rl_birdview/models/ppo_policy.py", line 226, in load
saved_variables = th.load(path, map_location=device)
File "/home/nsambhu/anaconda3/envs/carla/lib/python3.7/site-packages/torch/serialization.py", line 529, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/nsambhu/anaconda3/envs/carla/lib/python3.7/site-packages/torch/serialization.py", line 702, in _legacy_load
result = unpickler.load()
File "/home/nsambhu/anaconda3/envs/carla/lib/python3.7/site-packages/torch/serialization.py", line 665, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "/home/nsambhu/anaconda3/envs/carla/lib/python3.7/site-packages/torch/serialization.py", line 737, in restore_location
return default_restore_location(storage, map_location)
File "/home/nsambhu/anaconda3/envs/carla/lib/python3.7/site-packages/torch/serialization.py", line 156, in default_restore_location
result = fn(storage, location)
File "/home/nsambhu/anaconda3/envs/carla/lib/python3.7/site-packages/torch/serialization.py", line 136, in _cuda_deserialize
return storage_type(obj.size())
File "/home/nsambhu/anaconda3/envs/carla/lib/python3.7/site-packages/torch/cuda/init.py", line 480, in _lazy_new
return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory
My shell script to call train_rl.py is listed here: https://github.com/neilsambhu/carla-roach/blob/NeilBranch0/run/train_rl_NeilBranch0.sh
I have already reduced the batch size from 256 to 1 and the error persists: https://github.com/neilsambhu/carla-roach/blob/NeilBranch0/config/agent/ppo/training/ppo.yaml
Output from ( https://github.com/neilsambhu/carla-roach/blob/NeilBranch0/train_rl_NeilBranch0.py#L78 ) to show the batch size decreased:
cfg.agent[agent_name] {'entry_point': 'agents.rl_birdview.rl_birdview_agent:RlBirdviewAgent', 'wb_run_path': '', 'wb_ckpt_step': None, 'env_wrapper': {'entry_point': 'agents.rl_birdview.utils.rl_birdview_wrapper:RlBirdviewWrapper', 'kwargs': {'input_states': ['control', 'vel_xy'], 'acc_as_action': True}}, 'policy': {'entry_point': 'agents.rl_birdview.models.ppo_policy:PpoPolicy', 'kwargs': {'policy_head_arch': [256, 256], 'value_head_arch': [256, 256], 'features_extractor_entry_point': 'agents.rl_birdview.models.torch_layers:XtMaCNN', 'features_extractor_kwargs': {'states_neurons': [256, 256]}, 'distribution_entry_point': 'agents.rl_birdview.models.distributions:BetaDistribution', 'distribution_kwargs': {'dist_init': None}}}, 'training': {'entry_point': 'agents.rl_birdview.models.ppo:PPO', 'kwargs': {'learning_rate': 1e-05, 'n_steps_total': 12288,
'batch_size': 1,
'n_epochs': 20, 'gamma': 0.99, 'gae_lambda': 0.9, 'clip_range': 0.2, 'clip_range_vf': None, 'ent_coef': 0.01, 'explore_coef': 0.05, 'vf_coef': 0.5, 'max_grad_norm': 0.5, 'target_kl': 0.01, 'update_adv': False, 'lr_schedule_step': 8}}, 'obs_configs': {'birdview': {'module': 'birdview.chauffeurnet', 'width_in_pixels': 192, 'pixels_ev_to_bottom': 40, 'pixels_per_meter': 5.0, 'history_idx': [-16, -11, -6, -1], 'scale_bbox': True, 'scale_mask_col': 1.0}, 'speed': {'module': 'actor_state.speed'}, 'control': {'module': 'actor_state.control'}, 'velocity': {'module': 'actor_state.velocity'}}}
Hello, I'm executing run / train_ rl. sh, I found that several Carla windows were opened at the same time, and then crashed. I learned in your paper that RL experts are trained at the same time in Town 1-6, so I think my GPU may not meet the needs of the code. Please tell me the number and model of graphics cards you use when training RL experts.
Hi, thank you for sharing your excellent work! I want to train your RL agent with a 6 1080Ti (6 Carla server on 6 different GPU) 56 cores machine. But it seems that it takes around 5-6 min for 12288 steps so the total 10M steps will take around 50 days which is not acceptable. Do you know what the possible reason is or how to improve the speed? Thank you!
Hi,
Has the run path been updated?
I am trying to collect your dataset from the run path and it says:
Could not find run <Run zhejun/il_nocrash_ap/2pilkrol (not found)>
Or has it been unhosted?
Dear authors, first thanks for sharing your paper and code! It is very excellent.
I am a RL newbie. In the carla/gym/core/obs_manager/birdview/chauffeurnet.py, in the function get_observation, I noticed that when rastering vehicles' bboxes, the code only rasterize the ego vehicle's current bbox and specifically excluding the history of the ego vehicle.
I am wondering whether you have tried rasterize the ego vehicle's history bbox. Is this setting a common practice (from other codebase?)? I am curious about the reason. (Causal confusion? Markov property? RL instability?)
Thanks for your time.
Hello
Hi. When I run the file train_rl.py there is an error that module 'tensorflow_estimator.python.estimator.api._v1.estimator' has no attribute 'distributions. I use tensorflow ==2.6.0. How to fix this error
Hi, thanks for this awesome project!
I wonder if there is a way to visualize the RL training process, like how the ego car is driven in carla?
I want to do this to see the improvement of the RL model.
Appriciate any advice.
Hi, guys!
I have 2 GPUs in my conputer and I want to train the RL model on the 2nd one. I replaced the code gpu: [0]
in endless_all.yaml
with gpu: [1]
, the final shell comand of CARLA startup is CUDA_VISIBLE_DEVICES=1 bash /media/carla/CarlaUE4.sh -fps=10 -quality-level=Low -carla-rpc-port=2000
. However, I find that all CARLA servers are still running on GPU0
, as shown in the picture .
I am really confused and don't know how to solve this problem. Thanks for any help!
Hi,
Currently, I am trying to train IL models from scratch.
I am using a Tesla V100 (32GB) and 8 CPU cores to train models. Lastly, the batch size is 192.
My dataset consists of 183000 data points collected with your dataset collection code.
At present, it takes approximately 3-4 hours to train one epoch.
Was this the case for you? Can you let me know the GPU specs, training properties, and, finally, the epoch durations?
I would really appreciate it if you could help with this issue.
Best.
After I start running train_rl.sh, when the program executes this line self.policy.forward(self._last_obs)
in ppo.py
, the program gets stuck for a while and gives out the above error. @zhejz Do you have any idea?
Thanks for this wonderful work. As the experiments show, the exploration loss greatly improves PPO performance. The intuition behind it is what? And how to define different beta distributions after different events? For example, when running a red light / colliding with other agents, why introduce the distribution beta(1, 2.5) as p_z? Finally, are there any mathematical modeling works that can be referred to get a better understanding?
Relate issue: #8
From here:
#8 (comment)
I spent some time trying to integrate the scenario_runner into the multi-processing RL training but it didn't work out smoothly.
And What I want to try is use your roach expert to collect data from .XML route and .json scenarios based on official code from carla on leaderboard and scenarios to see the leaderboard result on your roach
. Since as said here:
more naturally than hand-crafted CARLA experts
and based on the official leaderboard and scenarios I can compare the result from the same route and scenarios but not random as this repo did.
But when I read code based on readme collect: https://github.com/zhejz/carla-roach#quick-start-collect-an-expert-dataset-using-roach
It seems that your agent file didn't suitable to run it on leaderboard, like the file:
class RlBirdviewAgent():
Did anyone try this on the offline official leaderboard and self-defined XML and JSON?
@zhejz How do I view the sensor data collected by the self-driving car models? On the README.md, I see training instructions for the agents. I would like to be able to view and modify the self-driving car models (e.g. model layers); I want to see the input to these networks.
Dear zhejz,
When I run the train_rl.sh through the training process I have EOF Error, I have 40 gigs RAM and running it on a 3090 gpu, I've this error in different epochs frequently after n_epoch: 0,n_epoch: 8,n_epoch: 25.
Here is the full error:
Error executing job with overrides: ['agent.ppo.wb_run_path=null', 'wb_project=train_rl_experts', 'wb_name=roach', 'agen
t/ppo/policy=xtma_beta', 'agent.ppo.training.kwargs.explore_coef=0.05', 'carla_sh_path=/media/carla/AVRL/carla/CarlaUE4.
sh']
Traceback (most recent call last):
File "train_rl.py", line 75, in main
agent.learn(env, total_timesteps=int(cfg.total_timesteps), callback=callback, seed=cfg.seed)
File "/media/carla/AVRL/roach/DML_AVRL/agents/rl_birdview/rl_birdview_agent.py", line 109, in learn
model.learn(total_timesteps, callback=callback, seed=seed)
File "/media/carla/AVRL/roach/DML_AVRL/agents/rl_birdview/models/ppo.py", line 249, in learn
callback.on_training_end()
File "/media/carla/AVRL/roach/env/carla/lib/python3.8/site-packages/stable_baselines3/common/callbacks.py", line 95, i
n on_training_end
self._on_training_end()
File "/media/carla/AVRL/roach/env/carla/lib/python3.8/site-packages/stable_baselines3/common/callbacks.py", line 179,
in _on_training_end
callback.on_training_end()
File "/media/carla/AVRL/roach/env/carla/lib/python3.8/site-packages/stable_baselines3/common/callbacks.py", line 95, i
n on_training_end
self._on_training_end()
File "/media/carla/AVRL/roach/DML_AVRL/agents/rl_birdview/utils/wandb_callback.py", line 67, in _on_training_end
avg_ep_stat, ep_events = self.evaluate_policy(self.vec_env, self.model.policy, eval_video_path)
File "/media/carla/AVRL/roach/DML_AVRL/agents/rl_birdview/utils/wandb_callback.py", line 158, in evaluate_policy
obs, reward, done, info = env.step(actions)
File "/media/carla/AVRL/roach/env/carla/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py",
line 161, in step
return self.step_wait()
File "/media/carla/AVRL/roach/env/carla/lib/python3.8/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.p
y", line 107, in step_wait
results = [remote.recv() for remote in self.remotes]
File "/media/carla/AVRL/roach/env/carla/lib/python3.8/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.p
y", line 107, in
results = [remote.recv() for remote in self.remotes]
File "/usr/lib/python3.8/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/usr/lib/python3.8/multiprocessing/connection.py", line 420, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 389, in _recv
raise EOFError
EOFError
Hi,
Very impressive work!
One simple question, when we set up the argument of n_episodes
, if we need to take into account of the number of towns? For example, if I want to collect 1 episode per towns (Town 1, 3, 4, 6). Shall I set up the n_episodes
to 1, or 4?
And for each episode, is the route (the start point and the end point) always set to the same?
Cheers,
Yi
In ego_vehicle_handler.py, is "score_route" "success rate"? I know "score_composed" is the "driving score".
Hi,
I have a question about the class Task Vehicle in carla_gym\core\task_actor\common\task_vehicle.py
The init takes 4 arguments vehicle, target_transforms, spawn_transforms, endless.
I have a question about target_transforms and spawn_transforms.
What is the data structure that the waypoints are expected to be in?
Are the target_transforms supposed to be the route waypoints from the route.xml files?
Are the spawn_transforms meant to be the initial vehicle transform?
What coordinate system are they expected to be using, the same one as the .xml files?
Hi,
As far as I understand, it is enough to change the observation configs to add new fields to the dataset. It is doable, for example, when I change the observation manager of an individual sensor (i.e., ObsManager class of GNSS -- adding the sensor noise value to the observation dictionary for the sake of example --). However, when I run the default data collection code (i.e., data_collect_bc.sh), it does not add the navigation.waypoint_plan
and birdview.chauffeurnet
observation dictionaries to the dataset, which exists in the agent/cilrs/obs_configs=central_rgb_wide.
Probably, I am missing the point here; that is, there are also other things that need to be set. At least, I would like to reach to navigation.waypoint_plan
observation dictionary during the data acquisition. So, how can I add the observation dictionaries of other ObsManager
modules to the dataset? I really appreciate it if you can help.
Best.
Hi, I'm running run/train_rl.sh
and keep receiving this error
[2022-05-15 08:09:58,133][utils.server_utils][INFO] - Kill Carla Servers!
CarlaUE4-Linux: no process found
[2022-05-15 08:09:59,167][utils.server_utils][INFO] - Kill Carla Servers!
[2022-05-15 08:09:59,168][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/thoaican/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2000
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
Traceback (most recent call last):
File "train_rl.py", line 40, in main
agent = AgentClass('config_agent.yaml')
File "/home/thoaican/carla-roach/agents/rl_birdview/rl_birdview_agent.py", line 15, in __init__
self.setup(path_to_conf_file)
File "/home/thoaican/carla-roach/agents/rl_birdview/rl_birdview_agent.py", line 27, in setup
f = max(all_ckpts, key=lambda x: int(x.name.split('_')[1].split('.')[0]))
ValueError: max() arg is an empty sequence
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[2022-05-15 08:10:05,478][wandb.sdk.internal.internal][INFO] - Internal process exited
I've browsed the issues page and found the same error from other person here, and the solution is delete the outputs/checkpoint.txt
. But to me it was no help
Thank you for your excellent work. When I run train_rl.py, I set no-rendering to false, but I can only see the running status of the Carla client and cannot see the BEV image. Is there any method to observe the BEV image during reinforcement learning training?
" PYTHON_RETURN=1!!! Start Over!!! "
hello,how can I solve this problem?
Hi,
I'm trying to run the carla-roach benchmark with carla 0.9.13. I am currently running into some segmentation faults when benchmarking the carla roaming agent. When using a max_step of 1 in the run_single function (benchmark.py), I only get a segmentation fault in benchmarks 'WetNoon_03' and 'SoftRainSunset_03'. However, when increasing the max_step to 5, it also occurs at many of the other benchmarks. The segmentation faults always occur while creating the zombiewalkers in the following loop (zombie_walker_handler.py):
for w_id, c_id in zip(walker_ids, controller_ids):
self.zombie_walkers[w_id] = ZombieWalker(w_id, c_id, self._world)
return self.zombie_walkers
It completes some iterations, but after a while it gives the segmentation fault (Usually around 200/250). This code is run during the _zw_handler reset in carla_multi_agent_env.py. Does anyone know what could be the cause of these segmentation faults?
NOTE: I am using vehicle.audi.a2 instead of vehicle.lincoln.mkz2017, because the lincoln doesn't seem to be recognized in carla 0.9.13.
hello,When I execute run/train_rl.sh
, I encountered the following problems:
Traceback (most recent call last): File "train_rl.py", line 62, in main env = DummyVecEnv([lambda config=config: env_maker(config) for config in server_manager.env_configs]) File "/home/whm/anaconda3/envs/roach/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 23, in __init__ self.envs = [fn() for fn in env_fns] File "/home/whm/anaconda3/envs/roach/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 23, in <listcomp> self.envs = [fn() for fn in env_fns] File "train_rl.py", line 62, in <lambda> env = DummyVecEnv([lambda config=config: env_maker(config) for config in server_manager.env_configs]) File "train_rl.py", line 58, in env_maker env = EnvWrapper(env, **wrapper_kargs) File "/home/whm/roach/agents/rl_birdview/utils/rl_birdview_wrapper.py", line 25, in __init__ assert len(env._obs_configs) == 1 File "/home/whm/anaconda3/envs/roach/lib/python3.8/site-packages/gym/core.py", line 228, in __getattr__ raise AttributeError(f"attempted to get missing private attribute '{name}'") AttributeError: attempted to get missing private attribute '_obs_configs
what's the issue?
As you said, I registered and logged in to wandb, but the following error occurred at runtime:
CarlaUE4-Linux: no process found
[2021-12-23 11:21:27,462][utils.server_utils][INFO] - Kill Carla Servers!
CarlaUE4-Linux: no process found
[2021-12-23 11:21:28,486][utils.server_utils][INFO] - Kill Carla Servers!
[2021-12-23 11:21:28,486][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/jjuv/carla/CARLA_0.9.10.1/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2000
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
[2021-12-23 11:21:33,555][main][INFO] - making port 2000
/home/jjuv/anaconda3/envs/roach/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
wandb: Currently logged in as: yqlol (use wandb login --relogin
to force relogin)
wandb: Tracking run with wandb version 0.10.12
wandb: Syncing run roach
wandb: ⭐️ View project at https://wandb.ai/yqlol/train_rl_experts
wandb: 🚀 View run at https://wandb.ai/yqlol/train_rl_experts/runs/37rjz3na
wandb: Run data is saved locally in /home/jjuv/carla-roach-main/outputs/2021-12-23/11-21-26/wandb/run-20211223_112136-37rjz3na
wandb: Run wandb offline
to turn off syncing.
wandb: WARNING Symlinked 3 files into the W&B run directory, call wandb.save again to sync new files.
trainable parameters: 1.53M
Stuck in 'trainable parameters: 1.53M', what's going on?
Originally posted by @Yiquan-lol in #9 (comment)
Thanks for sharing your excellent work!
I've trained the RL model for two times, it learns well at first, however after 7M steps the agent tends to get stuck at the traffic lights and won't start again when the light turned to green. It seems that the agent acts very conservative at a low speed or moves forward a little bit after a long time.
The checkpoint I got after 10M steps even can't complete a single route due to the problem. I didn't modify the reward code and tried to use the same training parameters used in the paper with batch_size=256
, n_steps_total=12288
and 6 towns at the same time. Below is the problem screenshot during training(I used the -quality-level=Low
option when starting CARLA to monitor the training process), the green car is the agent.
I find that the total loss begins to grow after 7M steps.
Thanks for any help or suggestion!
I am interested in multi-agent research. I am wondering how to go about configuring all vehicles with a tri-camera setup and collect data from all vehicles simultaneously? Rather than having one ego vehicle I would like to make every vehicle an "ego" vehicle from which I can collect data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.