zhejz / carla-roach Goto Github PK

View Code? Open in Web Editor NEW

267.0 2.0 47.0 1.48 MB

Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach. ICCV 2021.

Home Page: https://zhejz.github.io/roach

License: Other

Python 97.65% Shell 2.35%

carla autonomous-driving imitation-learning iccv2021 pytorch reinforcement-learning

carla-roach's People

Contributors

Stargazers

Watchers

carla-roach's Issues

time-out error

I meet this error of time-out when I train the RL model.:
RuntimeError: time-out of 60000ms while waiting for the simulator, make sure the simulator is ready and connected to localhost:2000
And it seems normal to collect the data
Can you give me some suggestions?

About Starting Carla

CARLA version: 0.9.10
Platform/OS: Ubuntu 22.04
CUDA ：12.2
When I run these two commands, I encounter an error, and it may have caused me to be unable to start the Carla Simulator. Calling for help.

bash /home/tech/carla/CarlaUE4.sh
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen

./run/train_rl.sh 
CarlaUE4-Linux: 未找到进程
[2023-09-11 11:41:53,139][utils.server_utils][INFO] - Kill Carla Servers!
CarlaUE4-Linux: 未找到进程
[2023-09-11 11:41:54,156][utils.server_utils][INFO] - Kill Carla Servers!
[2023-09-11 11:41:54,156][utils.server_utils][INFO] - SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=1 bash /home/njtech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2000
[2023-09-11 11:41:54,161][utils.server_utils][INFO] - SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=1 bash /home/njtech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2005
[2023-09-11 11:41:54,165][utils.server_utils][INFO] - SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=1 bash /home/njtech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2010
[2023-09-11 11:41:54,168][utils.server_utils][INFO] - SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=1 bash /home/njtech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2015
[2023-09-11 11:41:54,171][utils.server_utils][INFO] - SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=1 bash /home/njtech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2020
[2023-09-11 11:41:54,174][utils.server_utils][INFO] - SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=1 bash /home/njtech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2025
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
libEGL warning: egl: failed to create dri2 screen
[2023-09-11 11:42:01,081][agents.rl_birdview.rl_birdview_agent][INFO] - Resume checkpoint latest ckpt/ckpt_6082560.pth
[2023-09-11 11:42:05,036][agents.rl_birdview.rl_birdview_agent][INFO] - Loading wandb checkpoint: ckpt/ckpt_6082560.pth
carla_map load_world(): Town02
carla_map load_world(): Town05
carla_map load_world(): Town04
carla_map load_world(): Town06
carla_map load_world(): Town01
carla_map load_world(): Town03
Traceback (most recent call last):
  File "train_rl.py", line 64, in main
    env = SubprocVecEnv([lambda config=config: env_maker(config) for config in server_manager.env_configs])
  File "/home/njtech/anaconda3/envs/carla/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 98, in __init__
    observation_space, action_space = self.remotes[0].recv()
  File "/home/njtech/anaconda3/envs/carla/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/njtech/anaconda3/envs/carla/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/njtech/anaconda3/envs/carla/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[2023-09-11 11:43:07,615][wandb.sdk.internal.internal][INFO] - Internal process exited
 PYTHON_RETURN=0!!! Start Over!!!
CarlaUE4-Linux: 未找到进程
Bash script done.

ValueError: Could not find project

Hello, I'm trying to run run/benchmark.sh and I need to modify agent.cilrs.rl_run_path and agent.cilrs.rl_ckpt_step such that I could load my ckpt. Sorry for a rookie in wandb, as shown in the official docs, agent.cilrs.rl_run_path should be in 'entity/project' format, while the entity is '' and project is 'il_leaderboard_roach' if I just follow the lb running in run/train_il.sh. But it keeps telling me that project is not found, any helps? Thanks!!

how to change this line "from gym.wrappers.monitoring.video_recorder import ImageEncoder"

CARLA version :0.9.14 in ubuntu.
Hi all, when I run the data_collect.py file, I am getting this error -
"from gym.wrappers.monitoring.video_recorder import ImageEncoder
ImportError: cannot import name 'ImageEncoder' from 'gym.wrappers.monitoring.video_recorder"

When I search for error I got to know that ImageEncoder is deprecated. So how to change code for this line?

Thanks in advance.

Question about history information

Hi,

First of all, End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
is an excellent work and thank you for sharing your work and your code. However, I couldn't
understand some points in the paper. I would appreciate it if you help me.

In my understanding, history information of vehicles and pedestrians are given to bev space while training the roach expert. On the other hand, it looks like it might be conflicted with Markov Models since the concept of Markov model is to give only the current state. What is the motivation of adding history information in state ? What kind of advantage does it brings to the rl model ?

In addition, the imitation model of the paper does not use any history information. Do you think this might cause a problem ? Rl expert can extract additional information by using history information. On the other hand, the imitation model does not receive any of these information in its state. However, we expect the similar behaviours from both models.

How the performance of other reinforcement learning algorithms?

In your project, you use PPO in the coach stage, Have you tried other RL algorithm such as SAC?

About training RL expert

Hello, I'm running run/train_rl.sh and meet this error. I don't know what the problem is.

CarlaUE4-Linux: 未找到进程
[2023-09-05 15:34:29,212][utils.server_utils][INFO] - Kill Carla Servers!
CarlaUE4-Linux: 未找到进程
[2023-09-05 15:34:30,225][utils.server_utils][INFO] - Kill Carla Servers!
[2023-09-05 15:34:30,225][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/tech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2000
[2023-09-05 15:34:30,232][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/tech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2005
[2023-09-05 15:34:30,238][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/tech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2010
[2023-09-05 15:34:30,242][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/tech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2015
[2023-09-05 15:34:30,246][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/tech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2020
[2023-09-05 15:34:30,250][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/tech/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2025
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
Process ForkServerProcess-6:
Traceback (most recent call last):
  File "/home/tech/anaconda3/envs/carla/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/tech/anaconda3/envs/carla/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 13, in _worker
    env = env_fn_wrapper.var()
  File "train_rl.py", line 64, in <lambda>
    env = SubprocVecEnv([lambda config=config: env_maker(config) for config in server_manager.env_configs])
  File "train_rl.py", line 57, in env_maker
    seed=cfg.seed, no_rendering=True, **config['env_configs'])
  File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 145, in make
    return registry.make(id, **kwargs)
  File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 90, in make
    env = spec.make(**kwargs)
  File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 60, in make
    env = cls(**_kwargs)
  File "/home/tech/carla-roach/carla_gym/envs/suites/endless_env.py", line 9, in __init__
    obs_configs, reward_configs, terminal_configs, all_tasks)
  File "/home/tech/carla-roach/carla_gym/carla_multi_agent_env.py", line 28, in __init__
    self._init_client(carla_map, host, port, seed=seed, no_rendering=no_rendering)
  File "/home/tech/carla-roach/carla_gym/carla_multi_agent_env.py", line 151, in _init_client
    self._world = client.load_world(carla_map)
RuntimeError: map not found
/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
wandb: Currently logged in as: greatest-of-all-time (use `wandb login --relogin` to force relogin)
/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
wandb: Tracking run with wandb version 0.10.12
wandb: Syncing run roach
wandb: ⭐️ View project at https://wandb.ai/greatest-of-all-time/train_rl_experts
wandb: 🚀 View run at https://wandb.ai/greatest-of-all-time/train_rl_experts/runs/22x9vtjs
wandb: Run data is saved locally in /home/tech/carla-roach/outputs/2023-09-05/15-34-28/wandb/run-20230905_153439-22x9vtjs
wandb: Run `wandb offline` to turn off syncing.

wandb: WARNING Symlinked 3 files into the W&B run directory, call wandb.save again to sync new files.
trainable parameters: 1.53M
Traceback (most recent call last):
  File "train_rl.py", line 74, in main
    agent.learn(env, total_timesteps=int(cfg.total_timesteps), callback=callback, seed=cfg.seed)
  File "/home/tech/carla-roach/agents/rl_birdview/rl_birdview_agent.py", line 107, in learn
    model.learn(total_timesteps, callback=callback, seed=seed)
  File "/home/tech/carla-roach/agents/rl_birdview/models/ppo.py", line 216, in learn
    self.env.seed(seed)
  File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 114, in seed
    remote.send(("seed", seed + idx))
  File "/home/tech/anaconda3/envs/carla/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/tech/anaconda3/envs/carla/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/tech/anaconda3/envs/carla/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

wandb: Waiting for W&B process to finish, PID 943313
wandb: Program failed with code 1.  Press ctrl-c to abort syncing.
Process ForkServerProcess-5:loaded (0.00MB deduped)
Traceback (most recent call last):
  File "/home/tech/anaconda3/envs/carla/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/tech/anaconda3/envs/carla/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 13, in _worker
    env = env_fn_wrapper.var()
  File "train_rl.py", line 64, in <lambda>
    env = SubprocVecEnv([lambda config=config: env_maker(config) for config in server_manager.env_configs])
  File "train_rl.py", line 57, in env_maker
    seed=cfg.seed, no_rendering=True, **config['env_configs'])
  File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 145, in make
    return registry.make(id, **kwargs)
  File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 90, in make
    env = spec.make(**kwargs)
  File "/home/tech/anaconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 60, in make
    env = cls(**_kwargs)
  File "/home/tech/carla-roach/carla_gym/envs/suites/endless_env.py", line 9, in __init__
    obs_configs, reward_configs, terminal_configs, all_tasks)
  File "/home/tech/carla-roach/carla_gym/carla_multi_agent_env.py", line 28, in __init__
    self._init_client(carla_map, host, port, seed=seed, no_rendering=no_rendering)
  File "/home/tech/carla-roach/carla_gym/carla_multi_agent_env.py", line 152, in _init_client
    self._tm = client.get_trafficmanager(port+6000)
RuntimeError: trying to create rpc server for traffic manager; but the system failed to create because of bind error.
wandb:                                                                                
wandb: Find user logs for this run at: /home/tech/carla-roach/outputs/2023-09-05/15-34-28/wandb/run-20230905_153439-22x9vtjs/logs/debug.log
wandb: Find internal logs for this run at: /home/tech/carla-roach/outputs/2023-09-05/15-34-28/wandb/run-20230905_153439-22x9vtjs/logs/debug-internal.log
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 4 other file(s)
wandb: 
wandb: Synced roach: https://wandb.ai/greatest-of-all-time/train_rl_experts/runs/22x9vtjs
 PYTHON_RETURN=0!!! Start Over!!!
Bash script done.

Adding Leaderboard Scenarios

Hi, I wander have you tried to add the scenarios in Carla's scenario manager to your enviroment? I plan to do so but I am not sure whether py_trees used in the scenario manager supports multi-processing trainning (SubprocVecEnv). Thank you in advance!

Would like some tips and advice to train big Reinforcement Learning agents like these

Thank you for your wonderful work! I am very much impressed with the consistency at which the RL training progresses and that must have required some phenomenal hyperparameter tuning and debugging skills. I would like to learn to tune hyperparamters for big and dynamic environments like these. I would appreciate it a lot if you could give me some suggestions and tips for hyperparameter tuning and debugging RL training process. Thank you once again! Cheers!

Performance of Experts

I'm trying to benchmark the RL experts 'iccv21-roach/trained-models/1929isj0: Roach' on w&b, but I couldn't achieve the results in the paper. The test suites is nocrash_dense, and the success rate is less than 0.5.

Does the trained model on w&b correspond to the test results in the paper？How many steps does it need to train to get the results in the paper？

How to benchmark trained model in Town 7 and 10 ?

Hello, thank you for the wonderful work. I want to benchmark the model on unseen towns 7 and 10. I generated town07.h5 maps using carla_gym/utils/birdview_map.py code. However, to use the benchmarking script, I need route.xml file for town 7 and 10. Could you please tell me how to go about this?

Segmentation fault (core dumped) while training RL expert

Hi,

First of all thank you very much for sharing your code and your work in the paper is very interesting. But I encounter a problem while training RL expert. I would appreciate if you help me.

I installed carla 0.9.10 like you said in Installation.md. I can run benchmark.py and observe the car's behavior in video log. However, when I run train_rl.py, it shows segmentation fault. Moreover, I also noticed that the problem occurs when self._world tries to get vehicle_bbox_list and walker_bbox_list in chauffeurnet.py.

Have you encountered similar problem while you train your RL expert ?

Thank you,

How to retrain a new RL model in Town10

Dear zhejz,

I would like to use train_rl.sh to train a new model from scratch in Town10HD. I have changed the part of Town02 in endless_all.yaml into a Town10HD version, as the pattern shown below.

env_id: Endless-v0
env_configs:
carla_map: Town10HD
num_zombie_vehicles: [0, 100]
num_zombie_walkers: [0, 200]
weather_group: dynamic_1.0
gpu: [0]

Besides, I have used the file carla_gym/utils/birdview_map.py to generate a new BEV hdf5 map of Town10, which is Town10HD.h5 in /maps. However, it report some errors as shown below

Process ForkServerProcess-1:
Traceback (most recent call last):
File "/home/hcis-s09/miniconda3/envs/carla/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/hcis-s09/miniconda3/envs/carla/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/hcis-s09/miniconda3/envs/carla/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 13, in _worker
env = env_fn_wrapper.var()
File "train_rl.py", line 64, in
env = SubprocVecEnv([lambda config=config: env_maker(config) for config in server_manager.env_configs])
File "train_rl.py", line 57, in env_maker
seed=cfg.seed, no_rendering=True, **config['env_configs'])
File "/home/hcis-s09/miniconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 145, in make
return registry.make(id, **kwargs)
File "/home/hcis-s09/miniconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 90, in make
env = spec.make(kwargs)
File "/home/hcis-s09/miniconda3/envs/carla/lib/python3.7/site-packages/gym/envs/registration.py", line 60, in make
env = cls(_kwargs)
File "/home/hcis-s09/Downloads/carla-roach/carla_gym/envs/suites/endless_env.py", line 9, in init
obs_configs, reward_configs, terminal_configs, all_tasks)
File "/home/hcis-s09/Downloads/carla-roach/carla_gym/carla_multi_agent_env.py", line 28, in init
self._init_client(carla_map, host, port, seed=seed, no_rendering=no_rendering)
File "/home/hcis-s09/Downloads/carla-roach/carla_gym/carla_multi_agent_env.py", line 152, in _init_client
self._world = client.load_world(carla_map)
RuntimeError: time-out of 60000ms while waiting for the simulator, make sure the simulator is ready and connected to localhost:2000

I guess that just changing the town name from Town02 to Town10HD in endless_all.yaml is not sufficient enough to change the target town of RL training.
Could you please give me some hints about how to retrain a new RL model in Town10?
Thank you all for all of your beautiful works!

Sincerely,
sonicokuo

CUDA error: out of memory

I have collected NoCrash-dense data successfully:
https://github.com/neilsambhu/carla-roach/blob/NeilBranch0/run/data_collect_bc_NeilBranch0.sh
https://github.com/neilsambhu/carla-roach/blob/NeilBranch0/data_collect_NeilBranch0.py

When I run my version of train_rl.py ( https://github.com/neilsambhu/carla-roach/blob/NeilBranch0/train_rl_NeilBranch0.py ), I get the following error:
Traceback (most recent call last):
File "train_rl_NeilBranch0.py", line 87, in main
agent = AgentClass('config_agent.yaml')
File "/home/nsambhu/github/carla-roach/agents/rl_birdview/rl_birdview_agent.py", line 31, in init
self.setup(path_to_conf_file)
File "/home/nsambhu/github/carla-roach/agents/rl_birdview/rl_birdview_agent.py", line 205, in setup
self._policy, self._train_cfg['kwargs'] = self._policy_class.load(self._ckpt)
File "/home/nsambhu/github/carla-roach/agents/rl_birdview/models/ppo_policy.py", line 226, in load
saved_variables = th.load(path, map_location=device)
File "/home/nsambhu/anaconda3/envs/carla/lib/python3.7/site-packages/torch/serialization.py", line 529, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/nsambhu/anaconda3/envs/carla/lib/python3.7/site-packages/torch/serialization.py", line 702, in _legacy_load
result = unpickler.load()
File "/home/nsambhu/anaconda3/envs/carla/lib/python3.7/site-packages/torch/serialization.py", line 665, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "/home/nsambhu/anaconda3/envs/carla/lib/python3.7/site-packages/torch/serialization.py", line 737, in restore_location
return default_restore_location(storage, map_location)
File "/home/nsambhu/anaconda3/envs/carla/lib/python3.7/site-packages/torch/serialization.py", line 156, in default_restore_location
result = fn(storage, location)
File "/home/nsambhu/anaconda3/envs/carla/lib/python3.7/site-packages/torch/serialization.py", line 136, in _cuda_deserialize
return storage_type(obj.size())
File "/home/nsambhu/anaconda3/envs/carla/lib/python3.7/site-packages/torch/cuda/init.py", line 480, in _lazy_new
return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory

My shell script to call train_rl.py is listed here: https://github.com/neilsambhu/carla-roach/blob/NeilBranch0/run/train_rl_NeilBranch0.sh

I have already reduced the batch size from 256 to 1 and the error persists: https://github.com/neilsambhu/carla-roach/blob/NeilBranch0/config/agent/ppo/training/ppo.yaml

Output from ( https://github.com/neilsambhu/carla-roach/blob/NeilBranch0/train_rl_NeilBranch0.py#L78 ) to show the batch size decreased:
cfg.agent[agent_name] {'entry_point': 'agents.rl_birdview.rl_birdview_agent:RlBirdviewAgent', 'wb_run_path': '', 'wb_ckpt_step': None, 'env_wrapper': {'entry_point': 'agents.rl_birdview.utils.rl_birdview_wrapper:RlBirdviewWrapper', 'kwargs': {'input_states': ['control', 'vel_xy'], 'acc_as_action': True}}, 'policy': {'entry_point': 'agents.rl_birdview.models.ppo_policy:PpoPolicy', 'kwargs': {'policy_head_arch': [256, 256], 'value_head_arch': [256, 256], 'features_extractor_entry_point': 'agents.rl_birdview.models.torch_layers:XtMaCNN', 'features_extractor_kwargs': {'states_neurons': [256, 256]}, 'distribution_entry_point': 'agents.rl_birdview.models.distributions:BetaDistribution', 'distribution_kwargs': {'dist_init': None}}}, 'training': {'entry_point': 'agents.rl_birdview.models.ppo:PPO', 'kwargs': {'learning_rate': 1e-05, 'n_steps_total': 12288,
'batch_size': 1,
'n_epochs': 20, 'gamma': 0.99, 'gae_lambda': 0.9, 'clip_range': 0.2, 'clip_range_vf': None, 'ent_coef': 0.01, 'explore_coef': 0.05, 'vf_coef': 0.5, 'max_grad_norm': 0.5, 'target_kl': 0.01, 'update_adv': False, 'lr_schedule_step': 8}}, 'obs_configs': {'birdview': {'module': 'birdview.chauffeurnet', 'width_in_pixels': 192, 'pixels_ev_to_bottom': 40, 'pixels_per_meter': 5.0, 'history_idx': [-16, -11, -6, -1], 'scale_bbox': True, 'scale_mask_col': 1.0}, 'speed': {'module': 'actor_state.speed'}, 'control': {'module': 'actor_state.control'}, 'velocity': {'module': 'actor_state.velocity'}}}

Train RL Experts

Hello, I'm executing run / train_ rl. sh, I found that several Carla windows were opened at the same time, and then crashed. I learned in your paper that RL experts are trained at the same time in Town 1-6, so I think my GPU may not meet the needs of the code. Please tell me the number and model of graphics cards you use when training RL experts.

About the training time of RL agent

Hi, thank you for sharing your excellent work! I want to train your RL agent with a 6 1080Ti (6 Carla server on 6 different GPU) 56 cores machine. But it seems that it takes around 5-6 min for 12288 steps so the total 10M steps will take around 50 days which is not acceptable. Do you know what the possible reason is or how to improve the speed? Thank you!

Is this normal?

I want to know how you can run six carla environments at the same time on 2080ti, because I need about 21000M of video memory to run six carla environments, as shown below. Is this normal? What should I do to reduce the memory usage to less than 12G of 2080ti?

Could not find run (wandb run error)

Hi,

Has the run path been updated?

I am trying to collect your dataset from the run path and it says:
Could not find run <Run zhejun/il_nocrash_ap/2pilkrol (not found)>

Or has it been unhosted?

Regarding chauffeurnet implementation

Dear authors, first thanks for sharing your paper and code! It is very excellent.

I am a RL newbie. In the carla/gym/core/obs_manager/birdview/chauffeurnet.py, in the function get_observation, I noticed that when rastering vehicles' bboxes, the code only rasterize the ego vehicle's current bbox and specifically excluding the history of the ego vehicle.

I am wondering whether you have tried rasterize the ego vehicle's history bbox. Is this setting a common practice (from other codebase?)? I am curious about the reason. (Causal confusion? Markov property? RL instability?)

Thanks for your time.

module 'tensorflow_estimator.python.estimator.api._v1.estimator' has no attribute 'distributions

Hello
Hi. When I run the file train_rl.py there is an error that module 'tensorflow_estimator.python.estimator.api._v1.estimator' has no attribute 'distributions. I use tensorflow ==2.6.0. How to fix this error

How to visualize the RL training process?

Hi, thanks for this awesome project!
I wonder if there is a way to visualize the RL training process, like how the ego car is driven in carla?
I want to do this to see the improvement of the RL model.
Appriciate any advice.

How to run CARLA on my 2nd GPU?

Hi, guys!

I have 2 GPUs in my conputer and I want to train the RL model on the 2nd one. I replaced the code gpu: [0]in endless_all.yaml with gpu: [1], the final shell comand of CARLA startup is CUDA_VISIBLE_DEVICES=1 bash /media/carla/CarlaUE4.sh -fps=10 -quality-level=Low -carla-rpc-port=2000. However, I find that all CARLA servers are still running on GPU0, as shown in the picture .

I am really confused and don't know how to solve this problem. Thanks for any help!

Training Time

Hi,

Currently, I am trying to train IL models from scratch.
I am using a Tesla V100 (32GB) and 8 CPU cores to train models. Lastly, the batch size is 192.
My dataset consists of 183000 data points collected with your dataset collection code.

At present, it takes approximately 3-4 hours to train one epoch.
Was this the case for you? Can you let me know the GPU specs, training properties, and, finally, the epoch durations?
I would really appreciate it if you could help with this issue.

Best.

Getting RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED while running `train_rl.sh`

After I start running train_rl.sh, when the program executes this line self.policy.forward(self._last_obs) in ppo.py, the program gets stuck for a while and gives out the above error. @zhejz Do you have any idea?

Questions about exploration loss design

Thanks for this wonderful work. As the experiments show, the exploration loss greatly improves PPO performance. The intuition behind it is what? And how to define different beta distributions after different events? For example, when running a red light / colliding with other agents, why introduce the distribution beta(1, 2.5) as p_z? Finally, are there any mathematical modeling works that can be referred to get a better understanding?

Test roach through official leaderboard and self-defined xml route and json scenarios

Relate issue: #8

From here:
#8 (comment)

I spent some time trying to integrate the scenario_runner into the multi-processing RL training but it didn't work out smoothly.

And What I want to try is use your roach expert to collect data from .XML route and .json scenarios based on official code from carla on leaderboard and scenarios to see the leaderboard result on your roach. Since as said here:

more naturally than hand-crafted CARLA experts

and based on the official leaderboard and scenarios I can compare the result from the same route and scenarios but not random as this repo did.

But when I read code based on readme collect: https://github.com/zhejz/carla-roach#quick-start-collect-an-expert-dataset-using-roach
It seems that your agent file didn't suitable to run it on leaderboard, like the file:

class RlBirdviewAgent():

it didn't inherit from autonomous_agent.AutonomousAgent which leaderboard requires, and also get_entry_point etcs.
I have no idea how to start to try it on leaderboard follow your readme roach's codes and Carla official website: https://leaderboard.carla.org/get_started/#3-creating-your-own-autonomous-agent

Did anyone try this on the offline official leaderboard and self-defined XML and JSON?

Sensor Data

@zhejz How do I view the sensor data collected by the self-driving car models? On the README.md, I see training instructions for the agents. I would like to be able to view and modify the self-driving car models (e.g. model layers); I want to see the input to these networks.

EOF error in training process

Dear zhejz,
When I run the train_rl.sh through the training process I have EOF Error, I have 40 gigs RAM and running it on a 3090 gpu, I've this error in different epochs frequently after n_epoch: 0,n_epoch: 8,n_epoch: 25.
Here is the full error:
Error executing job with overrides: ['agent.ppo.wb_run_path=null', 'wb_project=train_rl_experts', 'wb_name=roach', 'agen
t/ppo/policy=xtma_beta', 'agent.ppo.training.kwargs.explore_coef=0.05', 'carla_sh_path=/media/carla/AVRL/carla/CarlaUE4.
sh']
Traceback (most recent call last):
File "train_rl.py", line 75, in main
agent.learn(env, total_timesteps=int(cfg.total_timesteps), callback=callback, seed=cfg.seed)
File "/media/carla/AVRL/roach/DML_AVRL/agents/rl_birdview/rl_birdview_agent.py", line 109, in learn
model.learn(total_timesteps, callback=callback, seed=seed)
File "/media/carla/AVRL/roach/DML_AVRL/agents/rl_birdview/models/ppo.py", line 249, in learn
callback.on_training_end()
File "/media/carla/AVRL/roach/env/carla/lib/python3.8/site-packages/stable_baselines3/common/callbacks.py", line 95, i
n on_training_end
self._on_training_end()
File "/media/carla/AVRL/roach/env/carla/lib/python3.8/site-packages/stable_baselines3/common/callbacks.py", line 179,
in _on_training_end
callback.on_training_end()
File "/media/carla/AVRL/roach/env/carla/lib/python3.8/site-packages/stable_baselines3/common/callbacks.py", line 95, i
n on_training_end
self._on_training_end()
File "/media/carla/AVRL/roach/DML_AVRL/agents/rl_birdview/utils/wandb_callback.py", line 67, in _on_training_end
avg_ep_stat, ep_events = self.evaluate_policy(self.vec_env, self.model.policy, eval_video_path)
File "/media/carla/AVRL/roach/DML_AVRL/agents/rl_birdview/utils/wandb_callback.py", line 158, in evaluate_policy
obs, reward, done, info = env.step(actions)
File "/media/carla/AVRL/roach/env/carla/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py",
line 161, in step
return self.step_wait()
File "/media/carla/AVRL/roach/env/carla/lib/python3.8/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.p
y", line 107, in step_wait
results = [remote.recv() for remote in self.remotes]
File "/media/carla/AVRL/roach/env/carla/lib/python3.8/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.p
y", line 107, in
results = [remote.recv() for remote in self.remotes]
File "/usr/lib/python3.8/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/usr/lib/python3.8/multiprocessing/connection.py", line 420, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 389, in _recv
raise EOFError
EOFError

Setting the argument of n_episodes

Hi,

Very impressive work!
One simple question, when we set up the argument of n_episodes, if we need to take into account of the number of towns? For example, if I want to collect 1 episode per towns (Town 1, 3, 4, 6). Shall I set up the n_episodes to 1, or 4?

And for each episode, is the route (the start point and the end point) always set to the same?

Cheers,
Yi

Metric

In ego_vehicle_handler.py, is "score_route" "success rate"? I know "score_composed" is the "driving score".

Question: TaskVehicle class

Hi,

I have a question about the class Task Vehicle in carla_gym\core\task_actor\common\task_vehicle.py

The init takes 4 arguments vehicle, target_transforms, spawn_transforms, endless.
I have a question about target_transforms and spawn_transforms.
What is the data structure that the waypoints are expected to be in?
Are the target_transforms supposed to be the route waypoints from the route.xml files?
Are the spawn_transforms meant to be the initial vehicle transform?
What coordinate system are they expected to be using, the same one as the .xml files?

Adding New Fields to the Dataset

Hi,

As far as I understand, it is enough to change the observation configs to add new fields to the dataset. It is doable, for example, when I change the observation manager of an individual sensor (i.e., ObsManager class of GNSS -- adding the sensor noise value to the observation dictionary for the sake of example --). However, when I run the default data collection code (i.e., data_collect_bc.sh), it does not add the navigation.waypoint_plan and birdview.chauffeurnet observation dictionaries to the dataset, which exists in the agent/cilrs/obs_configs=central_rgb_wide. Probably, I am missing the point here; that is, there are also other things that need to be set. At least, I would like to reach to navigation.waypoint_plan observation dictionary during the data acquisition. So, how can I add the observation dictionaries of other ObsManager modules to the dataset? I really appreciate it if you can help.

Best.

Error with training RL expert

Hi, I'm running run/train_rl.sh and keep receiving this error

[2022-05-15 08:09:58,133][utils.server_utils][INFO] - Kill Carla Servers!
CarlaUE4-Linux: no process found
[2022-05-15 08:09:59,167][utils.server_utils][INFO] - Kill Carla Servers!
[2022-05-15 08:09:59,168][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/thoaican/carla/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2000
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
Traceback (most recent call last):
  File "train_rl.py", line 40, in main
    agent = AgentClass('config_agent.yaml')
  File "/home/thoaican/carla-roach/agents/rl_birdview/rl_birdview_agent.py", line 15, in __init__
    self.setup(path_to_conf_file)
  File "/home/thoaican/carla-roach/agents/rl_birdview/rl_birdview_agent.py", line 27, in setup
    f = max(all_ckpts, key=lambda x: int(x.name.split('_')[1].split('.')[0]))
ValueError: max() arg is an empty sequence

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[2022-05-15 08:10:05,478][wandb.sdk.internal.internal][INFO] - Internal process exited

I've browsed the issues page and found the same error from other person here, and the solution is delete the outputs/checkpoint.txt. But to me it was no help

How to display BEV during reinforcement learning training?

Thank you for your excellent work. When I run train_rl.py, I set no-rendering to false, but I can only see the running status of the Carla client and cannot see the BEV image. Is there any method to observe the BEV image during reinforcement learning training?

data_collect_bc.sh

" PYTHON_RETURN=1!!! Start Over!!! "
hello，how can I solve this problem？

Run benchmark with carla 0.9.13

Hi,

I'm trying to run the carla-roach benchmark with carla 0.9.13. I am currently running into some segmentation faults when benchmarking the carla roaming agent. When using a max_step of 1 in the run_single function (benchmark.py), I only get a segmentation fault in benchmarks 'WetNoon_03' and 'SoftRainSunset_03'. However, when increasing the max_step to 5, it also occurs at many of the other benchmarks. The segmentation faults always occur while creating the zombiewalkers in the following loop (zombie_walker_handler.py):

        for w_id, c_id in zip(walker_ids, controller_ids):
            self.zombie_walkers[w_id] = ZombieWalker(w_id, c_id, self._world)
        return self.zombie_walkers

It completes some iterations, but after a while it gives the segmentation fault (Usually around 200/250). This code is run during the _zw_handler reset in carla_multi_agent_env.py. Does anyone know what could be the cause of these segmentation faults?

NOTE: I am using vehicle.audi.a2 instead of vehicle.lincoln.mkz2017, because the lincoln doesn't seem to be recognized in carla 0.9.13.

train_rl

hello，When I execute run/train_rl.sh, I encountered the following problems:
Traceback (most recent call last): File "train_rl.py", line 62, in main env = DummyVecEnv([lambda config=config: env_maker(config) for config in server_manager.env_configs]) File "/home/whm/anaconda3/envs/roach/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 23, in __init__ self.envs = [fn() for fn in env_fns] File "/home/whm/anaconda3/envs/roach/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 23, in <listcomp> self.envs = [fn() for fn in env_fns] File "train_rl.py", line 62, in <lambda> env = DummyVecEnv([lambda config=config: env_maker(config) for config in server_manager.env_configs]) File "train_rl.py", line 58, in env_maker env = EnvWrapper(env, **wrapper_kargs) File "/home/whm/roach/agents/rl_birdview/utils/rl_birdview_wrapper.py", line 25, in __init__ assert len(env._obs_configs) == 1 File "/home/whm/anaconda3/envs/roach/lib/python3.8/site-packages/gym/core.py", line 228, in __getattr__ raise AttributeError(f"attempted to get missing private attribute '{name}'") AttributeError: attempted to get missing private attribute '_obs_configs

what's the issue?

As you said, I registered and logged in to wandb, but the following error occurred at runtime:

CarlaUE4-Linux: no process found
[2021-12-23 11:21:27,462][utils.server_utils][INFO] - Kill Carla Servers!
CarlaUE4-Linux: no process found
[2021-12-23 11:21:28,486][utils.server_utils][INFO] - Kill Carla Servers!
[2021-12-23 11:21:28,486][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=0 bash /home/jjuv/carla/CARLA_0.9.10.1/CarlaUE4.sh -fps=10 -quality-level=Epic -carla-rpc-port=2000
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
[2021-12-23 11:21:33,555][main][INFO] - making port 2000
/home/jjuv/anaconda3/envs/roach/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
wandb: Currently logged in as: yqlol (use wandb login --relogin to force relogin)
wandb: Tracking run with wandb version 0.10.12
wandb: Syncing run roach
wandb: ⭐️ View project at https://wandb.ai/yqlol/train_rl_experts
wandb: 🚀 View run at https://wandb.ai/yqlol/train_rl_experts/runs/37rjz3na
wandb: Run data is saved locally in /home/jjuv/carla-roach-main/outputs/2021-12-23/11-21-26/wandb/run-20211223_112136-37rjz3na
wandb: Run wandb offline to turn off syncing.

wandb: WARNING Symlinked 3 files into the W&B run directory, call wandb.save again to sync new files.
trainable parameters: 1.53M

Stuck in 'trainable parameters: 1.53M', what's going on？

Originally posted by @Yiquan-lol in #9 (comment)

Can't get a good expert after training the RL model 10M steps

Thanks for sharing your excellent work!

I've trained the RL model for two times, it learns well at first, however after 7M steps the agent tends to get stuck at the traffic lights and won't start again when the light turned to green. It seems that the agent acts very conservative at a low speed or moves forward a little bit after a long time.

The checkpoint I got after 10M steps even can't complete a single route due to the problem. I didn't modify the reward code and tried to use the same training parameters used in the paper with batch_size=256, n_steps_total=12288 and 6 towns at the same time. Below is the problem screenshot during training(I used the -quality-level=Low option when starting CARLA to monitor the training process), the green car is the agent.

I find that the total loss begins to grow after 7M steps.

Thanks for any help or suggestion!

Collecting data from all vehicles simultaneously

I am interested in multi-agent research. I am wondering how to go about configuring all vehicles with a tri-camera setup and collect data from all vehicles simultaneously? Rather than having one ego vehicle I would like to make every vehicle an "ego" vehicle from which I can collect data.

zhejz / carla-roach Goto Github PK

carla-roach's People

Contributors

Stargazers

Watchers

Forkers

carla-roach's Issues

Recommend Projects

Recommend Topics

Recommend Org