pi-star-lab / resco Goto Github PK

Reinforcement Learning Benchmarks for Traffic Signal Control (RESCO)

Python 100.00%

resco's Introduction

RESCO

Source code implementing the Reinforcement Learning Benchmarks for Traffic Signal Control (RESCO).

The benchmark uses the Simulation for Urban Mobility (SUMO), which must be installed separately. SUMO_HOME environment variable must be set, this is done automatically on the install of Sumo on Windows and Ubuntu. SUMO 1.9.0 and 1.9.1 have been tested.

On Ubuntu the speed of the simulation may be greatly increased by using libsumo. Set the environment variable LIBSUMO_AS_TRACI to any value and give main.py --libsumo True. Note that this can not be used with multi-threading.

Python 3.7.4 is required for tensorflow -used by the MA2C and FMA2C implementation.

agent_config defines parameters for the available agents. An agent is specified by the --agent argument to main.

map_config specifies the SUMO scenario parameters, road network, and demand files.

mdp_config supplies constants to state and reward functions (e.g. for normalization)

signal_config defines each signal of each SUMO scenario. Valid green phases are determined from the road network TLSLogic, yellow signals are inserted as required. phase_pairs gives the directional index of phase combinations following the order defined in TLSLogic. valid_acts provides a translation table for shared controllers with varying action definitions across multiple signals. For each signal inbound lanes are given by the direction of traffic. Finally, each signal defines which signals are downstream for the purposes of coordination (neighbors, pressure, etc.)

An example command to train IDQN on the Ingolstadt region scenario is:

python main.py --agent IDQN --map ingolstadt21

SUMO scenarios are supplied in the environments directory. All scenarios are distributed under their original licenses. Information on the Cologne scenario can be found on (https://sumo.dlr.de/docs/Data/Scenarios/TAPASCologne.html). Information on Ingolstadt scenarios can be found at (https://github.com/silaslobo/InTAS). For more scenarios please see (https://sumo.dlr.de/docs/Data/Scenarios.html)

Below the benchmark performance for baselines (Fixed Time, Greedy, Max Pressure) and learning algorithms (IDQN, IPPO, MPLight, Extended MPLight (MPLight*), FMA2C) are given.

Citing RESCO

This project was used in Reinforcement Learning Benchmarks for Traffic Signal Control. If you use RESCO in your work, please include a citation:

@inproceedings{ault2021reinforcement,
  title={Reinforcement Learning Benchmarks for Traffic Signal Control},
  author={James Ault and Guni Sharon},
  booktitle={Proceedings of the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021) Datasets and Benchmarks Track},
  month={December},
  year={2021}
}

EPyMARL

RESCO has been updated to be compatible with the EPyMARL benchmark for cooperative RL algorithms. Some modifications within EPyMARL are required currently, available here. Clone the modified repository and execute EPyMARL algorithms against the RESCO benchmark using EPyMARL's main.py:

main.py --config=qmix --env-config=gymma with env_args.time_limit=1 env_args.key=resco_benchmark:cologne3-qmix-v1

resco's People

Contributors

Stargazers

Watchers

resco's Issues

Signal Configs

Hello everyone, I've noticed that creating signal configurations has been a bit tricky for me. I've tried to figure it out, even by looking at different closed issue and documentation in repo, but I'm still struggling.

Sometimes, I can't quite grasp the concept of phase_pairs. For instance, when I look at ingolstadt1.xml and the pair N-N, it seems like I can't find that specific pair in there. I'm also unsure why we use pairs and how we decide how many of them to use. To be honest, I'm pretty confused about these signal configurations.

I'd really appreciate it if someone could explain how this all works. If there are any images or visual aids available, that would be great, but even a simple explanation would help. Thanks a lot for your help and patience.

UPDATE:

I believe I've grasped the meaning behind 'N-N' now. I followed the example of cologne1 and examined the lane_sets. Now, let's take the example of S-W, Essentially, it means that the traffic is coming from the north but moving in the directions of south and west. Similarly, N-N implies that traffic is coming from the south and heading north. Have I understood this correctly?

If I understand correctly, phase_pairs represent the number of traffic light phases we have, right? Does this mean that we can only activate two lanes per phase? Is that the idea?

"Spikes" during training MPLight and DQN

Hi there, I have observed regular "spikes" when plotting the training curve (wait time) of MPLight and DQN every 10 episodes. I think this pattern is also observable in some scenarios in your paper. Just want to check if you have some clues if it's the nature of the algorithm/implementation or it can be diminished by changing some parameters/workflow. Thanks a lot!

TODO grab info. directly from tllogic python interface

Hi! Thanks for sharing your code. It's very nice, bro. I found a comment in multi_signal.py, said "TODO grab info. directly from tllogic python interface". I figured it out, test on my project, it works! There is code. Hope it will be helpful :)

    for lightID in self.signal_ids:
        for c in self.sumo.trafficlight.getAllProgramLogics(lightID):
            for k in c.getPhases():
                cur_phase = k.state
                if not lightID in valid_phases:
                    valid_phases[lightID] = []
                has_phase = False
                for phase in valid_phases[lightID]:
                    if phase == cur_phase:
                        has_phase = True
                if not has_phase:
                    valid_phases[lightID].append(cur_phase)
    self.step_sim()

Delay calculation

I want to benchmark in a paper against your models.

How do you calculate the delays, which you plot in your Readme? Is this just the sum of all wait times of each lane for the whole simulation and the divided through the number of cars in the end? Where can I find that?

code for processing new traffic network files in signal_config

Hi :)
Thank you for sharing the code, I'm wondering if it's possible to get the part of code that you use to generate the signal_config file, since I have a new network and I'd like to test it using the benchmark. Looking forward to your reply! Thanks!

'STOCHASTICAgent' object has no attribute 'save'

Hi,

first of all - thanks a lot for this code!

Both my python skills and reinforcement skills are yet under development. However, when I run a standard scenario from the "main.py" file with a stochastic agent (and in my case the Ingolstadt 7 scenario), I get the following AttributeError:

'STOCHASTICAgent' object has no attribute 'save'

Indeed, I didnt find such method (neither in agents/stochastic.py, nor in the agents.agent.py).

I know I could add such a method, but any help on how to actually do the "save" is very much appreciated.

Thank you!
Gabriel

Convert to Traci Subscriptions?

Any interest in switching the library to use traci/libsumo subscriptions? From experience it speeds up the SUMO simulation code dramatically. If so, I am happy to help!

Why normalization the state with IDQN by dividing 28?

In the state.py file, why the state value is divided by 28 for nomalization?
for example, in norm_DQN:
lane_obs.append(signal.full_observation[lane]['total_wait'] / 28).

signal_config

Hi, I am trying to use a custom network but I do not understand what the signal_config is doing. I read the previously open issues about it as well. It seems it is a common issue in this repository. Could you provide an explanation? Thanks!

key error when running python main.py --agent IDQN --map ingolstadt21

always get key error like:

File "/resco_benchmark/agents/agent.py", line 35, in observe
if info['eps'] % self.config['save_freq'] == 0:
KeyError: 'save_freq'

File "/resco_benchmark/agents/pfrl_dqn.py", line 43, in observe
if self.config['load']:
KeyError: 'load'

How to fix this kind of problem?

EPyMARL_RESCO erros

when i run ippo, maa2c, ia2c algs, occur this error :

--config=ippo --env-config=gymma with env_args.time_limit=1 env_args.key=resco_benchmark:cologne8-ippo-v1

Traceback (most recent call last):
  File "/home/mostafa/pyenv/epymarl_resco/lib/python3.8/site-packages/sacred/experiment.py", line 312, in run_commandline
    return self.run(
  File "/home/mostafa/pyenv/epymarl_resco/lib/python3.8/site-packages/sacred/experiment.py", line 276, in run
    run()
  File "/home/mostafa/pyenv/epymarl_resco/lib/python3.8/site-packages/sacred/run.py", line 238, in __call__
    self.result = self.main_function(*args)
  File "/home/mostafa/pyenv/epymarl_resco/lib/python3.8/site-packages/sacred/config/captured_function.py", line 42, in captured_function
    result = wrapped(*args, **kwargs)
  File "/home/mostafa/epymarl_resco/src/main.py", line 36, in my_main
    run(_run, config, _log)
  File "/home/mostafa/epymarl_resco/src/run.py", line 55, in run
    run_sequential(args=args, logger=logger)
  File "/home/mostafa/epymarl_resco/src/run.py", line 185, in run_sequential
    episode_batch = runner.run(test_mode=False)
  File "/home/mostafa/epymarl_resco/src/runners/nstep_runner.py", line 61, in run
    self.batch.update(pre_transition_data, ts=self.t)
  File "/home/mostafa/epymarl_resco/src/components/episode_buffer.py", line 105, in update
    target[k][_slices] = v.view_as(target[k][_slices])
RuntimeError: shape '[1, 0, 240]' is invalid for input of size 240

How do I set up testing for the model?

For example, using the IDQN model, in the main. py file, there is only the training process. So how do I set up testing for the model? Specifically, how should I set up to read the saved model and not continue optimizing model parameters during testing. Can you give some suggestions? Thanks

Cumulative rewards and Delay time after training

Thank you for sharing your code. Your code is very helpful but there are some things I don't understand.

After training model "python main.py --agent IDQN --map ingolstadt21" , I found that it generates several files in the logs directory including agent.pt, metric.csv and tripinfor.xml.
But I don't see the cumulative rewards per episode because I want to know when the model converges. And I also don't see where the model is saved, how can I test the model after training (trained_model.h5).
Thank you very much

traci.start() label error

First of all, thank you so much for your project codes.
When I run the code with IDQN and ingolstadt1, there is such error

I know it is the parameter of sumo traci.start(), and I do not modify any code. I check the code and do not find any problems. So can you tell me how to solve it?
My configuration matches the requirements in setup.py, but my sumo version is 1.12.0. Is it related to the sumo version?

Indiscriminate Yellow Steps

I am walking through the implementation and it seems that the MultiSignal.step enforces a peculiarity where if the actor resolution is < than the yellow time of signals, then all actors are unable to make observations & act during the period when one actor is in transition mode.

Could we move the yellow state internal to the traffic signal itself and simply prevent state changes until after the transition to green is made? Would also be nice to enforce some minimum green time.

Another option is to let SUMO handle the transition timing by using detector overrides, which is a functionality that they added! This would be a big change to the project structure, but ultimately more realistic in traffic signal internal behavior

RESCO/resco_benchmark/multi_signal.py

Lines 181 to 182 in dc773ab

 for step in range(self.yellow_length): 

 self.step_sim()

The reward of FMA2C

In FMA2C, the reward for a worker consists of two parts. The first part is the reward that includes delay and the number of queued vehicles, and the second part is the cos function between the actions and states of the manager. Missing the second part of the reward in code reward. py

Question about the calculation of 'arrivals' and 'departures' in full_observation

Hi, thank you for sharing the code. I have a question about the calculation of 'arrivals' and 'departures' in traffic_signal.Signal.observe. I think the 'arrivals' means the new vehicles in the lane, so why the 'arrivals' is calculated by self.last_step_vehicles.difference(all_vehicles) rather than all_vehicles.difference(self.last_step_vehicles)?
Thanks!

RESCO/resco_benchmark/traffic_signal.py

Lines 226 to 229 in dc773ab

 else: 

 full_observation['arrivals'] = self.last_step_vehicles.difference(all_vehicles) 

 departs = all_vehicles.difference(self.last_step_vehicles) 

 full_observation['departures'] = departs

fma2c_config['supervisors'] not implemented !

Hi
you use fma2c_config['supervisors'] in states definitions but it's not in mdp_config

KeyError FMA2C 'top_mgr'

Hi, Thanks for the code you shared

I try to run FMA2C agent with the available environment and its gives me an error about the 'top_mgr' like below

Traceback (most recent call last):
  File "C:/Users/hp/Downloads/RESCO-V1/RESCO-main/resco_benchmark/main.py", line 112, in <module>
    main()
  File "C:/Users/hp/Downloads/RESCO-V1/RESCO-main/resco_benchmark/main.py", line 36, in main
    run_trial(args, args.tr)
  File "C:/Users/hp/Downloads/RESCO-V1/RESCO-main/resco_benchmark/main.py", line 99, in run_trial
    agent = alg(agt_config, obs_act, args.map, trial)
  File "C:\Users\hp\Downloads\RESCO-V1\RESCO-main\resco_benchmark\agents\fma2c.py", line 45, in __init__
    self.managers[manager] = MA2CAgent(config, obs_act[manager][0], mgr_act_size, mgr_fingerprint_size, 0,
KeyError: 'top_mgr'

I am trying to fix it by changing obs_act in fma2c line 48 by

self.managers[manager] = MA2CAgent(config, obs_act[worker_ids[0]][0], mgr_act_size, mgr_fingerprint_size, 0,
                                                   manager + str(thread_number), self.sess)

but, its gives me another error like this

Traceback (most recent call last):
  File "C:/Users/hp/Downloads/RESCO-V1/RESCO-main/resco_benchmark/main.py", line 112, in <module>
    main()
  File "C:/Users/hp/Downloads/RESCO-V1/RESCO-main/resco_benchmark/main.py", line 36, in main
    run_trial(args, args.tr)
  File "C:/Users/hp/Downloads/RESCO-V1/RESCO-main/resco_benchmark/main.py", line 105, in run_trial
    act = agent.act(obs)
  File "C:\Users\hp\Downloads\RESCO-V1\RESCO-main\resco_benchmark\agents\fma2c.py", line 115, in act
    acts[agent_id] = self.managers[agent_id].act(combine)
  File "C:\Users\hp\Downloads\RESCO-V1\RESCO-main\resco_benchmark\agents\ma2c.py", line 127, in act
{'coef': 0.4, 'coop_gamma': 0.9, 'clip_wave': 4.0, 'clip_wait': 4.0, 'norm_wave': 5.0, 'norm_wait': 100.0, 'alpha': 0.75, 'management': {'top_mgr': ['360082', '360086'], 'bot_mgr': ['GS_cluster_2415878664_254486231_359566_359576']}, 'management_neighbors': {'top_mgr': ['bot_mgr'], 'bot_mgr': ['top_mgr']}, 'supervisors': {'360082': 'top_mgr', '360086': 'top_mgr', 'GS_cluster_2415878664_254486231_359566_359576': 'bot_mgr'}, 'cologne3': {...}}
    policy, self.value = self.model.forward(observation, False)
  File "C:\Users\hp\Downloads\RESCO-V1\RESCO-main\resco_benchmark\agents\ma2c.py", line 215, in forward
    return self.policy.forward(self.sess, obs, done, out_type)
  File "C:\Users\hp\Downloads\RESCO-V1\RESCO-main\resco_benchmark\agents\ma2c.py", line 377, in forward
    self.states: self.states_fw})
  File "C:\Users\hp\Downloads\RESCO-main\RESCO-main\venv\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
    run_metadata_ptr)
  File "C:\Users\hp\Downloads\RESCO-main\RESCO-main\venv\lib\site-packages\tensorflow_core\python\client\session.py", line 1156, in _run
    (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 19) for Tensor 'Placeholder:0', which has shape '(1, 20)'
Error: tcpip::Socket::recvAndCheck @ recv: peer shutdown
Quitting (on error).

Thank You

Result interpretation

Some metrics_i.csv file is being generated for every $i^{th}$ route while training any model. Now each observation record (every $10^{th}$ second) has three dictionaries containing numerical values against each key (which are the junction IDs).

How do I interpret those results? I want to specifically know what these three dictionaries represent?

How to write signal_configs for a new network I got from osm?

Hello, I was wondering if there are any hints on how to write the signal configs for any new network I want to add, for example: a road network from openStreetMap?

	else:
	full_observation['arrivals'] = self.last_step_vehicles.difference(all_vehicles)
	departs = all_vehicles.difference(self.last_step_vehicles)
	full_observation['departures'] = departs