cts198859 / deeprl_network Goto Github PK

View Code? Open in Web Editor NEW

358.0 358.0 90.0 2.03 MB

multi-agent deep reinforcement learning for networked system control.

Python 66.53% Jupyter Notebook 33.34% Shell 0.13%

deeprl_network's People

Contributors

Stargazers

Watchers

Forkers

murtazarang dogordog spuronlee zhengbwcetc zhangmwg minalspatil zeroun yanyuema manueldonsante quanticnova majadoon jeremydouglas91 aladinoster enoorani zhiyongc shitianyu-hue anirudhajitani nguyentrihai93 lamperougeyxy dongchen06 baojialiustc marl-cee-uw sandguine wanghuimu hlhsu bututoubaobei arm-comal esiseraj lorinchen tanxiangtj rainwangphy reinholdm tianqi-777 ynuwm limount genyoung qiu1234567 testmonkey02 wuao652 supershun1978 michaelperl hilbert521 jackory ecustboy jordiluque miracle1207 yandazhu0925 dong4325 josephthinhtran fb1n15 yangfengwxy aaronanima kingsvalley blankslide zzfoutofspace ericschuma yyds-xtt ancerhaides skydvn ljp-luo hell-to-heaven qiaowenchuan yaozhang-nwpu xyua0528 yining20 zhangtjtongxue yukimura0119 moumuyun muyun1996 mmatthews06 hejichao2020 yuanzhi0515 lstar939699 chenbindeng x-yang1021 milkigit pinkmoon-io mnaveed2021 projecttopstep avg-indian-coder shenjiede vishwajithsandaru toksjazz babylong123 dtbinh wyq199321 qst75693 mak2508 ahmad-573

deeprl_network's Issues

How is the traffic distribution graph drawn？

Sorry to bother you, Can you tell me how to draw this graph, I haven't found the source code in the repository.Thank you very much.

Could run this framework on custom environment? How?

size mismatch when running environent

When I run the given models in ATSC Monaco(didn't change any code), error appears as follow:
'RuntimeError: size mismatch, m1: [1 x 56], m2: [48 x 64] at /opt/conda/conda-bld/pytorch_1579022071601/work/aten/src/TH/generic/THTensorMath.cpp:136'

the complete error is:
'
Traceback (most recent call last):
File "main.py", line 161, in
train(args)
File "main.py", line 104, in train
trainer.run()
File "/home/ziqi/deeprl_network/utils.py", line 218, in run
ob, done, R = self.explore(ob, done)
File "/home/ziqi/deeprl_network/utils.py", line 151, in explore
policy, action = self._get_policy(ob, done)
File "/home/ziqi/deeprl_network/utils.py", line 115, in _get_policy
policy = self.model.forward(ob, done, self.ps)
File "/home/ziqi/deeprl_network/agents/models.py", line 271, in forward
actions, out_type)
File "/home/ziqi/deeprl_network/agents/policies.py", line 221, in forward
h, new_states = self._run_comm_layers(ob, done, fp, self.states_fw)
File "/home/ziqi/deeprl_network/agents/policies.py", line 341, in _run_comm_layers
s_i = self._get_comm_s(i, n_n, x, h, p)
File "/home/ziqi/deeprl_network/agents/policies.py", line 486, in _get_comm_s
return F.relu(self.fc_x_layers[i](torch.cat([x[i].unsqueeze(0), nx_i], dim=1))) +
File "/root/anaconda3/envs/py35pt/lib/python3.5/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/py35pt/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/root/anaconda3/envs/py35pt/lib/python3.5/site-packages/torch/nn/functional.py", line 1370, in linear
ret = torch.addmm(bias, input, weight.t())
'

The pytorch implementation cannot use gpu to train

I want to train it on GPU. But I see the code can only train on CPU as default. So I change the parameter: use_gpu=True as follows.

But then, I got this error:

So how can I fix this bug? Thank you very much

Delay time and queue length

Thank you for sharing your code

But i did not see the code to create queue length and delay time as shown in your paper

具体demo运行示例

运行代码，按照步骤来的，但第一句就说什么是不对，总说少参数

continuous action spaces

Thanks for your good contribution, Can all the algorithms be used only in discrete action spaces and not in continuous action spaces? If I want to use these algorithms in continuous action space, how should I modify them?
Thanks!

why are the reward norm different between different model?

For example,
In config_ia2c_catchup.ini, the reward_norm is 800
while in config_ma2c_dial_catchup.ini, the reward_norm is 5000

FingerPrint algorithm

hello,

For the FingerPrint, in original paper Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning, it use iteration number and annulling rate as the fingerprint.

but in your code, i cannot recognize which fingerprint you are using, and it seems you don't update the fingerprint. Could you help to understand that?

Could you share the requirements.txt file?

Could you tell me which version of Python, Tensorflow you are using for the code?

I am trying to run the basic CCAC setup without Sumo and I am getting the following errors:
Traceback (most recent call last):
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 490, in apply_op
preferred_dtype=default_dtype)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 741, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 614, in _TensorTensorConversionFunction
% (dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype bool for Tensor with dtype int64: 'Tensor("nc/boolean_mask/Reshape_1:0", shape=(8,), dtype=int64)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 161, in
train(args)
File "main.py", line 99, in train
model = init_agent(env, config['MODEL_CONFIG'], total_step, seed)
File "main.py", line 63, in init_agent
total_step, config, seed=seed)
File "/network/home/jitanian/thesis/deeprl_network/agents/models.py", line 196, in init
total_step, seed, model_config)
File "/network/home/jitanian/thesis/deeprl_network/agents/models.py", line 110, in _init_algo
self.policy = self._init_policy()
File "/network/home/jitanian/thesis/deeprl_network/agents/models.py", line 240, in _init_policy
self.neighbor_mask, n_fc=self.n_fc, n_h=self.n_lstm)
File "/network/home/jitanian/thesis/deeprl_network/agents/policies.py", line 198, in init
self._init_policy(n_agent, neighbor_mask, n_h)
File "/network/home/jitanian/thesis/deeprl_network/agents/policies.py", line 329, in _init_policy
self.pi_fw, self.v_fw, self.new_states = self._build_net('forward')
File "/network/home/jitanian/thesis/deeprl_network/agents/policies.py", line 287, in _build_net
h, new_states = lstm_comm(ob, policy, done, self.neighbor_mask, self.states, 'lstm_comm')
File "/network/home/jitanian/thesis/deeprl_network/agents/utils.py", line 192, in lstm_comm
mi = tf.expand_dims(tf.reshape(tf.boolean_mask(out_m, masks[i]), [-1]), axis=0)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1117, in boolean_mask
return _apply_mask_1d(tensor, mask)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1089, in _apply_mask_1d
indices = squeeze(where(mask), squeeze_dims=[1])
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 2326, in where
return gen_array_ops.where(input=condition, name=name)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3824, in where
result = _op_def_lib.apply_op("Where", input=input, name=name)
File "/network/home/jitanian/thesis/deeprl_network/spatio-temp/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 513, in apply_op
(prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'input' of 'Where' Op has type int64 that does not match expected type of bool.

Can you explain a bit about the Largegrid Env?

Thanks for sharing the code. It really helps me understand the paper and algorithms. But I can't really figure out what the largegrid Env is about? Is this an environment that others define in paper?If no, can you do me a favor and explain this a bit. Otherwise, may you point out what the paper is.
Thanks again for offering your hands.

size mismatch in `config_ia2c_grid` PyTorch implementation

When I use the PyTorch implementation to train with config_ia2c_grid, I encounter this bug

Could anyone give me some feedback? Thanks!

CPU or gpu?

Hello, thank you for sharing the code. The code seems to be running on the CPU and is very slow. Can I use GPU acceleration? What should I do if possible?

Whether manual parallel sampling will cause problem with LSTM design

Hi, since sumo is too slow and do not support parallel sampling as we know, we are trying to manually construct several parallel envs during training with sumo as the core each, following a serial manner. It seems like this becomes an off-policy training process since samples from several envs are collected. While my concern is whether this will disturb the LSTM since it records global hidden states of a single env.
If we want to end up with a parallel sampling manner, is asynchronous sampling necessary?

setup_sumo.h failed on ubuntu 18.04.3 LTS

libtool: link: g++-5 -Wall -Wformat -Woverloaded-virtual -Wshadow -O2 -DNDEBUG -Wuninitialized -ffast-math -fstrict-aliasing -finline-functions -fomit-frame-pointer -fexpensive-optimizations -DHAVE_JPEG_H=1 -DHAVE_PNG_H=1 -DHAVE_TIFF_H=1 -DHAVE_ZLIB_H=1 -DHAVE_BZ2LIB_H=1 -DHAVE_XFT_H=1 -I/usr/include/freetype2 -DHAVE_XSHM_H=1 -DHAVE_XSHAPE_H=1 -DHAVE_XCURSOR_H=1 -DHAVE_XRENDER_H=1 -DHAVE_XRANDR_H=1 -DHAVE_XFIXES_H=1 -DHAVE_XINPUT_H=1 -DNO_XIM -DHAVE_GLU_H=1 -DHAVE_GL_H=1 -o .libs/chart chart.o icons.o ./.libs/libCHART-1.6.so /tmp/fox-20210117-6371-x26beb/fox-1.6.56/src/.libs/libFOX-1.6.so ../src/.libs/libFOX-1.6.so -lX11 -lXext /usr/lib/x86_64-linux-gnu/libfreetype.so -lfontconfig -lXft -lXcursor -lXrender -lXrandr -lXfixes -lXi -lm -ldl -lpthread -lrt -ljpeg -lpng -ltiff -lz -lbz2 -lGLU -lGL -Wl,-rpath -Wl,/home/linuxbrew/.linuxbrew/Cellar/fox/1.6.56_2/lib
/home/linuxbrew/.linuxbrew/bin/ld: /home/linuxbrew/.linuxbrew/lib/libfontconfig.so: undefined reference to `FT_Done_MM_Var'

ubuntu@ubuntu-intel-nuc:~/deeprl_network$ nm -D /home/linuxbrew/.linuxbrew/lib/libfontconfig.so | grep FT_Done_MM_Var
U FT_Done_MM_Var

Errors pop out when running in ATSC net environment

There is sth wrong with the package "traci" and it stopped the training all the time.

2020-09-26 11:16:41,135 [INFO] Training: a dim [6, 4, 2, 2, 2, 4, 2, 4, 2, 5, 2, 2, 4, 2, 2, 4, 2, 3, 6, 3, 2, 4, 4, 4, 4, 4, 6, 3], agent dim: 28
2020-09-26 11:16:41,136 [INFO] Use cpu for pytorch...
2020-09-26 11:16:41,208 [ERROR] Can not find checkpoint for /home/liubo/deeprl_net/ia2c_net_1.0/model/
Loading configuration... done.
Error: Answered with error to command 0xc2: The phase duration must be given as an integer.
Traceback (most recent call last):
File "main.py", line 161, in
train(args)
File "main.py", line 104, in train
trainer.run()
File "/home/liubo/deeprl_network/utils.py", line 218, in run
ob, done, R = self.explore(ob, done)
File "/home/liubo/deeprl_network/utils.py", line 156, in explore
next_ob, reward, done, global_reward = self.env.step(action)
File "/home/liubo/deeprl_network/envs/atsc_env.py", line 182, in step
self._set_phase(action, 'yellow', self.yellow_interval_sec)
File "/home/liubo/deeprl_network/envs/atsc_env.py", line 516, in _set_phase
self.sim.trafficlight.setPhaseDuration(node_name, phase_duration)
File "/home/liubo/virtual-env/py36/lib/python3.6/site-packages/traci/_trafficlight.py", line 283, in setPhaseDuration
tc.CMD_SET_TL_VARIABLE, tc.TL_PHASE_DURATION, tlsID, phaseDuration)
File "/home/liubo/virtual-env/py36/lib/python3.6/site-packages/traci/connection.py", line 141, in _sendDoubleCmd
self._sendExact()
File "/home/liubo/virtual-env/py36/lib/python3.6/site-packages/traci/connection.py", line 109, in _sendExact
raise TraCIException(err, prefix[1], _RESULTS[prefix[2]])
traci.exceptions.TraCIException: The phase duration must be given as an integer.
Error: tcpip::Socket::recvAndCheck @ recv: peer shutdown
Quitting (on error).