Git Product home page Git Product logo

dcrnn_pytorch's Introduction

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

Diffusion Convolutional Recurrent Neural Network

This is a PyTorch implementation of Diffusion Convolutional Recurrent Neural Network in the following paper:
Yaguang Li, Rose Yu, Cyrus Shahabi, Yan Liu, Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting, ICLR 2018.

Requirements

  • torch
  • scipy>=0.19.0
  • numpy>=1.12.1
  • pandas>=0.19.2
  • pyyaml
  • statsmodels
  • tensorflow>=1.3.0
  • torch
  • tables
  • future

Dependency can be installed using the following command:

pip install -r requirements.txt

Comparison with Tensorflow implementation

In MAE (For LA dataset, PEMS-BAY coming in a while)

Horizon Tensorflow Pytorch
1 Hour 3.69 3.12
30 Min 3.15 2.82
15 Min 2.77 2.56

Data Preparation

The traffic data files for Los Angeles (METR-LA) and the Bay Area (PEMS-BAY), i.e., metr-la.h5 and pems-bay.h5, are available at Google Drive or Baidu Yun, and should be put into the data/ folder. The *.h5 files store the data in panads.DataFrame using the HDF5 file format. Here is an example:

sensor_0 sensor_1 sensor_2 sensor_n
2018/01/01 00:00:00 60.0 65.0 70.0 ...
2018/01/01 00:05:00 61.0 64.0 65.0 ...
2018/01/01 00:10:00 63.0 65.0 60.0 ...
... ... ... ... ...

Here is an article about Using HDF5 with Python.

Run the following commands to generate train/test/val dataset at data/{METR-LA,PEMS-BAY}/{train,val,test}.npz.

# Create data directories
mkdir -p data/{METR-LA,PEMS-BAY}

# METR-LA
python -m scripts.generate_training_data --output_dir=data/METR-LA --traffic_df_filename=data/metr-la.h5

# PEMS-BAY
python -m scripts.generate_training_data --output_dir=data/PEMS-BAY --traffic_df_filename=data/pems-bay.h5

Graph Construction

As the currently implementation is based on pre-calculated road network distances between sensors, it currently only supports sensor ids in Los Angeles (see data/sensor_graph/sensor_info_201206.csv).

python -m scripts.gen_adj_mx  --sensor_ids_filename=data/sensor_graph/graph_sensor_ids.txt --normalized_k=0.1\
    --output_pkl_filename=data/sensor_graph/adj_mx.pkl

Besides, the locations of sensors in Los Angeles, i.e., METR-LA, are available at data/sensor_graph/graph_sensor_locations.csv.

Run the Pre-trained Model on METR-LA

# METR-LA
python run_demo_pytorch.py --config_filename=data/model/pretrained/METR-LA/config.yaml

# PEMS-BAY
python run_demo_pytorch.py --config_filename=data/model/pretrained/PEMS-BAY/config.yaml

The generated prediction of DCRNN is in data/results/dcrnn_predictions.

Model Training

# METR-LA
python dcrnn_train_pytorch.py --config_filename=data/model/dcrnn_la.yaml

# PEMS-BAY
python dcrnn_train_pytorch.py --config_filename=data/model/dcrnn_bay.yaml

There is a chance that the training loss will explode, the temporary workaround is to restart from the last saved model before the explosion, or to decrease the learning rate earlier in the learning rate schedule.

Eval baseline methods

# METR-LA
python -m scripts.eval_baseline_methods --traffic_reading_filename=data/metr-la.h5

PyTorch Results

PyTorch Results

PyTorch Results

PyTorch Results

PyTorch Results

Citation

If you find this repository, e.g., the code and the datasets, useful in your research, please cite the following paper:

@inproceedings{li2018dcrnn_traffic,
  title={Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting},
  author={Li, Yaguang and Yu, Rose and Shahabi, Cyrus and Liu, Yan},
  booktitle={International Conference on Learning Representations (ICLR '18)},
  year={2018}
}

dcrnn_pytorch's People

Contributors

liyaguang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dcrnn_pytorch's Issues

About the performance improvement compared with Tensorflow implementation

Hi, thank you for the implementation very much, it helps me a lot! BTW, I am really confused about the performance improvement comparing the PyTorch implementation with the original Tensorflow implementation. I would be very grateful if you could give me some guidance, thank you in advance!

it doesn't seem to improve in the test run

I test ran the code in google colab and so far I got output as following



2024-02-06 18:09:14,584 - INFO - Log directory: data/model/dcrnn_DR_2_h_12_64-64_lr_0.01_bs_192_0206180913/

INFO:model.pytorch.dcrnn_supervisor:Log directory: data/model/dcrnn_DR_2_h_12_64-64_lr_0.01_bs_192_0206180913/

2024-02-06 18:09:35,626 - INFO - Model created

INFO:model.pytorch.dcrnn_supervisor:Model created

2024-02-06 18:09:38,948 - INFO - Loaded model at 50

INFO:model.pytorch.dcrnn_supervisor:Loaded model at 50

2024-02-06 18:09:40,199 - INFO - Start training ...

INFO:model.pytorch.dcrnn_supervisor:Start training ...

2024-02-06 18:09:40,204 - INFO - num_batches:125

INFO:model.pytorch.dcrnn_supervisor:num_batches:125

2024-02-06 18:18:39,040 - INFO - epoch complete

INFO:model.pytorch.dcrnn_supervisor:epoch complete

2024-02-06 18:18:39,045 - INFO - evaluating now!

INFO:model.pytorch.dcrnn_supervisor:evaluating now!

2024-02-06 18:19:24,359 - INFO - Epoch [50/100] (6375) train_mae: 1.9753, val_mae: 2.9198, lr: 0.010000, 584.1s

/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py:432: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
  warnings.warn("To get the last learning rate computed by the scheduler, "
INFO:model.pytorch.dcrnn_supervisor:Epoch [50/100] (6375) train_mae: 1.9753, val_mae: 2.9198, lr: 0.010000, 584.1s

2024-02-06 18:19:24,384 - INFO - Saved model at 50

INFO:model.pytorch.dcrnn_supervisor:Saved model at 50

2024-02-06 18:19:24,391 - INFO - Val loss decrease from inf to 2.9198, saving to models/epo50.tar

INFO:model.pytorch.dcrnn_supervisor:Val loss decrease from inf to 2.9198, saving to models/epo50.tar

2024-02-06 18:28:27,688 - INFO - epoch complete

INFO:model.pytorch.dcrnn_supervisor:epoch complete

2024-02-06 18:28:27,692 - INFO - evaluating now!

INFO:model.pytorch.dcrnn_supervisor:evaluating now!
...
2024-02-06 21:27:25,031 - INFO - Epoch [69/100] (8750) train_mae: 1.9429, val_mae: 2.9616, lr: 0.000100, 589.0s

INFO:model.pytorch.dcrnn_supervisor:Epoch [69/100] (8750) train_mae: 1.9429, val_mae: 2.9616, lr: 0.000100, 589.0s

2024-02-06 21:28:55,730 - INFO - Epoch [69/100] (8750) train_mae: 1.9429, test_mae: 3.2499,  lr: 0.000100, 589.0s

INFO:model.pytorch.dcrnn_supervisor:Epoch [69/100] (8750) train_mae: 1.9429, test_mae: 3.2499,  lr: 0.000100, 589.0s

2024-02-06 21:37:59,490 - INFO - epoch complete

INFO:model.pytorch.dcrnn_supervisor:epoch complete

2024-02-06 21:37:59,494 - INFO - evaluating now!

INFO:model.pytorch.dcrnn_supervisor:evaluating now!

2024-02-06 21:38:44,803 - INFO - Epoch [70/100] (8875) train_mae: 1.9318, val_mae: 2.9033, lr: 0.001000, 589.1s

INFO:model.pytorch.dcrnn_supervisor:Epoch [70/100] (8875) train_mae: 1.9318, val_mae: 2.9033, lr: 0.001000, 589.1s

2024-02-06 21:38:44,823 - INFO - Saved model at 70

INFO:model.pytorch.dcrnn_supervisor:Saved model at 70

2024-02-06 21:38:44,827 - INFO - Val loss decrease from 2.9198 to 2.9033, saving to models/epo70.tar

INFO:model.pytorch.dcrnn_supervisor:Val loss decrease from 2.9198 to 2.9033, saving to models/epo70.tar

2024-02-06 21:47:48,164 - INFO - epoch complete

INFO:model.pytorch.dcrnn_supervisor:epoch complete

2024-02-06 21:47:48,169 - INFO - evaluating now!

INFO:model.pytorch.dcrnn_supervisor:evaluating now!

2024-02-06 21:48:33,495 - INFO - Epoch [71/100] (9000) train_mae: 1.9262, val_mae: 2.9057, lr: 0.001000, 588.7s

INFO:model.pytorch.dcrnn_supervisor:Epoch [71/100] (9000) train_mae: 1.9262, val_mae: 2.9057, lr: 0.001000, 588.7s

2024-02-06 21:57:36,690 - INFO - epoch complete

INFO:model.pytorch.dcrnn_supervisor:epoch complete

2024-02-06 21:57:36,698 - INFO - evaluating now!

INFO:model.pytorch.dcrnn_supervisor:evaluating now!
...
2024-02-06 23:57:38,667 - INFO - Epoch [84/100] (10625) train_mae: 1.9336, val_mae: 2.9073, lr: 0.000100, 588.8s

INFO:model.pytorch.dcrnn_supervisor:Epoch [84/100] (10625) train_mae: 1.9336, val_mae: 2.9073, lr: 0.000100, 588.8s

2024-02-07 00:06:42,161 - INFO - epoch complete

INFO:model.pytorch.dcrnn_supervisor:epoch complete

2024-02-07 00:06:42,165 - INFO - evaluating now!

INFO:model.pytorch.dcrnn_supervisor:evaluating now!

2024-02-07 00:07:27,510 - INFO - Epoch [85/100] (10750) train_mae: 1.9430, val_mae: 2.9063, lr: 0.000100, 588.8s

INFO:model.pytorch.dcrnn_supervisor:Epoch [85/100] (10750) train_mae: 1.9430, val_mae: 2.9063, lr: 0.000100, 588.8s

the valid and test result doesn't seems improving and the lr stayed unchanged. Are they expected and they will get better before the 100 epochs? Or something is wrong?
Thanks!

CUDA out of memory error

When I run python dcrnn_train_pytorch.py --config_filename=data/model/dcrnn_la.yaml I get the error
RuntimeError:CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 5.80 GiB total capacity; 4.36 GiB already allocated; 6.69 MiB free; 4.63 GiB reserved in total by PyTorch)

Problem with curriculum learning?

the decoder of the DCRNNModel in model/pytorch/dcrnn_model.py seems to be fed with its own output when use_curriculum_learning is set to False:

for t in range(self.decoder_model.horizon):
            decoder_output, decoder_hidden_state = self.decoder_model(decoder_input,
                                                                      decoder_hidden_state)
            decoder_input = decoder_output
            outputs.append(decoder_output)
            if self.training and self.use_curriculum_learning:
                c = np.random.uniform(0, 1)
                if c < self._compute_sampling_threshold(batches_seen):
                    decoder_input = labels[t]

However, I think it should be fed with ground truth labels, isn't it?
cfr. DCRNN paper, page 4 1st line

How where the figures generated?

I was wondering how you reversed the DCRNN normalization in METRLADatasetLoader to plot the speed on the y-axis in your figures: https://github.com/chnsh/DCRNN_PyTorch/blob/d92490b808ba5c5be2f23d427d96e9a56b066d7f/README.md#pytorch-results

I'm using the the following Pytorch notebook: https://colab.research.google.com/drive/132hNQ0voOtTVk3I4scbD3lgmPTQub0KR?usp=sharing#scrollTo=EzrkqXPxFwIx

Your chart:
Screenshot 2022-05-18 at 23 58 31

The one in the notebook where I want the y-axis to be like yours:
metra-la

ImportError

I was running:

python run_demo_pytorch.py --config_filename=data/model/pretrained/METR-LA/config.yaml

This is what I got:

Traceback (most recent call last):
File "run_demo_pytorch.py", line 8, in
from model.pytorch.dcrnn_supervisor import DCRNNSupervisor
File "C:\Users\mi\Desktop\DCRNN_PyTorch-pytorch_scratch\model\pytorch\dcrnn_supervisor.py", line 6, in
from torch.utils.tensorboard import SummaryWriter
ImportError: No module named 'torch.utils.tensorboard'

Testing results during training

Hello, firstly I would like to thank you for the implementation. I've been trying to use your implementation and I've noticed a big difference, during training when evaluating (fx every 10 steps) you're only reporting the mae (over all 12 time stamps), while DCRNN reports mae/mape/rmse for every time stamp. I would be interested to see those numbers during training, or at least at the end of the training so I can compare it with other models. Do you have any suggestions how I could do this?

Test error calculation is not correct

I tried to use dcrnnsupervise.evaulate() to calculate the test MAE but found that the results will be largely different when using different test batch sizes. Later I found that the current implementation directly calculates the (masked) MAE per batch and then simply averages them. This is not correct since the masking average is not a linear operation and then it cannot be done per batch and then calculate their average. The correct approach should be first to collect all predictions (and the corresponding targets) and then calculate the (masked) MAE over this full batch of data.

Lr scheduler resets when resuming model

Hi, many thanks for converting the tensorflow implementation of DCRNN to pytorch, it helped me a lot.

I noticed that when resuming a model, the learning-rate scheduling is being reset. Any idea why this happens? I uploaded a picture of a test case below.
If you need any further details, let me know.

Screenshot 2020-05-25 at 17 23 26

Using no convolution better than DCRNN paper result?

Hi @chnsh,

Hope you're doing well. While testing your pytorch implementation of the DCRNN code, I stumbled across a weird result. When turning off convolution with max_diffusion_step=0 the result was greatly improved over the original dcrnn paper result.

I tested this on the PeMS dataset, configurations are:
Screenshot 2020-06-03 at 16 35 01
link to file: model_config_issue.txt. In short, I simplified the model a bit by doing the 15min forecast, using 'laplacian' filter, no curriculum learning and only 60 epochs. Of course also max_diffusion_step=0 is used to discard using a convolution.

This resulted in val_mae: 1.2388 at 60th epoch as can be seen in the snippet or full info.log. Screenshot 2020-06-03 at 16 29 53 This result is better than the full blown published DCRNN which reported val_mae: 1.38. The fact that even a simpler model without convolution is better than the original DCRNN should raise concern about the soundness of this implementation. This is kind of the same problem as issue #3.

I'm not familiar with tensorflow, that is why your implementation has given me much help with my thesis. Because this observation could probably bottleneck me down the road I would like to pin down the reason for this behaviour as early as possible. I think you have more insights in the workings of the original tensorflow implementation of DCRNN, thus I would like to ask you to have another look in finding the problem. I have a gut feeling the problem lies somewhere in the calculation of the error/loss.

Hope you can find the time to look into this issue. Thanks in advance.

Double sigmoid inside RNNCell

I was looking at the implementation of the DCGRUCell, and I've spotted something out of order. If we are using 'fc' for U and R gates we are going to apply the sigmoid twice (line 121 and 96). Is this how it's intended to work, or is there a bug?

About Missing Data (0 values)

There is a lot of missing data (those with value 0) in both datasets (METR-LA and PEMS-BAY). Did you adopt any interplolation approach (such as linear interplolation) to fill in these missing values?
an example of missing values

Run the Pre-trained Model on METR-LA

I have a question.
To the proposed model in this paper, why do we need pre-trained ?
Can you tell me the difference between run_demo and dcrnn_train ?
Thank you very much !

About the function "_setup_graph()”

Could you tell me why there must be a call "_setup_graph()" before load this model.
I am a pytorch beginner and hope to get your answers.
The code is here:

`

def load_model(self):
    self._setup_graph()
    assert os.path.exists('models/epo%d.tar' % self._epoch_num), 'Weights at epoch %d not found' % self._epoch_num
    checkpoint = torch.load('models/epo%d.tar' % self._epoch_num, map_location='cpu')
    self.dcrnn_model.load_state_dict(checkpoint['model_state_dict'])
    self._logger.info("Loaded model at {}".format(self._epoch_num))

def _setup_graph(self):
    with torch.no_grad():
        self.dcrnn_model = self.dcrnn_model.eval()

        val_iterator = self._data['val_loader'].get_iterator()

        for _, (x, y) in enumerate(val_iterator):
            x, y = self._prepare_data(x, y)
            output = self.dcrnn_model(x)
            break

`

Figure generation

Which part of the code actually help to generate those figure for the results section, any help please, how to generate plot.

PEMS-BAY

Hi,

Thanks for the great work.

I am sort of confused if for running the pems-bay dataset, I need to do have more files. Specifically the data/sensor_graph seems to be present for the LA dataset but not for BAY dataset. So to train on the BAY dataset, what are the additional steps to be taken?. Thank you

Weights at epoch 64 not found

I was running the script of run_demo_pytorch.py using the command:
python run_demo_pytorch.py --config_filename=data/model/pretrained/METR-LA/config.yaml

This is what I got:
Traceback (most recent call last):
File "run_demo_pytorch.py", line 33, in
run_dcrnn(args)
File "run_demo_pytorch.py", line 18, in run_dcrnn
supervisor = DCRNNSupervisor(adj_mx=adj_mx, **supervisor_config)
File "/home/cyd/DCRNN_PyTorch/model/pytorch/dcrnn_supervisor.py", line 50, in init
self.load_model()
File "/home/cyd/DCRNN_PyTorch/model/pytorch/dcrnn_supervisor.py", line 93, in load_model
assert os.path.exists('models/epo%d.tar' % self._epoch_num), 'Weights at epoch %d not found' % self._epoch_num
AssertionError: Weights at epoch 64 not found

Could you please upload the 'models/epo64.tar' to the repo? I hope to reproduce the MAE results demonstrated in README. Thx!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.