DSAN (Dynamic Switch-Attention Network)

[1] Haoxing Lin, Rufan Bai, Weijia Jia, Xinyu Yang, Yongjian You. Preserving Dynamic Attention for Long-Term Spatial-Temporal Prediction. KDD 2020. Arxiv link: https://arxiv.org/abs/2006.08849

@inproceedings{dsan,
    title={Preserving Dynamic Attention for Long-Term Spatial-Temporal Prediction},
    author={Haoxing Lin and Rufan Bai and Weijia Jia and Xinyu Yang and Yongjian You},
    booktitle={Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '20)},
    year={2020} 
}

1. About DSAN

Dynamic Switch-Attention Network is designed to achieve effective long-term spatial-temporal prediction by filtering our spatial noise and alleviating long-term error propagation. It relies on attention mechanism instead of CNN or RNN to measure the spatial-temporal correlation. You are welcomed to check the technical details and experimental results in our paper.

Model architecture:

2. Environments

Quick install dependencies: pip install -r requirements.txt

Prerequisites:

Ubuntu Server 18.04
Python 3.6
Tensorflow & Tensorflow-GPU: 2.3.0
CUDA 10.1
CUDNN 7.6.5

Docker is strongly recommended:

docker pull tensorflow/tensorflow:2.3.0-gpu

We have tested our model on two different machines:

A duo Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz machine with 256G RAM and 4 NVIDIA RTX 2080 Ti GPUs
NVIDIA DGX-2

We configured the dependencies on the first machine manually and used the docker mentioned above on NVIDIA DGX-2 without any modification. Both resulted in the same outcome, and the only difference is the training time. If you have problems configuring the environment, pulling the official docker image is recommended.

3. Pretrained DSAN Checkpoints

We have provided some pretrained checkpoints in the checkpoints.zip file with the corresponding training and testing logs. Checkpoints of other DSAN variants are coming soon.

Checkpoints info:

Checkpoint	Dataset	#GPU	Batch Size	Mem per GPU	JT Weights
taxi-64x1	Taxi-NYC	V100 x 1	64	4.8G	equal
taxi-256x2	Taxi-NYC	V100 x 2	512	17.6G	equal
bike-64x1	Bike-NYC	2080Ti x 1	64	< 2G	equal

** JT Weights denotes the setting of joint training weights.

You don't have to use the same GPUs listed here but to make sure that your GPU Mem is large enough.

If you train DSAN on your own data, the checkpoints would be saved after every epoch. If the program is killed accidentally before the training finished, you can set remove_old_files to False and run the program once again to continue from the last epoch.

4. Start training

Before running the model, unzip data.zip. Then run the following command to train the model with one GPU

python3 main_1gpu.py

If you have multiple powerful GPUs, let's say 4 NVIDIA RTX 2080 Ti, you can run

python3 main_4gpus.py

Actually, the two main files are basically the same. The differences are the settings of gpu_ids, BATCH_SIZE, warmup_steps, and es_epoch parameters. Technically, training on multiple GPUs is faster. On our first machine, using 1 GPU takes 9 ~ 16 hours while 4 GPUs take 3 ~ 5 hours. Of course, BATCH_SIZE also matters, setting it to 512 should apparently accelerate the training compared to 64. Besides, the warmup_steps and es_epoch should be changed accordingly as well. You can also decide how many GPUs you want to use based on your own machine.

To use different data sets, run

python3 main_1gpu.py --dataset=ctm  /* taxi, bike, ctm */

You can check the parameters by running

python3 main_1gpu.py --help

or adjust them inside the main files.

5. Results, checkpoints, and Tensorboard

When running DSAN, the results of training, evaluation, and testing are automatically written into a txt file in results folder. Besides, the model also executes summary_write to create Tensorboard output in tensorboard folder. The checkpoints mentioned above are delivered to checkpoints folder.

6. Train on your own data

If you want to use your own data, you can check the utils/dataloader.py and data_parameters.py files and see how to preprocess your own data. Basically, DSAN requires 7 inputs, which are detailed in the experimental section and the appendix in our paper:

dae_inp_g: Enc-G input, shape: (batch_size, his_time_step, height_g, width_g, features)
dae_inp: Enc-D input, shape: (batch_size, his_time_step, height_d, width_d, features)
dae_inp_ex: external information. If you don't have any, just use the one-hot time information vectors, shape: (batch_size, his_time_step, one_hot)
sad_inp: SAD input, shape: (batch_size, future_time_step, features)
sad_inp_ex: external information for sad_inp, shape: (batch_size, future_time_step, one_hot)
cors_g
cors

cors_g and cors are the coordinate matrices generated by our model automatically, which are used for producing the spatial positional encoding. To make sure they are generated correctly, the only thing you need to do is to specify the information of your own data in data_parameters.py.

zhouchena1 / dsan Goto Github PK

dsan's Introduction

DSAN (Dynamic Switch-Attention Network)

1. About DSAN

2. Environments

3. Pretrained DSAN Checkpoints

4. Start training

5. Results, checkpoints, and Tensorboard

6. Train on your own data

dsan's People

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent