Git Product home page Git Product logo

st-ssl's Introduction

ST-SSL: Spatio-Temporal Self-Supervised Learning for Traffic Prediction

PWC

PWC

PWC

PWC

This is a Pytorch implementation of ST-SSL in the following paper:

framework

new 27/10/2023: This paper is picked up by leading WeChat official accounts in the field of data mining and transportation. 当交通遇上机器学习 | 时空实验室 | AI蜗牛车

new 22/04/2023: The post of this paper is selected for a headline tweet by PaperWeekly and received nearly 7,000 reads. PaperWeekly is a leading AI academic platform in China.

new 09/02/2023: The video replay of academic presentation at AAAI 2023.

new 04/02/2023: J. Ji is invited to give a talk at AAAI 2023 Beijing Pre-Conference. The talk is about Spatio-Temporal Self-Supervised Learning for Traffic Flow Prediction.

Requirement

We build this project by Python 3.8 with the following packages:

numpy==1.21.2
pandas==1.3.5
PyYAML==6.0
torch==1.10.1

Datasets

The datasets range from {NYCBike1, NYCBike2, NYCTaxi, BJTaxi}. You can download them from GitHub repo, Beihang Cloud Drive, or Google Drive.

Each dataset is composed of 4 files, namely train.npz, val.npz, test.npz, and adj_mx.npz.

|----NYCBike1\
|    |----train.npz    # training data
|    |----adj_mx.npz   # predefined graph structure
|    |----test.npz     # test data
|    |----val.npz      # validation data

The train/val/test data is composed of 4 numpy.ndarray objects:

  • X: input data. It is a 4D tensor of shape (#samples, #lookback_window, #nodes, #flow_types), where # denotes the number sign.
  • Y: data to be predicted. It is a 4D tensor of shape (#samples, #predict_horizon, #nodes, #flow_types). Note that X and Y are paired in the sample dimension. For instance, (X_i, Y_i) is the i-the data sample with i indexing the sample dimension.
  • X_offset: a list indicating offsets of X's lookback window relative to the current time with offset 0.
  • Y_offset: a list indicating offsets of Y's prediction horizon relative to the current time with offset 0.

For all datasets, previous 2-hour flows as well as previous 3-day flows around the predicted time are used to forecast flows for the next time step.

adj_mx.npz is the graph adjacency matrix that indicates the spatial relation of every two regions/nodes in the studied area.

⚠️ Note that all datasets are processed as a sliding window view. Raw data of NYCBike1 and BJTaxi are collected from STResNet. Raw data of NYCBike2 and NYCTaxi are collected from STDN. If needed, one can download the original datasets from this link.

Model training and Evaluation

If the environment is ready, please run the following commands to train the model on the specific dataset from {NYCBike1, NYCBike2, NYCTaxi, BJTaxi}.

>> cd ST-SSL
>> ./runme 0 NYCBike1   # 0 specifies the GPU id, NYCBike1 gives the dataset

Note that this repo only contains the NYCBike1 data because including all datasets can make this repo heavy.

Cite

If you find the paper useful, please cite the following:

@article{ji2023spatio, 
  title={Spatio-Temporal Self-Supervised Learning for Traffic Flow Prediction}, 
  author={Ji, Jiahao and Wang, Jingyuan and Huang, Chao and Wu, Junjie and Xu, Boren and Wu, Zhenhe and Zhang Junbo and Zheng, Yu}, 
  journal={Proceedings of the AAAI Conference on Artificial Intelligence}, 
  volume={37},
  number={4},
  pages={4356-4364},
  year={2023}
}

st-ssl's People

Contributors

echo-ji avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

st-ssl's Issues

Where can I download the four datasets?

Hello, where can I download the four datasets including NYCBike1, I want to download the datasets from the relevant links of STResNet and STDN, but the relevant links are not working.

File "D:\AICode\ST-SSL\model\aug.py", line 53, in aug_topology drop_prob = torch.softmax(sim_mx[edge_mask], dim=0) RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

I encountered the following problem when running the program, I debugging found that the sim_mx is in cpu, edge_mask is in gpu, and then I change the edge_mask to CPU, and the program can run successfully, i want to ensure whether changing this place is right?

my change is : edge_mask = (input_graph > 0).tril(diagonal=-1).cpu()

关于数据集

感谢您出色的工作,我想知道原始的数据集是如何变成.npz文件的?能否提供对应的转换代码。再次感谢!

RuntimeError

File "D:\tfp_pro\ST-SSL-main\model\aug.py", line 54, in aug_topology drop_prob = torch.softmax(sim_mx[edge_mask], dim=0) RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
clone到本地运行遇到了这个问题

Raw datasets of STSSL

You can download the raw datasets from this link.

The name of the datasets in STSSL corresponds to the name of this link as follows:

STSSL The given link
NYCBike1 NYCBike20140409
NYCBike2 NYCBike20160708
NYCTaxi NYCTaxi20150103
BJTaxi TAXIBJ

数据集的一点问题

请问作者raw dataset有吗,我想对数据集重新进行处理来适配我的模型,但我在网上并没有找到数据集的原始版本,在您提供的数据集中划分的是19个时间片,我想拿到原始数据来重新划分一个sample含有更多的时间片,因为我看到x_offset中的数据是[[-73][-72][-71][-70][-69][-49][-48][-47][-46][-45][-25][-24][-23][-22][-21][ -3][ -2][ -1][ 0]],我感觉直接用第一维度乘以第二维度来获得连续的总的时间片是不太靠谱的,所以想请教下您应该怎么做呢?谢谢

ValueError: Cannot load file containing pickled data when allow_pickle=False

Hi,I have a question to ask.
When I run with taxibj dataset, have a bug 'ValueError: Cannot load file containing pickled data when allow_pickle=False' in the line 'cat_data = np.load(os.path.join(data_dir, dataset, category + '.npz'))',
and then I modify this line to 'cat_data = np.load(os.path.join(data_dir, dataset, category + '.npz'), allow_pickle=True)', but still have a bug 'pickle.UnpicklingError: Failed to interpret file 'data/BJTaxi/train.npz' as a pickle';
and also I replace np.load to np.loadtxt, but still unsuccessful.

Could you tell me how to load the taxibj dataset?
thank you!!!

损失函数Ls与pooler

   (1)     
    l1 = - torch.mean(torch.sum(q1 * F.log_softmax(zc2 / self.tau, dim=1), dim=1))
    l2 = - torch.mean(torch.sum(q2 * F.log_softmax(zc1 / self.tau, dim=1), dim=1)) ,请问:按照论文中的意思难道不是只算l2就行了吗? 为什么代码中还算了l1(生成的表征质量更高,用其聚类结果作为标签,指导原始表征的学习)                        

(2)请问:pooler类中的实现方法是否与论文中有区别,感觉不太一样

[Data Preprocessing] Questions about the data preprocessing procedure.

I would greatly appreciate it if you could elaborate on how to process the dataset.

In the Datasets section, it says that all datasets are processed as a sliding window view, and the format is composed of 4 numpy.ndarray objects.

Could you explain what these "x,y,x_offset,y_offset" mean? or better yet, release the preprocessing code.

Thank you very much for your time and attention to my inquiries.

about adj_mx.npz

Excuse me, I would like to ask how to generate the adj_mx.npz file?

How to change dataset?

Hi,I want to ask that how to run the projects with another dataset. Now,I can successfully run the projects with the dataset NYCBike 1,but i don't know how to change it into BJTaxi.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.