Download the preprocessed data and unzip the downloaded .zip file.
Set the PREFIX_PATH variable in my_constants.py as the path to this extracted folder.
For each city (Chengdu, Harbin etc), there are two types of data:
Stored as a python pickled list of tuples, where each tuple is of the form (trip_id, trip, time_info). Here each trip is a list of edge identifiers.
In the map folder, there are the following files-
- nodes.shp : Contains OSM node information (global node id mapped to (latitude, longitude))
- edges.shp : Contains network connectivity information (global edge id mapped to corresponding node ids)
- graph_with_haversine.pkl : Pickled NetworkX graph corresponding to the OSM data
The code has been tested for Python version 3.7.7 and CUDA 10.2. We recommend that you use the same.
To create a virtual environment using conda,
conda create -n ENV_NAME python=3.7.7
conda activate ENV_NAME
All dependencies can be installed by running the following commands -
pip install -r requirements.txt
pip install --no-index torch-scatter -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
pip install --no-index torch-sparse -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
pip install --no-index torch-cluster -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
pip install --no-index torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
pip install torch-geometric
After setting PREFIX_PATH in the my_constants.py file, the script can be run directly as follows-
python -i main.py -dataset harbin_data -gnn GCN -lipschitz
Other functionality can be toggled by adding them as arguments, for example,
python -i main.py -dataset DATASET -gpu_index GPU_ID -eval_frequency EVALUATION_PERIOD_IN_EPOCHS -epochs NUM_EPOCHS
python -i main.py -traffic -attention
python -i main.py -check_script
python -i main.py -cpu
Brief description of other arguments/functionality -
Argument | Functionality |
---|---|
-check_script | to run on a fixed subset of train_data, as a sanity test |
-cpu | forces computation on a cpu instead of the available gpu |
-gnn | can choose between a GCN or a GAT |
-gnn_layers | number of layers for the graph neural network used |
-epochs | number of epochs to train for |
-percent_data | percentage data used for training |
-fixed_embeddings | to make the embeddings static, they aren't learnt as parameters of the network |
-embedding_size | the dimension of embeddings used |
-hidden_size | hidden dimension for the MLP |
-traffic | to toggle the attention module |
-attention | to toggle the attention module |
For exact details about the expected format and possible inputs please refer to the args.py and my_constants.py files.