Git Product home page Git Product logo

heart's Introduction

HeaRT

Official code for the NeurIPS'23 paper "Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking".

Installation

Please see the installation.md for how to install the proper requirements.

Download Data

All data can be downloaded by running the download_data.sh script:

cd HeaRT  # Must be in the root directory
bash download_data.sh

This includes the negative samples generated by HeaRT and the splits for Cora, Citeseer, and Pubmed. The data for the OGB datasets will be automatically downloaded from the ogb package.

Reproduce Results

The commands needed to reproduce all the results with the appropriate hyperparameters can be found in the scripts/hyparameters directory. We include a file for each dataset which includes the command to train and evaluate each possible method.

For example, to reproduce the results on ogbl-collab under the existing evaluation setting, the command for each method can be found in the ogbl-collab.sh file located in the scripts/hyperparameter/existing_setting_ogb/ directory.

To run the code, we need to first go to the appropriate setting directory. This includes:

  • benchmarking/exist_setting_small: Run models on Cora, Citeseer, and Pubmed under the existing setting.
  • benchmarking/exist_setting_ogb: Run models on ogbl-collab, ogbl-ppa, and ogbl-citation2 under the existing setting.
  • benchmarking/exist_setting_ddi: Run models on on ogbl-ddi under the existing setting.
  • benchmarking/HeaRT_small: Run models on Cora, Citeseer, and Pubmed under HeaRT.
  • benchmarking/HeaRT_ogb: Run models on ogbl-collab, ogbl-ppa, and ogbl-citation2 under HeaRT.
  • benchmarking/HeaRT_ddi/: Run models on ogbl-ddi under HeaRT.

Below we give examples of running GCN on the different groups of datasets under both settings:

Cora/Citeseer/Pubmed under the existing setting.

cd benchmarking/exist_setting_small/
python  main_gnn_CoraCiteseerPubmed.py  --data_name cora  --gnn_model GCN --lr 0.01 --dropout 0.3 --l2 1e-4 --num_layers 1  --num_layers_predictor 3 --hidden_channels 128 --epochs 9999 --kill_cnt 10 --eval_steps 5  --batch_size 1024

ogbl-collab under the existing setting (similar for ogbl-ppa and ogbl-citation2):

cd benchmarking/exist_setting_ogb/
python main_gnn_ogb.py  --use_valedges_as_input  --data_name ogbl-collab  --gnn_model GCN --hidden_channels 256 --lr 0.001 --dropout 0.  --num_layers 3 --num_layers_predictor 3 --epochs 9999 --kill_cnt 100  --batch_size 65536 

ogbl-ddi under the existing setting:

cd benchmarking/exist_setting_ddi/
python main_gnn_ddi.py --data_name ogbl-ddi --gnn_model GCN  --lr 0.01 --dropout 0.5  --num_layers 3 --num_layers_predictor 3  --hidden_channels 256 --epochs 9999 --eval_steps 1 --kill_cnt 100 --batch_size 65536 

Cora/Citeseer/Pubmed under HeaRT:

cd benchmarking/HeaRT_small/
python main_gnn_CoraCiteseerPubmed.py  --data_name cora  --gnn_model GCN  --lr 0.001 --dropout 0.5 --l2 0 --num_layers 1 --hidden_channels 256  --num_layers_predictor 3  --epochs 9999 --kill_cnt 10 --eval_steps 5  --batch_size 1024 

ogbl-collab under HeaRT (similar for ogbl-ppa and ogbl-citation2):

cd benchmarking/HeaRT_ogb/
python main_gnn_ogb.py  --data_name ogbl-collab  --use_valedges_as_input --gnn_model GCN  --lr 0.001 --dropout 0.3 --num_layers 3 --hidden_channels 256  --num_layers_predictor 3 --epochs 9999 --kill_cnt 100 --eval_steps 1  --batch_size 65536  

ogbl-ddi under HeaRT:

cd benchmarking/HeaRT_ddi/
python main_gnn_ddi.py  --data_name ogbl-ddi   --gnn_model GCN --lr 0.01 --dropout 0 --num_layers 3 --hidden_channels 256  --num_layers_predictor 3 --epochs 9999 --kill_cnt 100 --eval_steps 1  --batch_size 65536    

Generate Negative Samples using HeaRT

The set of negative samples generated by HeaRT, that were used in the study, can be reproduced via the scripts in the scripts/HeaRT/ directory.

A custom set of negative samples can be produced by running the heart_negatives/create_heart_negatives.py script. Multiple options exist to customize the negative samples. This includes:

  • The CN metric used. Can be either CN or RA (default is RA). Specified via the --cn-metric argument.
  • The aggregation function used. Can be either min or mean (default is min). Specified via the --agg argument.
  • The number of negatives generated per positive sample. Specified via the --num-samples argument (default is 500).
  • The PPR parameters. This includes the tolerance used for approximating the PPR (--eps argument) and the teleporation probability (--alpha argument). alpha is fixed at 0.15 for all datasets. For the tolerance, eps, we recommend following the settings found in scripts/HeaRT.

Updates

November 3rd, 2023

  • Modified the negative samples for ogbl-collab to allow train/valid positive samples to be negatives. Please see Appendix I in the paper for our rationale.

Cite

@inproceedings{
  li2023evaluating,
  title={Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking},
  author={Li, Juanhui and Shomer, Harry and Mao, Haitao and Zeng, Shenglai and Ma, Yao and Shah, Neil and Tang, Jiliang and Yin, Dawei},
  booktitle={Neural Information Processing Systems {NeurIPS}, Datasets and Benchmarks Track},
  year={2023}
}

heart's People

Contributors

juanhui28 avatar harryshomer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.