Git Product home page Git Product logo

g-ood-d's Introduction

GOOD-D: On Unsupervised Graph Out-Of-Distribution Detection

This is the source code of WSDM'23 paper "GOOD-D: On Unsupervised Graph Out-Of-Distribution Detection".

Requirements

This code requires the following:

  • Python==3.9
  • Pytorch==1.11.0
  • Pytorch Geometric==2.0.4
  • Numpy==1.21.2
  • Scikit-learn==1.0.2
  • OGB==1.3.3
  • NetworkX==2.7.1
  • FAISS-GPU==1.7.2

Usage

Just run the script corresponding to the experiment and dataset you want. For instance:

  • Run out-of-distribution detection on BZR (ID) and COX2 (OOD) datasets:
bash script/oodd_BZR+COX2.sh
  • Run anomaly detection on PROTEINS_full datasets:
bash script/ad_PROTEINS_full.sh

Statistic of Graph-level OOD Detection Benchmark

The statistic of each dataset pair in our benchmark is provided as follows.

ID datasetOOD dataset
No.Name# Graph
(Train/Test)
# Node
(avg.)
# Edge
(avg.)
Name# Graph
(Test)
# Node
(avg.)
# Edge
(avg.)
1BZR364/4135.838.4 COX24141.243.5
2PTC-MR309/3514.314.7 MUTAG3517.919.8
3AIDS1,800/20015.716.2 DHFR20042.444.5
4ENZYMES540/6032.662.1 PROTEIN6039.172.8
5IMDB-B1,350/15019.896.5 IMDB-M15013.065.9
6Tox217,047/78418.619.3 SIDER78433.635.4
7FreeSolv577/658.78.4 ToxCast6518.819.3
8BBBP1,835/20424.126.0 BACE20434.136.9
9ClinTox1,329/14826.227.9 LIPO14827.029.5
10Esol1,015/11313.313.7 MUV11324.226.3

Statistic of Graph-level Anomaly Detection Datasets

The statistic of each dataset in the anomaly detection experiments is provided as follows.

Dataset# Graph
(Train/Test)
# Node
(avg.)
# Edge
(avg.)
PROTEINS-full360/22339.172.8
ENZYMES400/12032.662.1
AIDS1280/40015.716.2
DHFR368/15242.444.5
BZR69/8135.838.4
COX281/9441.243.5
DD390/236284.3715.7
NCI11646/82229.832.3
IMDB-B400/20019.896.5
REDDIT-B800/400429.6497.8
COLLAB1920/100074.52457.8
HSE423/26716.917.2
MMP6170/23817.618.0
p538088/26917.918.3
PPAR-gamma219/26717.417.7

Implementation Details

Hyper-parameters

For the sake of efficiency, we set the structural encoding dimensions $d_s^{(rw)}$ and $d_s^{(dg)}$ to $16$. The encoders are 5-layer GINs with $16$ hidden dimensions. The number of dimensions of projected embeddings is the same as which of node embeddings. The batch size is selected from $16$ to $128$ according to the graph size of datasets. The number of clusters $K$ and self-adaptiveness parameter $\alpha$ are selected through grid search, with the scopes of ${2, 3, 5, 10, 15, 20, 30}$ and ${0, 0.2, 0.4, 0.6, 0.8, 1.0}$, respectively. The model is trained by the Adam optimizer with a learning rate of $0.0001$ until converging.

Computing Infrastructures

We conduct the experiments on a Linux server with an Intel Xeon Gold 6226R CPU and two Tesla V100S GPUs. We implement our method with PyTorch 1.11.0 and Pytorch Geometric 2.0.4.

Cite

If you compare with, build on, or use aspects of this work, please cite the following:

@inproceedings{liu2023goodd,
  title={GOOD-D: On Unsupervised Graph Out-Of-Distribution Detection},
  author={Liu, Yixin and Ding, Kaize and Liu, Huan and Pan, Shirui},
  booktitle={Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining},
  year={2023}
}

g-ood-d's People

Contributors

g-ood-d avatar yixinliu233 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.