Git Product home page Git Product logo

gaga's Introduction

GAGA: Label Information Enhanced Fraud Detection against Low Homophily in Graphs (WWW '23)

The PyTorch implementation of GAGA.

Abstract: Node classification is a substantial problem in graph-based fraud detection. Many existing works adopt Graph Neural Networks (GNNs) to enhance fraud detectors. While promising, currently most GNN-based fraud detectors fail to generalize to the low homophily setting. Besides, label utilization has been proved to be significant factor for node classification problem. But we find they are less effective in fraud detection tasks due to the low homophily in graphs. In this work, we propose GAGA, a novel Group AGgregation enhanced TrAnsformer, to tackle the above challenges. Specifically, the group aggregation provides a portable method to cope with the low homophily issue. Such an aggregation explicitly integrates the label information to generate distinguishable neighborhood information. Along with group aggregation, an attempt towards end-to-end trainable group encoding is proposed which augments the original feature space with the class labels. Meanwhile, we devise two additional learnable encodings to recognize the structural and relational context. Then, we combine the group aggregation and the learnable encodings into a Transformer encoder to capture the semantic information. Experimental results clearly show that GAGA outperforms other competitive graph-based fraud detectors by up to 24.39% on two trending public datasets and a real-world industrial dataset from Baidu. Even more, the group aggregation is demonstrated to outperform other label utilization methods (e.g., C&S, BoT/UniMP) in the low homophily setting.

Reproduction Tutorial

  1. Setup.

    • Two public dataset YelpChi and Amazon will be downloaded automatically at first run. Alternatively, you can download both datasets from Github.

    • Requirements

      conda (an open source package management tool) is recommended.

      conda create -n gaga python=3.8
      conda activate gaga
      
      conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
      conda install -c dglteam/label/cu121 dgl
      
      conda install -c conda-forge ruff
      
      pip install numpy pandas scikit-learn tensorboard matplotlib wandb
      
      conda clean --all
  2. Prepare sequence data.

    cd preprocessing
    
    # dataset spliting
    python dataset_split.py --dataset yelp  --save_dir seq_data --train_size 0.4 --val_size 0.1
    python dataset_split.py --dataset amazon  --save_dir seq_data --train_size 0.4 --val_size 0.1
    
    # preprocess feature sequence
    python graph2seq_mp.py --dataset yelp --fanouts -1 -1  --save_dir seq_data --train_size 0.4 --val_size 0.1 --n_workers 8 --add_self_loop --norm_feat
    python graph2seq_mp.py --dataset amazon --fanouts -1 -1  --save_dir seq_data --train_size 0.4 --val_size 0.1 --n_workers 8 --add_self_loop --norm_feat
  3. Run main_transformer.py

    python main_transformer.py --config configs/yelpchi_paper.json --gpu 0  --log_dir logs --early_stop 100
    python main_transformer.py --config configs/amazon_paper.json --gpu 0  --log_dir logs --early_stop 100

Cite

If you use GAGA in a scientific publication, we would appreciate citations to the following paper:

@inproceedings{wang2023label,
  title={Label Information Enhanced Fraud Detection against Low Homophily in Graphs},
  author={Wang, Yuchen and Zhang, Jinghui and Huang, Zhengjie and Li, Weibin and Feng, Shikun and Ma, Ziheng and Sun, Yu and Yu, Dianhai and Dong, Fang and Jin, Jiahui and Wang, Beilun and Luo, Junzhou},
  booktitle={Proceedings of the ACM Web Conference 2023},
  year={2023}
}

Email: [email protected]

Acknowledgements

GAGA is inspired by the recent success of graph-based fraud detectors (i.e. CARE-GNN, PC-GNN, RioGNN, etc.) and label utilization in node classification tasks (i.e. BoT, UniMP, etc.). We also thank the authors for sharing their codes.

gaga's People

Contributors

qw1zzard avatar orion-wyc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.