Git Product home page Git Product logo

ecnu-fedtree-challenge's Introduction

FedTree Challenge for ECNU

Introduction

This is the Federated Learning Challenge for ECNU. The challenge is based on the FedTree framework, which is a federated learning framework for decision tree.

In this challenge, we will investigate federated learning for fraud detection. Fraud detection is a very important task. The United Nations estimated that the amount of money laundered globally in one year is 2 - 5% of global GDP. In reality, transaction data is distributed in multiple organizations (e.g., multiple banks). Thus, federated learning for fraud detection is a promising direction.

In this challenge, we will give you a dataset for fraud detection. You need to partition the dataset to multiple subsets to simulate the federated setting. Then, you need to develop the training model based on FedTree. The goal is to achieve high AUPRC for fraud detection.

Setup

Step 1 - Install FedTree

Please fork the FedTree repo and install it from source according to its document (install the standalone version is enough).

Step 2 - Simulate the Federated Setting

You need to download the fraud detection dataset here and put creditcard.csv under data directory. Then, you can run python train_test_split.py to split the dataset into training dataset and test dataset.

Then, you can create the partitions of the dataset to simulate the federated setting with the help of partitions/partition.py, which has the following parameters

Parameter Description
n_parties Number of parties, default = 2.
partition The partition way. Options: homo, noniid-labeldir, noniid-#label1 (or 2, 3, ..., which means the fixed number of labels each party owns), iid-diff-quantity. Default = homo
init_seed The initial seed, default = 0.
datadir The path of the dataset, default = ./data/creditcard_train.csv.
outputdir The path of the output directory, default = ./data/partitioned_creditcard/.
beta The concentration parameter of the Dirichlet distribution for heterogeneous partition, default = 0.5.

In this challenge, you need to try the following six federated settings.

  • n_parties = 2, partition = homo
  • n_parties = 2, partition = noniid-labeldir, beta = 0.5
  • n_parties = 10, partition = noniid-labeldir, beta = 0.5
  • n_parties = 10, partition = noniid-labeldir, beta = 0.1
  • n_parties = 10, partition = noniid-#label1
  • n_parties = 10, partition = iid-diff-quantity, beta = 0.5

You can run the scripts partitions/partition[1-6].sh to generate the partitions.

Step 3

You can run FedTree with the partitioned dataset and the configuration file. An example of the configuration file is shown below.

data=./data/creditcard1/0.csv,./data/creditcard1/1.csv
test_data=./data/creditcard_test.csv
n_parties=2
num_class=2
mode=horizontal
objective=binary:logistic
data_format=csv
privacy_tech=none
model_path=fedtree.model
max_num_bin=16
learning_rate=0.1
max_depth=6
n_trees=10

Then, you can run the following command to train the model.

./build/bin/FedTree-train example.conf

The expected output is AUC = 0.85367, which is the AUROC of the test dataset.

Metric

The main focus of the challenge is to achieve high AUC for each setting. We will take the average of the AUC of the six settings as the final score. Besides AUC, we also take efficiency (e.g., the running time of your training) into consideration. You are encouraged to develop any techniques to improve the accuracy and efficiency.

Requirement

You can modify any tree-related parameters (e.g., learning_rate, n_trees) to achieve the best AUC. You can also do any feature engineering work for each partitioned dataset, as long as the data information of each party is kept locally. You may refer to FedTree documentation for a full list of parameters. Moreover, you can modify the source code of FedTree if you want to implement any new tree-based algorithms or functionalities. The goal is to achieve the best AUC for each federated setting listed in Step 2 of Setup.

Submission

You need to submit a zip file including 1) a report to briefly describe your understanding of the problem, your solution, results, and findings (<= 2 pages); 2) the configuration file for each setting; 3) link to the source code you have implemented for your solution (if any). You do not need to submit the code of FedTree if you do not modify it.

ecnu-fedtree-challenge's People

Contributors

fedtree avatar qinbinli avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.