Git Product home page Git Product logo

erasmopurif / towards-human-centered-fairness-analysis Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 1.0 155.96 MB

Repository of the paper "Towards a Human-Centred Fairness Analysis: From Binary to Multiclass and Multigroup Assessment in Graph Neural Network-Based Models for User Profiling Tasks"

Python 100.00%
fairness fairness-ml graph-neural-networks user-profiling bias human-centered-ai user-modeling

towards-human-centered-fairness-analysis's Introduction

Python

Towards a Human-Centered Fairness Analysis

Repository of the paper "Towards a Human-Centered Fairness Analysis: From Binary to Multiclass and Multigroup Assessment in Graph Neural Network-Based User Modeling Tasks" by Erasmo Purificato, Ludovico Boratto and Ernesto William De Luca.

Abstract

User modeling is a key topic in many applications, mainly social networks and information retrieval systems. To assess the effectiveness of a user modeling approach, its capability to classify personal characteristics (e.g., the gender, age, or consumption grade of the users) is evaluated. Due to the fact that some of the attributes to predict are multiclass (e.g., age usually encompasses multiple ranges), assessing \textit{fairness} in user modeling becomes a challenge since most of the related metrics work with binary attributes. As a workaround, the original multiclass attributes are usually binarized to meet standard fairness metrics definitions where both the target class and sensitive attribute (such as gender or age) are binary. However, this alters the original conditions, and fairness is evaluated on classes that differ from those used in the classification. In this article, we extend the definitions of four existing fairness metrics (related to disparate impact and disparate mistreatment) from binary to multiclass scenarios, considering different settings where either the target class or the sensitive attribute includes more than two groups. Our work endeavors to bridge the gap between formal definitions and real use cases in bias detection. The results of the experiments, conducted on four real-world datasets by leveraging two state-of-the-art graph neural network-based models for user modeling, show that the proposed generalization of fairness metrics can lead to a more effective and fine-grained comprehension of disadvantaged sensitive groups and, in some cases, to a better analysis of machine learning models originally deemed to be fair.

Requirements

The code has been executed under Python 3.9.18, with the dependencies listed below.

CatGCN

metis==0.2a5
networkx==2.6.3
numpy==1.22.0
pandas==1.3.5
scikit_learn==1.1.2
scipy==1.7.3
texttable==1.6.4
torch==1.10.1+cu113
torch_geometric==2.0.3
torch_scatter==2.0.9
tqdm==4.62.3

RHGN

dgl==0.9.1
dgl_cu113==0.7.2
hickle==4.0.4
matplotlib==3.5.1
numpy==1.22.0
pandas==1.3.5
scikit_learn==1.1.2
scipy==1.7.3
torch==1.10.1+cu113

Notes:

  • the file requirements.txt installs all dependencies for both models;
  • the dependencies including cu113 are meant to run on CUDA 11.3 (install the correct package based on your version of CUDA).

Datasets

The preprocessed files required for running each model are included as a zip file within the related folder.

The raw datasets are available at:

Multiclass and Multigroup Fairness Metrics

The repository implements the generalised Multiclass and Multigroup Fairness Metrics presented in the paper.

Let:

  • $M$ be the number of classes;
  • $N$ be the number of demographic groups;
  • $y \in \lbrace 0, ..., M-1 \rbrace$ be the target class;
  • $\hat{y} \in \lbrace 0, ..., M-1 \rbrace$ be the predicted class;
  • $s \in \lbrace 0, ..., N-1 \rbrace$ be the sensitive attribute.

The score of each of the metrics displayed below should be equal across every class and group:

Multiclass and multigroup statistical parity

$$ P(\hat{y} = m | s = n), \forall m \in \lbrace 0,...,M-1 \rbrace \land \forall n \in \lbrace 0,...,N-1 \rbrace $$

Multiclass and multigroup equal opportunity

$$ P(\hat{y} = m | y = m, s = n), \forall m \in \lbrace 0,...,M-1 \rbrace \land \forall n \in \lbrace 0,...,N-1 \rbrace $$

Multiclass and multigroup overall accuracy equality

$$ \sum_{m=0}^{M-1} P(\hat{y} = m | y = m, s = n), \forall n \in \lbrace 0,...,N-1 \rbrace $$

Multiclass and multigroup treatment equality

$$ \frac{P(\hat{y} = m | y \neq m, s = n)}{P(\hat{y} \neq m | y = m, s = n)}, \forall m \in \lbrace 0,...,M-1 \rbrace \land \forall n \in \lbrace 0,...,N-1 \rbrace $$

Run the code

Example test runs for each combination of model-dataset.

CatGCN - Alibaba dataset

$ cd CatGCN
$ python3 main.py --seed 11 --gpu 0 --learning-rate 0.1 --weight-decay 1e-5 \
--dropout 0.1 --diag-probe 1 --graph-refining agc --aggr-pooling mean --grn-units 64 \
--bi-interaction nfm --nfm-units none --graph-layer pna --gnn-hops 1 --gnn-units none \
--aggr-style sum --balance-ratio 0.7 --edge-path ./input/ali_data/user_edge.csv \
--field-path ./input_ali_data/user_field.npy --target-path ./input_ali_data/user_buy.csv \
--labels-path ./input_ali_data/user_labels.csv --sens-attr age --label buy

CatGCN - JD dataset

$ cd CatGCN
$ python3 main.py --seed 11 --gpu 0 --learning-rate 1e-2 --weight-decay 1e-5 \
--dropout 0.1 --diag-probe 39 --graph-refining agc --aggr-pooling mean --grn-units 64 \
--bi-interaction nfm --nfm-units none --graph-layer pna --gnn-hops 1 --gnn-units none \
--aggr-style sum --balance-ratio 0.7 --edge-path ./input_jd_data/user_edge.csv \
--field-path ./input_jd_data/user_field.npy --target-path ./input_jd_data/user_expense.csv \
--labels-path ./input_jd_data/user_labels.csv --sens-attr bin_age --label expense

CatGCN - Pokec dataset

$ cd CatGCN
$ python3 main.py --seed 11 --gpu 0 --learning-rate 1e-3 --weight-decay 1e-5 \
--dropout 0.7 --diag-probe 1 --graph-refining agc --aggr-pooling mean --grn-units 64 \
--bi-interaction nfm --nfm-units none --graph-layer pna --gnn-hops 1 --gnn-units none \
--aggr-style sum --balance-ratio 0.1 --edge-path ./input_pokec_data/edges.csv \
--field-path ./input_pokec_data/categories.npy --target-path ./input_pokec_data/user_workfield.csv \
--labels-path ./input_pokec_data/users.csv --sens-attr bin_age --label work_field

CatGCN - NBA dataset

$ cd CatGCN
$ python3 main.py --seed 3 --gpu 0 --learning-rate o.1 --weight-decay 1e-4 \
--dropout 0.9 --diag-probe 39 --graph-refining agc --aggr-pooling mean --grn-units 64 \
--bi-interaction nfm --nfm-units none --graph-layer pna --gnn-hops 1 --gnn-units 64 \
--aggr-style sum --balance-ratio 0.7 --edge-path ./input_nba_data/edges.csv \
--field-path ./input_nba_data/points.npy --target-path ./input_nba_data/user_bin_salary.csv \
--labels-path ./input_nba_data/users.csv --sens-attr bin_age --label bin_salary

RHGN - Alibaba dataset

$ cd RHGN
$ python3 ali_main.py --seed 42 --gpu 0 --model RHGN --data_dir ./input_ali_data/ \
--graph G --max_lr 0.1 --n_hid 32 --clip 2 --n_epoch 100 \
--label bin_buy --sens_attr bin_age

RHGN - JD dataset

$ cd RHGN
$ python3 jd_main.py --seed 3 --gpu 0 --model RHGN --data_dir ./input_jd_data/ \
--graph G --max_lr 1e-3 --n_hid 64 --clip 1 --n_epoch 100 \
--label bin_exp --sens_attr bin_age

RHGN - Pokec dataset

$ cd RHGN
$ python3 pokec_main.py --seed 11 --gpu 0 --model RHGN --data_dir ./input_pokec_data/ \
--graph G --max_lr 1e-3 --n_hid 64 --clip 2 --n_epoch 100 \
--label bin_work_field --sens_attr age

RHGN - NBA dataset

$ cd RHGN
$ python3 nba_main.py --seed 11 --gpu 0 --model RHGN --data_dir ./input_nba_data/ \
--graph G --max_lr 0.1 --n_hid 32 --clip 1 --n_epoch 100 \
--label salary --sens_attr age

Contact

[email protected]

towards-human-centered-fairness-analysis's People

Contributors

erasmopurif avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

shawn-dm

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.