Dont blame me blame my data

Identify a subset of the training data that makes a classifier unfair. In this demo, we use Demographic Parity as our fairness constraint, so unless $p(\hat{Y} | A = a) = p(\hat{Y} | A = a’) ∀ a, a’$, then we declare our classifier as unfair. For simplicity, we first consider the case with only a single binary protected attribute.

We evaluate the distance between these two distributions using the KL divergence: $$D_KL (P || Q) = ∑_y P(y) log(\frac{P(y)}{Q(y)})$$ where $P(y) = p(\hat{Y} = y | A = 0)$ and $Q(y) = p(\hat{Y} = y | A = 1)$ were $A = 0$ indicates the protected group. We ues the KL divergence as our metric for unfairness, the larger the distance between these two distributions, the more unfair our classifier is.

We take two approaches to determining how each training example affects our fairness on a test set.

Leave-one-out retraining of the model for each training point and evaluating the change in unfairness for each model.
Computing the influence of each training point on the unfairness over the test set. We follow the approach of Koh and Liang, extending it to the case where we evaluate influence of a training point on a criteria different from the loss function the model was trained on.

After evaluating the impact of each training point on our classifier’s fairness, we select a subset that is most responsible for the unfairness exhibited on the test set. Similar to Khanna et al., we take advantage of the submodularity of this objective and greedily insert training examples into this set.

Results

Method	Unfairness	Accuracy
Original	0.0553	0.8107
Dropping random non-protected and positive class	0.0002	0.7697
Balancing distribution of positive and negative across protected class	0.0058	0.7915
Retrain, drop top 100	??	??
Influence, drop all with harmful influence	0.0001	0.7698

Is Male	>=50k	Fraction of dropped examples
X	X	0.66
X		0.20
	X	0.12
		0.02

Further inspection of the values dropped by the influence functions method reveals that the distribution of the features in the protected class better matches the distribution in the non-protected class. For example, before dropping training examples, examining training points in the positive class (earning more than \$50k per year), the mean hours per week worked by women in the training set is 36 hours, while for men it is 42 hours. After dropping the training examples deemed harmful by the influence function method, the mean value for women increases to 43 hours. This is consistent with intuition which tells us that in order to achieve demographic parity, the protected attribute should not be correlated with the prediction.

dmh43 / blame_my_data Goto Github PK

blame_my_data's Introduction

Dont blame me blame my data

Results

blame_my_data's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent