birkhoffg / relax Goto Github PK

Recourse Explanation Library in JAX

Home Page: https://birkhoffg.github.io/ReLax/

License: Apache License 2.0

Jupyter Notebook 68.58% Python 31.20% CSS 0.17% SCSS 0.05%

counterfactual-explanations explainable-ai algorithmic-recourse jax benchmarking explainability explainability-libraries python cpu research-tool gpu tpu recourse jax-relax

relax's Introduction

ReLax

Overview | Installation | Tutorials | Documentation | Citing ReLax

Important

📣 This repository is migrated to a new link: https://github.com/BirkhoffG/jax-relax.

Overview

ReLax (Recourse Explanation Library in Jax) is a library built on top of jax to generate counterfactual and recourse explanations for Machine Learning algorithms. By leveraging vectorization though vmap/pmap and just-in-time compilation in jax (a high-performance auto-differentiation library). ReLax offers massive speed improvements in generating individual (or local) explanations for predictions made by Machine Learning algorithms.

Some of the key features are as follows:

🏃 Fast recourse generation via jax.jit, jax.vmap/jax.pmap.
🚀 Accelerated over cpu, gpu, tpu.
🪓 Comprehensive set of recourse methods implemented for benchmarking.
👐 Customizable API to enable the building of entire modeling
and interpretation pipelines for new recourse algorithms.

Installation

The latest ReLax release can directly be installed from PyPI:

pip install jax-relax

or installed directly from the repository:

pip install git+https://github.com/BirkhoffG/ReLax.git

To futher unleash the power of accelerators (i.e., GPU/TPU), we suggest to first install this library via pip install jax-relax. Then, follow steps in the official install guidelines to install the right version for GPU or TPU.

An Example of using `ReLax`

See Getting Started with ReLax.

Citing `ReLax`

To cite this repository:

@software{relax2023github,
  author = {Hangzhi Guo and Xinchang Xiong and Amulya Yadav},
  title = {{R}e{L}ax: Recourse Explanation Library in Jax},
  url = {http://github.com/birkhoffg/ReLax},
  version = {0.1.0},
  year = {2023},
}

relax's People

Contributors

Watchers

Forkers

grahams-uncle

relax's Issues

logging during the training seems to slow the training

This line seems to slow the entire training

https://github.com/BirkhoffG/cfnet/blob/6d5c3ec28351de79e575bbded5e67ab1bab48b03/cfnet/module.py#L249-L250

Support dataloader in `tensorflow` backend

Plugin to #50

Document configs for DataModule

Migrate `cfnet` to `ReLax`

There are some outdated links (i.e., some links are still written in cfnet). Please try to fix ALL of them to the correct links.

For example,
https://github.com/BirkhoffG/ReLax/blob/v0.1/relax/data/module.py#L415-L416

Support Optional `monitor_metrics` in `TrainingConfigs`

Something like:

If monitor_metrics is None:
    # no checkpoint manage

Support hyper-parameter searching for CF explanation methods

Supporting hyper-parameter searching enables us to properly benchmark the algorithms. This issue is a thread discussing how to support hyperparameter searching in CF explanation methods.

In essence, this problem is a multi-objective problem (i.e., minimizing invalidity and cost).

Some open-sourced libraries of hyper-parameter searching:

Optuna [doc]
Ax [doc]
Dragonfly [link]
Spearmint [link]
BoTorch [link]

An abstract method `apply_constraints` in `BaseDataModule`, and implement this method in `TabulaDataModule`

Support aux arguments of `pred_fn` to be passed to `generate_cf_explanations`

Currently, we assume pred_fn is a function of only one input x. E.g., it is something like:

pred_fn = lambda x: 2 * x + 1

However, it is possible that user-defined pred_fn takes other arguments.

Hence, I propose

def generate_cf_explanations(
    cf_module: BaseCFModule,
    datamodule: TabularDataModule,
    pred_fn: Callable[[jnp.DeviceArray], jnp.DeviceArray] = None,
    *,
    t_configs=None,
    pred_fn_args: dict=None
)

where inside, we call pred_fn as

pred_fn(x, **pred_fn_args)

This offers additional flexibility for models that are not implemented using our framework.

Proposing a `BasePredFn` Mixin

`load_default_data` function to load data

CI/CD takes too long

Seems to run some unnecessary tests (e.g., train some models) during the testing

Complete Contributor Notes

Get rid of the Pytorch dependencies in `TabularDataModule`

Pytorch is only needed for loading data. Our library mainly handles tabular data, so data loading would not be a bottleneck to most scenarios. Pytorch Dataloader is overkill for our project in most use cases.

Purpose

Write a drop-in NumpyLoader.

ToDo

Delete the Pytorch Dependency

https://github.com/BirkhoffG/cfnet/blob/24783713dc787cd5b13e70aa483e455c4856198f/settings.ini#L18

https://github.com/BirkhoffG/cfnet/blob/24783713dc787cd5b13e70aa483e455c4856198f/cfnet/datasets.py#L9

Next, modify the following code to make them not inherent Pytorch Dataset and DataLoader:

https://github.com/BirkhoffG/cfnet/blob/24783713dc787cd5b13e70aa483e455c4856198f/cfnet/datasets.py#L12-L22

https://github.com/BirkhoffG/cfnet/blob/24783713dc787cd5b13e70aa483e455c4856198f/cfnet/datasets.py#L35-L51

Expected Functionalities

NumpyDataset should contain all the input data.

# x, y are jax.numpy.array, such that len(x) == len(y)
dataset = NumpyDataset(x, y)

x, y = dataset[:] # access all the data of x, y
x_5, y_5 = dataset[:5] # access first five data of x, y

NumpyLoader iterates the NumpyDataset. See Pytorch Docs.

batch_size = 128
dataloader = NumpyLoader(
    dataset, # a `NumpyDataset`
    batchsize=batch_size,
    shuffle=True, # if True, shuffle the data; else, return the data in order
    drop_last=False # if True, discard the last batch (if len(dataset) % batchsize != 0); else, return the last batch
)

for x, y in dataloader:
    assert len(x) == batch_size
    assert len(y) == batch_size
    ...

Check monitor_metrics

Check monitor_metrics before actually finding the metric in logs.

Before this line:
https://github.com/BirkhoffG/cfnet/blob/2ee1a3203a9935e89b2ed8adf175ee1150fd9960/cfnet/_ckpt_manager.py#L54

Check monitor_metrics.

raise ValueError(...)

if monitor_metrics is not appropriately configured.

Call `apply_constraints` when training `CounterNet` and other CF methods

Complete Installation tutorial

Refactor util functions

Move

binary_cross_entropy in cfnet.methods.vanilla
grad_update, cat_normalize in cfnet.training_module

into cfnet.utils

Implement ProtoCF

https://arxiv.org/abs/1907.02584

Document `nbs/06_evaluate.ipynb`

Ensure All Tests Passes

Support dataloader in `Pytorch` backend

Plugin to #50

`jax.nn.sigmoid` leads to `nan` when calculating gradient if input is large

Reference:
https://stackoverflow.com/a/68293931

Remove Everything with `@deprecated`

For example, remove this function:

ReLax/relax/evaluate.py

Line 156 in 729cacf

def generate_cf_results_local_exp(

`pred_fn` as an input argument for `LocalCFExplanationModule`, not for init argument

Pass `seed` and `batch_size` to the dataloader functions in `TabularDataModule`

Pass seed and batch_size to TabularDataModule.train_dataloader, TabularDataModule.val_dataloader, and TabularDataModule.test_dataloader.
batch_size should also be an argument in TrainingConfigs
https://github.com/BirkhoffG/cfnet/blob/2ee1a3203a9935e89b2ed8adf175ee1150fd9960/cfnet/train.py#L15
Deprecated batch_size in DataConfigs
Finally, pass appropriate arguments:
https://github.com/BirkhoffG/cfnet/blob/2ee1a3203a9935e89b2ed8adf175ee1150fd9960/cfnet/train.py#L58-L59

Proposed features:

Improve Documents

#85

ReLax/relax/data/module.py

Line 404 in 729cacf

def load_data(
#86
#87

Check if cat_info is updated in `CounterNetTrainingModule`

We might want to provide a hook to the data_module for accessing cat_idx and cat_array.

Inefficiency in indexing data

https://github.com/BirkhoffG/cfnet/blob/6d5c3ec28351de79e575bbded5e67ab1bab48b03/cfnet/datasets.py#L95

You can write something like:

 batch_data = self.dataset[self.indices]

Visualizing CF explanations

Do not retrain parametric models if they are already trained

ReLax/relax/evaluate.py

Lines 71 to 73 in 729cacf

 print(f'{type(cf_module).__name__} contains parametric models. ' 

 'Starts training before generating explanations...') 

 cf_module.train(datamodule, t_configs)

Proposal

data_module = TabularDataModule('adult')

As such, TabularDataModule will automatically load data_configs of the adult dataset.

We should also allow TabularDataModule to pass user-defined configs (i.e., current argument data_configs: str | dict).

Line 154 in ec1a411

def __init__(self, m_configs: Dict[str, Any]):

	print(f'{type(cf_module).__name__} contains parametric models. '
	'Starts training before generating explanations...')
	cf_module.train(datamodule, t_configs)