Git Product home page Git Product logo

l-forecaster / dynamask Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jonathancrabbe/dynamask

0.0 0.0 0.0 110 KB

This repository contains the implementation of Dynamask, a method to identify the features that are salient for a model to issue its prediction when the data is represented in terms of time series. For more details on the theoretical side, please read our ICML 2021 paper: 'Explaining Time Series Predictions with Dynamic Masks'.

License: MIT License

Python 100.00%

dynamask's Introduction

Dynamask - Explaining Time Series Predictions with Dynamic Masks

image

Code Author: Jonathan Crabbé ([email protected])

This repository contains the implementation of Dynamask, a method to identify the features that are salient for a model to issue its prediction when the data is represented in terms of time series. For more details on the theoretical side, please read our ICML 2021 paper: 'Explaining Time Series Predictions with Dynamic Masks'.

Part of the experiments in our paper are relying on FIT, another repository associated to the NeurIPS 2021 paper : 'What went wrong and when? Instance-wise feature importance for time-series black-box models'. We have included all the relevant files in the folder fit.

Installation

To install the relevant packages from shell:

  1. Clone the repository
  2. Create a new virtual environment with Python 3.8
  3. Run the following command from the repository folder:
    pip install -r requirements.txt #install requirements

When the packages are installed, Dynamask can directly be used.

Toy example

It is very easy to fit a mask on a time series model. Bellow, you can find a toy demonstration where we fit a mask to an input time series. In this case, the mask area is fixed to 0.1 (the 10% most important features are highlighted by the mask). All the relevant code can be found in the file mask.

import torch
from attribution.mask import Mask
from attribution.perturbation import GaussianBlur
from utils.losses import mse

torch.manual_seed(42)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Define a pseudo-black box:
def black_box(input):
   output = input[-1, :]  # The black-box returns the features of the last time step
   return output
# Define a random input:
X = torch.randn(10, 3).to(device) # The shape of the input has to be (T, N_features)

# Fit a mask to the input with a Gaussian Blur perturbation:
pert = GaussianBlur(device)
mask = Mask(pert, device)
mask.fit(X, black_box, loss_function=mse, keep_ratio=0.1, size_reg_factor_init=0.01) # Select the 10% most important features

# Plot the resulting saliency map:
mask.plot_mask()

If the proportion of features to select is unkown, a good approach is to fit a group of masks with different areas. Then, the extremal mask can be extracted from the group. The relevant code can be found in the file mask_group.

import torch
from attribution.mask_group import MaskGroup
from attribution.perturbation import GaussianBlur
from utils.losses import mse

torch.manual_seed(42)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Define a pseudo-black box:
def black_box(input):
   output = input[-1, :]  # The black-box returns the features of the last time step
   return output

# Define a random input:
X = torch.randn(10, 3).to(device) # The shape of the input has to be (T, N_features)

# Fit a group of masks to the input with a Gaussian Blur perturbation:
areas = [.1, .15, .2, .25] # These are the areas of the different masks
pert = GaussianBlur(device)
masks = MaskGroup(pert, device)
masks.fit(X, black_box, loss_function=mse, area_list=areas, size_reg_factor_init=0.01)

# Extract the extremal mask:
epsilon = 0.01
mask = masks.get_extremal_mask(threshold=epsilon)

# Plot the resulting saliency map:
mask.plot_mask()

Replicate experiments

All experiments in the ICML paper can be replicated easily. The necessary code is in experiments. Bellow , we detail the procedure for each experiment.

Replicate the Rare experiments

  1. Run the following command from the repository folder:
    python -m experiments.rare_feature # Runs the Rare Feature experiment
    python -m experiments.rare_time # Runs the Rare Time experiment
    To do the experiment with various seeds, please add the following specification to these commands:
    Options:
    --cv # An integer that sets the random seed (first run cv=0 , second run cv=1, ...)
  2. The results of these experiments are saved in the two following folders: Rare Feature and Rare Time. To process the results and compute the associated metrics run:
    python -m experiments.results.rare_feature.get_results
    python -m experiments.results.rare_time.get_results
    The following options need to be specified:
    Options:
    --CV # The number of runs you have done for the experiment
    --explainers # The baselines you have used among: dynamask, fo, fp, ig, shap (separated by a space)

Replicate the State experiment

  1. Run this command to generate the synthetic data and store it in data/state:

    python -m fit.data_generator.state_data --signal_len 200 --signal_num 1000
  2. Run the following command to fit a model together with a baseline saliency method:

    python -m fit.evaluation.baselines --explainer fit --train   

    To do the experiment with various baselines, please change the explainer:

    Options:
    --explainer # The baselines can be: fit, lime, retain, integrated_gradient, deep_lift, fo, afo, gradient_shap
    --train # Only put this option when fitting the FIRST baseline (this is to avoid retraining a model for each baseline)
    --cv # An integer that sets the random seed (first run cv=0 , second run cv=1, ...)
  3. The models and baselines saliency maps are all saved in this folder. Now fit a mask for each of these time series by running:

    python -m experiments.state

    Please use the same --cv option as for the previous command.

  4. The masks are all saved in this folder. To process the results and compute the associated metrics run:

    python -m experiments.results.state.get_results

    The following options need to be specified:

    Options:
    --CV # The number of runs you have done for the experiment
    --explainers # The baselines you have used among: dynamask, fo, afo, deep_lift, fit, gradient_shap, integrated_gradient, lime, retain (separated by a space)

Replicate the MIMIC experiment

  1. MIMIC-III is a private dataset. For the following, you need to have You need to have the MIMIC-III database running on a local server. For more information, please refer to the official MIMIC-III documentation.

  2. Run this command to acquire the data and store it:

    python fit/data_generator/icu_mortality.py --sqluser YOUR_USER --sqlpass YOUR_PASSWORD

    If everything happens properly, two files named adult_icu_vital.gz and adult_icu_lab.gz are stored in data/mimic.

  3. Run this command to preprocess the data:

    python fit/data_generator/data_preprocess.py

    If everything happens properly, a file patient_vital_preprocessed.pkl is stored in data/mimic.

  4. Run the following command to fit a model together with a baseline saliency method:

    python -m fit.evaluation.baselines --data mimic --explainer fit --train   

    To do the experiment with various baselines, please change the explainer:

    Options:
    --explainer # The baselines can be: fit, lime, retain, integrated_gradient, deep_lift, fo, afo, gradient_shap
    --train # Only put this option when fitting the FIRST baseline (this is to avoid retraining a model for each baseline)
    --cv # An integer that sets the random seed (first run cv=0 , second run cv=1, ...)
  5. The models and baselines saliency maps are all saved in this folder. Now fit a mask for each of these time series by running:

    python -m experiments.mimic

    Please use the same --cv option as for the previous command.

    Options:
    --cv # Same as in the previous command
    --area # The area of the mask to fit (a number between 0 and 1)
  6. The masks are all saved in this folder. To process the results and compute the associated metrics run:

    python -m experiments.results.state.plot_benchmarks

    The following options need to be specified:

    Options:
    --CV # The number of runs you have done for the experiment
    --explainers # The baselines you have used among: dynamask, fo, afo, deep_lift, fit, gradient_shap, integrated_gradient, lime, retain (separated by a space)
    --areas # The mask areas that you have computed (separated by a space)

    The resulting plots are saved in this folder.

Citing

If you use this code, please cite the associated paper:

@inproceedings{Crabbe2021Dynamask,
  title={Explaining Time Series Predictions with Dynamic Masks},
  author={Crabbé, Jonathan and van der Schaar, Mihaela},
  year={2021},
  booktitle={Proceedings of the 38-th International Conference on Machine Learning (ICML 2021)},
  organization={PMLR}
}

dynamask's People

Contributors

jonathancrabbe avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.