Quantus is an eXplainable AI toolkit for responsible evaluation of neural network explanations

Home Page: https://quantus.readthedocs.io/

License: Other

Python 10.44% Jupyter Notebook 89.56%

deep-learning explainable-ai interpretability machine-learning pytorch quantification-evaluation-methods reproducibility tensorflow xai

quantus's People

Contributors

Stargazers

Watchers

quantus's Issues

New tutorial: for how to use Quantus with 1D shapes

Make a tutorial for how to use Quantus with 1D shapes
Reference PR: #74

Make a !pip install quantus in all tutorials

Currently, we rely on a local import but we should make a !pip install quantus in all tutorials instead

Evaluating transformer explainability

Hello @sebastian-lapuschkin ,

Thank you for the great work,

Please suggest, is there any sample reference example, for using this framework with transformer explainability like this one
https://github.com/hila-chefer/Transformer-Explainability
& this one https://github.com/berniwal/swin-transformer-pytorch

Best Regards,
@jaiswati

Ability to Specify Seed for RandomLogit

The current implementation of RandomLogit uses random to select target labels, but it doesn't seem to provide ways to control the random process.

I just wonder whether it would be a good idea to provide another argument (e.g., seed) for that.

The same idea might be applicable for ModelParameterRandomisation.

Default normalisation function

For a lot of quantification metrics (only checked the faithfulness_metrics.py for this) the default seems to be

self.normalise = self.kwargs.get("normalise", True)
self.normalise_func` = self.kwargs.get("normalise_func", normalise_by_negative)

with normalise_by_negative being defined in normalise_func.py as

def normalise_by_negative(a: np.ndarray) -> np.ndarray:
    """Normalise relevance given a relevance matrix (r) [-1, 1]."""
    if a.min() >= 0.0:
        return a / a.max()
    if a.max() <= 0.0:
        return -a / a.min()
    return (a > 0.0) * a / a.max() - (a < 0.0) * a / a.min()

I think this type of normalisation (as default) may lead to some unexpected/unintended behavior for many metrics (RegionPerturbation, for instance), since it normalises positive and negative parts of an attribution map by different values, and then changes the sign of the negative parts (thus basically taking the abs()). I believe this changes not only the ordering of attribution values (as abs() would), but also their relative magnitudes.

For this reason, a better default may be either self.normalise = self.kwargs.get("normalise", False) or self.normalise_func = self.kwargs.get("normalise_func", normalise_by_max)` ?

Suggestion on Readability of README

While going through README, I have noticed that the section that describes metrics implemented is very long. Perhaps, we could use the <details></details> to hide the content.

More precisely, we could do it like:

(below is snippet from README with the suggested approach)

...
The library contains implementations of the following evaluation metrics:

Faithfulness: ...

Paper 1
Paper 2

Robustness: ...

Paper 1
Paper 2

Update github pages

Replicate README on https://understandable-machine-intelligence-lab.github.io/Quantus/

taking absolute before normalisation makes normalise_by_negative defective

I'm not sure if this is intended, but all metrics first take the absolute of the attributions, then apply normalisation.

But if we want to normalise the positive and the negative sides differently like it's done in normalise_by_negative it would be more reasonable to normalise first, and then take the absolute.

Am I missing something, or is this a bug?

torch-xai: zennit

let's use zennit for computing xai for stuff captum can not deliver, ie in https://github.com/xai-quantification-toolbox/xai-quantification-toolbox/blob/main/nbs/nb_test_measures.ipynb

Include wrapper for innvestigate when it works with tf2.0

Just a reminder

Add metric's name to tqdm progressbar

Loosen restrictions on number of steps for Pixel-Flipping

Pixel-Flipping has a check called assert_max_steps related to the maximum number of steps passed as an argument in max_steps_per_input. This check seems too restrictive and I am not sure about its exact purpose or meaning; perhaps it's actually meant for Region Perturbation.

Instead, a natural check for the max_steps_per_input parameter would be an upper bound limit regarding the total number of pixels in the input to avoid flipping the same pixels more than once.

Quantus/quantus/metrics/faithfulness_metrics.py

Lines 1369 to 1372 in 1ecb500

 asserts.assert_max_steps( 

 max_steps_per_input=self.max_steps_per_input, 

 input_shape=x_batch_s.shape[2:], 

 )

Quantus/quantus/helpers/asserts.py

Lines 76 to 81 in 1ecb500

 def assert_max_steps(max_steps_per_input: int, input_shape: Tuple[int, ...]) -> None: 

 """Assert that max steps per inputs is compatible with the image size.""" 

 assert np.prod(input_shape) % max_steps_per_input == 0, ( 

 "Set 'max_steps_per_input' so that the modulo remainder " 

 "returns zero given the product of the input shape." 

 )

Potentially dead code

Hi Anna, please check this potentially dead/erroneous code. I think me might need to remove it.

Quantus/quantus/metrics/robustness_metrics.py

Lines 759 to 775 in 4dda6f5

 @property 

 def aggregated_score(self): 

 """ 

  Implements a continuity correlation score (an addition to the original method) to evaluate the 

  relationship between change in explanation and change in function output. It can be seen as an 

  quantitative interpretation of visually determining how similar f(x) and R(x1) curves are. 

  """ 

 return np.mean( 

 [ 

 self.similarity_func( 

 self.last_results[sample][self.nr_patches], 

 self.last_results[sample][ix_patch], 

 ) 

 for ix_patch in range(self.nr_patches) 

 for sample in self.last_results.keys() 

 ] 

 )

Quantus/quantus/metrics/randomisation_metrics.py

Lines 318 to 329 in 4dda6f5

 elif isinstance(y_batch, (float, int)): 

 y_batch_off = np.array( 

 [ 

 random.choice( 

 [ 

 y 

 for y in list(np.arange(0, self.num_classes)) 

 if y != y_batch 

 ] 

 ) 

 ] 

 )

Duplicated random and uniform options for perturb_baseline

Subject: Random number generation

The random and uniform values for the perturb_baseline parameter in the metrics are semantically overlapping. While both return the same values, the random function is more restrictive regarding its bounds.

Giving the user the option between random and uniform can be confusing, it would be preferable to only offer uniform instead.

Quantus/quantus/helpers/utils.py

Lines 78 to 79 in 1ecb500

 "random": float(random.random()), 

 "uniform": float(random.uniform(arr.min(), arr.max())),

[function missing] baseline_replacement_by_patch

Hey,

When I use the code, I have an error. It says: "AttributeError: module 'quantus' has no attribute 'baseline_replacement_by_patch'"

I indeed did not find such a function. But it exists in your tutorial https://github.com/understandable-machine-intelligence-lab/Quantus/blob/main/tutorials/tutorial_basic_example_all_metrics.ipynb

Could you please add it?

Thank you in advance,

Hanwei

Bug: Performance bottleneck in Pixel-Flipping algorithm

The Pixel-Flipping metric might have significant room for improvement regarding its performance because it is taking longer than expected for a batch of only one element. I'm not sure what the root cause of this issue might be.

Expected behavior

Calculation of Pixel-Flipping metric using Quantus for 100 steps should finish within 2 minutes of execution, which is roughly what my own implementation of Pixel-Flipping (see demo.ipynb) takes.

Current behavior

Calculation had run for 45 minutes (still wasn't finished) when I interrupted the execution.

Reproduction steps

Prerequisites

Download and unzip the pr-attachments.zip, which contains the tensors X and R—as .pt files—used in the Minimal Working Example (MWE) below.
Load tensors X and R using torch.load

MWE

import quantus
import torchvision
import numpy
import torch
from typing import Union, Dict

# Init required arguments
input: torch.Tensor = X.clone().detach()
x_batch: numpy.ndarray = input.numpy()
y_batch: numpy.ndarray = numpy.array([483])
a_batch: numpy.ndarray = R.clone().detach().numpy()
model = torchvision.models.vgg16(pretrained=True)
model.eval()

# Init metric
metric_params: Dict[str, Union[str,bool]] = {
  'perturb_baseline': 'uniform',
  'disable_warnings': True,
  "display_progressbar": True,
  "max_steps_per_input": 98,
}
metric: quantus.Metric = quantus.PixelFlipping(abs=True, normalise=False, **metric_params)

# Run Pixel-Flipping algorithm
call_params: Dict[str, bool] = {
  'channel_first': True,
}
scores = metric(model=model, x_batch=x_batch, y_batch=y_batch, a_batch=a_batch, **call_params)

Details

Both, X and R (relevance scores), are in NCHW format and have shape torch.Size([1, 3, 224, 224]).

Meaning of NCHW format:
- N: number of images in the batch
- C: number of channels of the image (3 for RGB, 1 for grayscale)
- H: height of the image
- W: width of the image

Unify Requirements of Perturb Functions and their Usage in the Metrics

The currently implemented perturb functions have different requirements, e.g., in terms of expected input shape, and are thus not really interchangable for all metrics. Requirements should either be unified for perturb functions, or at least an informative error message should be thrown in the metrics if an inapplicable perturb function is used.

Add support for TF saved model / TF hub models

In TF it's common practise to use tf.saved_model format, for (de)serialising trained models. This format has however a bit
other APIs than tf.keras.Model. It would be great to have Quantus support it.

Also whilst on it, imo it would make sense to also support https://tfhub.dev/, since this is also a popular format for sharing pre-trained TF models

Tutorial notebook TypeError: Callable() takes no arguments

Hi!
I'm trying to run the tutorial_model_training_explanation_robustness.ipynb notebook and I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-7-432ed12102bf>](https://localhost:8080/#) in <module>()
     50                                                                         y_batch=y_batch.cpu().numpy(),
     51                                                                         a_batch=None,
---> 52                                                                         **{"method": "Saliency", "device": device, "img_size": 28}) 
     53 
     54     print(f"Epoch {epoch+1}/{epochs} - loss {loss.item():.2f} - test accuracy: {(100 * test_acc):.2f}% - max sensitivity {np.mean(sensitivities[epoch]):.2f}")

1 frames
[/usr/local/lib/python3.7/dist-packages/quantus/metrics/robustness_metrics.py](https://localhost:8080/#) in __call__(self, model, x_batch, y_batch, a_batch, *args, **kwargs)
    432                 inputs=x_batch,
    433                 targets=y_batch,
--> 434                 **self.kwargs,
    435             )
    436 

[/usr/lib/python3.7/typing.py](https://localhost:8080/#) in __call__(self, *args, **kwargs)
    676             raise TypeError(f"Type {self._name} cannot be instantiated; "
    677                             f"use {self._name.lower()}() instead")
--> 678         result = self.__origin__(*args, **kwargs)
    679         try:
    680             result.__orig_class__ = self

TypeError: Callable() takes no arguments

Add random seed for reproducibility

Make sure multiple re-runs of each metric deliver the same values each time.

[Bug] NameError: name 'tf' is not defined

For quantus==0.1.4,
the following example code snippet comes from quantus.metrics.localisation_metrics.PointingGame.

import torch
from quantus.helpers.models import LeNet
import torchvision
from captum.attr import Saliency
from quantus import PointingGame

# Enable GPU.
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Load a pre-trained LeNet classification model (architecture at quantus/helpers/models).
model = LeNet()
model.load_state_dict(torch.load("tutorials/assets/mnist"))

# Load MNIST datasets and make loaders.
test_set = torchvision.datasets.MNIST(root='./sample_data', download=True)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=24)

# Load a batch of inputs and outputs to use for XAI evaluation.
x_batch, y_batch = iter(test_loader).next()
x_batch, y_batch = x_batch.cpu().numpy(), y_batch.cpu().numpy()

# Generate Saliency attributions of the test set batch of the test set.
a_batch_saliency = Saliency(model).attribute(inputs=x_batch, target=y_batch, abs=True).sum(axis=1)
a_batch_saliency = a_batch_saliency.cpu().numpy()

# Initialise the metric and evaluate explanations by calling the metric instance.
metric = PointingGame(abs=True, normalise=False)
scores = metric(model=model, x_batch=x_batch, y_batch=y_batch, a_batch=a_batch_saliency, **{})

It produces the following error:

NameError                                 Traceback (most recent call last)
/tmp/ipykernel_19546/1671044940.py in <module>
      1 import torch
----> 2 from quantus.helpers.models import LeNet
      3 import torchvision
      4 from captum.attr import Saliency
      5 from quantus import PointingGame

~/.pyenv/versions/3.8.5/envs/env/lib/python3.8/site-packages/quantus/__init__.py in <module>
----> 1 from .helpers import *
      2 from .metrics import *
      3 from .evaluation import *

~/.pyenv/versions/3.8.5/envs/env/lib/python3.8/site-packages/quantus/helpers/__init__.py in <module>
     18     from .models import *
     19 if __EXTRAS__:
---> 20     from .explanation_func import *

~/.pyenv/versions/3.8.5/envs/env/lib/python3.8/site-packages/quantus/helpers/explanation_func.py in <module>
     95 
     96 def generate_tf_explanation(
---> 97     model: tf.keras.Model, inputs: np.array, targets: np.array, **kwargs
     98 ) -> np.ndarray:
     99     """

NameError: name 'tf' is not defined

A similar error occurs in quantus/helpers/utils.py:L200, and seems to be related to type checking. I'm not sure if the best is to continue wrapping function declarations with if util.find_spec("tensorflow"):, or choose a different strategy.

package installation

setup.py should contain information about required packages

New metrics: implement Consistency and Sufficiency

Implement Consistency (that belongs to the Robustness category) and Sufficiency (that belongs to the Sufficiency category):

• Consistency: roughly, two instances x, x0 that get the same explanation should also have the same prediction. For instance, if two different images are assigned the same explanation, e(x) = e(x 0 ) = “contains a zebra”, then their assigned labels should also be the same.
• Sufficiency: if x is assigned an explanation e(x) = π that also holds for another instance x 0 (even if e(x 0 ) 6= π), then x 0 should have the same label as x.

Paper: https://arxiv.org/pdf/2202.00734.pdf

Allow for more variable shapes of attributions and inputs

Shapes of accepted inputs/attributions in metrics are hard-coded and very restrictve to specific data domains currently.

Mainly opening this issue as a reminder to extend functionality to other, non-image domains, and allow for shape flexibility.
For instance, in some applications, channel-wise attributions may carry meaningful information.

Of course, this may only be feasible for a subset of quantification metrics.

New metric: implement Infidelity

Implement Infidelity metric by Yeh et al., 2019 under faithfulness_metrics.py

References:

Paper: https://arxiv.org/abs/1901.09392
Captum implementation: https://captum.ai/api/_modules/captum/metrics/_core/infidelity.html#infidelity

Create a "disable_parameter_printing" feature

Quantus is designed to be easy to use: users are allowed to initialise the metrics without explicitly setting any metrics arguments (there are default params for everything) - which means that metrics can be easily looped over etc, making the library very flexible.

But since "the devil is in the details" when it comes to metric parameterisation - we should create a feature that highlights the current parameter setting of each metric - so to identify potential typos etc.

Could be a simple feature that prints the values of metrics' argument at metric initialisation

Create API documentation for the library

Using e.g., https://readthedocs.org/

Make params in perturb_funcs tunable

In e.g., uniform_sampling we might want to have different lower and upper bounds

Let's make a Metric Comparison Matrix

Let's make a Metric Cimparison metric, similar to Captum's: https://captum.ai/docs/algorithms_comparison_matrix)https://captum.ai/docs/algorithms_comparison_matrix.

Tasks:

Create a MetricMatrix.md file
Start adding things like:
- Short description e.g., a one-liner describing the metric
- Score range (tuple) e.g., [-1, 1]
- Bounded (bool) e.g., y/n
- Time Complexity
- Category
- Task/ Data/ Model - Agnostic?
- Stochastic vs
- etc etc ....

Fix unexpected pytest failing

See last commit: https://github.com/understandable-machine-intelligence-lab/Quantus/runs/6899421836?check_suite_focus=true
Possible solution: https://discuss.streamlit.io/t/typeerror-descriptors-cannot-not-be-created-directly/25639/19

Include parameterizations for zennit xai computations.

At the moment, the zennit wrapper only uses default parameterizations. The user should be able to pass parameters to canonizer, composite, and attributor as kwargs.

New metric: implement ROAD evaluation procedure

Implement ROAD https://github.com/tleemann/road_evaluation under faithfulness_metrics.py

Paper: https://arxiv.org/pdf/2202.00449.pdf

New metric: implement Relative Stability

Implement Relative Stability (belongs to the Robustness category).

• Relative Stability: leverages model information to evaluate the stability of an explanation with respect to the change in the a) input data, b) intermediate representations and c) output logits of the underlying prediction model.

Paper: https://arxiv.org/pdf/2203.06877.pdf

To clarify changes excepted in the PR, the should be reflected in the following folders:

quantus/metrics/faithfulness_metrics.py and quantus/metrics/robustness_metrics.py
tests/metrics/test_faithfulness_metrics.py and quantus/metrics/test_robustness_metrics.py
tests/metrics/test_faithfulness_metrics.py and quantus/metrics/test_robustness_metrics.py
and in the tutorial: https://github.com/understandable-machine-intelligence-lab/Quantus/blob/main/tutorials/tutorial_basic_example_all_metrics.ipynb
`tests/helpers/constants.py``
README.md

Small typo in Tutorial

Hi,

I have found a small typo in https://github.com/understandable-machine-intelligence-lab/Quantus/blob/main/tutorials/tutorial_basic_example_all_metrics.ipynb.

New data domain: Quantus + NLP = True

Goal: make Quantus source code compatible with NLP tasks, including the addition of a tutorial to showcase how Quantus can help quantify the goodness of NLP explanation

Determine toy dataset:
- Suggestively use the following library https://pytorch.org/text/ -- with a tutorial: https://pytorch.org/text/stable/tutorials/sst2_classification_non_distributed.html and off-the-shelf model
Go over Quantus source code to determine the scope of this PR

Updates are expected:

source code
tutorial
pytests

New tutorial: on how to customise metrics with own user-defined functions

Would be great with a tutorial that shows:

how to employ user-defined explanation functions into the tests
how to customise the metric tests like perturb_func, similarity_func etc

Maybe call the tutorial something like: tutorial-customise-metrics-with-user-defined-functions

Update setup.py for including zennit library

For pytorch case: Check if zennit is installed. If it is, neither captum nor zennit need to be installed

raise exceptions instead of asserting conditions

This is just a minor issue most of the time, but using assertions for checking input variables can lead to problems for people using this package.

The problem is, that assertions are deactivated when you use the optimization parameter -O on launching python. So as an example if a user wants to optimize the interpreter/compiler for memory-efficiency, all of the assertions in this package will be ignored.
Most people don't use this optimization parameter, but I think we should be aware of this issue and maybe even enforce this for newly written code.

Here are some discussions on stackoverflow which can be helpful:
https://stackoverflow.com/questions/944592/best-practice-for-using-assert
https://stackoverflow.com/questions/28608385/assert-asserting-when-debug-false
https://stackoverflow.com/questions/56990925/is-this-assert-for-development-not-for-production

TLDR:

use exceptions for the user-side of the package, where things can go wrong by unintended input/output
use assertions for the developer-side of the package, to assert conditions that should absolutely never go wrong, independent of user input

AOC metric computes AUC instead

The metric IterativeRemovalOfFeatures should compute AOC but is computing AUC instead—see snippets below as proof.

Bug Description

The value being computed and appended to the list last_results is AUC—see get_auc_score definition. It seems like the line of code commented out contains the correct AOC computation:

# Correct AOC computation
self.last_results.append(1-get_auc_score(preds, np.arange(0, len(preds))))

AUC Definition

Typo aside, being fixed in #112, the docstring should read area under the curve ~~(AOC)~~ (AUC).

Quantus/quantus/metrics/faithfulness_metrics.py

Lines 1434 to 1436 in 3a2f72c

 def get_auc_score(self): 

 """Calculate the area under the curve (AOC) score for several test samples.""" 

 return [np.trapz(np.array(results), dx=1.0) for results in self.all_results]

1. AOC computing AUC instead

Quantus/quantus/metrics/faithfulness_metrics.py

Lines 712 to 713 in 3a2f72c

 # self.last_results.append(1-auc(preds, np.arange(0, len(preds)))) 

 self.last_results.append(np.trapz(np.array(preds), dx=1.0))

2. AUC being appended to `all_results`

Quantus/quantus/metrics/faithfulness_metrics.py

Line 721 in 3a2f72c

self.all_results.append(self.last_results)

3. Final aggregated score contains AUC scores instead of AOC

Quantus/quantus/metrics/faithfulness_metrics.py

Lines 726 to 728 in 3a2f72c

 def aggregated_score(self): 

 """Calculate the area over the curve (AOC) score for several test samples.""" 

 return [np.mean(results) for results in self.all_results]

Enhancement: out metrics' **kwargs in *args and set default values

To make the parametrisation of metrics in Quantus more explicit (in order to mitigate the risks of user typos and other undefined behaviours e.g., when passing a larger kwargs for different perturb_func calls if large-scale experiments are performed), we want to update the arguments list of all metrics initialisations like def __init__(self, ...) in the library as follows:

Split kwargs for every Callable used in the metric init like explain_func, perturb_func and similar_func as follows: explain_func_kwargs, perturb_kwargs and similar_func_kwargs
Make sure to set default values to None where types are immutable in arguments list and reset it after to its actual data type like perturb_kwargs: Union[None, dict] = None
Remove *argssince it is useless

AND:

Update the docstrings to reflect these changes
Update code in the library where the metrics are called e.g., in the tutorials and pytests

Captum import when supplying pre-computed attributions

It seems weird that captum is a requirement when a user intends to supply pre-computed attributions.

use batched processing instead of processing by instance

Currently all of the metrics are more or less structured by the following scheme:

x: array
y: array
a: array

for x_instance, y_instance, a_instance in zip(x, y, a):
    for perturbation_step in range(perturbation_steps):
        x_perturbed = perturb_instance(x_instance, a_instance, perturbation_step)
        y_perturbed = model(x_perturbed)
        score = calculate_score_for_instance(y_instance, y_perturbed)

The choice of perturb_instance arguments are just for simplicity, the code is of course more complex than presented.

But this kind of implementation doesn't use the performance benefits from batched model-prediction and vectorized numpy functions.
Instead we could speed up computations by a magnitude if we would instead use the following approach:

x: array
y: array
a: array
batch_size: int

generator = BatchGenerator(x, y, a, batch_size)
for x_batch, y_batch, a_batch in next(generator):
    for perturbation_step in range(perturbation_steps):
        x_batch_perturbed = perturb_batch(x_batch, a_batch, perturbation_step)
        y_batch_perturbed = model(x_batch_perturbed)
        score = calculate_score_for_batch(y_batch, y_batch_perturbed)

Some of perturb_batch functions may need an inner for-loop again, but others could be computed on the whole batch for sure.
Depending on the dataset size and model complexity, this should lead to significant improvements in performance.

Should we still support python 3.6? Because its technically dead since december.

See https://endoflife.date/python

`quantus.explain` shows zennit warning, when generating explanatins for TF model using tf-explain

If zennit and tf-explain installed, running e.g.

a_batch = quantus.explain(
    model,
    x_batch,
    y_batch,
    method='GradCam',
    gc_layer='test_conv'
 )

prints UserWarning: Using quantus 'explain' function as an explainer without specifying 'attributor'in kwargs will produce a vanilla 'Gradient' explanation.
Nothing really broken here, it is just misleading. After uninstalling zennit, the warning is gone

Rotten link in README/Tutorials

Hi,

It seems that the link of this tutorial is rotten: https://github.com/understandable-machine-intelligence-lab/Quantus/blob/main/README.md?plain=1#L226.

The link is

https://.../quantus/blob/main/tutorials/tutorial_[sensitivity]_parameterisation.ipynb

and the tutorial actually locates at

https://.../Quantus/blob/main/tutorials/tutorial_[sensivitivty]_parameterisation.ipynb

Noting that I added the brackets to emphasize the part that the two links differ and causes 404.

Given that it looks like a small typo in the filename, we can fix it there and the link in README.md the same.

Update constants.py to incorporate changes from new releases

Update constants.py to incorporate changes from new releases

Include softmax_act in model.predict in params of metrics

Allow the user to specify softmax_act to True or False (for softmax or logit application) that is used in model.predict by adding it to kwargs of all the metrics

[bug] `quantus.evaluate` with method `GradCAM` gives error for common input

In the current version quantus==0.1.4, the following:

import torch
from torch import nn
from torchvision import models
import quantus

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        resnet18 = models.resnet18()
        children = list(resnet18.children())

        self.backbone = nn.Sequential(*children[:-2])
        self.head = nn.Sequential(
            *children[-2:-1],
            nn.Flatten(start_dim=1),
            children[-1]
        )

    def forward(self, batch):
        return self.head(self.backbone(batch))


model = Net()

x_batch = torch.rand(1, 3, 256, 256)
y_batch = [1]

a_batch = quantus.explain(
                model=model,
                inputs=x_batch,
                targets=y_batch,
                method='GradCAM',
                gc_layer=list(model.named_modules())[-6][1],
                normalise=True,
)


quantus.evaluate(
    metrics={
        "PointingGame": quantus.PointingGame(disable_warnings=True),
    },
    xai_methods={"GradCAM": a_batch},
    model=model.head,
    x_batch=model.backbone(x_batch),
    y_batch=y_batch,
    s_batch=np.ones(shape=(1, 1, 8, 8)),
    agg_func=np.mean,
    **{"explain_func": quantus.explain}
)

produces the error

ValueError: Ambiguous input shape. Cannot infer channel-first/channel-last order.

This error makes sense for attributions at the input layer, but is probably unintended behavior for GradCAM, as GradCAM is usually applied to an intermediate conv-layer.

PixelFlipping metric only flips first color channel

Hi,
I think I found a bug in the PixelFlipping implementation. Using the perturbation baseline "black", I noticed that only the first color channel (red) is set to 0. I would have expected all channels to be set to 0.
I initialized the metric as demonstrated in the tutorials with the following code:

quantus.PixelFlipping(**{
      "features_in_step": 28, 
      "perturb_baseline": "black", 
      "perturb_func": quantus.baseline_replacement_by_indices
})

Plotting the perturbed images for a couple of iterations leads to the following images, where top is produced by the current implementation and bottom is what I expected (perturbing all color channels):

I think the problem is caused here, because x_perturbed has dimension num_col_channels x img_size x img_size and a_ix only has dimension img_size x img_size.

I solved this locally with a quick workaround (which produced the bottom images in the figure above):

num_channels = 3
a_ix = np.concatenate([a_ix + (ch * len(a)) for ch in range(num_channels)])

However, since the perturbation is performed on normalized images, the normalized value for 0 can differ by color channel. This is not addressed in this quick fix.

Tutorial on how to use tensorflow models

Would be great with a tutorial that shows:

how to use tensorflow models

Maybe call the tutorial something like: tutorial-evaluate-with-tensorflow-model

	asserts.assert_max_steps(
	max_steps_per_input=self.max_steps_per_input,
	input_shape=x_batch_s.shape[2:],
	)

	def assert_max_steps(max_steps_per_input: int, input_shape: Tuple[int, ...]) -> None:
	"""Assert that max steps per inputs is compatible with the image size."""
	assert np.prod(input_shape) % max_steps_per_input == 0, (
	"Set 'max_steps_per_input' so that the modulo remainder "
	"returns zero given the product of the input shape."
	)

	@property
	def aggregated_score(self):
	"""
	Implements a continuity correlation score (an addition to the original method) to evaluate the
	relationship between change in explanation and change in function output. It can be seen as an
	quantitative interpretation of visually determining how similar f(x) and R(x1) curves are.
	"""
	return np.mean(
	[
	self.similarity_func(
	self.last_results[sample][self.nr_patches],
	self.last_results[sample][ix_patch],
	)
	for ix_patch in range(self.nr_patches)
	for sample in self.last_results.keys()
	]
	)

	elif isinstance(y_batch, (float, int)):
	y_batch_off = np.array(
	[
	random.choice(
	[
	y
	for y in list(np.arange(0, self.num_classes))
	if y != y_batch
	]
	)
	]
	)

	"random": float(random.random()),
	"uniform": float(random.uniform(arr.min(), arr.max())),

	def get_auc_score(self):
	"""Calculate the area under the curve (AOC) score for several test samples."""
	return [np.trapz(np.array(results), dx=1.0) for results in self.all_results]

	# self.last_results.append(1-auc(preds, np.arange(0, len(preds))))
	self.last_results.append(np.trapz(np.array(preds), dx=1.0))

	def aggregated_score(self):
	"""Calculate the area over the curve (AOC) score for several test samples."""
	return [np.mean(results) for results in self.all_results]

understandable-machine-intelligence-lab / quantus Goto Github PK

quantus's People

Contributors

Stargazers

Watchers

Forkers

quantus's Issues

Expected behavior

Current behavior

Reproduction steps

Prerequisites

MWE

Details

Bug Description

AUC Definition

1. AOC computing AUC instead

2. AUC being appended to all_results

3. Final aggregated score contains AUC scores instead of AOC

Recommend Projects

Recommend Topics

Recommend Org

2. AUC being appended to `all_results`