Git Product home page Git Product logo

inseq-team / inseq Goto Github PK

View Code? Open in Web Editor NEW
298.0 10.0 32.0 5.73 MB

Interpretability for sequence generation models πŸ› πŸ”

Home Page: https://inseq.org

License: Apache License 2.0

Makefile 0.65% Python 99.29% Dockerfile 0.06%
interpretability deep-learning transformers explainable-ai captum huggingface attribution-methods natural-language-processing generative-ai language-generation

inseq's People

Contributors

bbjoverbeek avatar carschno avatar danielsc4 avatar dependabot[bot] avatar g8a9 avatar gsarti avatar lsickert avatar nfelnlp avatar xuan25 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

inseq's Issues

Set tqdm to iterate over sentences when doing attributions for multiple sentences

Description

When computing attributions for a list of sentences the tqdm iterator prints out the iteration per token, which gives no insight into how far in you are with the corpus of sentences that you are attributing over. I would suggest that when attributing over a List of strings the tqdm iterates per sentence, and drops the per token iteration.

image

Commit to Help

Happy to have a go at this if you agree this could be nice.

  • I'm willing to help with this feature.

Error of input text not matching decoded output of tokenizer

πŸ› Bug Report

If text is used as input to the attribute method that contains spaces before non-alphanumeric characters, the decoding of the tokenizer does not match the input anymore, leading to the assert error.

πŸ”¬ How To Reproduce

Give as input a text containing a special character (e.g.: . or ?) preceded by a white space while using a GPT-like model (and tokenizer).

Code sample

Steps to reproduce the behavior:

import inseq
model = inseq.load_model('gpt2', attribution_method='input_x_gradient') # Or any other gpt-like model
model.attribute(input_texts='Hello . This is an example')

Returns the following error: AssertionError: Forced generations with decoder-only models must start with the input texts.

Environment

  • OS: macOS (but independent from the OS)
  • Python version: Python 3.9.15
  • Inseq version, get it with: 0.4.0

Expected behavior

The decoded text output from the tokenizer should be identical to the input text, allowing the assert to be correctly verified:

assert all(
    generated_texts[idx].startswith(input_texts[idx]) for idx in range(len(input_texts))
), "Forced generations with decoder-only models must start with the input texts."

Additional context

The issue is related to the type of tokenizer used, already reported in huggingface/transformers#21119. To solve the problem, it is recommended to use the clean_up_tokenization_spaces=False flag when decoding the text.

Use `compute_transition_scores` for step scores

Description

πŸ€— Transformers v4.26.0 introduces the compute_transition_scores function to simplify the return of log probabilities. Example taken from docs link above:

from transformers import GPT2Tokenizer, AutoModelForCausalLM
import numpy as np

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer.pad_token_id = tokenizer.eos_token_id
inputs = tokenizer(["Today is"], return_tensors="pt")

# Example 1: Print the scores for each token generated with Greedy Search
outputs = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=True, output_scores=True)
transition_scores = model.compute_transition_scores(
    outputs.sequences, outputs.scores, normalize_logits=True
)
input_length = inputs.input_ids.shape[1]
generated_tokens = outputs.sequences[:, input_length:]
for tok, score in zip(generated_tokens[0], transition_scores[0]):
    # | token | token string | logits | probability
    print(f"| {tok:5d} | {tokenizer.decode(tok):8s} | {score.numpy():.3f} | {np.exp(score.numpy()):.2%}")

# Output
|   262 |  the     | -1.414 | 24.33%
|  1110 |  day     | -2.609 | 7.36%
|   618 |  when    | -2.010 | 13.40%
|   356 |  we      | -1.859 | 15.58%
|   460 |  can     | -2.508 | 8.14%

Motivation

We want to use compute_transition_scores to calculate the probability step score in Inseq, and all derivative scores, with the purpose of ensuring a continued compatibility with Transformers.

Time tracking / Runtime report

πŸš€ Feature Request

Next to tqdm, it could be helpful to get some runtime report.
After generating many explanations, this could give insight about the average computation time needed for one example.

Inconsistent behavior for batched attribution

πŸ› Bug Report

Attribution patterns differ when more than one example is attributed at the same time, even for deterministic methods (e.g. IG).

πŸ”¬ How To Reproduce

Attribute the same sentence twice using any attribution model and method, once as the only attributed text and once with other sentences. The resulting attributions differ, despite the results predicted by the model are the same.

Code sample

from inseq import load

model = load("Helsinki-NLP/opus-mt-en-de", "integrated_gradients")
single_out = model.attribute("This is an example sentence")
multi_out = model.attribute(["This is an example sentence", "This is another example"])
assert single_out.attributions == multi_out[0].attributions # raises AssertionError

Environment

  • OS: Linux 20.04
  • Python version: Python 3.8

πŸ“ˆ Expected behavior

The attributions must be the same regardless of the number of examples that are attributed at once.

πŸ“Ž Additional context

The question is whether this is a bug or a methodological error due to the architecture of the seq2seq models. A first step would be to start from the BERT tutorial in Captum and change the methods to accept multiple examples in input. If the attribution patterns are consistent, in principle the problem is somewhere in the cross-attention mechanism.

.source_attributions tensor returns float64 with integrated gradients method

πŸ› Bug Report

The output tensor .source_attributions in FeatureAttributionSequenceOutput is of type float64 when using "integrated_gradients" method, rather than the expected float32.

πŸ”¬ How To Reproduce

Steps to reproduce the behavior:

  1. Run any model using "integrated_gradients" method
  2. Inspect the dtype of out.sequence_attributions[0].source_attributions

Code sample

import inseq
model = inseq.load_model("Helsinki-NLP/opus-mt-en-fr", "integrated_gradients")
out = model.attribute(
  "The developer argued with the designer because her idea cannot be implemented.",
  n_steps=100
)
print(out.sequence_attributions[0].source_attributions.dtype)

Environment

Python 3.8.16

πŸ“ˆ Expected behavior

The dtype should be float32.

πŸ“Ž Additional context

Other methods ("saliency", "input_x_gradient", "deeplift") return float32.
Interestingly, "discretized_integrated_gradients" also returns float32, but "layer_integrated_gradients" returns float64.

Add support for decoder-only models

πŸš€ Feature Request

Adding support for decoder-only models like GPT-2 on top of the AttributionModel abstraction. The change will involve a radical refactoring of the whole attribution pipeline to enable target-only attribution and passing Batch objects instead of EncoderDecoderBatch if decoder-only attribution is performed.

The output attribution classes would mostly stay the same, with the exception of source attributions becoming optional.

[Summary] Add metrics for feature attribution evaluation

πŸš€ Feature Request

The following is a non-exhaustive list of attention-based feature attribution methods that could be added to the library:

Method name Source Code implementation Status
Sensitivity Yeh et al. '19 pytorch/captum
Infidelity Yeh et al. '19 pytorch/captum
Log Odds Shrikumar et al. '17 INK-USC/DIG
Sufficiency De Young et al. '20 INK-USC/DIG
Comprehensiveness De Young et al. '20 INK-USC/DIG
Human Agreement Atanasova et al. '20 copenlu/xai-benchmark
Confidence Indication Atanasova et al. '20 copenlu/xai-benchmark
Cross-Model Rationale Consistency Atanasova et al. '20 copenlu/xai-benchmark
Cross-Example Rationale Consistency (Dataset Consistency) Atanasova et al. '20 copenlu/xai-benchmark
Sensitivity Yin et al. '22 Iuclanlp/NLP-Interpretation-Faithfulness
Stability Yin et al. '22 Iuclanlp/NLP-Interpretation-Faithfulness

Notes:

  1. The Log Odds metric is just the negative logarithm of the Comprehensiveness metric. The application of - log can be controlled by a parameter do_log_odds: bool = False in the same function. The reciprocal can be obtained for the Sufficiency metric.

  2. All metrics that control masking/dropping a portion of the inputs via a top_k parameter can benefit from a recursive application to ensure the masking of most salient tokens at all times, as described in Madsen et al. '21. This could be captured by a parameter recursive_steps: Optional[int] = None. If specified, a masking of size top_k // recursive_steps + int(top_k % recursive_steps > 0) is performed for recursive_steps times, with the last step having size equal to top_k % recursive_steps if top_k % recursive_steps > 0.

  3. The Sensitivity and Infidelity methods add noise to input embeddings, which could produce unrealistic input embeddings for the model (see discussion in Sanyal et al. '21). Both sensitivity and infidelity can include a parameter discretize: bool = False that when turned on replaces the top-k inputs with their nearest neighbors in the vocabulary embedding space instead of their noised versions. Using Stability is more principled in this context since fluency is preserved by the two step procedure presented by Alzantot et al. '18, which includes a language modeling component. An additional parameter sample_topk_neighbors: int = 1 can be used to control the nearest neighbors' pool size used for replacement.

  4. Sensitivity by Yin et al. '22 is an adaptation to the NLP domain of Sensitivity-n by Yeh et al. '19. An important difference is that the norm of the noise vector causing the prediction to flip is used as a metric in Yin et al. '22, while the original Sensitivity in Captum uses the difference between original and noised prediction scores. The first should be prioritized for implementation.

  5. Cross-Lingual Faithfulness by Zaman and Belinkov '22 (code) is a special case of the Dataset Consistency metric by Atanasova et al. 2020 in which the pair is constituted by an example and its translated variant.

Overviews

A Comparative Study of Faithfulness Metrics for Model Interpretability Methods, Chan et al. '22

Slow `DiscretizedIntegratedGradientAttribution` method, also on GPU

πŸ› Bug Report

Inference on a google colab GPU is very slow. There is no significant difference if the model runs on cuda or CPU

πŸ”¬ How To Reproduce

The following model.attribute(...) code runs for around 33 to 47 seconds both on a colab CPU or GPU. I tried passing the device to the model and the model.device confirms that it's running on cuda, but it still takes very long to run only 2 sentences. (I don't know the underlying computations for attribution enough to know if this is to be expected, or if this should be faster. If it's always that slow, then it seems practically infeasible to analyse larger corpora)

import inseq
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"

print(inseq.list_feature_attribution_methods())
model = inseq.load_model("google/flan-t5-small", attribution_method="discretized_integrated_gradients", device=device)

model.to(device)

out = model.attribute(
    input_texts=["We were attacked by hackers. Was there a cyber attack?", "We were not attacked by hackers. Was there a cyber attack?"],
)

model.device

Environment

  • OS: linux, google colab
  • Python version: Python 3.8.10
  • Inseq version: 0.3.3

Expected behavior

Faster inference with a GPU/cuda

(Thanks btw, for the fix for returning the per-token scores in a dictionary, the new method works well :) )

FeatureAttributionSequenceOutput has no __repr__ implemented

πŸ› Bug Report

I am running IG attributions for a list of strings, and was trying to work with the resulting inseq.data.attribution.FeatureAttributionOutput object. To inspect what is in there I was looking at the sequence_attributions, but I can't print this object because the objects in there don't have a __repr__ method.

πŸ”¬ How To Reproduce

Code sample

import inseq

model = inseq.load_model("gpt2", "integrated_gradients")

sens = ["this is the first sentence. followed by a second"]
prefix = [sen[:sen.index('.')+1] for sen in sens]

attributions = model.attribute(
    prefix,
    generated_texts=sens,
    n_steps=500,
    internal_batch_size=50
)

print(attributions.sequence_attributions)

Returns:

TypeError: __repr__ returned non-string (type dict)

Environment

  • Google Colab
  • Inseq 0.4.0

[Summary] Add popular NLG faithfulness evaluation datasets

πŸš€ Feature Request

In order to facilitate the evaluation of different interpretability techniques, I propose to identify a set of commonly used datasets from the literature, create πŸ€— Datasets loading scripts to have them in a shared format, and host them on the Inseq organization in the Hugging Face hub.

This would provide a shared interface for:

  • Faithfulness metrics applied at a dataset level.
  • Future support of instance attribution methods.

The following table summarizes some of the datasets used in the literature:

Name Task Data source Paper Description
SCAT Translation neulab/contextual-mt Yin et al. '21 Contextual coreference in translation, with disambiguating context highlights from translators
Lambada + Rationales Language Modeling keyonvafa/sequential-rationales Vafa et al. '21 Next word prediction with human-annotated previous relevant context
Europarl Gold Alignments Translation TBD TBD Gold alignments for various language pairs in the Europarl corpus

The ExNLP Datasets website summarizes various sources available for NLP explainability, verify what is relevant to generation.

Limit the plots displayed through `.show()` function

Limit the plots displayed through .show() function

Description

Currently, the .show() function on the attribution output will display plots for all generated attributions by default. For larger batches, this can lead to huge outputs in a notebook/ large numbers of HTML files being generated. It might be preferable to provide a sensible default value to the number of plots that are displayed in a notebook and allow users to specify for themselves how many/which attributions they want to have visualized by referring to their index.

Similar functionality is already possible now by manually choosing the indices of out.sequence_attributions and calling the .show() methods of the individual attribution outputs, so this function would mainly be a convenience function for new users.

Motivation

see above

Additional context

Commit to Help

  • I'm willing to help with this feature.

Add decoding probabilities to the `attribute` method

πŸš€ Feature Request

Allow users to extract the probabilities associated with each generated token at every attribution step. This behavior can be controlled by a parameter output_probabilities: bool = False passed to the attribute function.

πŸ”ˆ Motivation

Uncertanity features such as generation probabilities have proven useful in many cases to estimate the quality of the generated sentence (see Fomicheva et al., 2020 as an example for QE in NMT)

πŸ““ Notes

The Huggingface library does not support the extraction of token-by-token probabilities from the generate method at the moment.

Add new `FeatureAttributionOutput` class

πŸš€ Feature Request

Adding this new class to become the default output for the AttributionModel.attribute method. This will entail the following naming changes:

  • FeatureAttributionStepOutput --> FeatureAttributionRawStepOutput
  • FeatureAttributionOutput --> FeatureAttributionStepOutput
  • NEW FeatureAttributionOutput, replacing both OneOrMoreFeatureAttributionSequenceOutputs and OneOrMoreFeatureAttributionSequenceOutputsWithStepOutputs.

Advantages:

  • Unify the outputs into a single dataclass to ensure consistency in output types.
  • Allow for extra fields to preserve information about the attribution process in the generated output, with the purpose of avoiding values-only classes lacking the original information about models and methods used for attribution.
  • Extensible for other types of attribution

Initial formulation:

@dataclass
class FeatureAttributionOutput:
    """
    Output produced by the `AttributionModel.attribute` method.

    Attributes:
        sequence_attributions (list of :class:`~inseq.data.FeatureAttributionSequenceOutput`): List 
			containing all attributions performed on input sentences (one per input sentence, including 
			source and optionally target-side attribution).
		step_attributions (list of :class:`~inseq.data.FeatureAttributionStepOutput`, optional): List 
			containing all step attributions (one per generation step performed on the batch), returned if 
			`output_step_attributions=True`.
		info (dict with str keys and str values): Dictionary including all available parameters used to 
			perform the attribution. 
	"""
  • Move save_attributions and load_attributions inside the class, removing the global methods.
  • Add a show method calling the the one in every FeatureAttributionSequenceOutput and concatenating outputs if return_html=True.
  • Add a join method allowing to extend the sequence_attributions and step_attributions lists if info match, raising a ValueError: attributions produced under different settings cannot be combined error.

allow negative values for `attr_pos_end` parameter

Allow negative values for attr_pos_end

Description

allow negative values to be defined for the attr_pos_end parameter.

Motivation

This would make it possible to e.g. define that the last token (which is often just the EOS token) should be removed from the generated attributions. It probably needs to be evaluated if this is possible for batched attributions, but I think at least for singular attributions, it could be a nice quality-of-life feature. Especially when attributing over multiple sentences where using a positive value is more complicated due to different sentence lengths.

Additional context

Commit to Help

  • I'm willing to help with this feature.

`petals` compatibility issue tracker

πŸ› Bug Report

In principle, inseq should work with the DistributedBloomModel implemented in the petals package to perform feature attribution of the 176B Bloom model in a distributed setup. However, some compatibility issues currently undermine the interoperability of the two libraries.

πŸ”¬ How To Reproduce

Refer to bigscience-workshop/petals#178 for additional precisions.

`CUDA out of memory` for larger datasets during attribution

πŸ› Bug Report

When loading inseq with a larger dataset, on a CUDA device, an out-of-memory error is occurring regardless of the defined batch_size. I believe that is is caused by the call to self.encode inattribution_model.py lines 345 and 347, which is operating on the full inputs instead of a single batch and moves all inputs to the CUDA device after the encoding.

πŸ”¬ How To Reproduce

Steps to reproduce the behavior:

  1. Load any model without pre-generated targets
  2. Load a larger dataset with at least 1000 samples
  3. Call the .attribute() method with any batch_size parameter

Code sample

Environment

  • OS: macOS

  • Python version: 3.10

  • Inseq version: 0.4.0

Expected behavior

The input texts should ideally only be encoded or moved to the GPU once they are actually processed.

Additional context

Visualize Attention Weights for a Decoder Only Model

Question

How can I visualize the attention weights for a decoder only model like Pythia for a given input prompt?

Additional context

I went over the tutorial here which uses an Encoder-Decoder model and wanted to try this out for a Decoder only model

I tried to replace the model name only but it does not seem to work -

model = inseq.load_model("EleutherAI/pythia-70m-deduped", "input_x_gradient")

out = model.attribute(
    input_texts="Hello everyone, hope you're enjoying the tutorial!",
    attribute_target=True,
    method="attention"
)
# out[0] is a shortcut for out.sequence_attributions[0]
out.sequence_attributions[0].source_attributions.shape

but I get the error -

AttributeError                            Traceback (most recent call last)
[<ipython-input-5-6df0e921faca>](https://localhost:8080/#) in <cell line: 11>()
      9 )
     10 # out[0] is a shortcut for out.sequence_attributions[0]
---> 11 out.sequence_attributions[0].source_attributions.shape

AttributeError: 'NoneType' object has no attribute 'shape'

However strangely I can still look at the outputs using -

out.sequence_attributions[0]._aggregator
out.show()

Is this the intended functioning?

Also I would love to get some help in interpreting the generated plot

image

I'm confused about why are there some full rows and some rows with certain values masked and what exactly does a cell signify. I know this might be a trivial thing :(

Checklist

  • I've searched the project's issues.

GPT2 Integrated Gradients - empty input gives false results

πŸ› Bug Report

When leaving the input texts empty for GPT2 with integrated gradients, the saliency map seems to be incorrect and giving false results. The goal is to only give <|endoftext|>, the BOS token, as input (and let GPT-2 generate from nothing basically), which can be done by leaving the input empty.

image

The problem is here:

sequences = self.attribution_model.formatter.get_text_sequences(self.attribution_model, batch)

@staticmethod
def get_text_sequences(attribution_model: "DecoderOnlyAttributionModel", batch: DecoderOnlyBatch) -> TextSequences:
return TextSequences(
sources=None,
targets=attribution_model.convert_tokens_to_string(batch.input_tokens, as_targets=True),
)

The call to TextSequences in this method sets skip_special_tokens to True, removing the <|endoftext|> from the input. This also prevents a user from giving <|endoftext|> as the only input (and at the start of the generated text), since it is removed in the input. In that case, when running, there will be an error that the generated text does not begin with the input text.

It can be resolved by temporarily changing the line to:

sequences = TextSequences(
            sources=None,
            targets=self.attribution_model.convert_tokens_to_string(batch.input_tokens, as_targets=True, skip_special_tokens=False),
        )

image

However, the feature attribution is zero for every <|endoftext|> token in the input and the output. I'm not sure whether or not this is meant to be, the same process with the ecco package gives attribution to this token. Also, the first token (in this case This) gets zero attribution, which is probably not supposed to be the case.

Summary:

  1. Visual glitch when leaving the GPT-2 input empty.
  2. Unable to give <|endoftext|> as input because it is removed when processing.
  3. The temporary fix described above reveals that the feature attribution to <|endoftext|> is zero. This is probably not correct.

πŸ”¬ How To Reproduce

Steps to reproduce the behavior:

  1. Run the code sample.

Code sample

import inseq
model = inseq.load_model("gpt2", "integrated_gradients")
model.attribute(
    "",
    "This is a demo sentence."
).show()

Environment

  • OS: Windows 10
  • Python version: 3.10.9
  • Inseq version: 0.5.0.dev0 (pulled from the main branch on 1 June 2023)

Expected behavior

See bug report. This is the integrated gradients result from the ecco package on the same sentence, also using integrated gradients:
image
I assume this would be correct, however, they leave the baseline default.

models cannot be loaded with instantiated `PreTrainedTokenizerFast`

πŸ› Bug Report

When loading a model with an already instantiated fast tokenizer from Huggingface, below error is thrown:

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'MBart50TokenizerFast(name_or_path='facebook/mbart-large-50-many-to-many-mmt', vocab_size=250054, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '', 'eos_token': '', 'unk_token': '', 'sep_token': '', 'pad_token': '', 'cls_token': '', 'mask_token': '', 'additional_special_tokens': ['ar_AR', 'cs_CZ', 'de_DE', 'en_XX', 'es_XX', 'et_EE', 'fi_FI', 'fr_XX', 'gu_IN', 'hi_IN', 'it_IT', 'ja_XX', 'kk_KZ', 'ko_KR', 'lt_LT', 'lv_LV', 'my_MM', 'ne_NP', 'nl_XX', 'ro_RO', 'ru_RU', 'si_LK', 'tr_TR', 'vi_VN', 'zh_CN', 'af_ZA', 'az_AZ', 'bn_IN', 'fa_IR', 'he_IL', 'hr_HR', 'id_ID', 'ka_GE', 'km_KH', 'mk_MK', 'ml_IN', 'mn_MN', 'mr_IN', 'pl_PL', 'ps_AF', 'pt_XX', 'sv_SE', 'sw_KE', 'ta_IN', 'te_IN', 'th_TH', 'tl_XX', 'uk_UA', 'ur_PK', 'xh_ZA', 'gl_ES', 'sl_SI']}, clean_up_tokenization_spaces=True)'. Use 'repo_type' argument if needed.

πŸ”¬ How To Reproduce

Steps to reproduce the behavior:

  1. try to run load a model with instantiated fast tokenizer (see code sample below)

Code sample

import inseq
from transformers import (MBartForConditionalGeneration, MBart50TokenizerFast)

model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")

de_tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-many-mmt", src_lang="de_DE", tgt_lang="ko_KR")

attr_model = inseq.load_model(model, "attention", tokenizer=de_tokenizer)

Environment

  • OS: macOS

  • Python version: 3.10.9

  • Inseq version: 0.4.0

Expected behavior

Providing pretrained fast-tokenizers should be supported.

Additional context

The issue seems to originate in huggingface_model.py where several type definitions are set as transformers.PreTrainedTokenizer instead of transformers.PreTrainedTokenizerBase, effectively disallowing any Fast tokenizer. the issue seems to be line 112 in particular. Changing the type definitions solves the issue

Minor fixes to v0.3

πŸ› Bug Report

Track here minimal fixes needed to v0.3:

  • Bump required version of transformers to 4.22.0 in pyproject.toml, required for M1 support and new target tokenization API
  • Fix tqdm finishing at N-1 in attribute.

Minor issues with v0.2.0, update CLI commands

πŸ› Bug Report

  • FeatureAttributionSequenceOutput.show() raises an error in console mode due to TokenWithId having replaced str tokens.

  • Loading a FeatureAttributionOutput object with load() should instantiate default AggregableMixin attributes to guarantee that show() will work out of the box after loading.

  • format_input_texts in AttributionModel can be moved to utils/misc, since it does not require self.

  • The __repr__ of TensorWrapper and FeatureAttributionOutput classes should direct to the prettified __str__ representation by default.

  • The CLI in __main__ is not working; it should be adapted to the updated library and refactored so that inseq attribute text [PARAMS] calls the normal attribution function. A future inseq attribute file [PARAMS] command will be added to directly attribute sentences from a file. Create a separate commands folder in the library to group those.

How to programmatically extract attribution scores per token?

Checklist

  • I've searched the project's issues.

❓ Question

How do I programmatically extract the per-token scores to have them in a list or dictionary, mapped to each token?

I understand how to show the scores per token visually, but I don't how to extract them from the "out" object for further downstream processing

model = inseq.load_model("google/flan-t5-base", attribution_method="discretized_integrated_gradients")
out = model.attribute(
    input_texts=["We were attacked by hackers. Was there a cyber attack?", "We were not attacked by hackers. Was there a cyber attack?"],
)
out.sequence_attributions[0]
out.show()

HuggingFace's FSMTForConditionalGeneration is not properly integrated in the library

πŸ› Bug Report

I get an error when trying to load an FSMTForConditionalGeneration model. It doesn't seem to have a get_decoder function, and instead the decoder attribute should be used. In this thread, we can also collect other seq2seq and decoder-only models from HuggingFace that may not be integrated properly in Inseq.

πŸ”¬ How To Reproduce

Steps to reproduce the behavior:

I have only tested it on the fix-macos-issues branch, but I suspect it is a more general problem.
Loading a FSMT model leads to an AttributeError: 'FSMTForConditionalGeneration' object has no attribute 'get_decoder'

Code sample

model = inseq.load_model("facebook/wmt19-en-de", "integrated_gradients")
out = model.attribute(
  "The developer argued with the designer because her idea cannot be implemented.",
  n_steps=100
)
out.show()

Environment

  • OS: macOS
  • Python version: 3.10.7

πŸ“ˆ Expected behavior

Inseq should load the FSMT model without any problems.

[Summary] Add perturbation feature attribution methods

πŸš€ Feature Request

The following is a non-exhaustive list of perturbation-based feature attribution methods that could be added to the library:

Method name Source In Captum Code implementation Status
(Layer) Feature Ablation1 - βœ… pytorch/captum
Occlusion Zeiler and Fergus '13 βœ… pytorch/captum βœ…
Shapley Value Sampling Castro et al. '09 βœ… pytorch/captum
Lime Ribeiro et al. '16 βœ… pytorch/captum βœ…
KernelShap Lundberg and Lee '17 βœ… pytorch/captum
Editing 2 - - -
Greedy Rationalization 3 Vafa et al. '21 - keyonvafa/sequential-rationales
Information Bottleneck Jiang et al. '20 - DFKI-NLP/thermostat
BayesLime Slack et al. '21 - dylan-slack/Modeling-Uncertainty-Local-Explainability
BayesSHAP Slack et al. '21 - dylan-slack/Modeling-Uncertainty-Local-Explainability
Input Reduction Feng et al. '18 - -
Input Marginalization Kim et al. '20 - -
Occlusion & Language Modeling Harbecke and Alt '20 - DFKI-NLP/OLM
Context Probing 4 CΓ­fka and Liutkus '22 - cifkao/context-probing
Weighted SHAP Kwon and Zou '22 - ykwon0407/WeightedSHAP
Value Zeroing Mohebbi et al. '23 - hmohebbi/ValueZeroing #173
Comprehensiveness-as-a-metric Zhou et al. '23 - YilunZhou/solvability-explainer
Sufficiency-as-a-metric Zhou et al. '23 - YilunZhou/solvability-explainer
Causal Tracing Meng et al. '22 - kmeng01/rome
Attention Knockout5 Geva et al. '23 - -
ReAGent Zhao et al. '24 - casszhao/ReAGent #250
SyntaxSHAP Amara et al. '24 - k-amara/syntax-shap

Notes:

  1. For more information on Editing, see point 3 in #112 .

Footnotes

  1. Called ablation, but perform masking of features using a baseline.
  2. Editing replaces tokens with their nearest neighbors in the vocabulary embedding space and measures saliency as the drop in performance for the target. In the future, this can allow users to specify a custom editing strategy via an input Callable.
  3. Possibly overlapping with feature ablation up to some measure.
  4. Valid only for decoder-only models.
  5. Verify whether it would be exactly equivalent to Value Zeroing, include only if functionally different (alias otherwise).

Add custom attribution baseline

πŸš€ Feature Request

Add an optional baselines field to the attribute method of AttributionModel. If not specified, baselines takes a default value of None and preserves the default behavior of using UNK tokens as a "no-information" baseline for attribution methods requiring one (e.g. integrated gradients, deeplift). The argument can take one of the following values:

  • str: The baseline is an alternative text. In this case, the text needs to be encoded and embedded inside FeatureAttribution.prepare to fill the baseline_ids and baseline_embeds fields of the Batch class. For now, only strings matching the original input length after tokenization are supported.

  • sequence(int): The baseline is a list of input ids. In this case, we embed the ids as described above. Again, the length must match the original input ids length.

  • torch.tensor: We would be interested in passing baseline embeddings explicitly, e.g. to allow for baselines not matching the original input shape that could be derived by averaging embeddings of different spans. In this case, the baseline embeddings field of Batch is populated directly (after checking that the shape is consistent with input embeddings) and the baseline ids field will be populated with some special id (e.g. -1) to mark that the ids were not provided. Important: This modality should raise a ValueError if used in combination with a layer method since layer methods that require a baseline use baseline ids explicitly as inputs for the forward_func used for attribution instead of baseline embeddings.

  • tuple of previous types: If we want to specify both source and target baselines when using attribute_target=True, the input will be a tuple of one of the previous types. The same procedure will be applied separately to define source and target baselines, except for the encoding that will require the tokenizer.as_target_tokenizer() context manager to encode strings.

  • list or tuple of lists of previous types: When multiple baselines are specified, we return the expected attribution score (i.e. average, assuming normality) by computing attributions for all available baselines and averaging the final results. See Section 2.2 of Erion et al. 2020 for more details.

πŸ”ˆ Motivation

When working on minimal pairs, we might be interested in defining the contribution of specific words in the source or the target prefix not only in absolute terms by using a "no-information" baseline, but as the relative effect between the words composing the pair. Adding the possibility of using a custom baseline would enable this type of comparisons.

πŸ›° Notes

  • It will be important to validate whether the hooked method makes use of a baseline via the use_baseline attribute, raising a warning that the value of the custom input baseline would be ignored otherwise

  • Since baselines will support all input types (str, ids, embeds), it would be the right time to enable such support for the input of the attribute function. This could be achieved by an extra attribution_input field set to None by default that will substitute input_texts in the call to prepare_and_attribute, and get set to input_texts if not specified.

Bug-Tracker MPS issues

πŸ› Bug Report

Even after updating to the newest pytorch version 1.13.1 several issues with the mps-backend still remain when it is enabled in the code. There still seems to be some inconsistency across the different devices depending on the operations that are run, as can be seen below.

The goal of this issue is primarily to collect and highlight these problems.

πŸ”¬ How To Reproduce

Steps to reproduce the behavior:

  1. go to inseq/utils/torch_utils and change cpu to mps in line 229 to enable the mps-backend
  2. run make fast-test to run the tests

Code sample

see above

Environment

  • OS: macOS
  • Python version: 3.9.7

Screenshots

Running the tests this way generates the following error report:

========================================================================================== short test summary info ===========================================================================================
FAILED tests/attr/feat/test_feature_attribution.py::test_mcd_weighted_attribution - NotImplementedError: The operator 'aten::remainder.Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
FAILED tests/models/test_huggingface_model.py::test_attribute_slice_seq2seq - RuntimeError: shape '[2, 1]' is invalid for input of size 1
FAILED tests/models/test_huggingface_model.py::test_attribute_decoder - NotImplementedError: The operator 'aten::cumsum.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
==================================================================== 3 failed, 25 passed, 442 deselected, 6 warnings in 76.36s (0:01:16) =====================================================================

When run with the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 set, the following errors are still occuring:

========================================================================================== short test summary info ===========================================================================================
FAILED tests/models/test_huggingface_model.py::test_attribute_slice_seq2seq - RuntimeError: shape '[2, 1]' is invalid for input of size 1
FAILED tests/models/test_huggingface_model.py::test_attribute_decoder - AssertionError: assert 26 == 27
==================================================================== 2 failed, 26 passed, 442 deselected, 6 warnings in 113.36s (0:01:53) ====================================================================

These errors do not occur when running the tests on other backends, implying that there is still some inconsistency between mps and the other torch backends.

πŸ“ˆ Expected behavior

All tests should run consistently across all torch backends.

πŸ“Ž Additional context

[Summary] Add gradient-based attribution methods

πŸš€ Feature Request

The following is a non-exhaustive list of gradient-based feature attribution methods that could be added to the library:

Method name Source In Captum Code implementation Status
DeepLiftSHAP - βœ… pytorch/captum
GradientSHAP1 Lundberg and Lee '17 βœ… pytorch/captum
Guided Backprop Springenberg et al. '15 βœ… pytorch/captum
LRP 2 Bach et al. '15 βœ… pytorch/captum
Guided Integrated Gradients Kapishnikov et al. '21 PAIR-code/saliency
Projected Gradient Descent (PGD) 3 Madry et al. '18, Yin et al. '22 uclanlp/NLP-Interpretation-Faithfulness
Sequential Integrated Gradients Enguehard '23 josephenguehard/time_interpret
Greedy PIG 4 Axiotis et al. '23
AttnLRP Achtibat et al. '24 rachtibat/LRP-for-Transformers

Notes:

  1. The Deconvolution method can also be added, but it seems to perform the same procedure as Guided Backprop, so it wasn't included to avoid deduplication.

Footnotes

  1. The method was already present in inseq but was removed due to instability in the single example vs. batched setting, reintroducing it will need this problem to be fixed.
  2. Custom rules for the supported architectures need to be defined in order to adapt the LRP attribution method to our use-case. An existing implementation of LRP rules for Transformer models in Tensorflow is available here: [lena-voita/the-story-of-heads](https://github.com/lena-voita/the-story-of-heads).
  3. The method leverage gradient information to perform adversarial replacement, so its collocation in the gradient-based family should be reviewed.
  4. Similar to Sequential Integrated Gradient, but instead of focusing on one word at a time, at every iteration the top features identified by attribution are fixed (i.e. baseline is set to identity) and the remaining ones are attributed again in the next round.

Add support for PEFT models

Add support for PEFT models

Description

Currently, only models corresponding to the PreTrainedModel instance are supported. It would be useful to add support for models using Parameter-Efficient Fine-Tuning (πŸ€— PEFT) methods.

Motivation

Adding support for πŸ€— PEFT models would allow the same analyses to be performed on models optimised and trained to be efficient on consumer hardware.

Additional context

Mostly tbd, as PEFT uses a small number of different (trainable) parameters to those in the original PreTrainedModel model.

Commit to Help

  • I'm willing to help with this feature.

Export the visualization without jupyter

πŸš€ Feature Request

Accessing the visualization (via .show()) currently requires jupyter. It would be nice to have an option to export it as an image from the console.

πŸ”ˆ Motivation

With out being a FeatureAttributionOutput, html = out.show(return_html=True) returns an error:

AttributeError: return_html=True is can be used only inside an IPython environment.

If the HTML content is returned without error, one can use e.g. imgkit to create images.

Screenshot from 2022-09-18 23-08-55

Add optional `NoiseTunnel` smoothing wrapper

Description

This issue requests the inclusion of a wrapper for the NoiseTunnel method to make it available for all attribution classes.

Motivation

Smoothing techniques like the one proposed by Smilkov et al. 2017 can provide a more robust estimation of feature attributions, but they are largely ignored for NLP applications. Including support for noise-injecting techniques in the library would encourage their adoption in the broader research community.

Migrate HTML visualizations to Gradio Blocks

πŸš€ Feature Request

Use the newly introduced PySvelte library and the Svelte framework to deduplicate HTML visualization from the main body of the library.

Update: Provided the new capabilities of Gradio v3.0 and the introduction of support for custom components with Gradio Blocks (using Svelte as frontend) and tabbed interfaces enabling multi-visualization widgets, Gradio becomes the most simple and interesting choice for Inseq visualizations.

torchtyping requires typeguard version <3.0

πŸ› Bug Report

In the requirements.txt it is stated that typeguard==3.0.1, but torchtyping is incompatible with typeguard versions >3.0 (https://github.com/patrick-kidger/torchtyping#installation).

For me this led to issues with importing inseq, raising the following import error:

ImportError: cannot import name 'LiteralString' from 'typing_extensions'

However, it turned out the issue there did not stem from the typing_extensions library, but the typeguard version: once I had set that to version 2.13.3 the error disappeared.

[Summary] Add internals-based feature attribution methods

πŸš€ Feature Request

The following is a non-exhaustive list of attention-based feature attribution methods that could be added to the library:

Method name Source Code implementation Β Status
Last-Layer Attention Jain and Wallace '19 successar/AttentionExplanation βœ…
Aggregated Attention Jain and Wallace '19 successar/AttentionExplanation βœ…
Attention Flow Abnar and Zuidema '20 samiraabnar/attention_flow
Attention Rollout Abnar and Zuidema '20 samiraabnar/attention_flow
Attention with Values Norm (Attn-N) Kobayashi et al '20 gorokoba560/norm-analysis-of-transformer
Attention with Residual Norm (AttnRes-N) Kobayashi et al '20 gorokoba560/norm-analysis-of-transformer
Attention with Attention Block Norm (AttnResLn-N or LnAttnRes-N) Kobayashi et al '21 gorokoba560/norm-analysis-of-transformer
Attention-driven Relevance Propagation Chefer et al. '21 hila-chefer/Transformer-MM-Explainability
ALTI+ Ferrando et al '22 mt-upc/transformer-contributions-nmt
GlobEnc Modarressi et al. '22 mohsenfayyaz/globenc
Attention with Attention Block + FFN Norm (AttnResLnFF-N or LnAttnResFF-N) Kobayashi et al '23 -
Attention x Transformer Block Norm Kobayashi et al '23 -
Logit Ferrando et al '23 mt-upc/logit-explanations
ALTI-Logit Ferrando et al '23 mt-upc/logit-explanations
DecompX Modarressi et al '23 mohsenfayyaz/DecompX

Notes:

  1. Add the possibility to scale attention weights by the norm of value vectors, shown to be effective for alignment and encoder models (Ferrando and Costa-jussΓ  '21, Treviso et al. '21)
  2. The ALTI+ technique is an extension of the ALTI method by Ferrando et al. '22 (paper, code) to Encoder-Decoder architectures. It was recently used by the Facebook team to detect hallucinated toxicity by highlighting toxic keywords paying attention to the source (NLLB paper, Figure 31).
  3. Attention Flow is very computationally expensive to compute but has proven SHAP guarantees for same-layer attribution, which is not the case for Rollout or other methods. Flow and rollout should be propagation methods rather than stand-alone approaches since they are used for most attention-based attributions.
  4. GlobEnc corresponds roughly to Attention x Transformer Block Norm but ignores the FFN part, that in the latter is incorporated by a localized application of Integrated Gradients with 0-valued baselines (authors' default)

Generation pass seems not to be batched.

πŸ› Bug Report

I'm trying to generate multiple attributions with a large LM (10B+ params) on a dataset of 2000+ sentences and no constrained decoding.
Apparently, the generate step in the pipeline crashes with CUDA OOM no matter the batch_size I set (even with batch_size=1). The generation itself seems not batched since if I pass a smaller set of texts, the attribution goes smoothly.

πŸ”¬ How To Reproduce

Steps to reproduce the behavior:

model = inseq.load_model(
    args.model_name_or_path, # 10B+ model
    "integrated_gradients",
    load_in_8bit=True,
    device_map="auto",
)

out = model.attribute(
    input_texts=texts, # 2000+ sentences
    n_steps=50,
    return_convergence_delta=True,
    step_scores=["probability"],
    batch_size=1,
)

Whereas if I do

model = inseq.load_model(
    args.model_name_or_path,
    "integrated_gradients",
    load_in_8bit=True,
    device_map="auto",
)

# raise NotImplementedError()
n_batches = len(texts) // args.batch_size
print("Splitting texts into n batches", n_batches)
batches = np.array_split(texts, n_batches)

for batch in tqdm(batches, desc="Batch", total=len(batches)):
    out = model.attribute(
        input_texts=batch.tolist(),
        n_steps=50,
        return_convergence_delta=True,
        step_scores=["probability"],
        batch_size=len(batch),
        internal_batch_size=len(batch),
        generation_args=asdict(generation_args),
        show_progress=True,
    )

it all seems to work.

Environment

  • OS: Linux / Windows / macOS]
  • Python version, get it with:
  • Inseq version: 0.4.0

implementation of get_post_variable_assignment_hook

Question

I'm not sure if I'm looking in the wrong places, or if it is missing, but I cannot find an implementation for get_post_variable_assignment_hook which is mentioned in inseq.utils' s __all__ and used in the value-zeroing implementation.

Checklist

  • I've searched the project's issues.

Bug: MPS not working properly on pytorch 1.12

πŸ› Bug Report

As long as pytorch 1.12 is still used (basically until 1.13.1 comes out), the "mps" backend seems to be too unstable to use it, failing several of the tests. Even setting PYTORCH_ENABLE_MPS_FALLBACK=1 in the environment does not fully remove this issue.

πŸ”¬ How To Reproduce

Steps to reproduce the behavior:

  1. run make fast-test (or any other command) on macOS with "mps" support

Environment

  • OS: macOS
  • Python version: 3.9.7

πŸ“ˆ Expected behavior

Tests should run successfully

πŸ“Ž Additional context

A quickfix would be to set the default device in inseq.utils.torch_utils.py to "cpu" for mps-environments as well for now, until pytorch 1.13.1 is released.

def get_default_device() -> str:
    if is_cuda_available() and is_cuda_built():
        return "cuda"
    elif is_mps_available() and is_mps_built():
        return "cpu"
    else:
        return "cpu"

Add Saliency Cards to documentation

Description

Saliency cards (Paper | Repository) introduce a structured framework to document feature attribution methods' strengths and applicability to different use-cases. Introducing saliency cards specific to sequential generation tasks would help Inseq users in selecting more principled approaches for their analysis.

Motivation

Copying from the original paper's abstract:

Saliency methods are a common class of machine learning interpretability techniques that calculate how important each input feature is to a model’s output. We find that, with the rapid pace of development, users struggle to stay informed of the strengths and limitations of new methods and, thus, choose methods for unprincipled reasons (e.g., popularity). Moreover, despite a corresponding rise in evaluation metrics, existing approaches assume universal desiderata for saliency methods (e.g., faithfulness) that do not account for diverse user needs. In response, we introduce saliency cards: structured documentation of how saliency methods operate and their performance across a battery of evaluative metrics.

Additional context

  • Introducing ad-hoc cards in Inseq should be preferable than contributing to the original saliency cards repository since 1) they will be more easily used and improved by the Inseq community and 2) the original authors focus solely on vision-centric applications.

  • The following sections are relevant for the integration of saliency cards into Inseq:

    • Determinism: Determinism measures if a saliency method will always produce the same saliency map given a particular input, label, and model.

    • Hyperparameter Dependence: Hyperparameter dependence measures a saliency method’s sensitivity to user-specified parameters. By documenting a method’s hyperparameter dependence, saliency cards inform users of consequential parameters and how to set them appropriately.

    • Model Agnosticism: Model agnosticism measures how much access to the model a saliency method requires. *Since several future methods need access to specific modules (see #173 for example), this part could document which parameters will need to be defined in the ModelConfig class before usage.

    • Computational Efficiency: Computational efficiency measures how computationally intensive it is to produce the saliency map. Using the same models, we could report unified benchmarks across different methods (and different parameterizations, in some cases).

    • Semantic Directness: Saliency methods abstract different aspects of model behavior, and semantic directness represents the complexity of this abstraction (i.e. what the reported scores correspond to). For example, discussing the difference between salience and sensitivity for raw gradients vs. input x gradient (see Appendix B of Geva et al. 2023)

    • (Added) Granularity: Specifying the granularity of the scores returned by the attribution method (e.g. raw gradient attribution returns one score per hidden size of the model embeddings, corresponding to the gradient with respect to the attributed_fn propagated through the model.

    • (Added) Target dependence: Specifying whether the method relies on model final predictions to derive importance scores, or whether these are extracted from model internal processes (e.g. for raw attention weights).

  • The Sensitivity Testing and Perceptibility Testing sections describe empirical measurements of minimality/robustness rather than inherent properties of methods. As such, they should be added only in the presence of a reproducible study using Inseq to compare different methods.

Add `scores_precision` parameter to `FeatureAttributionOutput.save`

Description

This issue addresses the high space requirements of large attribution scores tensors by adding a scores_precision parameter to FeatureAttributionOutput.save method.

Proposant: @g8a9

Motivation

Currently, tensors in FeatureAttributionOutput objects (attributions and step scores) are serialized in float32 precision as a default when using out.save(). While it is possible to compress the representation of these values with ndarray_compact=True, the resulting JSON files are usually quite large. Using more parsimonious data types could reduce the size of saved objects and facilitate systematic analyses leveraging large amounts of data.

Proposal

float32 precision should probably remain the default behavior, as we do not want to cause any information loss by default.

float16 and float8 should also be considered, both in the signed and unsigned variants, since leveraging the strictly positive nature of some score types would allow supporting greater precision while halving space requirements. Unsigned values will be used as defaults if no negative scores are present in a tensor.

float16 can be easily used by casting tensors to the native torch.float16 data type, which would preserve precision up to 4 decimal values for scores normalized in the [-1;1] interval (8 for unsigned tensors). This corresponds to 2 or 4 decimal places for float8. However, this data type is not supported natively in Pytorch, so tensors should be converted to torch.int8 and torch.uint8 instead and transformed in floats upon reloading the object.

Add registered contrastive logits difference step function

Motivation

Contrastive attributions are currently supported thanks to custom attributed targets (see #138). The current definition of the contrastive attribution custom function can be found here.

The current implementation is problematic for integrated_gradients and similar methods using multiple approximation steps since the contrastive forward uses static ids instead of embeddings obtained as steps between the original contrastive input and a baseline. Moreover, the current implementation allows only for granular token-based comparisons proposed in the original work by Yin and Neubig (2022), but comparing spans of different lengths could also be desirable.

Given this will add further complexity to the custom attributed function, and given the interest in such an application, it would be ideal to include a pre-registered version of the contrastive attribution step function inside Inseq to enable easy and quick usage.

Design

Step function name: contrast_probs_diff, since the contrastive comparison is done by taking the difference of output probabilities between a regular and a contrastive example.

Extra arguments:

  • contrast_input: required, can be either an input text, a sequence of ids or embeddings for the contrastive example. The function will handle the formatting to match the original input.
  • input_start_span_ids, contrast_start_span_ids: Two lists containing initial ids for ever span in the input and the contrast that we want to consider as single units for attribution purposes (e.g. [0, 2, 5] for input_start_span_ids says By default None, set to list(range(len(input_ids))) and list(range(len(contrast_ids))) respectively (i.e. every token is treated separately, as in normal feature attribution).

Notes for span ids:

  • Must verify that all ids are valid given string length, and that the two lists have same length to ensure contrastive comparison for every span step.
  • The step function is called at every token, but we consider multi-token spans. If the current token is an start span token for the input, we compute the product of probabilities for all tokens in the current input span, the corresponding product for all tokens in the current contrast span and output their diff. If it's not a start token (i.e. its id in the span is not included in input_start_span_ids) 0 is returned.

Aggregation:

The default abs_max function used for span aggregation (span_aggregate) of attributions will return the attributions for the first token of every span (see description in the previous section) if the same spans are used with a ContiguousSpanAggregator. The aggregate_map for the step score should also be set to abs_max upon registration.

Add Fairseq support

πŸš€ Feature Request

Adding support for Fairseq models on top of the AttributionModel abstraction, similarly to what was done for πŸ€— transformers models.

πŸ”ˆ Motivation

pytorch/fairseq is a core library for training seq2seq models in Pytorch. Adding support would allow for extended experimentation with state-of-the-art model, especially for NMT.

πŸ”— Additiona details

keyonvafa/sequential-rationales uses different attribution methods on FairseqEncoderDecoderModel models and can provide inspiration for an implementation aiming to access the internals of such models.

Confusing call of `merge_attributions` method

Description

The merge_attributions method of the FeatureAttributionOutput class can be used as static but also called from an object, instance of that class.
In the latter case, the method's output turns out to be unintended since attribution merge is performed only for the objects in the parameter list and not with the object itself calling the method.
Therefore, it would be advisable to move the method out of the class to be used as a utility.

Motivation

See above.

Additional context

Example code used as a demonstration of the method behaviour:

>>> import inseq
>>> seq_model = inseq.load_model('gpt2', attribution_method='input_x_gradient')
>>> out1 = seq_model.attribute(input_texts=['hello world!', 'How are you? '])
>>> out2 = seq_model.attribute(input_texts=['I am going to ', 'My name is '])

Correct usage:

>>> inseq.FeatureAttributionOutput.merge_attributions([out1, out2])
FeatureAttributionOutput({
    sequence_attributions: list with 4 elements of type GranularFeatureAttributionSequenceOutput:[
...

Misleading behaviour, leading to the loss of attributions contained in out1:

>>> out1.merge_attributions([out2])
FeatureAttributionOutput({
    sequence_attributions: list with 2 elements of type GranularFeatureAttributionSequenceOutput:[
...

Commit to Help

  • I'm willing to help with this feature.

Bug about get_scores_dicts() function

πŸ› Bug Report

Hello, thanks for your contribution firstly! The tool is really helpful and beautiful.
However, when I try to use get_scores_dicts() to output the attributions, I got an error 'index out of bounds'.
image
I print out the aggr and find it seems to be a bug:
image

πŸ”¬ How To Reproduce

import inseq
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
import pandas as pd

model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m")
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-560m")

data = [
        {
            'prompt': 'people usually take shower in the',
            'target_true': ' morning',
            'target_false': ' evening',
            }]

for layer in range(24):
    print('layer:', str(layer))
    attrib_model = inseq.load_model(
        model,
        "layer_gradient_x_activation",
        tokenizer="bigscience/bloom-560m",
        target_layer=model.transformer.h[layer].mlp,
    )
    for i, ex in enumerate(data):
        print(ex)
        # e.g. "The capital of Spain is"
        #prompt = ex["relation"].format(ex["subject"])
        prompt = ex["prompt"]
        # e.g. "The capital of Spain is Madrid"
        true_answer = prompt + ex["target_true"]
        # e.g. "The capital of Spain is Paris"
        false_answer = prompt + ex["target_false"]
        # Contrastive attribution of true vs false answer
        out = attrib_model.attribute(
            prompt,
            true_answer,
            attributed_fn="contrast_prob_diff",
            contrast_targets=false_answer,
            step_scores=["contrast_prob_diff"],
            show_progress=False,
        )
        out.show()
        out.get_scores_dicts()

Environment

  • OS: Linux
  • Python version: 3.11
  • Inseq version: 0.5.0.dev0

Add ALTI+ implementation

Description

The ALTI+ method is an extension of ALTI for encoder-decoder (and by extension, decoder-only) models.

Authors: @gegallego @javiferran

Implementation notes:

  • The current implementation extracts input features for key, query and value projections and computes intermediate steps using the Kobayashi refactoring to obtain the transformed vectors used in the final ALTi computation.

  • The computation of attention layer outputs is carried on up to the resultant (i.e. the actual output of the attention layer) in order to check that the result matches the original output of the attention layer forward pass. This is only done for sanity checking purposes but it's not especially heavy from a computational perspective, so it can be preserved (e.g. raise an error if the outputs doesn't match to signal the model is maybe not supported)

  • Focusing on GPT-2 as an example model, the per-head attention weights and outputs (i.e. matmul of weights and value vectors) are returned here so they can be extracted with a hook and used to compute the transformed vectors needed for ALTI.

  • Pre- and post-layer norm models are handled differently because the transformed vectors are the final outputs of the attention block, regardless of the position of the layer norm (it needs to be included in any case). In the Kobayashi decomposition of the attention layer the bias component needs to be separate both for the layer norm and the output projections, so we need to make sure whether this is possible out of the box, or it needs to be computed in an ad-hoc hook.

  • If we are interested in the output vectors before the bias is added, we can extract the bias vector alongside the output of the attention module and subtract the former from the latter.

  • For aggregating ALTI+ scores in order to obtain overall importance we will use the extended rollout implementation that is currently being developed in #173.

Refererence implementation mt-upc/transformer-contributions-nmt

Add target-side attribution

πŸš€ Feature Request

Allow users to perform feature attributions on the target prefix. The behavior is controlled by a new attribute_target: bool = False parameter passed to the AttributionModel.attribute method.

πŸ”ˆ Motivation

Attributing only on the source is reductive, since the influence of the target prefix is fundamental in determining the outcome of the next generation step in many occasions (e.g. a prefix Ladies and will strongly bias the next token towards Gentlemen, regardless of the source sequence).

Test case for larger (prompting-based) language models (+ LLM.int8())

πŸš€ Feature Request

Following our discussion, it might be valuable to add a test case including larger language models that work with prompting such as T0 and its variants. Since availability of these kinds of models is becoming more common (see "Motivation"), we should show an example of feature attribution for them.

πŸ”ˆ Motivation

The bitsandbytes integration and 8-bit precision (Dettmers et al., 2022) released in August enables the use of larger models on single GPU setups.

Inconsistent batching for DiscretizedIntegratedGradients attributions

πŸ› Bug Report

Despite fixing batched attribution so that results are consistent with individual attribution (see #110), the method DiscretizedIntegratedGradients still produces different results when applied to a batch of examples.

πŸ”¬ How To Reproduce

  1. Instantiate a AttributionModel with the discretized_integrated_gradients method.
  2. Perform an attribution for a batch of examples
  3. Perform an attribution for a single example present in the previous batch
  4. Compare the attributions obtained in the two cases

Code sample

import inseq

model = inseq.load_model("Helsinki-NLP/opus-mt-en-de", "discretized_integrated_gradients")

out_multi = model.attribute(
    [
        "This aspect is very important",
        "Why does it work after the first?",
        "This thing smells",
        "Colorless green ideas sleep furiously"
    ],
    n_steps=20,
    return_convergence_delta=True,
)

out_single = model.attribute(
    [ "Why does it work after the first?" ],
    n_steps=20,
    return_convergence_delta=True,
)

assert out_single.attributions == out_multi[1].attributions # raises AssertionError

Environment

  • OS: 20.04
  • Python version: 3.8

πŸ“ˆ Expected behavior

Same as #110

πŸ“Ž Additional context

The problem is most likely due to a faulty scaling of the gradients in the _attribute method of the DiscretizedIntegratedGradients class.

Bug about 'contrast_prob_diff'

πŸ› Bug Report

Hi Inseq team, thanks again for your contribution.
I just noticed a bug when using 'contrast_prob_diff' - output attributions seem to be reversed.
It outputs the attribution of false_answer but not true_answer.

πŸ”¬ How To Reproduce

What I did is:

  1. use contrast_prob_diff, and the output attribute for 'morning' and 'evening' is:
    image
    image

  2. use probability (where the variable target_false is useless), and the output attribute for 'morning' and 'evening' is:
    image
    image

The code for the attribute part is: (top: contrast_prob_diff; bottom: probability)
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.