Git Product home page Git Product logo

cdpierse / transformers-interpret Goto Github PK

View Code? Open in Web Editor NEW
1.2K 20.0 91.0 8.05 MB

Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.

License: Apache License 2.0

Python 2.42% Jupyter Notebook 97.58%
nlp machine-learning natural-language-processing explainable-ai transformers model-explainability transformers-model captum deep-learning neural-network

transformers-interpret's Introduction

Transformers Intepret Title

Explainability for any 🤗 Transformers models in 2 lines.

Transformers Interpret is a model explainability tool designed to work exclusively with the 🤗 transformers package.

In line with the philosophy of the Transformers package Transformers Interpret allows any transformers model to be explained in just two lines. Explainers are available for both text and computer vision models. Visualizations are also available in notebooks and as savable png and html files.

Check out the streamlit demo app here

Install

pip install transformers-interpret

Quick Start

Text Explainers

Sequence Classification Explainer and Pairwise Sequence Classification

Let's start by initializing a transformers' model and tokenizer, and running it through the `SequenceClassificationExplainer`.

For this example we are using distilbert-base-uncased-finetuned-sst-2-english, a distilbert model finetuned on a sentiment analysis task.

from transformers import AutoModelForSequenceClassification, AutoTokenizer
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# With both the model and tokenizer initialized we are now able to get explanations on an example text.

from transformers_interpret import SequenceClassificationExplainer
cls_explainer = SequenceClassificationExplainer(
    model,
    tokenizer)
word_attributions = cls_explainer("I love you, I like you")

Which will return the following list of tuples:

>>> word_attributions
[('[CLS]', 0.0),
 ('i', 0.2778544699186709),
 ('love', 0.7792370723380415),
 ('you', 0.38560088858031094),
 (',', -0.01769750505546915),
 ('i', 0.12071898121557832),
 ('like', 0.19091105304734457),
 ('you', 0.33994871536713467),
 ('[SEP]', 0.0)]

Positive attribution numbers indicate a word contributes positively towards the predicted class, while negative numbers indicate a word contributes negatively towards the predicted class. Here we can see that I love you gets the most attention.

You can use predicted_class_index in case you'd want to know what the predicted class actually is:

>>> cls_explainer.predicted_class_index
array(1)

And if the model has label names for each class, we can see these too using predicted_class_name:

>>> cls_explainer.predicted_class_name
'POSITIVE'

Visualize Classification attributions

Sometimes the numeric attributions can be difficult to read particularly in instances where there is a lot of text. To help with that we also provide the visualize() method that utilizes Captum's in built viz library to create a HTML file highlighting the attributions.

If you are in a notebook, calls to the visualize() method will display the visualization in-line. Alternatively you can pass a filepath in as an argument and an HTML file will be created, allowing you to view the explanation HTML in your browser.

cls_explainer.visualize("distilbert_viz.html")

Explaining Attributions for Non Predicted Class

Attribution explanations are not limited to the predicted class. Let's test a more complex sentence that contains mixed sentiments.

In the example below we pass class_name="NEGATIVE" as an argument indicating we would like the attributions to be explained for the NEGATIVE class regardless of what the actual prediction is. Effectively because this is a binary classifier we are getting the inverse attributions.

cls_explainer = SequenceClassificationExplainer(model, tokenizer)
attributions = cls_explainer("I love you, I like you, I also kinda dislike you", class_name="NEGATIVE")

In this case, predicted_class_name still returns a prediction of the POSITIVE class, because the model has generated the same prediction but nonetheless we are interested in looking at the attributions for the negative class regardless of the predicted result.

>>> cls_explainer.predicted_class_name
'POSITIVE'

But when we visualize the attributions we can see that the words "...kinda dislike" are contributing to a prediction of the "NEGATIVE" class.

cls_explainer.visualize("distilbert_negative_attr.html")

Getting attributions for different classes is particularly insightful for multiclass problems as it allows you to inspect model predictions for a number of different classes and sanity-check that the model is "looking" at the right things.

For a detailed explanation of this example please checkout this multiclass classification notebook.

Pairwise Sequence Classification

The PairwiseSequenceClassificationExplainer is a variant of the the SequenceClassificationExplainer that is designed to work with classification models that expect the input sequence to be two inputs separated by a models' separator token. Common examples of this are NLI models and Cross-Encoders which are commonly used to score two inputs similarity to one another.

This explainer calculates pairwise attributions for two passed inputs text1 and text2 using the model and tokenizer given in the constructor.

Also, since a common use case for pairwise sequence classification is to compare two inputs similarity - models of this nature typically only have a single output node rather than multiple for each class. The pairwise sequence classification has some useful utility functions to make interpreting single node outputs clearer.

By default for models that output a single node the attributions are with respect to the inputs pushing the scores closer to 1.0, however if you want to see the attributions with respect to scores closer to 0.0 you can pass flip_sign=True. For similarity based models this is useful, as the model might predict a score closer to 0.0 for the two inputs and in that case we would flip the attributions sign to explain why the two inputs are dissimilar.

Let's start by initializing a cross-encoder model and tokenizer from the suite of pre-trained cross-encoders provided by sentence-transformers.

For this example we are using "cross-encoder/ms-marco-MiniLM-L-6-v2", a high quality cross-encoder trained on the MSMarco dataset a passage ranking dataset for question answering and machine reading comprehension.

from transformers import AutoModelForSequenceClassification, AutoTokenizer

from transformers_interpret import PairwiseSequenceClassificationExplainer

model = AutoModelForSequenceClassification.from_pretrained("cross-encoder/ms-marco-MiniLM-L-6-v2")
tokenizer = AutoTokenizer.from_pretrained("cross-encoder/ms-marco-MiniLM-L-6-v2")

pairwise_explainer = PairwiseSequenceClassificationExplainer(model, tokenizer)

# the pairwise explainer requires two string inputs to be passed, in this case given the nature of the model
# we pass a query string and a context string. The question we are asking of our model is "does this context contain a valid answer to our question"
# the higher the score the better the fit.

query = "How many people live in Berlin?"
context = "Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."
pairwise_attr = pairwise_explainer(query, context)

Which returns the following attributions:

>>> pairwise_attr
[('[CLS]', 0.0),
 ('how', -0.037558652124213034),
 ('many', -0.40348581975409786),
 ('people', -0.29756140282349425),
 ('live', -0.48979015417391764),
 ('in', -0.17844527885888117),
 ('berlin', 0.3737346097442739),
 ('?', -0.2281428913480142),
 ('[SEP]', 0.0),
 ('berlin', 0.18282430604641564),
 ('has', 0.039114659489254834),
 ('a', 0.0820056652212297),
 ('population', 0.35712150914643026),
 ('of', 0.09680870840224687),
 ('3', 0.04791760029513795),
 (',', 0.040330986539774266),
 ('520', 0.16307677913176166),
 (',', -0.005919693904602767),
 ('03', 0.019431649515841844),
 ('##1', -0.0243808667024702),
 ('registered', 0.07748341753369632),
 ('inhabitants', 0.23904087299731255),
 ('in', 0.07553221327346359),
 ('an', 0.033112821611999875),
 ('area', -0.025378852244447532),
 ('of', 0.026526373859562906),
 ('89', 0.0030700151809002147),
 ('##1', -0.000410387092186983),
 ('.', -0.0193147139126114),
 ('82', 0.0073800833347678774),
 ('square', 0.028988305990861576),
 ('kilometers', 0.02071182933829008),
 ('.', -0.025901070914318036),
 ('[SEP]', 0.0)]

Visualize Pairwise Classification attributions

Visualizing the pairwise attributions is no different to the sequence classification explaine. We can see that in both the query and context there is a lot of positive attribution for the word berlin as well the words population and inhabitants in the context, good signs that our model understands the textual context of the question asked.

pairwise_explainer.visualize("cross_encoder_attr.html")

If we were more interested in highlighting the input attributions that pushed the model away from the positive class of this single node output we could pass:

pairwise_attr = explainer(query, context, flip_sign=True)

This simply inverts the sign of the attributions ensuring that they are with respect to the model outputting 0 rather than 1.

MultiLabel Classification Explainer

This explainer is an extension of the SequenceClassificationExplainer and is thus compatible with all sequence classification models from the Transformers package. The key change in this explainer is that it caclulates attributions for each label in the model's config and returns a dictionary of word attributions w.r.t to each label. The visualize() method also displays a table of attributions with attributions calculated per label.

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers_interpret import MultiLabelClassificationExplainer

model_name = "j-hartmann/emotion-english-distilroberta-base"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)


cls_explainer = MultiLabelClassificationExplainer(model, tokenizer)


word_attributions = cls_explainer("There were many aspects of the film I liked, but it was frightening and gross in parts. My parents hated it.")

This produces a dictionary of word attributions mapping labels to a list of tuples for each word and it's attribution score.

Click to see word attribution dictionary
>>> word_attributions
{'anger': [('<s>', 0.0),
           ('There', 0.09002208622000409),
           ('were', -0.025129709879675187),
           ('many', -0.028852677974079328),
           ('aspects', -0.06341968013631565),
           ('of', -0.03587626320752477),
           ('the', -0.014813095892961287),
           ('film', -0.14087587475098232),
           ('I', 0.007367876912617766),
           ('liked', -0.09816592066307557),
           (',', -0.014259517291745674),
           ('but', -0.08087144668471376),
           ('it', -0.10185214349220136),
           ('was', -0.07132244710777856),
           ('frightening', -0.4125361737439814),
           ('and', -0.021761663818889918),
           ('gross', -0.10423745223600908),
           ('in', -0.02383646952201854),
           ('parts', -0.027137622525091033),
           ('.', -0.02960415694062459),
           ('My', 0.05642774605113695),
           ('parents', 0.11146648216326158),
           ('hated', 0.8497975489280364),
           ('it', 0.05358116678115284),
           ('.', -0.013566277162080632),
           ('', 0.09293256725788422),
           ('</s>', 0.0)],
 'disgust': [('<s>', 0.0),
             ('There', -0.035296263203072),
             ('were', -0.010224922196739717),
             ('many', -0.03747571761725605),
             ('aspects', 0.007696321643436715),
             ('of', 0.0026740873113235107),
             ('the', 0.0025752851265661335),
             ('film', -0.040890035285783645),
             ('I', -0.014710007408208579),
             ('liked', 0.025696806663391577),
             (',', -0.00739107098314569),
             ('but', 0.007353791868893654),
             ('it', -0.00821368234753605),
             ('was', 0.005439709067819798),
             ('frightening', -0.8135974168445725),
             ('and', -0.002334953123414774),
             ('gross', 0.2366024374426269),
             ('in', 0.04314772995234148),
             ('parts', 0.05590472194035334),
             ('.', -0.04362554293972562),
             ('My', -0.04252694977895808),
             ('parents', 0.051580790911406944),
             ('hated', 0.5067406070057585),
             ('it', 0.0527491071885104),
             ('.', -0.008280280618652273),
             ('', 0.07412384603053103),
             ('</s>', 0.0)],
 'fear': [('<s>', 0.0),
          ('There', -0.019615758046045408),
          ('were', 0.008033402634196246),
          ('many', 0.027772367717635423),
          ('aspects', 0.01334130725685673),
          ('of', 0.009186049991879768),
          ('the', 0.005828877177384549),
          ('film', 0.09882910753644959),
          ('I', 0.01753565003544039),
          ('liked', 0.02062597344466885),
          (',', -0.004469530636560965),
          ('but', -0.019660439408176984),
          ('it', 0.0488084071292538),
          ('was', 0.03830859527501167),
          ('frightening', 0.9526443954511705),
          ('and', 0.02535156284103706),
          ('gross', -0.10635301961551227),
          ('in', -0.019190425328209065),
          ('parts', -0.01713006453323631),
          ('.', 0.015043169035757302),
          ('My', 0.017068079071414916),
          ('parents', -0.0630781275517486),
          ('hated', -0.23630028921273583),
          ('it', -0.056057044429020306),
          ('.', 0.0015102052077844612),
          ('', -0.010045048665404609),
          ('</s>', 0.0)],
 'joy': [('<s>', 0.0),
         ('There', 0.04881772670614576),
         ('were', -0.0379316152427468),
         ('many', -0.007955371089444285),
         ('aspects', 0.04437296429416574),
         ('of', -0.06407011137335743),
         ('the', -0.07331568926973099),
         ('film', 0.21588462483311055),
         ('I', 0.04885724513463952),
         ('liked', 0.5309510543276107),
         (',', 0.1339765195225006),
         ('but', 0.09394079060730279),
         ('it', -0.1462792330432028),
         ('was', -0.1358591558323458),
         ('frightening', -0.22184169339341142),
         ('and', -0.07504142930419291),
         ('gross', -0.005472075984252812),
         ('in', -0.0942152657437379),
         ('parts', -0.19345218754215965),
         ('.', 0.11096247277185402),
         ('My', 0.06604512262645984),
         ('parents', 0.026376541098236207),
         ('hated', -0.4988319510231699),
         ('it', -0.17532499366236615),
         ('.', -0.022609976138939034),
         ('', -0.43417114685294833),
         ('</s>', 0.0)],
 'neutral': [('<s>', 0.0),
             ('There', 0.045984598036642205),
             ('were', 0.017142566357474697),
             ('many', 0.011419348619472542),
             ('aspects', 0.02558593440287365),
             ('of', 0.0186162232003498),
             ('the', 0.015616416841815963),
             ('film', -0.021190511300570092),
             ('I', -0.03572427925026324),
             ('liked', 0.027062554960050455),
             (',', 0.02089914209290366),
             ('but', 0.025872618597570115),
             ('it', -0.002980407262316265),
             ('was', -0.022218157611174086),
             ('frightening', -0.2982516449116045),
             ('and', -0.01604643529040792),
             ('gross', -0.04573829263548096),
             ('in', -0.006511536166676108),
             ('parts', -0.011744224307968652),
             ('.', -0.01817041167875332),
             ('My', -0.07362312722231429),
             ('parents', -0.06910711601816408),
             ('hated', -0.9418903509267312),
             ('it', 0.022201795222373488),
             ('.', 0.025694319747309045),
             ('', 0.04276690822325994),
             ('</s>', 0.0)],
 'sadness': [('<s>', 0.0),
             ('There', 0.028237893283377526),
             ('were', -0.04489910545229568),
             ('many', 0.004996044977269471),
             ('aspects', -0.1231292680125582),
             ('of', -0.04552690725956671),
             ('the', -0.022077819961347042),
             ('film', -0.14155752357877663),
             ('I', 0.04135347872193571),
             ('liked', -0.3097732540526099),
             (',', 0.045114660009053134),
             ('but', 0.0963352125332619),
             ('it', -0.08120617610094617),
             ('was', -0.08516150809170213),
             ('frightening', -0.10386889639962761),
             ('and', -0.03931986389970189),
             ('gross', -0.2145059013625132),
             ('in', -0.03465423285571697),
             ('parts', -0.08676627134611635),
             ('.', 0.19025217371906333),
             ('My', 0.2582092561303794),
             ('parents', 0.15432351476960307),
             ('hated', 0.7262186310977987),
             ('it', -0.029160655114499095),
             ('.', -0.002758524253450406),
             ('', -0.33846410359182094),
             ('</s>', 0.0)],
 'surprise': [('<s>', 0.0),
              ('There', 0.07196110795254315),
              ('were', 0.1434314520711312),
              ('many', 0.08812238369489701),
              ('aspects', 0.013432396769890982),
              ('of', -0.07127508805657243),
              ('the', -0.14079766624810955),
              ('film', -0.16881201614906485),
              ('I', 0.040595668935112135),
              ('liked', 0.03239855530171577),
              (',', -0.17676382558158257),
              ('but', -0.03797939330341559),
              ('it', -0.029191325089641736),
              ('was', 0.01758013584108571),
              ('frightening', -0.221738963726823),
              ('and', -0.05126920277135527),
              ('gross', -0.33986913466614044),
              ('in', -0.018180366628697),
              ('parts', 0.02939418603252064),
              ('.', 0.018080129971003226),
              ('My', -0.08060162218059498),
              ('parents', 0.04351719139081836),
              ('hated', -0.6919028585285265),
              ('it', 0.0009574844165327357),
              ('.', -0.059473118237873344),
              ('', -0.465690452620123),
              ('</s>', 0.0)]}

Visualize MultiLabel Classification attributions

Sometimes the numeric attributions can be difficult to read particularly in instances where there is a lot of text. To help with that we also provide the visualize() method that utilizes Captum's in built viz library to create a HTML file highlighting the attributions. For this explainer attributions will be show w.r.t to each label.

If you are in a notebook, calls to the visualize() method will display the visualization in-line. Alternatively you can pass a filepath in as an argument and an HTML file will be created, allowing you to view the explanation HTML in your browser.

cls_explainer.visualize("multilabel_viz.html")
Zero Shot Classification Explainer

Models using this explainer must be previously trained on NLI classification downstream tasks and have a label in the model's config called either "entailment" or "ENTAILMENT".

This explainer allows for attributions to be calculated for zero shot classification like models. In order to achieve this we use the same methodology employed by Hugging face. For those not familiar method employed by Hugging Face to achieve zero shot classification the way this works is by exploiting the "entailment" label of NLI models. Here is a link to a paper explaining more about it. A list of NLI models guaranteed to be compatible with this explainer can be found on the model hub.

Let's start by initializing a transformers' sequence classification model and tokenizer trained specifically on a NLI task, and passing it to the ZeroShotClassificationExplainer.

For this example we are using cross-encoder/nli-deberta-base which is a checkpoint for a deberta-base model trained on the SNLI and NLI dataset Datasets. This model typically predicts whether a sentence pair are an entailment, neutral, or a contradiction, however for zero-shot we only look the entailment label.

Notice that we pass our own custom labels ["finance", "technology", "sports"] to the class instance. Any number of labels can be passed including as little as one. Whichever label scores highest for entailment can be accessed via predicted_label, however the attributions themselves are calculated for every label. If you want to see the attributions for a particular label it is recommended just to pass in that one label and then the attributions will be guaranteed to be calculated w.r.t. that label.

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers_interpret import ZeroShotClassificationExplainer

tokenizer = AutoTokenizer.from_pretrained("cross-encoder/nli-deberta-base")

model = AutoModelForSequenceClassification.from_pretrained("cross-encoder/nli-deberta-base")


zero_shot_explainer = ZeroShotClassificationExplainer(model, tokenizer)


word_attributions = zero_shot_explainer(
    "Today apple released the new Macbook showing off a range of new features found in the proprietary silicon chip computer. ",
    labels = ["finance", "technology", "sports"],
)

Which will return the following dict of attribution tuple lists for each label:

>>> word_attributions
{'finance': [('[CLS]', 0.0),
  ('Today', 0.144761198095125),
  ('apple', 0.05008283286211926),
  ('released', -0.29790757134109724),
  ('the', -0.09931162582050683),
  ('new', -0.151252730475885),
  ('Mac', 0.19431968978659608),
  ('book', 0.059431761386793486),
  ('showing', -0.30754747734942633),
  ('off', 0.0329034397830471),
  ('a', 0.04198035048519715),
  ('range', -0.00413947940202566),
  ('of', 0.7135069733740484),
  ('new', 0.2294990755900286),
  ('features', -0.1523457769188503),
  ('found', -0.016804346228170633),
  ('in', 0.1185751939327566),
  ('the', -0.06990875734316043),
  ('proprietary', 0.16339657649559983),
  ('silicon', 0.20461302470245252),
  ('chip', 0.033304742383885574),
  ('computer', -0.058821677910955064),
  ('.', -0.19741292299059068)],
 'technology': [('[CLS]', 0.0),
  ('Today', 0.1261355373492264),
  ('apple', -0.06735584800073911),
  ('released', -0.37758515332894504),
  ('the', -0.16300368060788886),
  ('new', -0.1698884472100767),
  ('Mac', 0.41505959302727347),
  ('book', 0.321276307285395),
  ('showing', -0.2765988420377037),
  ('off', 0.19388699112601515),
  ('a', -0.044676708673846766),
  ('range', 0.05333370699507288),
  ('of', 0.3654053610507722),
  ('new', 0.3143976769670845),
  ('features', 0.2108588137592185),
  ('found', 0.004676960337191403),
  ('in', 0.008026783104605233),
  ('the', -0.09961358108721637),
  ('proprietary', 0.18816708356062326),
  ('silicon', 0.13322691438800874),
  ('chip', 0.015141805082331294),
  ('computer', -0.1321895049108681),
  ('.', -0.17152401596638975)],
 'sports': [('[CLS]', 0.0),
  ('Today', 0.11751821789941418),
  ('apple', -0.024552367058659215),
  ('released', -0.44706064525430567),
  ('the', -0.10163968191086448),
  ('new', -0.18590036257614642),
  ('Mac', 0.0021649499897370725),
  ('book', 0.009141161101058446),
  ('showing', -0.3073791152936541),
  ('off', 0.0711051596941137),
  ('a', 0.04153236257439005),
  ('range', 0.01598478741712663),
  ('of', 0.6632118834641558),
  ('new', 0.2684728052423898),
  ('features', -0.10249856013919137),
  ('found', -0.032459999377294144),
  ('in', 0.11078761617308391),
  ('the', -0.020530085754695244),
  ('proprietary', 0.17968209761431955),
  ('silicon', 0.19997909769476027),
  ('chip', 0.04447720580439545),
  ('computer', 0.018515748463790047),
  ('.', -0.1686603393466192)]}

We can find out which label was predicted with:

>>> zero_shot_explainer.predicted_label
'technology'

Visualize Zero Shot Classification attributions

For the ZeroShotClassificationExplainer the visualize() method returns a table similar to the SequenceClassificationExplainer but with attributions for every label.

zero_shot_explainer.visualize("zero_shot.html")
Question Answering Explainer

Let's start by initializing a transformers' Question Answering model and tokenizer, and running it through the QuestionAnsweringExplainer.

For this example we are using bert-large-uncased-whole-word-masking-finetuned-squad, a bert model finetuned on a SQuAD.

from transformers import AutoModelForQuestionAnswering, AutoTokenizer
from transformers_interpret import QuestionAnsweringExplainer

tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")

qa_explainer = QuestionAnsweringExplainer(
    model,
    tokenizer,
)

context = """
In Artificial Intelligence and machine learning, Natural Language Processing relates to the usage of machines to process and understand human language.
Many researchers currently work in this space.
"""

word_attributions = qa_explainer(
    "What is natural language processing ?",
    context,
)

Which will return the following dict containing word attributions for both the predicted start and end positions for the answer.

>>> word_attributions
{'start': [('[CLS]', 0.0),
  ('what', 0.9177170660377296),
  ('is', 0.13382234898765258),
  ('natural', 0.08061747350142005),
  ('language', 0.013138062762511409),
  ('processing', 0.11135923869816286),
  ('?', 0.00858057388924361),
  ('[SEP]', -0.09646373141894966),
  ('in', 0.01545633993975799),
  ('artificial', 0.0472082598707737),
  ('intelligence', 0.026687249355110867),
  ('and', 0.01675371260058537),
  ('machine', -0.08429502436554961),
  ('learning', 0.0044827685126163355),
  (',', -0.02401013152520878),
  ('natural', -0.0016756080249823537),
  ('language', 0.0026815068421401885),
  ('processing', 0.06773157580722854),
  ('relates', 0.03884601576992908),
  ('to', 0.009783797821526368),
  ('the', -0.026650922910540952),
  ('usage', -0.010675019721821147),
  ('of', 0.015346787885898537),
  ('machines', -0.08278008270160107),
  ('to', 0.12861387892768839),
  ('process', 0.19540146386642743),
  ('and', 0.009942879959615826),
  ('understand', 0.006836894853320319),
  ('human', 0.05020451122579102),
  ('language', -0.012980795199301),
  ('.', 0.00804358248127772),
  ('many', 0.02259009321498161),
  ('researchers', -0.02351650942555469),
  ('currently', 0.04484573078852946),
  ('work', 0.00990399948294476),
  ('in', 0.01806961211334615),
  ('this', 0.13075899776164499),
  ('space', 0.004298315347838973),
  ('.', -0.003767904539347979),
  ('[SEP]', -0.08891544093454595)],
 'end': [('[CLS]', 0.0),
  ('what', 0.8227231947501547),
  ('is', 0.0586864942952253),
  ('natural', 0.0938903563379123),
  ('language', 0.058596976016400674),
  ('processing', 0.1632374290269829),
  ('?', 0.09695686057123237),
  ('[SEP]', -0.11644447033554006),
  ('in', -0.03769172371919206),
  ('artificial', 0.06736158404049886),
  ('intelligence', 0.02496399001288386),
  ('and', -0.03526028847762427),
  ('machine', -0.20846431491771975),
  ('learning', 0.00904892847529654),
  (',', -0.02949905488474854),
  ('natural', 0.011024507784743872),
  ('language', 0.0870741751282507),
  ('processing', 0.11482449622317169),
  ('relates', 0.05008962090922852),
  ('to', 0.04079118393166258),
  ('the', -0.005069048880616451),
  ('usage', -0.011992752445836278),
  ('of', 0.01715183316135495),
  ('machines', -0.29823535624026265),
  ('to', -0.0043760160855057925),
  ('process', 0.10503217484645223),
  ('and', 0.06840313586976698),
  ('understand', 0.057184000619403944),
  ('human', 0.0976805947708315),
  ('language', 0.07031163646606695),
  ('.', 0.10494566513897102),
  ('many', 0.019227154676079487),
  ('researchers', -0.038173913797800885),
  ('currently', 0.03916641120002003),
  ('work', 0.03705371672439422),
  ('in', -0.0003155975107591203),
  ('this', 0.17254932354022232),
  ('space', 0.0014311439625599323),
  ('.', 0.060637932829867736),
  ('[SEP]', -0.09186286505530596)]}

We can get the text span for the predicted answer with:

>>> qa_explainer.predicted_answer
'usage of machines to process and understand human language'

Visualize Question Answering attributions

For the QuestionAnsweringExplainer the visualize() method returns a table with two rows. The first row represents the attributions for the answers' start position and the second row represents the attributions for the answers' end position.

qa_explainer.visualize("bert_qa_viz.html")
Token Classification (NER) explainer

This is currently an experimental explainer under active development and is not yet fully tested. The explainers' API is subject to change as are the attribution methods, if you find any bugs please let me know.

Let's start by initializing a transformers' Token Classfication model and tokenizer, and running it through the TokenClassificationExplainer.

For this example we are using dslim/bert-base-NER, a bert model finetuned on the CoNLL-2003 Named Entity Recognition dataset.

from transformers import AutoModelForTokenClassification, AutoTokenizer
from transformers_interpret import TokenClassificationExplainer

model = AutoModelForTokenClassification.from_pretrained('dslim/bert-base-NER')
tokenizer = AutoTokenizer.from_pretrained('dslim/bert-base-NER')

ner_explainer = TokenClassificationExplainer(
    model,
    tokenizer,
)

sample_text = "We visited Paris last weekend, where Emmanuel Macron lives."

word_attributions = ner_explainer(sample_text, ignored_labels=['O'])

In order to reduce the number of attributions that are calculated, we tell the explainer to ignore the tokens that whose predicted label is 'O'. We could also tell the explainer to ignore certain indexes providing a list as argument of the parameter ignored_indexes.

Which will return the following dict of including the predicted label and the attributions for each of token, except those which were predicted as 'O':

>>> word_attributions
{'paris': {'label': 'B-LOC',
  'attribution_scores': [('[CLS]', 0.0),
   ('we', -0.014352325471387907),
   ('visited', 0.32915222186559123),
   ('paris', 0.9086791784795596),
   ('last', 0.15181203147624034),
   ('weekend', 0.14400210630677038),
   (',', 0.01899744327012935),
   ('where', -0.039402005463239465),
   ('emmanuel', 0.061095284002642025),
   ('macro', 0.004192922551105228),
   ('##n', 0.09446355513057757),
   ('lives', -0.028724312616455003),
   ('.', 0.08099007392937585),
   ('[SEP]', 0.0)]},
 'emmanuel': {'label': 'B-PER',
  'attribution_scores': [('[CLS]', 0.0),
   ('we', -0.006933030636686712),
   ('visited', 0.10396962390436904),
   ('paris', 0.14540758744233165),
   ('last', 0.08024018944451371),
   ('weekend', 0.10687970996804418),
   (',', 0.1793198466387937),
   ('where', 0.3436407835483767),
   ('emmanuel', 0.8774892642652167),
   ('macro', 0.03559399361048316),
   ('##n', 0.1516315604785551),
   ('lives', 0.07056441327498127),
   ('.', -0.025820924624605487),
   ('[SEP]', 0.0)]},
 'macro': {'label': 'I-PER',
  'attribution_scores': [('[CLS]', 0.0),
   ('we', 0.05578067326280157),
   ('visited', 0.00857021283406586),
   ('paris', 0.16559056506114297),
   ('last', 0.08285256685903823),
   ('weekend', 0.10468727443796395),
   (',', 0.09949509071515888),
   ('where', 0.3642458274356929),
   ('emmanuel', 0.7449335213978788),
   ('macro', 0.3794625659183485),
   ('##n', -0.2599031433800762),
   ('lives', 0.20563450682196147),
   ('.', -0.015607017319486929),
   ('[SEP]', 0.0)]},
 '##n': {'label': 'I-PER',
  'attribution_scores': [('[CLS]', 0.0),
   ('we', 0.025194121717285252),
   ('visited', -0.007415022865239864),
   ('paris', 0.09478357303107598),
   ('last', 0.06927939834474463),
   ('weekend', 0.0672008033510708),
   (',', 0.08316907214363504),
   ('where', 0.3784915854680165),
   ('emmanuel', 0.7729352621546081),
   ('macro', 0.4148652759139777),
   ('##n', -0.20853534512145033),
   ('lives', 0.09445057087678274),
   ('.', -0.094274985907366),
   ('[SEP]', 0.0)]},
 '[SEP]': {'label': 'B-LOC',
  'attribution_scores': [('[CLS]', 0.0),
   ('we', -0.3694351403796742),
   ('visited', 0.1699038407402483),
   ('paris', 0.5461587414992369),
   ('last', 0.0037948102770307517),
   ('weekend', 0.1628100955702496),
   (',', 0.4513093410909263),
   ('where', -0.09577409464161038),
   ('emmanuel', 0.48499459835388914),
   ('macro', -0.13528905587653023),
   ('##n', 0.14362969934754344),
   ('lives', -0.05758007024257254),
   ('.', -0.13970977266152554),
   ('[SEP]', 0.0)]}}

Visualize NER attributions

For the TokenClassificationExplainer the visualize() method returns a table with as many rows as tokens.

ner_explainer.visualize("bert_ner_viz.html")

For more details about how the TokenClassificationExplainer works, you can check the notebook notebooks/ner_example.ipynb.

Vision Explainers

Image Classification Explainer

The ImageClassificationExplainer is designed to work with all models from the Transformers library that are trained for image classification (Swin, ViT etc). It provides attributions for every pixel in that image that can be easily visualized using the explainers built in visualize method.

Initialising an image classification is very simple, all you need a is a image classification model finetuned or trained to work with Huggingface and its feature extractor.

For this example we are using google/vit-base-patch16-224, a Vision Transformer (ViT) model pre-trained on ImageNet-21k that predicts from 1000 possible classes.

from transformers import AutoFeatureExtractor, AutoModelForImageClassification
from transformers_interpret import ImageClassificationExplainer
from PIL import Image
import requests

model_name = "google/vit-base-patch16-224"
model = AutoModelForImageClassification.from_pretrained(model_name)
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)

# With both the model and feature extractor initialized we are now able to get explanations on an image, we will use a simple image of a golden retriever.
image_link = "https://imagesvc.meredithcorp.io/v3/mm/image?url=https%3A%2F%2Fstatic.onecms.io%2Fwp-content%2Fuploads%2Fsites%2F47%2F2020%2F08%2F16%2Fgolden-retriever-177213599-2000.jpg"

image = Image.open(requests.get(image_link, stream=True).raw)

image_classification_explainer = ImageClassificationExplainer(model=model, feature_extractor=feature_extractor)

image_attributions = image_classification_explainer(
    image
)

print(image_attributions.shape)

Which will return the following list of tuples:

>>> torch.Size([1, 3, 224, 224])

Visualizing Image Attributions

Because we are dealing with images visualization is even more straightforward than in text models.

Attrbutions can be easily visualized using the visualize method of the explainer. There are currently 4 supported visualization methods.

  • heatmap - a heatmap of positive and negative attributions is drawn in using the dimensions of the image.
  • overlay - the heatmap is overlayed over a grayscaled version of the original image
  • masked_image - the absolute value of attrbutions is used to create a mask over original image
  • alpha_scaling - Sets alpha channel (transparency) of each pixel to be equal to normalized attribution value.

Heatmap

image_classification_explainer.visualize(
    method="heatmap",
    side_by_side=True,
    outlier_threshold=0.03

)

Overlay

image_classification_explainer.visualize(
    method="overlay",
    side_by_side=True,
    outlier_threshold=0.03

)

Masked Image

image_classification_explainer.visualize(
    method="masked_image",
    side_by_side=True,
    outlier_threshold=0.03

)

Alpha Scaling

image_classification_explainer.visualize(
    method="alpha_scaling",
    side_by_side=True,
    outlier_threshold=0.03

)

Future Development

This package is still in active development and there is much more planned. For a 1.0.0 release we're aiming to have:

  • Clean and thorough documentation website
  • Support for Question Answering models
  • Support for NER models
  • Support for Zero Shot Classification models.
  • Ability to show attributions for multiple embedding type, rather than just the word embeddings.
  • Support for SentenceTransformer embedding models and other image embeddings
  • Additional attribution methods
  • Support for vision transformer models
  • In depth examples
  • A nice logo (thanks @Voyz)
  • and more... feel free to submit your suggestions!

Contributing

If you would like to make a contribution please checkout our contribution guidelines

Questions / Get In Touch

The maintainer of this repository is @cdpierse.

If you have any questions, suggestions, or would like to make a contribution (please do 😁), feel free to get in touch at [email protected]

I'd also highly suggest checking out Captum if you find model explainability and interpretability interesting.

This package stands on the shoulders of the the incredible work being done by the teams at Pytorch Captum and Hugging Face and would not exist if not for the amazing job they are both doing in the fields of ML and model interpretability respectively.

Reading and Resources

Captum

All of the attributions within this package are calculated using PyTorch's explainability package Captum. See below for some useful links related to Captum.

Attribution

Integrated Gradients (IG) and a variation of it Layer Integrated Gradients (LIG) are the core attribution methods on which Transformers Interpret is currently built. Below are some useful resources including the original paper and some video links explaining the inner mechanics. If you are curious about what is going on inside of Transformers Interpret I highly recommend checking out at least one of these resources.

Miscellaneous

Captum Links

Below are some links I used to help me get this package together using Captum. Thank you to @davidefiocco for your very insightful GIST.

transformers-interpret's People

Contributors

cdpierse avatar cwenner avatar dependabot[bot] avatar lalitpagaria avatar owaiskhan9654 avatar pabvald avatar rinapch avatar voyz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transformers-interpret's Issues

change tokenizer parameters

Hi,

It is ok when I use SequenceClassificationExplainer with short texts but for long texts it throws an error like
RuntimeError: The expanded size of the tensor (583) must match the existing size (514) at non-singleton dimension 1. Target sizes: [1, 583]. Tensor sizes: [1, 514]

I think it will solve the problem if I modify or pass some parameters like padding="max_length", truncation=True, max_length=max_length to explainer.

Do you have any suggestion for this problem? How can I solve?

Example usage:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers_interpret import MultiLabelClassificationExplainer, SequenceClassificationExplainer

model = AutoModelForSequenceClassification.from_pretrained("model_name")
tokenizer = AutoTokenizer.from_pretrained("model_name")

explainer = SequenceClassificationExplainer(model, tokenizer)

example_text = """some long text"""
word_attributions = explainer(example_text)

Exception:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_247623/3310833535.py in <module>
      1 example_text = """some long text"""
----> 2 word_attributions = explainer(preprocess(example_text), class_name="riskli")

/data/ai-nlp/miniconda3/envs/pl_test/lib/python3.8/site-packages/transformers_interpret/explainers/sequence_classification.py in __call__(self, text, index, class_name, embedding_type, internal_batch_size, n_steps)
    312         if internal_batch_size:
    313             self.internal_batch_size = internal_batch_size
--> 314         return self._run(text, index, class_name, embedding_type=embedding_type)
    315 
    316     def __str__(self):

/data/ai-nlp/miniconda3/envs/pl_test/lib/python3.8/site-packages/transformers_interpret/explainers/sequence_classification.py in _run(self, text, index, class_name, embedding_type)
    266         self.text = self._clean_text(text)
    267 
--> 268         self._calculate_attributions(embeddings=embeddings, index=index, class_name=class_name)
    269         return self.word_attributions  # type: ignore
    270 

/data/ai-nlp/miniconda3/envs/pl_test/lib/python3.8/site-packages/transformers_interpret/explainers/sequence_classification.py in _calculate_attributions(self, embeddings, index, class_name)
    225 
    226         reference_tokens = [token.replace("Ġ", "") for token in self.decode(self.input_ids)]
--> 227         lig = LIGAttributions(
    228             self._forward,
    229             embeddings,

/data/ai-nlp/miniconda3/envs/pl_test/lib/python3.8/site-packages/transformers_interpret/attributions.py in __init__(self, custom_forward, embeddings, tokens, input_ids, ref_input_ids, sep_id, attention_mask, token_type_ids, position_ids, ref_token_type_ids, ref_position_ids, internal_batch_size, n_steps)
     60             )
     61         elif self.position_ids is not None:
---> 62             self._attributions, self.delta = self.lig.attribute(
     63                 inputs=(self.input_ids, self.position_ids),
     64                 baselines=(

/data/ai-nlp/miniconda3/envs/pl_test/lib/python3.8/site-packages/captum/log/__init__.py in wrapper(*args, **kwargs)
     33             @wraps(func)
     34             def wrapper(*args, **kwargs):
---> 35                 return func(*args, **kwargs)
     36 
     37             return wrapper

/data/ai-nlp/miniconda3/envs/pl_test/lib/python3.8/site-packages/captum/attr/_core/layer/layer_integrated_gradients.py in attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, internal_batch_size, return_convergence_delta, attribute_to_layer_input)
    363             self.device_ids = getattr(self.forward_func, "device_ids", None)
    364 
--> 365         inputs_layer = _forward_layer_eval(
    366             self.forward_func,
    367             inps,

/data/ai-nlp/miniconda3/envs/pl_test/lib/python3.8/site-packages/captum/_utils/gradient.py in _forward_layer_eval(forward_fn, inputs, layer, additional_forward_args, device_ids, attribute_to_layer_input, grad_enabled)
    180     grad_enabled: bool = False,
    181 ) -> Union[Tuple[Tensor, ...], List[Tuple[Tensor, ...]]]:
--> 182     return _forward_layer_eval_with_neuron_grads(
    183         forward_fn,
    184         inputs,

/data/ai-nlp/miniconda3/envs/pl_test/lib/python3.8/site-packages/captum/_utils/gradient.py in _forward_layer_eval_with_neuron_grads(forward_fn, inputs, layer, additional_forward_args, gradient_neuron_selector, grad_enabled, device_ids, attribute_to_layer_input)
    443 
    444     with torch.autograd.set_grad_enabled(grad_enabled):
--> 445         saved_layer = _forward_layer_distributed_eval(
    446             forward_fn,
    447             inputs,

/data/ai-nlp/miniconda3/envs/pl_test/lib/python3.8/site-packages/captum/_utils/gradient.py in _forward_layer_distributed_eval(forward_fn, inputs, layer, target_ind, additional_forward_args, attribute_to_layer_input, forward_hook_with_return, require_layer_grads)
    292                     single_layer.register_forward_hook(hook_wrapper(single_layer))
    293                 )
--> 294         output = _run_forward(
    295             forward_fn,
    296             inputs,

/data/ai-nlp/miniconda3/envs/pl_test/lib/python3.8/site-packages/captum/_utils/common.py in _run_forward(forward_func, inputs, target, additional_forward_args)
    454     additional_forward_args = _format_additional_forward_args(additional_forward_args)
    455 
--> 456     output = forward_func(
    457         *(*inputs, *additional_forward_args)
    458         if additional_forward_args is not None

/data/ai-nlp/miniconda3/envs/pl_test/lib/python3.8/site-packages/transformers_interpret/explainers/sequence_classification.py in _forward(self, input_ids, position_ids, attention_mask)
    178 
    179         if self.accepts_position_ids:
--> 180             preds = self.model(
    181                 input_ids,
    182                 position_ids=position_ids,

/data/ai-nlp/miniconda3/envs/pl_test/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/data/ai-nlp/miniconda3/envs/pl_test/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
   1198         return_dict = return_dict if return_dict is not None else self.config.use_return_dict
   1199 
-> 1200         outputs = self.roberta(
   1201             input_ids,
   1202             attention_mask=attention_mask,

/data/ai-nlp/miniconda3/envs/pl_test/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/data/ai-nlp/miniconda3/envs/pl_test/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    814             if hasattr(self.embeddings, "token_type_ids"):
    815                 buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length]
--> 816                 buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
    817                 token_type_ids = buffered_token_type_ids_expanded
    818             else:

RuntimeError: The expanded size of the tensor (583) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [1, 583].  Tensor sizes: [1, 514]

pip install does not work (for me) on windows 10

I'm getting an error when pip installing transformers-interpret

Collecting transformers-interpret
Using cached https://files.pythonhosted.org/packages/20/5c/190decb08671a1b30a7686cb7b26609d28d07f11f1ee164f7374a2cb7f66/transformers-interpret-0.1.4.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\T2NPAU~1.BRA\AppData\Local\Temp\pip-install-7xovqegv\transformers-interpret\setup.py", line 6, in
long_description = fh.read()
File "C:\Users\paul.bradbeer\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5676: character maps to

Code in setup.py assumes utf-8

with open("README.md", "r") as fh:
long_description = fh.read()

If I change the file open to have encoding iso-8859-1 then it works

Is this down to something inside README.md or is it something on my local python env/system?

Adding contribution guideline

@cdpierse
While contributing one PR to this awesome repo, I observed that it would be great to have contribution guideline.

Some of possible points can be added to this guide are -

  • Fork the repo [If not done]
  • Fetch upstream (git fetch upstream)
  • Checkout dev branch
  • Rebase dev branch with upstream (git rebase upstream/dev)
  • Create new branch and add changes to that branch
  • Add/update/remove relevant test cases
  • Perform black formatting on changes
  • Raise PR against upstream dev branch
  • Ask for the review

Few observations about CI process -

  • CI should run on every PR, so PR raiser will get feedback about test failure or build issue early
  • Currently PR merge is very open, it does not check for PR approval and CI pass status

zero_shot_explainer not working xlm-roberta-large-xnli-anli

model_name="vicgalle/xlm-roberta-large-xnli-anli"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

IndexError Traceback (most recent call last)
in
3 zero_shot_explainer = ZeroShotClassificationExplainer(model, tokenizer)
4
----> 5 word_attributions = zero_shot_explainer(
6 "国家**美国",
7 labels = ["国家", "**", "美国"],

/opt/conda/lib/python3.8/site-packages/transformers_interpret/explainers/zero_shot_classification.py in call(self, text, labels, embedding_type, hypothesis_template, include_hypothesis, internal_batch_size, n_steps)
314 self.hypothesis_labels = [hypothesis_template.format(label) for label in labels]
315
--> 316 predicted_text_idx = self._get_top_predicted_label_idx(
317 text, self.hypothesis_labels
318 )

/opt/conda/lib/python3.8/site-packages/transformers_interpret/explainers/zero_shot_classification.py in _get_top_predicted_label_idx(self, text, hypothesis_labels)
143 )
144 attention_mask = self._make_attention_mask(input_ids)
--> 145 preds = self._get_preds(
146 input_ids, token_type_ids, position_ids, attention_mask
147 )

/opt/conda/lib/python3.8/site-packages/transformers_interpret/explainers/question_answering.py in _get_preds(self, input_ids, token_type_ids, position_ids, attention_mask)
212 ):
213 if self.accepts_position_ids and self.accepts_token_type_ids:
--> 214 preds = self.model(
215 input_ids,
216 token_type_ids=token_type_ids,

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

/opt/conda/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
993 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
994
--> 995 outputs = self.roberta(
996 input_ids,
997 attention_mask=attention_mask,

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

/opt/conda/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, output_attentions, output_hidden_states, return_dict)
685 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
686
--> 687 embedding_output = self.embeddings(
688 input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
689 )

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

/opt/conda/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds)
117 inputs_embeds = self.word_embeddings(input_ids)
118 position_embeddings = self.position_embeddings(position_ids)
--> 119 token_type_embeddings = self.token_type_embeddings(token_type_ids)
120
121 embeddings = inputs_embeds + position_embeddings + token_type_embeddings

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/sparse.py in forward(self, input)
122
123 def forward(self, input: Tensor) -> Tensor:
--> 124 return F.embedding(
125 input, self.weight, self.padding_idx, self.max_norm,
126 self.norm_type, self.scale_grad_by_freq, self.sparse)

/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1812 # remove once script supports set_grad_enabled
1813 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
-> 1814 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1815
1816

IndexError: index out of range in self

Pass Tokens Instead of String

Is it possible to, instead of passing in a string and tokenizer, pass in the individual parameters required by model.forward?

E.g., instead of passing a string, I can provide input_ids, attention_mask, etc.

I have a custom BERT model with different/additional parameters I'd like to inspect.

TokenClassificationExplainer Example does not work (LIGAttributions receives unexpected keyword argument "target")

Hello, I want to use the TokenClassificationExplainer. But when I run the example from your README.md, I receive the error

Traceback (most recent call last):
  File "trans_int.py", line 14, in <module>
    word_attributions = ner_explainer(sample_text, ignored_labels=['O'])
  File "/vol/fob-vol7/mi19/harnisph/ti/transformers_interpret/explainers/token_classification.py", line 296, in __call__
    return self._run(text, embedding_type=embedding_type)
  File "/vol/fob-vol7/mi19/harnisph/ti/transformers_interpret/explainers/token_classification.py", line 264, in _run
    self._calculate_attributions(embeddings=embeddings)
  File "/vol/fob-vol7/mi19/harnisph/ti/transformers_interpret/explainers/token_classification.py", line 234, in _calculate_attributions
    n_steps=self.n_steps,
TypeError: __init__() got an unexpected keyword argument 'target'

After debugging I saw, that the constructor of LIGAttributions receives target=7 and in the class itself it has an argument target: Optional[Union[int, Tuple, torch.Tensor, List]] = None, so I dont understand why it struggles with this argument as it is obviously there and should also be a valid value type.

Maybe someone can help me to understand/fix this. Thank you in advance.

Correctness of summing up embeddings by the sequence dimension

Hi, thanks for the great repo!

I would like to discuss the implementation of Integrated Gradients in combination with the text models. I found that you sum IG outputs by the last dimension to get a single scalar for each token, I also remember that the implementation of AllenNLP uses the same trick. But do we really can simply sum them as the gradients can be negative? If some of the gradients for a particular token are negative, summing them with the rest will reduce the overall token's "score". Could it be more correct if we will take some kind of norm (L1, L2) for each token?

Support for NER models

Discussed in #83

Originally posted by pabvald May 1, 2022
Hi!

I have tried to generate explanations for NER models but I haven't found any description of how to implement this, so I have tried to extrapolate what is been already done for other tasks, such as multi-class classification.

I have followed the approach described in the Captum tutorial to interpret BERT in which a custom forward function that allows to access a particular position of the prediction using the position input argument is used and the attributions are computed with respect to the BertEmbeddings using the captum.attr.LayerIntegratedGradients class.

I have generated two types of visualizations:

  1. Attributions considering each token (position) and its corresponding predicted NER label: this allows to see how the rest of the sentence influenced each token to be classified into a particular NER class.
  2. Attributions considering a single token and all the possible NER classes: this allows to see how the rest of the sentence influenced a certain token to be classified into a class and not into any of the others.

A notebook with these two visualizations can be found in this repo. I would like to know if my approach makes sense, or, if not, what I should do to correctly visualize explanations for NER models. Once I am sure my approach is right, I would like to contribute to Transformers Interpret implementing the support for NER models.

Thank you in advance for the help!

base_model_prefix

I love this library ❤️ However, I am trying to use it with a fine-tuned BERT classifier, but I keep getting:

"AttributeError: 'ClassificationModel' object has no attribute 'base_model_prefix".

My first question, does the interpreter work if I load a community fine-tuned model? According to the doc, it should work with all classification transformer models. So, I don't quite get the issue here.

Second, I tried to hard-code it in the code (explainer.py), but I keep getting the same error.

self.model_prefix = "bert"

Any ideas?

Thanks!

Will this work for ZeroShotClassification?

Hello Charles,

I used the example from your medium article with one change. On replacing the DistilBert model with the Bart model, I get an error that the attribute does not exist. The model I am trying to use is "facebook/bart-large-mnli".

File "C:\TextClassificationV5\lib\site-packages\transformers_interpret\explainers\sequence_classification.py", line 129, in _calculate_attributions
attribute = getattr(self.model, self.model_type)
File "C:\TextClassificationV5\lib\site-packages\torch\nn\modules\module.py", line 779, in __getattr__
type(self).__name__, name))
torch.nn.modules.module.ModuleAttributeError: 'BartForSequenceClassification' object has no attribute 'bart'"

The model name is BartForSequenceClassification
And the model_type is Bart

Have I missed something?

Use model.embeddings.word_embeddings as in LIG

LIG attributions in sequence classifier currently take model.model_name.embeddings as embeddings. It seems that layer integrated gradients are computed separately w.r.t to each of the three possible embedding layers:

  • word_embeddings
  • position_embeddings
  • token_type_embeddings

For the calculation and display of word attributions which this package is currently primarily focussed on it seems using the word_embeddings as default might be a better direction. To maintain flexibility however I can add in an optional parameter in the Explainers' init method for selecting which embedding layer (or all) to calculate attributions on.

Tips for using Pairwise Inputs?

I tried just passing 'input 1 [SEP] input 2' but kept getting the wrong predicted label from the SingleLabelClassifier -- I assume its because the tokenizer isn't assigning token_type_ids. Is there any hacky-way to fix this?

How are attributions calculated?

Thank you for your amazing work. The documentation for this project appears to be limited to code usage. I couldn't find much explanation for the actual method used to explain the model. Explicitly, some comments for the _calculate_attributions() method would be helpful to give an idea on how attributions are calculated. Thanks!

Multirprocessing - AttributeError: Can't pickle local object

Hi

Running the below code, getting an error when using multiprocessing. Please help

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers_interpret import ZeroShotClassificationExplainer

tokenizer = AutoTokenizer.from_pretrained("typeform/distilbert-base-uncased-mnli") #facebook/bart-large-mnli, typeform/distilbert-base-uncased-mnli
model = AutoModelForSequenceClassification.from_pretrained("typeform/distilbert-base-uncased-mnli")
model.cuda()
zero_shot_explainer = ZeroShotClassificationExplainer(model, tokenizer)

def get_att_label(tags,zero_shot_explainer,sentence):
    word_attributions = zero_shot_explainer(
    sentence,
    labels = tags)
    return zero_shot_explainer.predicted_label, word_attributions[zero_shot_explainer.predicted_label]

from torch.multiprocessing import Pool, Process, set_start_method
from functools import partial
from tqdm import tqdm
try:
     set_start_method('spawn')
except RuntimeError:
    pass

if __name__ == '__main__': 
    p = Pool(processes=5)
    get_att_label_fixed_params = partial(get_att_label, tags=tag_values, zero_shot_explainer=zero_shot_explainer)
    predictions = p.map(get_att_label_fixed_params,test_lst)
    p.close()
    p.terminate()
    p.join()

Error -


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-16-91ebdc8201d7> in <module>
      3     get_att_label_fixed_params = partial(get_att_label, tags=tag_values, zero_shot_explainer=zero_shot_explainer)
----> 4     predictions = p.map(get_att_label_fixed_params,test_lst)
      5     p.close()

~/anaconda3/envs/pytorch_p36/lib/python3.6/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    264         in a list that is returned.
    265         '''
--> 266         return self._map_async(func, iterable, mapstar, chunksize).get()
    267 
    268     def starmap(self, func, iterable, chunksize=None):

~/anaconda3/envs/pytorch_p36/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

~/anaconda3/envs/pytorch_p36/lib/python3.6/multiprocessing/pool.py in _handle_tasks(taskqueue, put, outqueue, pool, cache)
    422                         break
    423                     try:
--> 424                         put(task)
    425                     except Exception as e:
    426                         job, idx = task[:2]

~/anaconda3/envs/pytorch_p36/lib/python3.6/multiprocessing/connection.py in send(self, obj)
    204         self._check_closed()
    205         self._check_writable()
--> 206         self._send_bytes(_ForkingPickler.dumps(obj))
    207 
    208     def recv_bytes(self, maxlength=None):

~/anaconda3/envs/pytorch_p36/lib/python3.6/multiprocessing/reduction.py in dumps(cls, obj, protocol)
     49     def dumps(cls, obj, protocol=None):
     50         buf = io.BytesIO()
---> 51         cls(buf, protocol).dump(obj)
     52         return buf.getbuffer()
     53 

AttributeError: Can't pickle local object 'LayerIntegratedGradients.attribute.<locals>.gradient_func'

Can you please assist ?

Regards,
Subham

pip install does not work (for me) on Windows 10

I'm getting an error when pip installing transformers-interpret

Collecting transformers-interpret
Using cached https://files.pythonhosted.org/packages/20/5c/190decb08671a1b30a7686cb7b26609d28d07f11f1ee164f7374a2cb7f66/transformers-interpret-0.1.4.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\T2NPAU~1.BRA\AppData\Local\Temp\pip-install-7xovqegv\transformers-interpret\setup.py", line 6, in
long_description = fh.read()
File "C:\Users\paul.bradbeer\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5676: character maps to

Code in setup.py assumes utf-8

with open("README.md", "r") as fh:
long_description = fh.read()

If I change the file open to have encoding iso-8859-1 then it works

Is this down to something inside README.md or is it something on my local python env/system?

Quick Start example fails with device error

Hi,

At least on my environment, with transformers 3.5.0 and pytorch 1.6.0, the quickstart example fails with an error about the device used, see below. I was able to get it to run on cpu by setting cls_explainer.device = torch.device("cpu"), but I'm wondering if there's a way to run it on gpu?

import torch
import transformers
torch.__version__
'1.6.0'
transformers.__version__
'3.5.0'
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
from transformers_interpret import SequenceClassificationExplainer
cls_explainer = SequenceClassificationExplainer("I love you, I like you", model, tokenizer)
attributions = cls_explainer()
---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

<ipython-input-4-b5ac24f5e40f> in <module>
      1 from transformers_interpret import SequenceClassificationExplainer
      2 cls_explainer = SequenceClassificationExplainer("I love you, I like you", model, tokenizer)
----> 3 attributions = cls_explainer()


/data/anaconda/lib/python3.7/site-packages/transformers_interpret/explainers/sequence_classification.py in __call__(self, text, index, class_name)
    133 
    134     def __call__(self, text: str = None, index: int = None, class_name: str = None):
--> 135         return self.run(text, index, class_name)
    136 
    137     def __str__(self):


/data/anaconda/lib/python3.7/site-packages/transformers_interpret/explainers/sequence_classification.py in run(self, text, index, class_name)
     51             self.text = text
     52 
---> 53         self._calculate_attributions(index=index, class_name=class_name)
     54         return self.attributions
     55 


/data/anaconda/lib/python3.7/site-packages/transformers_interpret/explainers/sequence_classification.py in _calculate_attributions(self, index, class_name)
    114                 self.selected_index = self.predicted_class_index
    115         else:
--> 116             self.selected_index = self.predicted_class_index
    117 
    118         if self.attribution_type == "lig":


/data/anaconda/lib/python3.7/site-packages/transformers_interpret/explainers/sequence_classification.py in predicted_class_index(self)
     62     def predicted_class_index(self):
     63         if self.input_ids is not None:
---> 64             preds = self.model(self.input_ids)[0]
     65             self.pred_class = torch.argmax(torch.softmax(preds, dim=0)[0])
     66             return torch.argmax(torch.softmax(preds, dim=1)[0]).detach().numpy()


/data/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),


/data/anaconda/lib/python3.7/site-packages/transformers/modeling_distilbert.py in forward(self, input_ids, attention_mask, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
    629             output_attentions=output_attentions,
    630             output_hidden_states=output_hidden_states,
--> 631             return_dict=return_dict,
    632         )
    633         hidden_state = distilbert_output[0]  # (bs, seq_len, dim)


/data/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),


/data/anaconda/lib/python3.7/site-packages/transformers/modeling_distilbert.py in forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
    479 
    480         if inputs_embeds is None:
--> 481             inputs_embeds = self.embeddings(input_ids)  # (bs, seq_length, dim)
    482         return self.transformer(
    483             x=inputs_embeds,


/data/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),


/data/anaconda/lib/python3.7/site-packages/transformers/modeling_distilbert.py in forward(self, input_ids)
    106         position_ids = position_ids.unsqueeze(0).expand_as(input_ids)  # (bs, max_seq_length)
    107 
--> 108         word_embeddings = self.word_embeddings(input_ids)  # (bs, max_seq_length, dim)
    109         position_embeddings = self.position_embeddings(position_ids)  # (bs, max_seq_length, dim)
    110 


/data/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),


/data/anaconda/lib/python3.7/site-packages/torch/nn/modules/sparse.py in forward(self, input)
    124         return F.embedding(
    125             input, self.weight, self.padding_idx, self.max_norm,
--> 126             self.norm_type, self.scale_grad_by_freq, self.sparse)
    127 
    128     def extra_repr(self) -> str:


/data/anaconda/lib/python3.7/site-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1812         # remove once script supports set_grad_enabled
   1813         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1814     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   1815 
   1816 


RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select

Multiple GPU usage not working

I am attempting to use this by moving my model into the torch DataParallel class:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = [[[CreateModel()]]]

model= nn.DataParallel(model)
model.to(device)

and ensuring that attributes are accessible:

class MyDataParallel(nn.DataParallel):
    def __getattr__(self, name):
        return getattr(self.module, name)

I get runtime errors of tensor sizes when I use the CLS explainer. Any plans to enable multiple GPU utilization?

Link to paper?

Hi, could you please link me the paper that this repo is adapted to?

Support for Tensorflow 2

Hi There, Thanks a lot for creating and sharing this, Its very useful 👍. Are you planning to extend the support for HF tensorflow-2 models in near future?

[SEP] index is 1 smaller than it should be

When using ZeroshotClassificationExplainer, the last token's value is not contained in outputs.

word_attributions = zeroshot_explainer(
        text = "吾輩は猫である",
        labels = ["文学"],
        hypothesis_template = "この文は、{}に関するものである。",
)

print(word_attributions)

# the last token "ある" 's value is not contained in outputs
>> {'文学': [('[CLS]', 0.0), ('吾', -0.07784473169266609), ('##輩', -0.2678669763950776), ('は', -0.5911349137518062), ('猫', -0.6037091075576813), ('で', -0.45637956560324705)]}

I guess the reason is that [SEP] index is 1 smaller than it should be.
In ZeroshotClassificationExplainer._make_input_reference_pair, L187 should be len(text_ids) + 1, not len(text_ids) .

Add support for GPT2ForSequenceClassification classification

When testing SequenceClassificationExplainer out with DialogRPT which uses the GPT2 sequence classification head I noticed that it fails.

So far I can see two reasons for this:

  • the BOS and EOS tokens are handled differently.
  • The embeddings cannot be fetched the same way as they are with BERT like models

AttributeError: 'str' object has no attribute 'config'

Hi @cdpierse , your project looks very nice, congrats !

I have a problem with the SequenceClassificationExplainer() that is not working with any classification model in input (including the DistilBertForSequenceClassification you used in your towardsdatascience article). I have the following error:

AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_21316/2774477167.py in <module>
      1 from transformers_interpret import SequenceClassificationExplainer
----> 2 cls_explainer = SequenceClassificationExplainer("I love you, I like you", model, tokenizer)

/dds/miniconda/envs/py39/lib/python3.9/site-packages/transformers_interpret/explainers/sequence_classification.py in __init__(self, model, tokenizer, attribution_type, custom_labels)
     55             AttributionTypeNotSupportedError:
     56         """
---> 57         super().__init__(model, tokenizer)
     58         if attribution_type not in SUPPORTED_ATTRIBUTION_TYPES:
     59             raise AttributionTypeNotSupportedError(

/dds/miniconda/envs/py39/lib/python3.9/site-packages/transformers_interpret/explainer.py in __init__(self, model, tokenizer)
     17         self.tokenizer = tokenizer
     18 
---> 19         if self.model.config.model_type == "gpt2":
     20             self.ref_token_id = self.tokenizer.eos_token_id
     21         else:

AttributeError: 'str' object has no attribute 'config'

I suspect that it might be a version problem with HuggingFace transformers lib. I'm currently using transformers 4.16.2 and transformers-interpret 0.6.0.

Thanks for your help

What algorithm is used to visualize text in SequenceClassificationExplainer

transformers-interpret is a good toolkit, but I want to know more about the algorithm involved in transformers-interpret. I'm curious about the method SequenceClassificationExplainer, this method directly calls captum.attr.visualization. But I want to know more about the algorithm involved in visualize text, what algorithms used to visualize text?

ZeroShotClassificationExplainer does not correctly explain ZeroShotClassificationPipeline results (single label)

In the case of a single label, the logic to calculate the classification probability with the ZeroShotClassificationExplainer (see here) is different than the logic in the Huggingface ZeroShotClassificationPipeline (see here):

  • The Huggingface ZeroShotClassificationPipeline calculates the softmax over entailment and contradiction scores and returns the resulting value for entailment, but
  • the ZeroShotClassificationExplainer returns just the sigmoid of the entailment score.

At least, if this is intended, it should be documented somewhere. My usecase is multi-label classification and I used the single label approach to simulate that, but it took me some time to figure out that this does not work to explain ZeroShotClassificationPipeline predictions.

[Question] Map attention weights to original input text instead of tokenized input

Thanks for creating this library. It's super useful :). I have some questions which I hope you can share some of your insights.

How can I map the attention weights assigned to the tokenized text to the original input text? or if there are any libraries that could help solve this?

The reason is that the tokenized text is not really UX friendly and not necessarily interpretable for non-ML people.

Some examples

  • "I have a new GPU" would become "i", "have", "a", "new", "gp", "##u".
  • "Don't you love 🤗Transformers Transformers? We sure do." would become don't you love [UNK] transformers? we sure do.. I want to be able to show which part of the original sentence the model looks at i.e. [UNK] is pointing to 🤗Transformers

https://stackoverflow.com/questions/70107997/mapping-huggingface-tokens-to-original-input-text

CUDA out of memory when using Longformer base model

Thank you for your amazing work.
I want to use your transformer interpreter to explain a Longformer model. I literally replace distilbert-base-uncased-finetuned-sst-2-english with allenai/longformer-base-4096 and I encounter a CUDA out of memory error. I don't touch anything else at all. Just changed the model name in the example provided in the README file.

I'm using Google Colab and it has no issue with training Longformer model. Is this a known issue?
16000 MB of GPU memory should be enough for such task. Is there any workaround to quickly fix this? Thank you.

Package Version:
0.5.0

GPU spec:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P0    28W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-18-8b25e2022be5> in <module>()
----> 1 word_attributions = cls_explainer("I love you, I like you")
      2 word_attributions

22 frames
/usr/local/lib/python3.7/dist-packages/transformers/models/longformer/modeling_longformer.py in forward(self, hidden_states, attention_mask, layer_head_mask, is_index_masked, is_index_global_attn, is_global_attn, output_attentions)
    650 
    651         # softmax sometimes inserts NaN if all positions are masked, replace them with 0
--> 652         attn_probs = torch.masked_fill(attn_probs, is_index_masked[:, :, None, None], 0.0)
    653         attn_probs = attn_probs.type_as(attn_scores)
    654 

RuntimeError: CUDA out of memory. Tried to allocate 604.00 MiB (GPU 0; 15.90 GiB total capacity; 14.28 GiB already allocated; 357.75 MiB free; 14.67 GiB reserved in total by PyTorch)

F

I'm getting an error when pip installing transformers-interpret

Collecting transformers-interpret
Using cached https://files.pythonhosted.org/packages/20/5c/190decb08671a1b30a7686cb7b26609d28d07f11f1ee164f7374a2cb7f66/transformers-interpret-0.1.4.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\T2NPAU~1.BRA\AppData\Local\Temp\pip-install-7xovqegv\transformers-interpret\setup.py", line 6, in
long_description = fh.read()
File "C:\Users\paul.bradbeer\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5676: character maps to

Code in setup.py assumes utf-8

with open("README.md", "r") as fh:
long_description = fh.read()

If I change the file open to have encoding iso-8859-1 then it works

Is this down to something inside README.md or is it something on my local python env/system?

Gpu usage

Hi

How do I make sure I am utilising gpu ?

Code:

`from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers_interpret import ZeroShotClassificationExplainer

tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-mnli")
model = AutoModelForSequenceClassification.from_pretrained("facebook/bart-large-mnli")
zero_shot_explainer = ZeroShotClassificationExplainer(model, tokenizer)

%%time
word_attributions = zero_shot_explainer(
"Today apple released the new Macbook showing off a range of new features found in the proprietary silicon chip computer. ",
labels = ["finance", "technology", "sports"],
)`

CPU times: user 2min 47s, sys: 3.41 s, total: 2min 51s
Wall time: 1min 26s

When I run nvidia-smi, it does not show any usage. Is there some other command to check gpu usage ?

Tue Aug 3 05:59:18 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.03 Driver Version: 450.119.03 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 36C P8 29W / 149W | 3MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

Server has 4vcpu and 64gb ram along with the above gpu.

torch.version - '1.8.1+cu111'
torch.cuda.is_available() - True

Thanks,
Subham

@subhamkhemka

Binary Classification: How is predicted label computed?

Hi there,

I am observing the following (strange) behavior when using pipeline from the transformers library and transformer-interpret:

text = "Now Accord networks is a company in video, and he led the sales team, and the marketing group at Accord, and he took it from start up, sound familiar, it's from start up to $60 million company in two years."
classifier = pipeline('text-classification',  model=model, tokenizer=tokenizer, device=0)
classifier(text)
[{'label': 'LABEL_1', 'score': 0.9711543321609497}]

while transformer-interpret gives me slightly different scores:

explainer = SequenceClassificationExplainer(model, tokenizer)
attributions = explainer(text)
html = explainer.visualize()

image

In both cases I apply the exact same model and tokenizer...

I am grateful for any hint and/or advice! 🤗

Can we use this for multitasking pytorch model

I am solving a multitask classification problem with own model definition. I am using SciBERT(Shared) + MLP(Individual for all tasks). Is their any way to use this library to understand the SciBERT functioning on data. Or Do you know any library which can help me.

Thanks

Support for list of sentences

The current version does not support function for getting attributions for list of sentences in SequenceClassificationExplainer . Is this feature in process of development? If not then I would like to work upon this.

[RobertaForSequenceClassification] RAM memory leaks during retrieving word attributions

Hi folks,

I'm experiencing some memory leaks when using the Transformers Interpret (TI) library which triggers out of memory errors and kills the model service process.

Setup

Here is how the TI lib is being using (roughly):

class NLPModel:
    """
    NLP Classification Model
    """

    def __init__(self):

        self.label_encoder: LabelEncoder = joblib.load(
           ...
        )
        self.tokenizer = AutoTokenizer.from_pretrained(
            ...
            padding="max_length",
            truncation=True,
        )

        self.model: "RobertaForSequenceClassification" = AutoModelForSequenceClassification.from_pretrained(
           ..., num_labels=self.num_classes
        )

        self.classifier = pipeline(
            "text-classification", model=self.model, tokenizer=self.tokenizer, return_all_scores=True
        )

        self.explainer = SequenceClassificationExplainer(self.model, self.tokenizer) # the TI library

    def get_word_attributions(self, agent_notes: str) -> List[WordAttribution]:
        """
        Retrieves attributions for each word based on the model explainer
        """

        with torch.no_grad():
            raw_word_attributions: List[RawWordAttribution] = self.explainer(agent_notes)[1:-1] 

        # some post processing of the raw_word_attributions
        # return processed word attributions

model = NLPModel()

def get_model() -> NLPModel:
     return model

The code is running as a FastAPI view with one server worker:

app = FastAPI()
router = APIRouter(prefix=URL_PREFIX)

# some views

@router.post("/predict/")
def get_predictions(payload: PredictionPayload, model: NLPModel = Depends(get_model)):
    samples = payload.samples
    predictions = model.predict(...)  # regular forward pass on the roberta classification model
    response: List[dict] = []

    for sample, prediction in zip(samples, predictions):
        word_attributions = model.get_word_attributions(sample.text)

        response.append(
            {
                # .... some other information
                "predictions": prediction,
                "word_attributions": [attribution.dict() for attribution in word_attributions],
            }
        )

    return JSONResponse(content=jsonable_encoder(response))

app.include_router(router)
uvicorn main:app --workers 1 --host 0.0.0.0 --port 8080 --log-level debug

The whole service is running on CPU/RAM, no GPU/CUDA is available.

Problem

Now when I start to send sequential requests to the service, it allocates more and more memory after each of the request. Eventually this leads to OOM errors and the server gets killed by the system.

Memory allocation roughly looks like this (these statistics I have collected on my local docker environment where I have 6 GB RAM limit):
Screen Shot 2022-02-15 at 16 39 58

Using empirical experiments, I was able to define that problem lays in the following line:

raw_word_attributions: List[RawWordAttribution] = self.explainer(agent_notes)[1:-1] 

When I disabled the line, the service used not more than 500-700MB and the memory consumption almost stayed the same.

Now from what I understand the TI library calculates gradients in order to identify word attributions. I suspect that this is the reason of the issue. However, using zero_grad() on the whole model did not help me to clean up RAM. I have tried more tricks like forcing GC collection, removing the explainer and model instances but non of the things really helped.

Do you have any ideas how could I clean up RAM that the service uses after running the Transformers Interpret library?

Appreciate your help 🙏

Various requests and LIG bug

This package is amazing!! Thank you so much for putting it together. I had a hard time using it though and noticed a few issues of issues (detailed below) that made me eventually drop it in favor of writing my own abstractions, so I believe this feedback, if incorporated, would make your package much more helpful and popular for others! So sorry for dumping all of these issues/feature requests on you without putting in some pull requests myself but I hope it's useful :)

  1. I think this was the biggest issue for me, the interface is confusingly split across an object (SequenceClassificationExplainer) that also acts as a quasi-factory method. The constructor and instance also overlap in their functionality, e.g. my understanding is that there's two ways to attribute a bunch of texts:
    for text, label in zip(texts, labels):
        explainer = SequenceClassificationExplainer(text, model, tokenizer)
        attributions = explainer(index=1)
        print(attributions.word_attributions)             # note how the products of the inference/attribution live in two different objects
        print(cls_explainer.predicted_class_index)

or, equivalently

    explainer = SequenceClassificationExplainer("",  model, tokenizer)
    for text, label in zip(texts, labels):
        attributions = explainer(text= text, index=1)
        print(attributions.word_attributions)             # products of the inference/attribution still live in two different objects?
        print(cls_explainer.predicted_class_index)

I think it might be much cleaner to have one object which is in charge of orchestrating the attribution!

    explainer = SequenceClassificationExplainer(model, tokenizer)    
    for text, label in zip(texts, labels):
        attributions = explainer.explain(text= text, index=1)
        print(attributions.word_attributions)             # products of the inference/attribution live in same object
        print(attributions.predicted_class_index)
  1. true_class in visualize() is expected to be a string, but immediately cast to an int

  2. I think there's a bug in how you're calling LIG? I think using the input_ids as a baseline is bad because then there's nothing for LIG to interpolate between when computing gradients. I'm less sure about the token type ids but I think the preferred approach is just giving it the special tokens and pads as your baseline.

  1. Argument "text" goes unused

    self, pred_prob, pred_class, true_class, attr_class, text, all_tokens

  2. It would be nice if cls_explainer.visualize() optionally wrote to a file and instead returned the HTML so the user could decide what to do with it.

Again, thank you so much for putting this together and hope this is useful for you!

Max class probability too low with a multi-class classifier

Hi @cdpierse , very nice project, congrats!

I am doing some experiments for bias in sentiment classifiers using our tool Rubrix together with transformers-interpret and I have encountered an issue.

I am using the following sentiment pipeline:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers_interpret import SequenceClassificationExplainer

model_name = "cardiffnlp/twitter-roberta-base-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

cls_explainer = SequenceClassificationExplainer(model, tokenizer)

And one of the weird examples is:

word_attributions = cls_explainer("This woman is a secretary.")

I have three labels, and the model predicts LABEL_0 (negative) but the probability is too low (0.14) assuming the model is multi-class and not multilabel. Using the model widget from Hugging (https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment?text=This+woman+is+a+secretary.) I get 0.57 probability. Maybe I'm missing something when creating the SequenceClassifcationExplainer or loading the model.

Lastly, would it be possible to get all predicted labels and probabilities not only the max probability label?

Keep up the good work, and let me if I can contribute a fix/enhancement if needed

Segmentation fault when running zero shot classification

Hi,

Code :

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers_interpret import ZeroShotClassificationExplainer

tokenizer = AutoTokenizer.from_pretrained("typeform/distilbert-base-uncased-mnli") #facebook/bart-large-mnli, typeform/distilbert-base-uncased-mnli
model = AutoModelForSequenceClassification.from_pretrained("typeform/distilbert-base-uncased-mnli")
model.cuda()
zero_shot_explainer = ZeroShotClassificationExplainer(model, tokenizer)

pred_class=[]
pred_class_attributions=[]
for i in test_lst:
    word_attributions = zero_shot_explainer(
    i,
    labels = tag_values)
    pred_class.append(zero_shot_explainer.predicted_label)
    pred_class_attributions.append(word_attributions[zero_shot_explainer.predicted_label])

Length of test_lst is 5 and tag_values is 52.

Cmd - python -Xfaulthandler interpret_v1.py

Logs -

(pytorch_latest_p37) [[email protected] test]$ python -Xfaulthandler interpret_v1.py 
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
0
['cocktail']
[[('[CLS]', 0.0), ('make', 0.07347648148605623), ('your', -0.05869353189040274), ('own', 0.09924983470472397), ('bis', -0.2031768971755763), ('##cuit', -0.07013737901068032), ('mix', -0.7664830511524773), ('party', -0.474903031843741), ('favour', 0.05818274110665375), ('with', 0.344515970736176)]]
1
['cocktail', 'clear']
[[('[CLS]', 0.0), ('make', 0.07347648148605623), ('your', -0.05869353189040274), ('own', 0.09924983470472397), ('bis', -0.2031768971755763), ('##cuit', -0.07013737901068032), ('mix', -0.7664830511524773), ('party', -0.474903031843741), ('favour', 0.05818274110665375), ('with', 0.344515970736176)], [('[CLS]', 0.0), ("'", 0.20192346184296878), ('brighter', 0.07367997923590573), ('days', 0.2074040520906648), ('are', 0.018519351322348894), ('ahead', -0.0365908221753947), ("'", 0.45054462728286815), ('art', 0.4889299062595618), ('print', -0.6143691546487715), (',', 0.14646573014999828), ('un', 0.17538161160139595), ('##frame', 0.19338197528230183)]]
2
free(): invalid pointer
Fatal Python error: Aborted

Thread 0x00007f212b4fb700 (most recent call first):

Current thread 0x00007f2243356740 (most recent call first):
  File "/home/ec2-user/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/transformers_interpret/explainers/zero_shot_classification.py", line 146 in _get_top_predicted_label_idx
  File "/home/ec2-user/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/transformers_interpret/explainers/zero_shot_classification.py", line 317 in __call__
  File "interpret_v1.py", line 60 in <module>
Aborted

Please help.

What are the definitions of the output?

@cdpierse , Can you give definitions of the output words? Ill put my understandings, but please correct them.

True Label: The index of the label given in class_name?
Predicted Label: Output of the model
Predicted label in parens: I have no idea, I cant reproduce this number. Doesnt appear to be the predicted probability or the prob normalized with softmax.
Attribution Label: The label given in class_name
Attribution Score: Not sure. Doesnt appear to be the score of the class_name label.
Legend: How each word effects the Attribution Label class?

Explainers for two input classification problem

Thanks for the project.

My question is that if we have an input composed of two sentences for a classification problem, which should be separated with a [SEP] token, how can we set up the explainers? As I understand from the source code, it encodes single text.

What can be done for MultipleChoice Models ?

I want to get some info about sentence importance. So my models are ForMultipleChoice from HF transformers. I was wondering if you've any suggestions on getting importance scores. I am mostly interested in finding out which tokens are primarily responsible for giving out a correct option. I also talked about it here. You seem to have done something similar with regards to sentence classification. If you've any suggestions for Multiple choice use case, I am happy to try out transformers-interpret, I do know its on the roadmap. If not, feel free to close this issue.

Possible Multilabel Extension?

Given that it's possible to generate word attributions for both the positive and negative class, it seems to be quite extensible to go through a list of labels in a multilabel setting and generate attributions for each one. Would that be a project of interest here ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.