teamhg-memex / eli5 Goto Github PK

View Code? Open in Web Editor NEW

2.7K 67.0 330.0 36.55 MB

A library for debugging/inspecting machine learning classifiers and explaining their predictions

Home Page: http://eli5.readthedocs.io

License: MIT License

Python 5.93% HTML 0.10% Jupyter Notebook 93.95% Shell 0.02%

scikit-learn machine-learning xgboost lightgbm crfsuite inspection explanation nlp data-science python

eli5's Introduction

ELI5

ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions.

It provides support for the following machine learning frameworks and packages:

scikit-learn. Currently ELI5 allows to explain weights and predictions of scikit-learn linear classifiers and regressors, print decision trees as text or as SVG, show feature importances and explain predictions of decision trees and tree-based ensembles. ELI5 understands text processing utilities from scikit-learn and can highlight text data accordingly. Pipeline and FeatureUnion are supported. It also allows to debug scikit-learn pipelines which contain HashingVectorizer, by undoing hashing.
Keras - explain predictions of image classifiers via Grad-CAM visualizations.
xgboost - show feature importances and explain predictions of XGBClassifier, XGBRegressor and xgboost.Booster.
LightGBM - show feature importances and explain predictions of LGBMClassifier and LGBMRegressor.
CatBoost - show feature importances of CatBoostClassifier, CatBoostRegressor and catboost.CatBoost.
lightning - explain weights and predictions of lightning classifiers and regressors.
sklearn-crfsuite. ELI5 allows to check weights of sklearn_crfsuite.CRF models.

ELI5 also implements several algorithms for inspecting black-box models (see Inspecting Black-Box Estimators):

TextExplainer allows to explain predictions of any text classifier using LIME algorithm (Ribeiro et al., 2016). There are utilities for using LIME with non-text data and arbitrary black-box classifiers as well, but this feature is currently experimental.
Permutation importance method can be used to compute feature importances for black box estimators.

Explanation and formatting are separated; you can get text-based explanation to display in console, HTML version embeddable in an IPython notebook or web dashboards, a pandas.DataFrame object if you want to process results further, or JSON version which allows to implement custom rendering and formatting on a client.

License is MIT.

Check docs for more.

eli5's People

Contributors

Stargazers

Watchers

Forkers

lopuhin woshahua joastern vyraun xiaolongmeng stevenlol ericshape xsongx pombredanne bc2009 yuhuofei itobysq jnothman nasmi1 soprof creestoph alexxnica kryndex danmartinez78 datnamer fvinas vladminsky rmax-contrib eotp gth158a scofieldyoo cafew lins05 archicxd melcutz elephantscale manyu90 yiyisan zevcc-gh tony32769 pebbleshx ai-hack luoq thepro-dot-xyz ladin157 raamana xglzero unnir hbcbh1999 ilovejs sharthz23 jwuthri chenglongchen hengqujushi hhh920406 undata romainbousquet lazycrazyowl nicolewhite sudarshan1413 shixw1991 whmnoe4j pijju789 next-josealbertoarcos fullstackenviormentss wuqixiaobai hanfeijp idanwekhai taposh alexleethinker bytearchive rhendrickson42 ma5onic athenaxlee bigrlab harrysalmon chetanmehra cxz ryanvarley alexanet kevinbird15 insikk jsantoso-stts haribaskar uzbekdev1 qh582 jbdatascience nikhilkandur vishalbelsare siddharth3977 afcarl genricgg wojohowitz00 mohdkashif93 priya-gittest jonathanbrothers robot-ai-machinelearning iamavb jialing3 pwaila phpmind anxueying nikhilsinghnine xiaoxiao19 olamyy

eli5's Issues

unhashing: sign of a feature can be confusing in case of collisions

A follow-up to #10 and #18: when deciding if a feature should be in top positive or in top negative features we should take in account sign of the most popular term, e.g. instead of

(-)people | considered | approximately +1.739 (as it is now)

it should be better to show

people | (-)considered | (-)approximately -1.739

readthedocs docs are broken

https://readthedocs.org/projects/eli5/builds/

unhashing: in case of collisions sort features by their counts in training dataset

If the estimator is not supported, nothing is shown in the notebook

Because we include only weights, but there are not weights in this case, and it is not clear what happened.

more sampling strategies for LIME

Oversampling is a common strategy for handling imbalanced datasets; we should take a look at how is data over-sampled - maybe there are ideas to steal. See https://github.com/scikit-learn-contrib/imbalanced-learn.

Headers for targets are not aligned

A minor issue, but since targets are rendered in separate tables, their headers might get misaligned, for example here: http://eli5.readthedocs.io/en/latest/tutorials/sklearn-text.html

It should be easy to move headers rendering out, to the top-level table.

LIME: add support for regression tasks

Currently only classification support is built-in. It shouldn't be hard to add regression support.

allow to filter features by their names

Sometimes it is useful to check coefficients only for some of the features. For example, here (scroll down to "What are important features?") one may want to check how e.g. query:... features affect the result, without looking at all other features. This also can be helpful when adding a new feature.

What about adding 'feature_re' or 'feature_patterns' argument to explain_weights functions?

Scikit-learn Pipeline support

add an option to show combined weights for multiclass classifiers

Currently if there are several classes weights for them are shown separately. I think sometimes it could be helpful to show them all in the same table (or even highlight them all in text).

add IPython interactive widget

A widget may allow to change options, e.g.:

change a number of features to show;
show only some of the classes;
filter features by name;
switch between layouts;
etc.

Allow to show features sorted by absolute weight value

See #21 (comment).

JSON serialization of Explanation

I think it makes sense to add something like asdict method to Explanation that will return a JSON-serializable object (it will just call attr.asdict(self)).
And also add test that check that it is indeed json-serializable (right now it can have some numpy ints that are not seriazable).

don't highlight whitespace as red for zero weights

It looks weird with feature imporances:

Support FeatureUnion for InvertableHashingVectorizer

Just adding features from .transformer_list, possibly with prefixes, should be enough

Add number of remaining features for feature importances

Just like we have it for feature weights, this could show the number of remaining features with non-zero feature importance - it can still be lower than total number of features, I think.

drop scikit-learn 0.17.x support

I tried to add tests for scikit-learn 0.17, but it turns out compatibility shims in eli5.lime don't work - e.g. KFold has different API. What do you think about dropping scikit-learn 0.17 support, and supporting only 0.18.x? //cc @lopuhin

Vectorized explain_prediction (?)

It might be possible to improve explain_prediction performance for large number of items by vectorizing predictions and argmax calculations (see for example http://vmprof.com/#/d7e9c13357f50a5ee4d00a69200693f5?id=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0&view=flames). But it's not clear if this is worth additional complications.

Negative feature weights have different order in text and html

Order in text is wrong:

 $ py.test tests/test_sklearn_explain_prediction.py::test_explain_linear_regression[reg0] -s
============================================================================== test session starts ===============================================================================
platform darwin -- Python 3.5.1, pytest-3.0.2, py-1.4.31, pluggy-0.3.1
rootdir: /Users/kostia/shub/memex/eli5, inifile: 
plugins: hypothesis-3.4.2
collected 25 items 

tests/test_sklearn_explain_prediction.py {'estimator': 'ElasticNet(alpha=1.0, copy_X=True, fit_intercept=True, '
              'l1_ratio=0.5,\n'
              '      max_iter=1000, normalize=False, positive=False, '
              'precompute=False,\n'
              "      random_state=42, selection='cyclic', tol=0.0001, "
              'warm_start=False)',
 'method': 'linear model',
 'targets': [{'feature_weights': {'neg': [('x10', -19.656206335733643),
                                          ('x12', -16.947217711388856),
                                          ('x9', -3.368443508747657),
                                          ('x7', -0.73147197826808674)],
                                  'neg_remaining': 0,
                                  'pos': [('<BIAS>', 38.96972344614295),
                                          ('x5', 6.8348858609128671),
                                          ('x11', 4.8082096167385444),
                                          ('x8', 1.8485323743243427),
                                          ('x0', 0.23929256935816867)],
                                  'pos_remaining': 0},
              'score': 11.997304333338633,
              'target': 'y'}]}
Explained as: linear model
'y' (score=11.997) top features
----------------
 +38.970  <BIAS>
  +6.835  x5    
  +4.808  x11   
  +1.849  x8    
  +0.239  x0    
 -19.656  x10   
 -16.947  x12   
  -3.368  x9    
  -0.731  x7

HashingVectorizer with binary=True is not handled correctly

When binary=True, all non-zero entries are replaced with 1. It means sign shouldn't be taken in accont in this case. Example of a wrong explanation:

It is hard to customize formatting in IPython notebook

Currently in order to change formatting options in IPython notebook user has to do something like this:

from IPython.display import HTML
expl = explain_weights(clf, vec=fe, top=20)
HTML(format_as_html(expl, highlight_spaces=False, horizontal_layout=False))

It'd be nice to reduce it to a one-liner.

LIME: normalize data to make distances more comparable

We should use StandardScaler or RobustScaler to make LIME implementation work better when features have different scale.

Impossible to get both bias and filtered features (using feature_re)

Not sure what is the best solution here - always return the bias even with feature_re, or add an option?

add polylearn support

https://github.com/scikit-learn-contrib/polylearn

For binary classification show both classes in description

Currently for binary classification only weights for one class are shown:

It'd be nice to show other class name as well, maybe in the bottom of the table.

add helpers for non-text data to eli5.lime

We need a helper to generate examples similar to a given example. Helpful links:

Extra white borders in html table for feature importances

At least when the table has no extra styles. Reproducing:

py.test tests/test_sklearn_explain_weights.py::test_explain_random_forest -s
open .html/test_sklearn_explain_weights_test_explain_random_forest_RandomForestClassifier.html

TODO:

check weights table styles
check styles in ipython notebook

styles are broken for NER docs

Weight tables don't fit the column at http://eli5.readthedocs.io/en/latest/tutorials/sklearn_crfsuite.html, and there is no scrolling - tables are just truncated.

I was able to fix it locally using custom CSS, but had to revert the fix (153ab4a) because on readthedocs.io original stylesheet was not loaded for some reason.

LIME: add an example on a real dataset

Make _weight_range and _weight_color functions from formatters.html public

And maybe also some other functions? They are needed if we want to render weights in html similar to how it is done in the html formatter.
Another option would be to use an object instead of (name, weight) tuple, and add hsl_color attribute to it. I'm not sure which is better, making functions public feels less committing.

explain_prediction support for decision trees and ensemble

See https://github.com/andosa/treeinterpreter

Unstable test test_lime_utils.py::test_fit_proba

https://travis-ci.org/TeamHG-Memex/eli5/jobs/173112065 - I think this is the same failure I already saw, I added random_state but it did not help:

=================================== FAILURES ===================================
________________________________ test_fit_proba ________________________________
    def test_fit_proba():
        X = np.array([
            [0.0, 0.8],
            [0.0, 0.5],
            [1.0, 0.1],
            [0.9, 0.2],
            [0.7, 0.3],
        ])
        y_proba = np.array([
            [0.0, 1.0],
            [0.1, 0.9],
            [1.0, 0.0],
            [0.55, 0.45],
            [0.4, 0.6],
        ])
        y_bin = y_proba.argmax(axis=1)
    
        # fit on binary labels
        clf = LogisticRegression(C=10, random_state=42)
        clf.fit(X, y_bin)
        y_pred = clf.predict_proba(X)[:,1]
        mae = mean_absolute_error(y_proba[:,1], y_pred)
        print(y_pred, mae)
    
        # fit on probabilities
        clf2 = LogisticRegression(C=10, random_state=42)
        fit_proba(clf2, X, y_proba, expand_factor=200)
        y_pred2 = clf2.predict_proba(X)[:,1]
        mae2 = mean_absolute_error(y_proba[:,1], y_pred2)
        print(y_pred2, mae2)
    
        assert mae2 * 1.2 < mae
    
        # let's get 3th example really right
        sample_weight = np.array([0.1, 0.1, 0.1, 10.0, 0.1])
        clf3 = LogisticRegression(C=10, random_state=42)
        fit_proba(clf3, X, y_proba, expand_factor=200, sample_weight=sample_weight)
        y_pred3 = clf3.predict_proba(X)[:,1]
        print(y_pred3)
    
        val = y_proba[3][1]
        assert abs(y_pred3[3] - val) * 1.5 < abs(y_pred2[3] - val)
>       assert abs(y_pred3[3] - val) * 1.5 < abs(y_pred[3] - val)
E       assert (0.077946544208881308 * 1.5) < 0.10327808741270417
E        +  where 0.077946544208881308 = abs((0.3720534557911187 - 0.45000000000000001))
E        +  and   0.10327808741270417 = abs((0.34672191258729584 - 0.45000000000000001))
tests/test_lime_utils.py:53: AssertionError
----------------------------- Captured stdout call -----------------------------
[ 0.92137462  0.87156298  0.26152978  0.34672191  0.49837953] 0.114698148448
[ 0.99854408  0.90620802  0.1122826   0.31398412  0.59140365] 0.0529117527887
[ 0.9862338   0.94839957  0.23016764  0.37205346  0.59652343]

add fasttext support

https://github.com/facebookresearch/fastText

There is a Python wrapper (https://github.com/salestock/fastText.py), but I'm not sure I like its API, and it segfaults for me sometimes. Time to create another wrapper? :)

documentation

We need a proper documentation.

add sklearn Naive Bayes support

See scikit-learn/scikit-learn#2237.

show_weights with OneVsRestClassifier

Hi guys, I really like this tool! I have a pipeline say

mlb = MultiLabelBinarizer()
y_train = mlb.fit_transform(y_train)
vec = TfidfVectorizer(ngram_range=(1, 2), stop_words='english')
clf = OneVsRestClassifier(LogisticRegressionCV())
pipeline = make_pipeline(vec, clf)
pipeline.fit(X_train, y_train)

show_prediction works neatly, but I run into 'LogisticRegressionCV' object has no attribute 'classes_' when calling eli5.show_weights(clf.estimator, vec=vec, target_names=mlb.classes_) or unsupported class if I use clf directly.

Is it possible to work around this problem or do you plan adding support for this soon?

Cheers!
Simon

Unhashed feature signs are handles incorrectly in explain_prediction

The signs are wrong since the beginning (#12)

For multiclass classification weight ranges used for text highlighting are not the same for all classes

On a screenshot the letter 'd' in 'medication' word has -0.084 weight in the top example and -0.079 weight in the bottom example, according to titles displayed on mouse hover. But the bottom 'd' is brighter. Is it because weight range is different? If so, does it make sense to use the same weight range for all classes?

explain_prediction shouldn't limit feature count

Currently only top features are highlighted by default:

For me (23 more positive / 37 more negative) doesn't make much sense. Should we remove or increase limit for explain_prediction?

defer generating dummy feature names in FeatureUnhasher

I think FeatureUnhasher.get_feature_names should have an option to use nan / None as feature names instead of generated FEATURE[%d] string names. Creating all these string is the slowest part of this code, and it looks unnecessary because printing/formatting code can easily generate missing feature names itself.

LIME: add built-in explanation support via linear regression

Original LIME code uses linear regression; it'd be nice to have it here as well and compare results.

add unhashing support to explain_prediction

It seems that in case of explain_prediction we can call InvertableHashingVectorizer.fit method automatically.

FeatureNames doesn't support slicing

ivec.get_feature_names()[:100] doesn't work for InvertableHashingVectorizer because FeatureName doesn't handle slice objects.

See also: http://stackoverflow.com/questions/13855288/turn-slice-into-range-in-python; there is also a nice comment

Does it help that in Python 3 you can slice a range object to get a new range object?

-- given that a feature is not super-important we could decide to support it only in Python 3 :)

write unit tests for InvertableHashingVectorizer and FeatureUnhasher

Currently they are tested only via integrational test which checks that they produce reasonable explanations when used with explain_weights. As the logic is not straightforward it makes sense to add unit tests for them.

add explain_prediction to eli5.sklearn_crfsuite

Currently it is not possible to show an explanation of an individual sklearn_crfsuite.CRF prediction.

html features: preserve whitespaces

Features with whitespaces in front get these whitespaces removed in HTML.

Compare:

+2.837  spa 
+2.805   spa

and

I think whitespaces should be replaced with   for HTML display. It could also make sense to use another background for text, in order to show whitespaces in the end.

Bad browser performance on large text with highlighted features

Scrolling in the notebook is noticeably laggy when the text with highlighted features is in sight. Maybe it will help if we merge spans with the same weight (at least in case of word analyzers).

Text highlighting: should we preserve density?

When highlighting a feature, we can highlight it regardless of length (currently in master), or try to preserve density, so coloring longer feature with a less intense color. I tried that second approach in preserve-density branch, here are some screenshots with master behaviour on top (links to notebooks: https://github.com/TeamHG-Memex/eli5/blob/preserve-density/notebooks/explain_text_prediction.ipynb for words and https://github.com/TeamHG-Memex/eli5/blob/preserve-density/notebooks/explain_text_prediction_char.ipynb for chars).

some of IPython notebooks are broken

These notebooks got broken after we switched to @attrs instead of using dicts for explanation results: