Git Product home page Git Product logo

maif / shapash Goto Github PK

View Code? Open in Web Editor NEW
2.7K 37.0 327.0 57.65 MB

🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models

Home Page: https://maif.github.io/shapash/

License: Apache License 2.0

Makefile 0.03% Python 15.63% JavaScript 0.01% CSS 0.07% Jupyter Notebook 84.16% HTML 0.06% Jinja 0.04%
python machine-learning explainability explainable-ml transparency ethical-artificial-intelligence shap lime interpretability

shapash's People

Contributors

amnaabbassi avatar amorea04 avatar blanoe avatar dependabot[bot] avatar dragonwarrior15 avatar florinegreciet avatar francesco-marini avatar gap01 avatar githubyako avatar guerinclement avatar guillaume-vignal avatar johannmartin95 avatar mathisbarthere avatar maxgdr avatar mlecardonnel avatar peterdhansen avatar ptitficus avatar sebastienbidault avatar susmitpy avatar thibaudreal avatar thomasbouche avatar yg79 avatar yl79 avatar yvanzubro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

shapash's Issues

Persist postprocessing and preprocessing parameters

We need postprocessing and postprocessing attributes in order to use them during predict step (SmartPredictor)
Persist these 2 arguments in postprocessing and preprocessing attributes

modify the tests to integrate these changes

create to_smartpredictor method

In order to make easy the declaration of a SmartPredictor object,
develop a method to_smartpredictor associate to SmartExplainer

end user syntax to switch from a SmartExplainer to SmartPredictor
xpred = xpl.to_smartpredictor()

Initialize SmartPredictor Object

Initialize the SmartObject that takes the following SmartExplainer attributes:
features_dict
label_dict
model
_case
_classes
columns_dict
postprocessing
preprocessing

create the associated python unit test script

SmartExplainer - Run App Without ypred

Description of Problem:
ypred is an optional parameter to the compile method of smartexplainer
however ypred is needed to use the run_app method
--> Shapash must compute the prediction by itself if the user wants to run the app

Overview of the Solution:
smartexplainer.py - run_app() method: add ypred check, if None then compute ypred using predict() method
smartexplainer.py - add predict() method: refer to predict function
model.py - add predict function

WebApp - Add title

Description of Problem:
Add a title at the top of Shapash WebApp

Overview of the Solution:

  • add an attribute 'model_title' to the SmartExplainer Object
  • model_title could be specified by Shapash User using compile() or add() method (add a new parameter)
  • In the App code: manage truncation of a too long title (there is already a function in util.py to manage that kind of problem)

SmartPredictor add_input

create the add_input method,

the fallowing parameters
x : optional (if x attribute allready exists, you don't need to overwrite it) pd.DataFrame
ypred: optional pd.DataFrame if _case = "classification"
contributions: Pd.Dataframe or list of DataFrame

the add_input do the following steps;

  • initialize ypred, contributions, x if x is specify
  • check x structure (shape, column types, names)
  • ordering columns
  • check ypred: shape and values
  • check contributions (shape, column types, names)
  • apply preprocessing

ERROR: Could not find a version that satisfies the requirement shapash

Hey there and thank you for using Issue Tracker!

Do the checklist before filing an issue:

  • Is this something you can debug and fix? Send a pull request ! Bug fixes and documentation fixes are welcome.
  • Have a usage question ? Ask your question on StackOverflow. We use StackOverflow for usage question and GitHub for bugs.

None of the above, create a bug report

Make sure to add all the information needed to understand the bug so that someone can help. If the info is missing we'll add the 'Needs more information' label and close the issue until there is enough information.

  • Provide a minimal code snippet example that reproduces the bug.
  • Provide screenshots where appropriate
  • What's the version of Python you're using ?
  • Are you using Mac, Linux or Windows?

Python version :

Shapash version :

Operating System :

AttributeError when saving then loading SmartExplainer object

Code to reproduce the error :

xpl = SmartExplainer()
xpl.compile(
    x=Xtest,
    model=regressor,
    preprocessing=encoder, 
    y_pred=y_pred
)
# Saving compiled explainer object
xpl.save('smart_explainer.pickle')

# Loading saved explainer in a new explainer object and displaying contribution plot
xpl2 = SmartExplainer()
xpl2.load('smart_explainer.pickle')
xpl2.plot.contribution_plot(col='1stFlrSF')

Error :

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-f33554cc0550> in <module>
----> 1 xpl2.plot.contribution_plot(col='1stFlrSF')

~/Documents/missions/shapash/shapash/shapash/explainer/smart_plotter.py in contribution_plot(self, col, selection, label, violin_maxf, max_points, proba, width, height, file_name, auto_open)
   1127 
   1128         # Subset
-> 1129         if self.explainer.postprocessing_modifications:
   1130             feature_values = self.explainer.x_contrib_plot.loc[list_ind, col_name].to_frame()
   1131         else:

AttributeError: 'SmartExplainer' object has no attribute 'postprocessing_modifications'

Python version : 3.8

Shapash version : 1.1.0

Operating System : Mac OS

Switch from SmartPredictor to SmartExplainer

Hi Team,

Is it possible to generate local explanations on the new data added to SmartPredictor object? I went through all the tutorials and I understand that when we are satisfied with the explainability results given by Shapash, we can use the SmartPredictor object for deployment. But my question is how can we get local explanation chart after deployment on the new data that is coming in? I hope my question is clear.

Thanks,
Chetan

SmartPredictor init - Check Model - Explainer - Features

the initialization of the SmartPredictor, must check the consistency of the attributes model, explainer, preprocessing, postprocessing, features_dict, columns_dict, features_types

develop the differents functions / methods to check that consistency

predictor_load.add_input method not working

Thanks for your wonderful work on this !!

While exploring tutorial03-Shapash-overview-model-in-production.ipynb notebook I am running into below errors.

Issue# 1
When executing predictor_load.add_input(x=Xtrain, ypred=ytrain) method. It's clear from the error message that Xtrain's dtypes are not matching with original dataset's dtypes. The features_types has dtypes before preprocessing and Xtrain is having after preprocessing dtypes. So both will never match.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-74-f4ec8d8babcb> in <module>
----> 1 predictor_load.add_input(x=Xtrain, ypred=ytrain)

c:\users\chetan_ambi\.conda\envs\shapash\lib\site-packages\shapash\explainer\smart_predictor.py in add_input(self, x, ypred, contributions)
    195         """
    196         if x is not None:
--> 197             x = self.check_dataset_features(self.check_dataset_type(x))
    198             self.data = self.clean_data(x)
    199             self.data["x_postprocessed"] = self.apply_postprocessing()

c:\users\chetan_ambi\.conda\envs\shapash\lib\site-packages\shapash\explainer\smart_predictor.py in check_dataset_features(self, x)
    293         assert all(column in self.features_types.keys() for column in x.columns)
    294         if not all([str(x[feature].dtypes) == self.features_types[feature] for feature in x.columns]):
--> 295             raise ValueError("Types of features in x doesn't match with the expected one in features_types.")
    296         return x
    297 

ValueError: Types of features in x doesn't match with the expected one in features_types.

Issue#2

It's considering only np.float, np.int. Should we consider adding np.int32, np.float32, np.int64, np.float64

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-24-f4ec8d8babcb> in <module>
----> 1 predictor_load.add_input(x=Xtrain, ypred=ytrain)

c:\users\chetan_ambi\.conda\envs\shapash\lib\site-packages\shapash\explainer\smart_predictor.py in add_input(self, x, ypred, contributions)
    211 
    212         if ypred is not None:
--> 213             self.data["ypred_init"] = self.check_ypred(ypred)
    214 
    215         if contributions is not None:

c:\users\chetan_ambi\.conda\envs\shapash\lib\site-packages\shapash\explainer\smart_predictor.py in check_ypred(self, ypred)
    305             User-specified prediction values.
    306         """
--> 307         return check_ypred(self.data["x"],ypred)
    308 
    309     def choose_state(self, contributions):

c:\users\chetan_ambi\.conda\envs\shapash\lib\site-packages\shapash\utils\check.py in check_ypred(x, ypred)
    135                 raise ValueError("y_pred must be a one column pd.Dataframe or pd.Series.")
    136             if not (ypred.dtypes[0] in [np.float, np.int]):
--> 137                 raise ValueError("y_pred must contain int or float only")
    138         if isinstance(ypred, pd.Series):
    139             if not (ypred.dtype in [np.float, np.int]):

ValueError: y_pred must contain int or float only

Add attribute explainer to SmartExplainer Object

Modify the SmartExplainer object to add explainer attribute
This attribute can be compute by shapash (shap_backend.py)

  • or can be specify by the end user during the compile step (in this can, explainer must be a shap object)

If this user specify an explainer, compute the contributions.

Bug on sphinx documentation

Bug on sphinx documentation :

  • Link to the installations_instructions (for jupyter) isn't available
    image

  • Display issues on the tutorial tutorial03-Shapash-overview-model-in-production
    image

Python version : 3.6/3.7/3.8
Shapash version : 1.1.0
Operating System : Windows

SmartPlotter - scatter plot and violin plot code refactoring

Description of Problem:
Many lines of code are duplicated in plot_scatter and plot_violin methods.
It would be better to create subfunctions to regroup similar lines of codes.
Similar code has been produced for interactions plots, we should try to have same coding approach here.

compare_plot

Add the compare_plot method to the SmartPlotter object. This plot allows the user to see the difference between two or more predictions and to focus on the main criteria that explain this difference.

Switch from SmartPredictor to SmartExplainer

Description of Problem:
In deployment, we are using the SmartPredictor object to get predictions and explainability of our models on specific datasets. The issue here is about the possibility to generate charts to visualize our results after deployment on this new data.

Overview of the Solution:

  • add a new method on SmartPredictor object "to_smartexplainer" to switch from a SmartPredictor Object to a SmartExpxlainer one
    • Start with a check to ensure that the add_input step has been done with the SmartPredictor to get the specific data on which we want to analyse several charts
    • Initialize SmartExplainer Object with attributes from SmartPredictor
    • Compute the compile step of SmartExplainer with attributes and input used by the SmartPredictor for parameters

Examples:
image
image
image
image
image
image

Standalone report

Description:
🚀 We are currently working on a new feature that will produce a standalone report of the user project.

The idea is to deliver a static HTML file that will contain :

  • General information about the project
  • Dataset description
  • Model library, parameters, and other specifications
  • Dataset analysis
  • Global explainability of the model
  • Model performance

This report should constitute an interesting base in the situation of a model audit but also if one wants to share general information about a data science project.

Interaction values - Documentation

Documentation:

  • Add a tutorial notebook for the interaction values (new feature) that explains how to plot and understand interactions between two features.

Interaction Values - SmartExplainer - add get_interaction_values

Description of Problem:

  • It would be interesting to get the interactions existing between two variables and their corresponding contributions. An implementation already exists in shap for TreeExplainer type.

Overview of the Solution:

  • Add a method get_interaction_values to the SmartExplainer Object (allowing other functions of interactions for later developments);
  • Add a function get_shap_interaction_values to shap_backend.py used by the previous method. This function could call the shap.shap_interaction_values method implemented in shap.

SmartPredictor - Predict_proba

develop the 2 methods:

predict_proba():

  • return Pandas.DataFrame with all proba if there is no y_pred data
  • return Pandas.DataFrame with y_pred and the associated probability

detail_contributions():

  • need a y_pred to display contributions
  • return Pandas.DataFrame with y_pred and the associated contributions

smartpredictor - summarize method

Develop the summarize method which summarizes the contributions as specified in the mask_params dict
you should adapt / refactor what be done in filter method of smart_explainer class

output should be a pd.DataFrame, like smart_predictor.to_pandas output, containing

  • input X index
  • ypred
  • proba (classification case)
  • features_n
  • values_n
  • contribution_n

Smartpredictor - predict method

develop predict method the Smartpredictor

This method return the predicted compute using X input.
End user can simply call this method:

smp.add_in(x=Xinput)
smp.predict()
in that case, you have to return values and write data['ypred'] value

The end user can also compute detail contributions without specifying ypred.
In that case, you have de compute data['ypred'] using predict method

plot methods - nbformat error

BugFix: when we use xpl.plot.plot_methods() in JupyterHub,
Error msg:
ValueError: Mime type rendering requires nbformats>=4.2.0 but is not installed

Make changes in setup.py & requirement.dev.txt to define appropriate environment.

Issue with xpl.plot.contribution_plot() method after loading serialized SmartExplainer

Hi -

When running contribution_plot() on SmartExplainer object it works fine. But once you serialize it, load it and then try contribution_plot() it giving below error. I was trying this for Wine Quality dataset and facing this issue.

xpl.save('WineQuality_xpl.pkl')

xpl = SmartExplainer()
xpl.load('WineQuality_xpl.pkl')
xpl.plot.contribution_plot("alcohol")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-17643c0a90cf> in <module>
----> 1 xpl.plot.contribution_plot("alcohol")

c:\users\chetan_ambi\.conda\envs\shapash\lib\site-packages\shapash\explainer\smart_plotter.py in contribution_plot(self, col, selection, label, violin_maxf, max_points, proba, width, height, file_name, auto_open)
   1081 
   1082         # Subset
-> 1083         if self.explainer.postprocessing_modifications:
   1084             feature_values = self.explainer.x_contrib_plot.loc[list_ind, col_name].to_frame()
   1085         else:

AttributeError: 'SmartExplainer' object has no attribute 'postprocessing_modifications'

Python version : 3.8.5
Shapash version : 1.1.0
Operating System : Windows 10 64-bit

ValueError:

Could you help me, can this only handle two categories?

ValueError: Length of list of contributions parameter is not equal to the number of classes in the target. Please check model and contributions parameters.

SmartPredictor - modify_mask

This method allows you to modify the mask_params values
The params are the same than the filter method of smartexplainer (threshold, max_contrib, features_to_hide, positive)

each parameter is optional, modify_mask method modifies only the values ​​of the elements specified in parameters (the others remain unchanged)

Apply Preprocessing

To adapt the preprocessing step to the different user-stories considered for the smart explainer.

develop function(s) apply_preprocessing.
this(ese) functions will allow:

  • dict
  • category_encoders
  • columntrasformer
  • list of category_encoders & columntrasformer

the function(s) will be appeared in the following scripts.
you'll rename them:

  • inverse_category_encoder.py --> category_encoder_backend.py
  • inverse_columntransformer --> columntransformer_backend.py

Defaulting to user installation because normal site-packages is not writeable ERROR: Could not find a version that satisfies the requirement shapash (from versions: none) ERROR: No matching distribution found for shapash

Hey there and thank you for using Issue Tracker!

Do the checklist before filing an issue:

  • Is this something you can debug and fix? Send a pull request ! Bug fixes and documentation fixes are welcome.
  • Have a usage question ? Ask your question on StackOverflow. We use StackOverflow for usage question and GitHub for bugs.

None of the above, create a bug report

Make sure to add all the information needed to understand the bug so that someone can help. If the info is missing we'll add the 'Needs more information' label and close the issue until there is enough information.

  • Provide a minimal code snippet example that reproduces the bug.
  • Provide screenshots where appropriate
  • What's the version of Python you're using ?
  • Are you using Mac, Linux or Windows?

Python version : 3.9.1

Shapash version :

Operating System : Fedora 33 working station

feature/postprocessing

Add a new parameter in the compile step of SmartExplainer: postprocessing
Postprocessing parameter is a dict that specify which transformation user wants to apply to dataset.

The postprocessing dict

{ ‘feature1’ : { ‘type’ : ‘prefix’, ‘rule’ : ‘age: ‘ },
‘feature2’ : { ‘type’ : ‘suffix’, ‘rule’ : ‘$/week ‘ },
‘feature3’ : { ‘type’ : ‘transcoding’, ‘rule‘: { ‘code1’ : ‘single’, ‘code2’ : ‘married’}}
‘feature4’ : { ‘type’ : ‘regex’ , ‘rule‘: { ‘in’ : ‘AND’, ‘out’ : ‘ & ‘ }}
‘feature5’ : { ‘type’ : ‘case’ , ‘rule‘: ‘lower’‘ }}

  • Add a check_postprocessing() method : control features name
  • Develop postprocess functions
  • change the x_pred attribute at the output of compile step

Smartpredictor - to_pickle

Add to_pickle, and load method that permit
to save on disk the smartpredictor object and load object
take inspiration from what has been done for the smartexplainer.
If necessary, refactor the code

nb: the data dict mustn't save in pickle file

Interaction Values - SmartPlotter - add method top_interactions_plot

Description of Problem:

  • Visualizing interactions can take a lot of time as it requires to visualize p^2 plots where p is the number of features, one for each couple of features.
  • The idea of the feature is to have a plot that summarizes the top interactions between two features to visualize.

Overview of the Solution:

  • Add a method in SmartPlotter (top_interactions_plot) that allows the user to visualize the top n interactions between two features.

smartpredictor tutorial

Create a tutorial notebook that explains how to use smartpredictor to deploy the compute and the summarize of your explanability.

The different step

SmartExplainer to Smartpredictor
write to pickle file
load pickle file

make a simple prediction (API: dict imput)
the batch mode

Problem with XGBoost contributions computation

Code

import numpy as np
import pandas as pd

import xgboost

import shap
from shapash.explainer.smart_explainer import SmartExplainer

X,y = shap.datasets.nhanesi()
X_display,y_display = shap.datasets.nhanesi(display=True) # human readable feature values
y = np.array(y)
X = X.drop('Unnamed: 0', axis=1)

xgb_train = xgboost.DMatrix(X, label=y)

params_train = {
    "eta": 0.002,
    "max_depth": 3,
    "objective": "survival:cox",
    "subsample": 0.5,
}

model = xgboost.train(params_train, xgb_train, num_boost_round=5)

# Smart explainer creation
xpl = SmartExplainer()
xpl.compile(
    x=X,
    model=model
)

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-8abc6cc3f74d> in <module>
     28 # Smart explainer creation
     29 xpl = SmartExplainer()
---> 30 xpl.compile(
     31     x=X,
     32     model=model,

~/.local/lib/python3.8/site-packages/shapash/explainer/smart_explainer.py in compile(self, x, model, explainer, contributions, y_pred, preprocessing, postprocessing)
    192             raise ValueError("You have to specify just one of these arguments: explainer, contributions")
    193         if contributions is None:
--> 194             contributions, explainer = shap_contributions(model, self.x_init, self.check_explainer(explainer))
    195         adapt_contrib = self.adapt_contributions(contributions)
    196         self.state = self.choose_state(adapt_contrib)

~/.local/lib/python3.8/site-packages/shapash/utils/shap_backend.py in shap_contributions(model, x_df, explainer)
     55 
     56     if str(type(model)) not in list(sum((simple_tree_model,catboost_model,linear_model,svm_model),())):
---> 57         raise ValueError(
     58             """
     59             model not supported by shapash, please compute contributions

ValueError: 
            model not supported by shapash, please compute contributions
            by yourself before using shapash

Hint:

str(type(model))
"<class 'xgboost.core.Booster'>"

Python version : 3.8

Shapash version : 1.1.0
XGBoost version : 1.0.0

Operating System : Linux

Interaction Values - SmartPlotter - add method plot_scatter_interactions

Description of Problem:

  • Implement a plot to observe interactions between two variables and their corresponding shap values.
    See here for more information about what is already implemented in shap : https://slundberg.github.io/shap/notebooks/NHANES%20I%20Survival%20Model.html
    The goal here would be to reproduce the scatter plots in plotly to have better understanding of the interactions between two variables, which is not possible in shapash right now.

Overview of the Solution:
The plot_scatter_interactions method could :

  • Re-use the plot_scatter function of the SmartPlotter object
  • Use the get_interaction_values method of the SmartExplainer object (see #120 )

Shapash available with python 3.8

tried to install with: pip install shapash

Error: Could not find a version that satisfies the requirement shamash (from versions: none)
Error: No matching distribution found for shapash

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.