maif / shapash Goto Github PK

🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models

Home Page: https://maif.github.io/shapash/

License: Apache License 2.0

Makefile 0.03% Python 15.63% JavaScript 0.01% CSS 0.07% Jupyter Notebook 84.16% HTML 0.06% Jinja 0.04%

python machine-learning explainability explainable-ml transparency ethical-artificial-intelligence shap lime interpretability

shapash's People

Contributors

Stargazers

Watchers

Forkers

minalspatil adbmd kokizzu gyanachand1 djouany smitakshigupta maxcodextc stjordanis tgaston nicolizamacorrea eng-rsmy afmiguez shekhawatmeenu18 hikkav maxgdr srinivasgutta7 teja788 vimcoper jordanvrtanoski sandy4321 juanlp yg79 coloratto sechan9999 deeplearning2012 thomasbouche johannmartin95 hessamolya sergejhorvat lara2020-coder nemeur anhmike albertosnchz ehabqino guillaume-vignal micseb ssitb abera07 gunjanrt04 apv-8 carterrees fagrahmed12 mattiya knut0815 sunny11286 cscookie thibaudreal kraw rsmahabir donkas axelfouquet ssusantachary gayathrikumari vhshemanth silviodc manikant92 sairagillani18k weiplanet amit2016-17 joskid gap01 jokorosid milan-chicago a3digit billy-horn dinaabdelrahman oussamaabb yumeone ragrawal plaban1981 uliang alexispetit bwallyn saralkarki teddovanmierle sankhe-llc rcmoraleshernandez shnchr antoinedelanglois saisiddu andrefmb aoelvp94 guptaarha ioioba aagarwal937 joalmjoalm happyman11 jbdatascience sowmiyanm odedlur yazangharaibeh vinodsangare nvaldebenitomolina oldforks leonkei amirhossein-nourbakhsh fadelerwin mayankbaluni alangrosso ian-njuguna11

shapash's Issues

Persist postprocessing and preprocessing parameters

We need postprocessing and postprocessing attributes in order to use them during predict step (SmartPredictor)
Persist these 2 arguments in postprocessing and preprocessing attributes

modify the tests to integrate these changes

create to_smartpredictor method

In order to make easy the declaration of a SmartPredictor object,
develop a method to_smartpredictor associate to SmartExplainer

end user syntax to switch from a SmartExplainer to SmartPredictor
xpred = xpl.to_smartpredictor()

Update sphinx doc - Shapash 2.0

define the rst files and specify the docstring to display in sphinx doc

Initialize SmartPredictor Object

Initialize the SmartObject that takes the following SmartExplainer attributes:
features_dict
label_dict
model
_case
_classes
columns_dict
postprocessing
preprocessing

create the associated python unit test script

SmartExplainer - Run App Without ypred

Description of Problem:
ypred is an optional parameter to the compile method of smartexplainer
however ypred is needed to use the run_app method
--> Shapash must compute the prediction by itself if the user wants to run the app

Overview of the Solution:
smartexplainer.py - run_app() method: add ypred check, if None then compute ypred using predict() method
smartexplainer.py - add predict() method: refer to predict function
model.py - add predict function

WebApp - Add title

Description of Problem:
Add a title at the top of Shapash WebApp

Overview of the Solution:

add an attribute 'model_title' to the SmartExplainer Object
model_title could be specified by Shapash User using compile() or add() method (add a new parameter)
In the App code: manage truncation of a too long title (there is already a function in util.py to manage that kind of problem)

SmartPredictor add_input

create the add_input method,

the fallowing parameters
x : optional (if x attribute allready exists, you don't need to overwrite it) pd.DataFrame
ypred: optional pd.DataFrame if _case = "classification"
contributions: Pd.Dataframe or list of DataFrame

the add_input do the following steps;

initialize ypred, contributions, x if x is specify
check x structure (shape, column types, names)
ordering columns
check ypred: shape and values
check contributions (shape, column types, names)
apply preprocessing

SmartPredictor - Check Consistency OneHotEncoder

Authorize the use of One hot encoder in SmartPredictor.

ERROR: Could not find a version that satisfies the requirement shapash

Hey there and thank you for using Issue Tracker!

Do the checklist before filing an issue:

Is this something you can debug and fix? Send a pull request ! Bug fixes and documentation fixes are welcome.
Have a usage question ? Ask your question on StackOverflow. We use StackOverflow for usage question and GitHub for bugs.

None of the above, create a bug report

Make sure to add all the information needed to understand the bug so that someone can help. If the info is missing we'll add the 'Needs more information' label and close the issue until there is enough information.

Provide a minimal code snippet example that reproduces the bug.
Provide screenshots where appropriate
What's the version of Python you're using ?
Are you using Mac, Linux or Windows?

Python version :

Shapash version :

Operating System :

AttributeError when saving then loading SmartExplainer object

Code to reproduce the error :

xpl = SmartExplainer()
xpl.compile(
    x=Xtest,
    model=regressor,
    preprocessing=encoder, 
    y_pred=y_pred
)
# Saving compiled explainer object
xpl.save('smart_explainer.pickle')

# Loading saved explainer in a new explainer object and displaying contribution plot
xpl2 = SmartExplainer()
xpl2.load('smart_explainer.pickle')
xpl2.plot.contribution_plot(col='1stFlrSF')

Error :

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-f33554cc0550> in <module>
----> 1 xpl2.plot.contribution_plot(col='1stFlrSF')

~/Documents/missions/shapash/shapash/shapash/explainer/smart_plotter.py in contribution_plot(self, col, selection, label, violin_maxf, max_points, proba, width, height, file_name, auto_open)
   1127 
   1128         # Subset
-> 1129         if self.explainer.postprocessing_modifications:
   1130             feature_values = self.explainer.x_contrib_plot.loc[list_ind, col_name].to_frame()
   1131         else:

AttributeError: 'SmartExplainer' object has no attribute 'postprocessing_modifications'

Python version : 3.8

Shapash version : 1.1.0

Operating System : Mac OS

Switch from SmartPredictor to SmartExplainer

Hi Team,

Is it possible to generate local explanations on the new data added to SmartPredictor object? I went through all the tutorials and I understand that when we are satisfied with the explainability results given by Shapash, we can use the SmartPredictor object for deployment. But my question is how can we get local explanation chart after deployment on the new data that is coming in? I hope my question is clear.

Thanks,
Chetan

SmartPredictor init - Check Model - Explainer - Features

the initialization of the SmartPredictor, must check the consistency of the attributes model, explainer, preprocessing, postprocessing, features_dict, columns_dict, features_types

develop the differents functions / methods to check that consistency

SmartPredictor - add explainer attribute

add explainer in SmartPredictor (init)
Modify the to_smartpredictor method to persist this explainer

SmartPredictor - consistency check - ColumnTransformers

add a consistency check for ColumnTransformers - input features

Factorisation - smartpredictor test

Group the Smartpredictor objects initialization for the tests in a dedicated methode
each test just check the method to which it refers

predictor_load.add_input method not working

Thanks for your wonderful work on this !!

While exploring tutorial03-Shapash-overview-model-in-production.ipynb notebook I am running into below errors.

Issue# 1
When executing predictor_load.add_input(x=Xtrain, ypred=ytrain) method. It's clear from the error message that Xtrain's dtypes are not matching with original dataset's dtypes. The features_types has dtypes before preprocessing and Xtrain is having after preprocessing dtypes. So both will never match.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-74-f4ec8d8babcb> in <module>
----> 1 predictor_load.add_input(x=Xtrain, ypred=ytrain)

c:\users\chetan_ambi\.conda\envs\shapash\lib\site-packages\shapash\explainer\smart_predictor.py in add_input(self, x, ypred, contributions)
    195         """
    196         if x is not None:
--> 197             x = self.check_dataset_features(self.check_dataset_type(x))
    198             self.data = self.clean_data(x)
    199             self.data["x_postprocessed"] = self.apply_postprocessing()

c:\users\chetan_ambi\.conda\envs\shapash\lib\site-packages\shapash\explainer\smart_predictor.py in check_dataset_features(self, x)
    293         assert all(column in self.features_types.keys() for column in x.columns)
    294         if not all([str(x[feature].dtypes) == self.features_types[feature] for feature in x.columns]):
--> 295             raise ValueError("Types of features in x doesn't match with the expected one in features_types.")
    296         return x
    297 

ValueError: Types of features in x doesn't match with the expected one in features_types.

Issue#2

It's considering only np.float, np.int. Should we consider adding np.int32, np.float32, np.int64, np.float64

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-24-f4ec8d8babcb> in <module>
----> 1 predictor_load.add_input(x=Xtrain, ypred=ytrain)

c:\users\chetan_ambi\.conda\envs\shapash\lib\site-packages\shapash\explainer\smart_predictor.py in add_input(self, x, ypred, contributions)
    211 
    212         if ypred is not None:
--> 213             self.data["ypred_init"] = self.check_ypred(ypred)
    214 
    215         if contributions is not None:

c:\users\chetan_ambi\.conda\envs\shapash\lib\site-packages\shapash\explainer\smart_predictor.py in check_ypred(self, ypred)
    305             User-specified prediction values.
    306         """
--> 307         return check_ypred(self.data["x"],ypred)
    308 
    309     def choose_state(self, contributions):

c:\users\chetan_ambi\.conda\envs\shapash\lib\site-packages\shapash\utils\check.py in check_ypred(x, ypred)
    135                 raise ValueError("y_pred must be a one column pd.Dataframe or pd.Series.")
    136             if not (ypred.dtypes[0] in [np.float, np.int]):
--> 137                 raise ValueError("y_pred must contain int or float only")
    138         if isinstance(ypred, pd.Series):
    139             if not (ypred.dtype in [np.float, np.int]):

ValueError: y_pred must contain int or float only

SmartExplainer - name ypred column

In orger To launch webapp,
name column of ypred pdDataframe.

Smartpredictor - allow the use of Onehotencoders

Allow the use of Onehotencoders of category_encoders and columntransformers

Add attribute explainer to SmartExplainer Object

Modify the SmartExplainer object to add explainer attribute
This attribute can be compute by shapash (shap_backend.py)

or can be specify by the end user during the compile step (in this can, explainer must be a shap object)

If this user specify an explainer, compute the contributions.

WebApp - Sort Features in the "Feature(s) to mask :" list

Description of Problem:

Overview of the Solution:
Use the values of the features_dict attribute of SmartExplainer Object
sort in alphabetical order

Examples:

Blockers:

Definition of Done:

Bug on sphinx documentation

Bug on sphinx documentation :

Link to the installations_instructions (for jupyter) isn't available
Display issues on the tutorial tutorial03-Shapash-overview-model-in-production

Python version : 3.6/3.7/3.8
Shapash version : 1.1.0
Operating System : Windows

SmartPlotter - scatter plot and violin plot code refactoring

Description of Problem:
Many lines of code are duplicated in plot_scatter and plot_violin methods.
It would be better to create subfunctions to regroup similar lines of codes.
Similar code has been produced for interactions plots, we should try to have same coding approach here.

version upgrade - dash==1.17.0 - plotly==4.12.0

compare_plot

Add the compare_plot method to the SmartPlotter object. This plot allows the user to see the difference between two or more predictions and to focus on the main criteria that explain this difference.

Switch from SmartPredictor to SmartExplainer

Description of Problem:
In deployment, we are using the SmartPredictor object to get predictions and explainability of our models on specific datasets. The issue here is about the possibility to generate charts to visualize our results after deployment on this new data.

Overview of the Solution:

add a new method on SmartPredictor object "to_smartexplainer" to switch from a SmartPredictor Object to a SmartExpxlainer one
- Start with a check to ensure that the add_input step has been done with the SmartPredictor to get the specific data on which we want to analyse several charts
- Initialize SmartExplainer Object with attributes from SmartPredictor
- Compute the compile step of SmartExplainer with attributes and input used by the SmartPredictor for parameters

Examples:

SmartExplainer - add attribute x_postprocess

To standardize the operation of the smartexplainer with the smartpredictor, store the x postprocess in a dedicated attribute
modify graphical outputs and pandas exports

Standalone report

Description:
🚀 We are currently working on a new feature that will produce a standalone report of the user project.

The idea is to deliver a static HTML file that will contain :

General information about the project
Dataset description
Model library, parameters, and other specifications
Dataset analysis
Global explainability of the model
Model performance

This report should constitute an interesting base in the situation of a model audit but also if one wants to share general information about a data science project.

Interaction values - Documentation

Documentation:

Add a tutorial notebook for the interaction values (new feature) that explains how to plot and understand interactions between two features.

Interaction Values - SmartExplainer - add get_interaction_values

Description of Problem:

It would be interesting to get the interactions existing between two variables and their corresponding contributions. An implementation already exists in shap for TreeExplainer type.

Overview of the Solution:

Add a method get_interaction_values to the SmartExplainer Object (allowing other functions of interactions for later developments);
Add a function get_shap_interaction_values to shap_backend.py used by the previous method. This function could call the shap.shap_interaction_values method implemented in shap.

SmartPredictor - Predict_proba

develop the 2 methods:

predict_proba():

return Pandas.DataFrame with all proba if there is no y_pred data
return Pandas.DataFrame with y_pred and the associated probability

detail_contributions():

need a y_pred to display contributions
return Pandas.DataFrame with y_pred and the associated contributions

smartpredictor - summarize method

Develop the summarize method which summarizes the contributions as specified in the mask_params dict
you should adapt / refactor what be done in filter method of smart_explainer class

output should be a pd.DataFrame, like smart_predictor.to_pandas output, containing

input X index
ypred
proba (classification case)
features_n
values_n
contribution_n

Modify transform_ct -

Use the dict of distinct models to select features_name

Smartpredictor - predict method

develop predict method the Smartpredictor

This method return the predicted compute using X input.
End user can simply call this method:

smp.add_in(x=Xinput)
smp.predict()
in that case, you have to return values and write data['ypred'] value

The end user can also compute detail contributions without specifying ypred.
In that case, you have de compute data['ypred'] using predict method

plot methods - nbformat error

BugFix: when we use xpl.plot.plot_methods() in JupyterHub,
Error msg:
ValueError: Mime type rendering requires nbformats>=4.2.0 but is not installed

Make changes in setup.py & requirement.dev.txt to define appropriate environment.

Issue with xpl.plot.contribution_plot() method after loading serialized SmartExplainer

Hi -

When running contribution_plot() on SmartExplainer object it works fine. But once you serialize it, load it and then try contribution_plot() it giving below error. I was trying this for Wine Quality dataset and facing this issue.

xpl.save('WineQuality_xpl.pkl')

xpl = SmartExplainer()
xpl.load('WineQuality_xpl.pkl')
xpl.plot.contribution_plot("alcohol")

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-17643c0a90cf> in <module>
----> 1 xpl.plot.contribution_plot("alcohol")

c:\users\chetan_ambi\.conda\envs\shapash\lib\site-packages\shapash\explainer\smart_plotter.py in contribution_plot(self, col, selection, label, violin_maxf, max_points, proba, width, height, file_name, auto_open)
   1081 
   1082         # Subset
-> 1083         if self.explainer.postprocessing_modifications:
   1084             feature_values = self.explainer.x_contrib_plot.loc[list_ind, col_name].to_frame()
   1085         else:

AttributeError: 'SmartExplainer' object has no attribute 'postprocessing_modifications'

Python version : 3.8.5
Shapash version : 1.1.0
Operating System : Windows 10 64-bit

webapp_lauch - shap_backend reference

compute_contribution - to shap backend

ValueError:

Could you help me, can this only handle two categories?

ValueError: Length of list of contributions parameter is not equal to the number of classes in the target. Please check model and contributions parameters.

SmartPredictor - modify_mask

This method allows you to modify the mask_params values
The params are the same than the filter method of smartexplainer (threshold, max_contrib, features_to_hide, positive)

each parameter is optional, modify_mask method modifies only the values of the elements specified in parameters (the others remain unchanged)

Apply Preprocessing

To adapt the preprocessing step to the different user-stories considered for the smart explainer.

develop function(s) apply_preprocessing.
this(ese) functions will allow:

dict
category_encoders
columntrasformer
list of category_encoders & columntrasformer

the function(s) will be appeared in the following scripts.
you'll rename them:

inverse_category_encoder.py --> category_encoder_backend.py
inverse_columntransformer --> columntransformer_backend.py

Defaulting to user installation because normal site-packages is not writeable ERROR: Could not find a version that satisfies the requirement shapash (from versions: none) ERROR: No matching distribution found for shapash

Hey there and thank you for using Issue Tracker!

Do the checklist before filing an issue:

Is this something you can debug and fix? Send a pull request ! Bug fixes and documentation fixes are welcome.
Have a usage question ? Ask your question on StackOverflow. We use StackOverflow for usage question and GitHub for bugs.

None of the above, create a bug report

Provide a minimal code snippet example that reproduces the bug.
Provide screenshots where appropriate
What's the version of Python you're using ?
Are you using Mac, Linux or Windows?

Python version : 3.9.1

Shapash version :

Operating System : Fedora 33 working station

feature/postprocessing

Add a new parameter in the compile step of SmartExplainer: postprocessing
Postprocessing parameter is a dict that specify which transformation user wants to apply to dataset.

The postprocessing dict

{ ‘feature1’ : { ‘type’ : ‘prefix’, ‘rule’ : ‘age: ‘ },
‘feature2’ : { ‘type’ : ‘suffix’, ‘rule’ : ‘$/week ‘ },
‘feature3’ : { ‘type’ : ‘transcoding’, ‘rule‘: { ‘code1’ : ‘single’, ‘code2’ : ‘married’}}
‘feature4’ : { ‘type’ : ‘regex’ , ‘rule‘: { ‘in’ : ‘AND’, ‘out’ : ‘ & ‘ }}
‘feature5’ : { ‘type’ : ‘case’ , ‘rule‘: ‘lower’‘ }}

Add a check_postprocessing() method : control features name
Develop postprocess functions
change the x_pred attribute at the output of compile step

Smartpredictor - to_pickle

Add to_pickle, and load method that permit
to save on disk the smartpredictor object and load object
take inspiration from what has been done for the smartexplainer.
If necessary, refactor the code

nb: the data dict mustn't save in pickle file

Fix WebApp bug - Bug for dataset with string Index

Can't launch webapp if the index of my input pd.DataFrame is a string

Interaction Values - SmartPlotter - add method top_interactions_plot

Description of Problem:

Visualizing interactions can take a lot of time as it requires to visualize p^2 plots where p is the number of features, one for each couple of features.
The idea of the feature is to have a plot that summarizes the top interactions between two features to visualize.

Overview of the Solution:

Add a method in SmartPlotter (top_interactions_plot) that allows the user to visualize the top n interactions between two features.

smartpredictor tutorial

Create a tutorial notebook that explains how to use smartpredictor to deploy the compute and the summarize of your explanability.

The different step

SmartExplainer to Smartpredictor
write to pickle file
load pickle file

make a simple prediction (API: dict imput)
the batch mode

Problem with XGBoost contributions computation

Code

import numpy as np
import pandas as pd

import xgboost

import shap
from shapash.explainer.smart_explainer import SmartExplainer

X,y = shap.datasets.nhanesi()
X_display,y_display = shap.datasets.nhanesi(display=True) # human readable feature values
y = np.array(y)
X = X.drop('Unnamed: 0', axis=1)

xgb_train = xgboost.DMatrix(X, label=y)

params_train = {
    "eta": 0.002,
    "max_depth": 3,
    "objective": "survival:cox",
    "subsample": 0.5,
}

model = xgboost.train(params_train, xgb_train, num_boost_round=5)

# Smart explainer creation
xpl = SmartExplainer()
xpl.compile(
    x=X,
    model=model
)

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-8abc6cc3f74d> in <module>
     28 # Smart explainer creation
     29 xpl = SmartExplainer()
---> 30 xpl.compile(
     31     x=X,
     32     model=model,

~/.local/lib/python3.8/site-packages/shapash/explainer/smart_explainer.py in compile(self, x, model, explainer, contributions, y_pred, preprocessing, postprocessing)
    192             raise ValueError("You have to specify just one of these arguments: explainer, contributions")
    193         if contributions is None:
--> 194             contributions, explainer = shap_contributions(model, self.x_init, self.check_explainer(explainer))
    195         adapt_contrib = self.adapt_contributions(contributions)
    196         self.state = self.choose_state(adapt_contrib)

~/.local/lib/python3.8/site-packages/shapash/utils/shap_backend.py in shap_contributions(model, x_df, explainer)
     55 
     56     if str(type(model)) not in list(sum((simple_tree_model,catboost_model,linear_model,svm_model),())):
---> 57         raise ValueError(
     58             """
     59             model not supported by shapash, please compute contributions

ValueError: 
            model not supported by shapash, please compute contributions
            by yourself before using shapash

Hint:

str(type(model))
"<class 'xgboost.core.Booster'>"

Python version : 3.8

Shapash version : 1.1.0
XGBoost version : 1.0.0

Operating System : Linux

Smartpredictor - Postprocessing

integrate post process functionality

Interaction Values - SmartPlotter - add method plot_scatter_interactions

Description of Problem:

Implement a plot to observe interactions between two variables and their corresponding shap values.
See here for more information about what is already implemented in shap : https://slundberg.github.io/shap/notebooks/NHANES%20I%20Survival%20Model.html
The goal here would be to reproduce the scatter plots in plotly to have better understanding of the interactions between two variables, which is not possible in shapash right now.

Overview of the Solution:
The plot_scatter_interactions method could :

Re-use the plot_scatter function of the SmartPlotter object
Use the get_interaction_values method of the SmartExplainer object (see #120 )

Shapash available with python 3.8

tried to install with: pip install shapash

Error: Could not find a version that satisfies the requirement shamash (from versions: none)
Error: No matching distribution found for shapash

update Shap==0.37.0

select appropriate version of packages of requirement.dev.txt

maif / shapash Goto Github PK

shapash's People

Contributors

Stargazers

Watchers

Forkers

shapash's Issues

Recommend Projects

Recommend Topics

Recommend Org