Comments (8)
@jayden526 SHAP values work well with regressions, in fact the Boston housing example in the read-me is a least squares regression. The SHAP values are in the same units as the model output (for tree SHAP in XGBoost this is before the link function (such as a logistic). So if you are predicting dollars, then the units of the SHAP values will be in dollars and will sum to the output of the model.
As for the error, if there is a simple example of how you got it, please post it and I'll fix it.
FYI...If you are using a tree model I would suggest using XGBoost and getting the exact shap values vs using the model agnostic Kernel SHAP on scikit.
from shap.
I've used shap and summary plot for the house list price problem before, which is a regression, and the explanations work just fine, and adjust to what I would expect from a logical standpoint. For example, construction area, distance to certain places of interest, and house geographical sector were all top features. I don't have the plot at hand, but a mini app that uses an XGBoost model for house list price prediction (at least in my city), is available in my profile, albeit with some fixes that I need to do for it.
From what I've understood, the shapley values for each feature is the same as a weight or coefficient, like in regression.There's also the bias or intercept. This bias is the base value for the predictions of the model, for example, the average price of all houses in the dataset. For a single data point, each coefficient represents the impact of the feature on the final prediction. These coefficients and intercept are added, then the sigmoid function is applied to the result of the sum. The result of the sigmoid function is the prediction that the original model gave, which is a probability between 0 and 1. For regression models, the process is the same, except that the sigmoid step is skipped, since the output isn't between 0 and 1, but continuous.
@slundberg Can give you better details though, so you should wait for his output.
from shap.
Thank you @JuanCorp, I think you are right. Even for classification the log odds needs to be computed in order to find the probability. The syntax I tried is referred to the classification example:
shap_values = shap.KernelExplainer(randomforest.predict, X_train).shap_values(X_test)
shap.summary_plot(shap_values, X_test)
is this the same as yours? at least now I can get the shap values.
@slundberg Would you mind to clarify the shap_values in regressions? If it is already mentioned in your paper, please let me know, I can check that! thank you.
from shap.
Sorry for asking again, I sometimes have runtime error when I used different number of samples in my X_test (sometimes is ok, sometimes if I only use 100 sample of the test, this error occurs),
Exception in thread Thread-15
RuntimeError: Set changed size during iteration
Could you help me with this? Thank you!
from shap.
@slundberg Thank you so much! I will definitely try with Xgboost to see whether it works for me.
from shap.
sounds good
from shap.
@slundberg Hi, thanks for the great package! I am not getting how to use my own dataset with shap? What is the use of *shap.dataset and how can I use my own datasets in the form of (X, y) with SHAP? Thanks :)
from shap.
Do you have a model and a dataset or just a dataset representing the output of the model? Perhaps clarifying what doesn't make sense about the examples in the README would be helpful.
from shap.
Related Issues (20)
- BUG: Unexpected Interaction Plot Instead of Summary Plot in Multiclass SHAP Summary with XGBoost HOT 3
- BUG: Workflow failure on macOS when building 'lightgbm'
- ENH: expose raw feature categories in shap.plots.bar HOT 1
- ENH: Winter Values HOT 4
- [Meta-issue] Release 0.45.1 HOT 5
- BUG: [in version 0.44.0 and 0.45.0] UserWarning: unrecognized nn.Module: Flatten HOT 1
- BUG: 'tuple' object has no attribute 'as_list' HOT 3
- BUG: unable to use SHAP with CUDA support on Windows10/11 machine for calculating Shapley values HOT 1
- Support for KerasClassifier HOT 1
- BUG: Failed in nopython mode pipeline (step: native lowering) float16 HOT 4
- Questions: question about SamplingExplainer HOT 1
- BUG: SHAP values calculated using CPU differ from SHAP values calculated using GPU HOT 5
- BUG: TypeError: ufunc 'isfinite' not supported for the input types HOT 4
- BUG: Additivity check failed HOT 1
- When plotting the shap text it is showing an extra letter(Ġ) before every word. HOT 1
- Demangle pytorch and tensorflow dependencies
- ENH: Python 3.13 Support
- TypeError: In v0.20, force plot now requires the base value as the first parameter! Try shap.plots.force(explainer.expected_value, shap_values) or for multi-output models try shap.plots.force(explainer.expected_value[0], shap_values[0]). HOT 1
- Key not found with shap.TreeExplainer and XGBRegressor
- BUG: Unable to Generate SHAP values for a dataframe containing text data trained on lstm model
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from shap.