reiinakano / scikit-plot Goto Github PK
View Code? Open in Web Editor NEWAn intuitive library to add plotting functionality to scikit-learn objects.
License: MIT License
An intuitive library to add plotting functionality to scikit-learn objects.
License: MIT License
It would be best to add ability for the user to select which curves they want to display. E.g some people might not want to display the macro and micro averaged curves, display only a specific class' ROC curve.
This approach could then be extended to Precision Recall curves as well
EDIT: ROC Curves now has curves
argument thanks to @doug-friedman
When trying to install scikit-plot with pip3 I got this error:
Collecting scikit-plot
Downloading scikit-plot-0.1.dev3.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/9g/pv05gc7d5zvb2n_pr_0tmn8r0000gn/T/pip-build-6c8p8iqs/scikit-plot/setup.py", line 9, in <module>
import scikitplot
File "/private/var/folders/9g/pv05gc7d5zvb2n_pr_0tmn8r0000gn/T/pip-build-6c8p8iqs/scikit-plot/scikitplot/__init__.py", line 5, in <module>
from scikitplot.classifiers import classifier_factory
File "/private/var/folders/9g/pv05gc7d5zvb2n_pr_0tmn8r0000gn/T/pip-build-6c8p8iqs/scikit-plot/scikitplot/classifiers.py", line 9, in <module>
from sklearn.model_selection import learning_curve
ModuleNotFoundError: No module named 'sklearn'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/9g/pv05gc7d5zvb2n_pr_0tmn8r0000gn/T/pip-build-6c8p8iqs/scikit-plot/
My guess is just that classifiers.py just doesn't import sklearn.
skplot 0.3.5
matplotlib 2.2.3
skplt.metrics.plot_confusion_matrix(y_test, prediction)
/home/chris/.local/lib/python3.5/site-packages/matplotlib/cbook/deprecation.py:107: MatplotlibDeprecationWarning: Passing one of 'on', 'true', 'off', 'false' as a boolean is deprecated; use an actual boolean (True/False) instead.
warnings.warn(message, mplDeprecation, stacklevel=1)
Hi there,
I am getting an error installing it
pip install scikit-plot ~ 1
Collecting scikit-plot
Downloading scikit-plot-0.2.1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\setup.py", line 9, in <module>
import scikitplot
File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\scikitplot\__init__.py", line 5, in <module>
from scikitplot.classifiers import classifier_factory
File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\scikitplot\classifiers.py", line 7, in <module>
from scikitplot import plotters
File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\scikitplot\plotters.py", line 9, in <module>
from sklearn.metrics import confusion_matrix
ImportError: No module named sklearn.metrics
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\
Hello I have installed scikitplot (version 0.3.1) from pip and get the following warning when I plot a confusion matrix plot
C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\site-packages\matplotlib\cbook\deprecation.py:106: MatplotlibDeprecationWarning: The spectral and spectral_r colormap was deprecated in version 2.0. Use nipy_spectral and nipy_spectral_r instead.
warnings.warn(message, mplDeprecation, stacklevel=1)
Cheers,
Nikos
The current method of appending plotting methods to scikit-learn objects may feel a little restrictive. Work is currently ongoing to develop a "Functions" API where stand-alone functions are exposed for maximum flexibility and compatibility with even non-scikit-learn objects. The current API will then need to be refactored to use the "Functions" API to prevent redundancy.
Add Jupyter notebook examples for metrics.plot_cumulative_gain
and metrics.plot_lift_curve
Noticed most of the tests use unittest
.
Is there any interest in porting this over to pytest
eventually? This gives several benefits such as parametrization, monkey patching whist maintaining compatibility with unittest
and thus allowing for a gradual overhaul.
The added benefits of this is that we would be more in line with the testing practices used by scikit-learn
and thus increase compatibility between the two libraries!
For binary classification, when I input numpy arrays having test label and test probabilities, it throws the following error :
y_true = np.array(ytest)
y_probas = np.array(p_test)
skplt.metrics.plot_roc_curve(y_true,y_probas)
plt.show()
IndexError Traceback (most recent call last)
<ipython-input-49-1b02f082006a> in <module>()
----> 1 skplt.metrics.plot_roc_curve(y_true,y_probas)
2 plt.show()
/Users/tarun/anaconda/envs/gl-env/lib/python2.7/site-packages/scikitplot/metrics.pyc in plot_roc_curve(y_true, y_probas, title, curves, ax, figsize, cmap, title_fontsize, text_fontsize)
247 roc_auc = dict()
248 for i in range(len(classes)):
--> 249 fpr[i], tpr[i], _ = roc_curve(y_true, probas[:, i],
250 pos_label=classes[i])
251 roc_auc[i] = auc(fpr[i], tpr[i])
IndexError: too many indices for array
Hi there,
I was wondering if there is a way of defining the digit numerical precision of values such as roc_auc.
To see what I mean, let me point you to sklearn
API such as for Classification Report, where the parameter digits
defines to what precision the values are presented.
This is specially important, for example, when one is training classifiers that are already in the top, say, +99.5% of accuracy/precision/recall/auc and we want to study differences amongst classifiers that are competing at the 0.1% level.
Namely I noticed that digit precision is not consistent throughout scikit-plot
, where roc_auc
is presenting three digit precision, whil precision_recall
is presenting four digit precision.
As you can imagine, for scientific publication purposes it's a bit inelegant to present bound metrics with different precision.
Thanks!
Hey,
It would be cool if this worked in a virtual environment.
It's generally possible by using a different matplotlib backend, such as 'AGG'. This would only allow to save the plot's as figures though (I think).
The specific error I get when installing through virtualenv is:
RuntimeError: Python is not installed as a framework.
The Mac OS X backend will not be able to function correctly if Python is not installed as a framework.
See the Python documentation for more information on installing Python as a framework on Mac OS X.
Please either reinstall Python as a framework, or try one of the other backends.
If you are using (Ana)Conda please install python.app and replace the use of 'python' with 'pythonw'.
See 'Working with Matplotlib on OSX' in the Matplotlib FAQ for more information.
scikit-plot's so amazing, I decided to make a conda-forge recipe for it! It should up available via conda-forge within a few hours.
I would like to control the color in plot_cumulative_gain and think that like pandas these could be pass through arguments.
The following methods return wrong colour ranges for plotting:
skplt.plot_pca_2d_projection(pca, X, y)
plt.show()
Returns a single colour for all classes.
skplt.plot_precision_recall_curve(y_true=y, y_probas=probas)
plt.show()
Returns repetition of colours for large classes.
Hello,
I am using cross-validation with a particular metric, Kappa score, rather than the standard accuracy metric.
cross_val_score(clf, x_train, y_train, scoring=kappa_scorer, cv=kf, n_jobs=-1)
I would like to to set the CV done inside the plot_learning_curve
method for each set of train_sizes
to use the Kappa Scorer and not the accuracy score. I would also like to use the Kappa Scorer to evaluate the models performance for the training set. Is there any way to set this in the plot_learning_curve
method ?
Hi @reiinakano,
Thank you for this great repo! I am using plot_confusion_matrix()
but my counts are quite large so the overlaid counts end up overlapping each other and result in a cluttered plot. I was wondering if I could submit a pull request to update this function to add a hide_counts
parameter to give the option to not plot the counts? I've already forked and created a branch with the changes. Thank you!
pred = clf.predict_proba(data_test)
skplt.plot_roc_curve(target_test, pred)
plt.show()
it's result is 0.81
pred_y = clf.predict(data_test)
false_positive_rate, true_positive_rate, thresholds = roc_curve(target_test, pred_y)
roc_auc = auc(false_positive_rate, true_positive_rate)
plt.title('Receiver Operating Characteristic')
plt.plot(false_positive_rate, true_positive_rate, 'b', label='AUC = %0.2f'% roc_auc)
plt.legend(loc='lower right')
plt.plot([0,1],[0,1],'r--')
plt.xlim([0,1.0])
plt.ylim([0,1.0])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()
it's result is 0.72
The problem I encountered is bi-class classification not multi-class classification.
Thank you for your help!
@reiinakano Hi, I have just found that the project of scikit-plot is really helpful for those what to do the data analysis or machine learning, and I use it much. During my working time, I find that for the parameter choosen maybe also be plot for visualization. I have writen a new method for plot the cross-validation for a appointed parameter. I want to create a new branch added the new added method. Is that ok?
Plots such as plot_roc_curve
must be able to take any array-like objects. As of today, they only take numpy arrays as input, otherwise an exception is raised. This numpy array conversion must be done inside the function itself.
Example:
skplt.plot_roc_curve([0, 1], [[0.2, 0.8], [0.8, 0.2]])
does not work while
skplt.plot_roc_curve(np.array([0, 1]), np.array([[0.2, 0.8], [0.8, 0.2]]))
does
I'm concerned about the how long it takes given that the model must be trained within the wrapper for the Factory API. The plot_learning_curve method has an n_jobs parameter, but plot_precision_recall_curve doesn't seem to -- am I missing something?
As pointed out by @frankherfert , when Seaborn is used (import seaborn
) with scikit-plot, the confusion matrix tends to have weird white grid lines. Suggestions on how to get rid of this (especially from experienced Seaborn users) without adding a Seaborn dependency would be much appreciated.
Hi,
I think this project is a great idea, are you still working on it?
Dear Reiinakano,
first thank you for your really helpful scikit-plot package. I use it a lot.
I have written just a module to create a all-in-one gain-+lift+probability plot for all classes function
which would fit into your package well.
So feel free to integrate my code into scikit-plot. I would be very pleased.
You can have a look under:
https://sourceforge.net/projects/gains-chart
Best regards, Erich
http://scikit-plot.readthedocs.io/en/stable/apidocs.html#classifier-plots
in plot_confusion_matrix paragraph, the code is
rf = classifier_factory(RandomForestClassifier())
rf.plot_learning_curve(X, y, normalize=True)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490>
plt.show()
it should be
rf = classifier_factory(RandomForestClassifier())
rf.**plot_confusion_matrix**(X, y, normalize=True)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490>
plt.show()
Hello i have a precision-recall curve where i plot as the following:
skplt.metrics.plot_precision_recall_curve(y_test, y_probas, curves=['each_class'])
I have two classes in the data (one positive and one negative class with labels 1 and -1 respectively). Questions: How can I plot ONLY the positive class?
Thank you
The plot_pca_2d_projection
method plots scores and colors by target. Biplots usually plot the scores and vectors with different scalings (scaling 1: distance, and scaling 2: correlation): biplots could included in plot_pca_2d_projection
or as separate method. The target argument would preferably be optional. The ecopy
library includes a nice biplot
interface.
After
sudo pip install scikit-plot
I get
RuntimeError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends. If you are using (Ana)Conda please install python.app and replace the use of 'python' with 'pythonw'. See 'Working with Matplotlib on OSX' in the Matplotlib FAQ for more information.
All other libraries I know and use does not require this setup.
I mean a different Lift Chart or classification than the "lift chart" in the cumulative gain curve and lift curve.
Here is a sample:
https://medium.com/@inlinecoder/disrupting-the-entrance-point-to-a-predictive-data-analytics-12676aa91a8d
https://cran.r-project.org/web/packages/datarobot/vignettes/AdvancedVignette.html
I think this lift chart is quite common in machine learning and data science industry.
I wrote one for binary classification but not sure if it can be extended to multiclass.
def plotLiftChart(actual, predicted):
df_dict = {'actual': list (actual), 'pred': list(predicted)}
df = pd.DataFrame(df_dict)
pred_ranks = pd.qcut(df['pred'].rank(method='first'), 100, labels=False)
actual_ranks = pd.qcut(df['actual'].rank(method='first'), 100, labels=False)
pred_percentiles = df.groupby(pred_ranks).mean()
actual_percentiles = df.groupby(actual_ranks).mean()
plt.title('Lift Chart')
plt.plot(np.arange(.01, 1.01, .01), np.array(pred_percentiles['pred']),
color='darkorange', lw=2, label='Prediction')
plt.plot(np.arange(.01, 1.01, .01), np.array(pred_percentiles['actual']),
color='navy', lw=2, linestyle='--', label='Actual')
plt.ylabel('Target Percentile')
plt.xlabel('Population Percentile')
plt.xlim([0.0, 1.0])
plt.ylim([-0.05, 1.05])
plt.legend(loc="best")
In preparation for v0.3.0, the Factory API will be deprecated in favor of the more flexible Functions API.
If anybody has any reason to keep the Factory API, best state it here. :)
Oh, and if anybody has suggestions for what should be included in v0.3.0, please do say here.
With v0.3.0
, the Jupyter notebook examples are now outdated. It should be trivial to change the examples to the v0.3.0
format.
I think it could be useful, when one wants to plot only e.g. class 1, to have an option to produce consistent plots for both plot_cumulative_gain and plot_roc
At the moment, instead, only plot_roc supports such option.
Thanks a lot
when I use updated version I got this error. And it seems some sample code need update.
Thank you for the Keras example in README: scikit-plot looks like a very elegant solution for plotting ML / DL curves.
I was wondering if you have tried plotting PyTorch optimizers with scikit-plot? If you have an example of PyTorch it will help me out a lot.
Thank you again for an awesome library!
Several plots in the metrics package trigger the matplotlib deprecation warning as they use the 'spectral' colormap by default.
skplt.metrics.plot_roc_curve(y_train, y_probas)
MatplotlibDeprecationWarning: The spectral and spectral_r colormap was deprecated in version 2.0.
Use nipy_spectral and nipy_spectral_r instead.
Needs to be changed to 'nipy_spectral'
Hello,
I have an issue when trying to plot a confusion matrix fewer classes in my test set than in training.
The class with 12 000+ occcurences in my sample should be labelled 'O'
is it possible to get around this, or to include the label set manually as an input?
it's not a big issue but would be nice if we could fix it.
Thanks for your help
Dear sir
Thanks for your excellent work in the scikit-plot. I am confused in the function of precision_recall_curve. As we all know,the PR curve goes through two points:(0,1) and(1,0).But I used the test code to draw the curve and found that the curve does not goes the point of (1,0).I used the below code and get the curve.
import scikitplot as skplt
rf = GaussianNB ()
rf = rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)
plt.show()
Hi i tried to pip install skikit-plot but got the following error message:
Command "python setup.py egg_info" failed with error code 1.
Any suggestion?
Thanks very much.
Elena
I'm trying to plot the ROC curve, but I get ValueError: Found input variables with inconsistent numbers of samples.
Here's the code I use:
`skplt.metrics.plot_roc(labels_test.values, pred_w2v_cnn.values)
plt.show()`
Both labels_test.values and pred_w2v_cnn.values have the same length and both are of type np.ndarray. I'd be thankful if anyone can help me to solve this problem.
When you create a confusion matrix plot for a classification problem that has a lot of categories with long names, the names for the "Predicted label" axis can overlap, causing the axis to become unreadable. Even using large dimensions for the figsize
parameter isn't enough in some cases.
Here is an example.
Perhaps there could be a new optional parameter added to allow the labels on the "Predicted label" axis to be rotated 90 degrees in order to make them easier to read.
As discussed in #11 and pointed out by @frankherfert , plots generated by scikit-plot are quite small on Jupyter notebook. Adding a figsize
, title_fontsize
, and text_fontsize
parameter will let the user adjust the size of the plot based on his/her preferences.
figsize
should accept a 2-tuple and be kept at a default value of None and passed to the plt.subplots
function during figure creation. title_fontsize
should be kept to default value "large" and text_fontsize
to default value "medium"
Hello I want to plot a precision-recall curve for SVC (support vector machine classifier), but the scikit-learn svm classifier does not implement a predict_proba
method. How can I do that in scikit-plot (as far as I can see in the documentation it accepts prediction probabilities to plot the curve)?
Note that the scikit-learn documentation page has an example of precision-recall curve for SVC
Thank you,
Nikos
Hello,
Thanks so much for your great work on scikit-plot. I've found it quite useful in my ML workflows.
I'm wondering: I work with imbalanced datasets pretty frequently, so it's important for me to be able to stratify my train/test splits. When I use the classifier factory to generate plots directly from the classifier object, I don't see any options to stratify my data (e.g. in the plot_confusion_matrix
function). How can I accomplish this?
I've just tried to upgrade the package, but it gave the following error:
Collecting scikit-plot
Using cached scikit-plot-0.2.6.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-7wut1485/scikit-plot/setup.py", line 9, in <module>
import scikitplot
File "/tmp/pip-build-7wut1485/scikit-plot/scikitplot/__init__.py", line 5, in <module>
from scikitplot.classifiers import classifier_factory
File "/tmp/pip-build-7wut1485/scikit-plot/scikitplot/classifiers.py", line 5, in <module>
import matplotlib.pyplot as plt
File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/matplotlib/pyplot.py", line 115, in <module>
_backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/matplotlib/backends/__init__.py", line 32, in pylab_setup
globals(),locals(),[backend_name],0)
File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/matplotlib/backends/backend_tkagg.py", line 6, in <module>
from six.moves import tkinter as Tk
File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/six.py", line 92, in __get__
result = self._resolve()
File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/six.py", line 115, in _resolve
return _import_module(self.mod)
File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/six.py", line 82, in _import_module
__import__(name)
File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/lib/python3.6/tkinter/__init__.py", line 36, in <module>
import _tkinter # If this fails your Python may not be configured for Tk
ModuleNotFoundError: No module named '_tkinter'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-7wut1485/scikit-plot/
Unfortunately, I don't know how to debug this problem. If you need some info, please don't hesitate to ask!
It would be nice to have Jupyter notebooks in the "examples" folder showing the different plots as used in a Jupyter notebook. It could contain the same exact code as the examples in the .py files, but adjusted for size (Jupyter notebook plots tend to come out much smaller).
Hi Team!
I would like to know if you have any plan of adding new functions to plot Gain Charts and Lift Charts since they are popular in data science projects.
Thank you!
Hi. I'm beginner in python and I got this error when I tried to run a code that uses scikitplot in w7. I'm using Spyder to do it. I used this command "import scikitplot as skplt".
I checked if scikitplot is installed and it is installed in my computer. I ran this code in Ubuntu last year, but I need to run again in w7 and it is not working in this SO. How can I solve this issue, please?
While plotting the roc curve I'm getting this error. please help
I am facing IndexError but i don't know why as my train and test set are in perfect shape as required by cross_val_score but still im getting this error. Any suggestions ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.