reiinakano / scikit-plot Goto Github PK

An intuitive library to add plotting functionality to scikit-learn objects.

License: MIT License

Python 100.00%

scikit-learn visualization machine-learning data-science plotting plot

scikit-plot's Issues

Customization of Precision Recall curves

It would be best to add ability for the user to select which curves they want to display. E.g some people might not want to display the macro and micro averaged curves, display only a specific class' ROC curve.

This approach could then be extended to Precision Recall curves as well

EDIT: ROC Curves now has curves argument thanks to @doug-friedman

The True label displays abnormal

Dear developer,
when i used your function metrics.plot_confusion_matrix I obtain the following result, The true label looks a little ugly. How to do can I fix it?
Thanks

Likely an import error?

When trying to install scikit-plot with pip3 I got this error:

Collecting scikit-plot
  Downloading scikit-plot-0.1.dev3.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/9g/pv05gc7d5zvb2n_pr_0tmn8r0000gn/T/pip-build-6c8p8iqs/scikit-plot/setup.py", line 9, in <module>
        import scikitplot
      File "/private/var/folders/9g/pv05gc7d5zvb2n_pr_0tmn8r0000gn/T/pip-build-6c8p8iqs/scikit-plot/scikitplot/__init__.py", line 5, in <module>
        from scikitplot.classifiers import classifier_factory
      File "/private/var/folders/9g/pv05gc7d5zvb2n_pr_0tmn8r0000gn/T/pip-build-6c8p8iqs/scikit-plot/scikitplot/classifiers.py", line 9, in <module>
        from sklearn.model_selection import learning_curve
    ModuleNotFoundError: No module named 'sklearn'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/9g/pv05gc7d5zvb2n_pr_0tmn8r0000gn/T/pip-build-6c8p8iqs/scikit-plot/

My guess is just that classifiers.py just doesn't import sklearn.

MatplotlibDeprecationWarning: Passing one of 'on', 'true', 'off', 'false' as a boolean is deprecated

skplot 0.3.5
matplotlib 2.2.3

skplt.metrics.plot_confusion_matrix(y_test, prediction)

/home/chris/.local/lib/python3.5/site-packages/matplotlib/cbook/deprecation.py:107: MatplotlibDeprecationWarning: Passing one of 'on', 'true', 'off', 'false' as a boolean is deprecated; use an actual boolean (True/False) instead.
  warnings.warn(message, mplDeprecation, stacklevel=1)

Error installing No module named sklearn.metrics

Hi there,
I am getting an error installing it

pip install scikit-plot                                                              ~ 1
Collecting scikit-plot
  Downloading scikit-plot-0.2.1.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\setup.py", line 9, in <module>
        import scikitplot
      File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\scikitplot\__init__.py", line 5, in <module>
        from scikitplot.classifiers import classifier_factory
      File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\scikitplot\classifiers.py", line 7, in <module>
        from scikitplot import plotters
      File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\scikitplot\plotters.py", line 9, in <module>
        from sklearn.metrics import confusion_matrix
    ImportError: No module named sklearn.metrics

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\

matplotlib deprecation warning

Hello I have installed scikitplot (version 0.3.1) from pip and get the following warning when I plot a confusion matrix plot

C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\site-packages\matplotlib\cbook\deprecation.py:106: MatplotlibDeprecationWarning: The spectral and spectral_r colormap was deprecated in version 2.0. Use nipy_spectral and nipy_spectral_r instead.
  warnings.warn(message, mplDeprecation, stacklevel=1)

Cheers,
Nikos

Develop "Functions" API

The current method of appending plotting methods to scikit-learn objects may feel a little restrictive. Work is currently ongoing to develop a "Functions" API where stand-alone functions are exposed for maximum flexibility and compatibility with even non-scikit-learn objects. The current API will then need to be refactored to use the "Functions" API to prevent redundancy.

Add Jupyter notebook examples for plot_cumulative_gain and plot_lift_curve

Add Jupyter notebook examples for metrics.plot_cumulative_gain and metrics.plot_lift_curve

scikit-plot should decorate object instead of add new methods

Any interest in moving towards pytest testing framework?

Noticed most of the tests use unittest.
Is there any interest in porting this over to pytest eventually? This gives several benefits such as parametrization, monkey patching whist maintaining compatibility with unittest and thus allowing for a gradual overhaul.
The added benefits of this is that we would be more in line with the testing practices used by scikit-learn and thus increase compatibility between the two libraries!

Throws error "IndexError: too many indices for array" when trying to plot roc for binary classification

For binary classification, when I input numpy arrays having test label and test probabilities, it throws the following error :


y_true = np.array(ytest)
y_probas = np.array(p_test)
skplt.metrics.plot_roc_curve(y_true,y_probas)
plt.show()

IndexError                                Traceback (most recent call last)
<ipython-input-49-1b02f082006a> in <module>()
----> 1 skplt.metrics.plot_roc_curve(y_true,y_probas)
      2 plt.show()


/Users/tarun/anaconda/envs/gl-env/lib/python2.7/site-packages/scikitplot/metrics.pyc in plot_roc_curve(y_true, y_probas, title, curves, ax, figsize, cmap, title_fontsize, text_fontsize)
    247     roc_auc = dict()
    248     for i in range(len(classes)):
--> 249         fpr[i], tpr[i], _ = roc_curve(y_true, probas[:, i],
    250                                       pos_label=classes[i])
    251         roc_auc[i] = auc(fpr[i], tpr[i])

IndexError: too many indices for array

Add numerical digit precision parameter

Hi there,

I was wondering if there is a way of defining the digit numerical precision of values such as roc_auc.

To see what I mean, let me point you to sklearn API such as for Classification Report, where the parameter digits defines to what precision the values are presented.

This is specially important, for example, when one is training classifiers that are already in the top, say, +99.5% of accuracy/precision/recall/auc and we want to study differences amongst classifiers that are competing at the 0.1% level.

Namely I noticed that digit precision is not consistent throughout scikit-plot, where roc_auc is presenting three digit precision, whil precision_recall is presenting four digit precision.

As you can imagine, for scientific publication purposes it's a bit inelegant to present bound metrics with different precision.

Thanks!

Not working in virtualenv

Hey,

It would be cool if this worked in a virtual environment.
It's generally possible by using a different matplotlib backend, such as 'AGG'. This would only allow to save the plot's as figures though (I think).

The specific error I get when installing through virtualenv is:

RuntimeError: Python is not installed as a framework. 
The Mac OS X backend will not be able to   function correctly if Python is not installed as a framework.
See the Python documentation for more information on installing Python as a framework on Mac OS X.
Please either reinstall Python as a framework, or try one of the other backends.
If you are using (Ana)Conda please install python.app and replace the use of 'python' with 'pythonw'. 
See 'Working with Matplotlib on OSX' in the Matplotlib FAQ for more information.

conda-forge build!

scikit-plot's so amazing, I decided to make a conda-forge recipe for it! It should up available via conda-forge within a few hours.

add more argument pass through from plot_cumulative_gain to matplotlib

I would like to control the color in plot_cumulative_gain and think that like pandas these could be pass through arguments.

Problems with colours

The following methods return wrong colour ranges for plotting:

skplt.plot_pca_2d_projection(pca, X, y)
plt.show()

Returns a single colour for all classes.

skplt.plot_precision_recall_curve(y_true=y, y_probas=probas)
plt.show()

Returns repetition of colours for large classes.

Custom Scorer for CV inside plot_learning_curve

Hello,

I am using cross-validation with a particular metric, Kappa score, rather than the standard accuracy metric.

cross_val_score(clf, x_train, y_train, scoring=kappa_scorer, cv=kf, n_jobs=-1)

I would like to to set the CV done inside the plot_learning_curve method for each set of train_sizes to use the Kappa Scorer and not the accuracy score. I would also like to use the Kappa Scorer to evaluate the models performance for the training set. Is there any way to set this in the plot_learning_curve method ?

Adding a parameter to plot_confusion_matrix() to hide overlaid counts

Hi @reiinakano,

Thank you for this great repo! I am using plot_confusion_matrix() but my counts are quite large so the overlaid counts end up overlapping each other and result in a cluttered plot. I was wondering if I could submit a pull request to update this function to add a hide_counts parameter to give the option to not plot the counts? I've already forked and created a branch with the changes. Thank you!

why the auc calculated from plot_roc_curve is different than I manually？

calculated use plot_roc_curve

pred = clf.predict_proba(data_test)
skplt.plot_roc_curve(target_test, pred)
plt.show()

it's result is 0.81

calculated manually

pred_y = clf.predict(data_test)
false_positive_rate, true_positive_rate, thresholds = roc_curve(target_test, pred_y)
roc_auc = auc(false_positive_rate, true_positive_rate)

plt.title('Receiver Operating Characteristic')
plt.plot(false_positive_rate, true_positive_rate, 'b', label='AUC = %0.2f'% roc_auc)
plt.legend(loc='lower right')
plt.plot([0,1],[0,1],'r--')
plt.xlim([0,1.0])
plt.ylim([0,1.0])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

it's result is 0.72

The problem I encountered is bi-class classification not multi-class classification.

Thank you for your help!

Add cross-validation curve for appointed parameter of param_range

@reiinakano Hi, I have just found that the project of scikit-plot is really helpful for those what to do the data analysis or machine learning, and I use it much. During my working time, I find that for the parameter choosen maybe also be plot for visualization. I have writen a new method for plot the cross-validation for a appointed parameter. I want to create a new branch added the new added method. Is that ok?

Make plotting functions work with array-like inputs

Plots such as plot_roc_curve must be able to take any array-like objects. As of today, they only take numpy arrays as input, otherwise an exception is raised. This numpy array conversion must be done inside the function itself.

Example:

skplt.plot_roc_curve([0, 1], [[0.2, 0.8], [0.8, 0.2]])

does not work while

skplt.plot_roc_curve(np.array([0, 1]), np.array([[0.2, 0.8], [0.8, 0.2]]))

does

n_jobs plot_learning_curve

I'm concerned about the how long it takes given that the model must be trained within the wrapper for the Factory API. The plot_learning_curve method has an n_jobs parameter, but plot_precision_recall_curve doesn't seem to -- am I missing something?

`plot_confusion_matrix` has weird white grid lines with Seaborn

As pointed out by @frankherfert , when Seaborn is used (import seaborn) with scikit-plot, the confusion matrix tends to have weird white grid lines. Suggestions on how to get rid of this (especially from experienced Seaborn users) without adding a Seaborn dependency would be much appreciated.

Is this project still maintained?

Hi,

I think this project is a great idea, are you still working on it?

Code to integrate

Dear Reiinakano,

first thank you for your really helpful scikit-plot package. I use it a lot.
I have written just a module to create a all-in-one gain-+lift+probability plot for all classes function
which would fit into your package well.
So feel free to integrate my code into scikit-plot. I would be very pleased.
You can have a look under:
https://sourceforge.net/projects/gains-chart

Best regards, Erich

Little Error in official document

http://scikit-plot.readthedocs.io/en/stable/apidocs.html#classifier-plots

in plot_confusion_matrix paragraph, the code is

 rf = classifier_factory(RandomForestClassifier())
 rf.plot_learning_curve(X, y, normalize=True)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490>
plt.show()

it should be

 rf = classifier_factory(RandomForestClassifier())
 rf.**plot_confusion_matrix**(X, y, normalize=True)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490>
plt.show()

Plot ONLY one class

Hello i have a precision-recall curve where i plot as the following:

skplt.metrics.plot_precision_recall_curve(y_test, y_probas, curves=['each_class'])

I have two classes in the data (one positive and one negative class with labels 1 and -1 respectively). Questions: How can I plot ONLY the positive class?

Thank you

PCA: biplot

The plot_pca_2d_projection method plots scores and colors by target. Biplots usually plot the scores and vectors with different scalings (scaling 1: distance, and scaling 2: correlation): biplots could included in plot_pca_2d_projection or as separate method. The target argument would preferably be optional. The ecopy library includes a nice biplot interface.

Problems with installing on Anaconda (OS X)

After

sudo pip install scikit-plot

I get

RuntimeError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends. If you are using (Ana)Conda please install python.app and replace the use of 'python' with 'pythonw'. See 'Working with Matplotlib on OSX' in the Matplotlib FAQ for more information.

All other libraries I know and use does not require this setup.

Create a function for plotting Lift Chart?

I mean a different Lift Chart or classification than the "lift chart" in the cumulative gain curve and lift curve.

Here is a sample:
https://medium.com/@inlinecoder/disrupting-the-entrance-point-to-a-predictive-data-analytics-12676aa91a8d
https://cran.r-project.org/web/packages/datarobot/vignettes/AdvancedVignette.html

I think this lift chart is quite common in machine learning and data science industry.

I wrote one for binary classification but not sure if it can be extended to multiclass.

def plotLiftChart(actual, predicted):
    df_dict = {'actual': list (actual), 'pred': list(predicted)}
    df = pd.DataFrame(df_dict)
    pred_ranks = pd.qcut(df['pred'].rank(method='first'), 100, labels=False)
    actual_ranks = pd.qcut(df['actual'].rank(method='first'), 100, labels=False)
    pred_percentiles = df.groupby(pred_ranks).mean()
    actual_percentiles = df.groupby(actual_ranks).mean()
    plt.title('Lift Chart')
    plt.plot(np.arange(.01, 1.01, .01), np.array(pred_percentiles['pred']),
             color='darkorange', lw=2, label='Prediction')
    plt.plot(np.arange(.01, 1.01, .01), np.array(pred_percentiles['actual']),
             color='navy', lw=2, linestyle='--', label='Actual')
    plt.ylabel('Target Percentile')
    plt.xlabel('Population Percentile')
    plt.xlim([0.0, 1.0])
    plt.ylim([-0.05, 1.05])
    plt.legend(loc="best")

Deprecation of Factory API + Suggestions for v0.3.0

In preparation for v0.3.0, the Factory API will be deprecated in favor of the more flexible Functions API.

If anybody has any reason to keep the Factory API, best state it here. :)

Oh, and if anybody has suggestions for what should be included in v0.3.0, please do say here.

Update Jupyter notebook examples

With v0.3.0, the Jupyter notebook examples are now outdated. It should be trivial to change the examples to the v0.3.0 format.

add classes_to_plot option to plot_cumulative_gain

I think it could be useful, when one wants to plot only e.g. class 1, to have an option to produce consistent plots for both plot_cumulative_gain and plot_roc

At the moment, instead, only plot_roc supports such option.

Thanks a lot

AttributeError: module 'scikitplot' has no attribute 'metrics'

when I use updated version I got this error. And it seems some sample code need update.

Example Request: PyTorch Precision Recall

Thank you for the Keras example in README: scikit-plot looks like a very elegant solution for plotting ML / DL curves.

I was wondering if you have tried plotting PyTorch optimizers with scikit-plot? If you have an example of PyTorch it will help me out a lot.

Thank you again for an awesome library!

Deprecation of 'spectral' colormap in matplotlib

Several plots in the metrics package trigger the matplotlib deprecation warning as they use the 'spectral' colormap by default.

skplt.metrics.plot_roc_curve(y_train, y_probas)

MatplotlibDeprecationWarning: The spectral and spectral_r colormap was deprecated in version 2.0. 
Use nipy_spectral and nipy_spectral_r instead.

Needs to be changed to 'nipy_spectral'

Class mismatch in skplt.plot_confusion_matrix when test has fewer classes than training

Hello,
I have an issue when trying to plot a confusion matrix fewer classes in my test set than in training.
The class with 12 000+ occcurences in my sample should be labelled 'O'
is it possible to get around this, or to include the label set manually as an input?

it's not a big issue but would be nice if we could fix it.
Thanks for your help

Is there a [problem in precision_recall_curve?

Dear sir
Thanks for your excellent work in the scikit-plot. I am confused in the function of precision_recall_curve. As we all know,the PR curve goes through two points:(0,1) and(1,0).But I used the test code to draw the curve and found that the curve does not goes the point of (1,0).I used the below code and get the curve.

import scikitplot as skplt
rf = GaussianNB ()
rf = rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)
plt.show()

installation error

Hi i tried to pip install skikit-plot but got the following error message:
Command "python setup.py egg_info" failed with error code 1.

Any suggestion?

Thanks very much.

Elena

ValueError: Found input variables with inconsistent numbers of samples

I'm trying to plot the ROC curve, but I get ValueError: Found input variables with inconsistent numbers of samples.
Here's the code I use:

`skplt.metrics.plot_roc(labels_test.values, pred_w2v_cnn.values)

plt.show()`

Both labels_test.values and pred_w2v_cnn.values have the same length and both are of type np.ndarray. I'd be thankful if anyone can help me to solve this problem.

Confusion Matrix plots are confusing with long categories

When you create a confusion matrix plot for a classification problem that has a lot of categories with long names, the names for the "Predicted label" axis can overlap, causing the axis to become unreadable. Even using large dimensions for the figsize parameter isn't enough in some cases.

Here is an example.

Perhaps there could be a new optional parameter added to allow the labels on the "Predicted label" axis to be rotated 90 degrees in order to make them easier to read.

Add `figsize`, `title_fontsize`, and `text_fontsize` parameter for existing plots.

As discussed in #11 and pointed out by @frankherfert , plots generated by scikit-plot are quite small on Jupyter notebook. Adding a figsize, title_fontsize, and text_fontsize parameter will let the user adjust the size of the plot based on his/her preferences.

figsize should accept a 2-tuple and be kept at a default value of None and passed to the plt.subplots function during figure creation. title_fontsize should be kept to default value "large" and text_fontsize to default value "medium"

Plot precision-recall curve for support vector machine classifier

Hello I want to plot a precision-recall curve for SVC (support vector machine classifier), but the scikit-learn svm classifier does not implement a predict_proba method. How can I do that in scikit-plot (as far as I can see in the documentation it accepts prediction probabilities to plot the curve)?

Note that the scikit-learn documentation page has an example of precision-recall curve for SVC

Thank you,
Nikos

How to stratify the data when using the classifier factory?

Hello,

Thanks so much for your great work on scikit-plot. I've found it quite useful in my ML workflows.

I'm wondering: I work with imbalanced datasets pretty frequently, so it's important for me to be able to stratify my train/test splits. When I use the classifier factory to generate plots directly from the classifier object, I don't see any options to stratify my data (e.g. in the plot_confusion_matrix function). How can I accomplish this?

0.2.3 to 0.2.6 update failed

I've just tried to upgrade the package, but it gave the following error:

Collecting scikit-plot
  Using cached scikit-plot-0.2.6.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-7wut1485/scikit-plot/setup.py", line 9, in <module>
        import scikitplot
      File "/tmp/pip-build-7wut1485/scikit-plot/scikitplot/__init__.py", line 5, in <module>
        from scikitplot.classifiers import classifier_factory
      File "/tmp/pip-build-7wut1485/scikit-plot/scikitplot/classifiers.py", line 5, in <module>
        import matplotlib.pyplot as plt
      File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/matplotlib/pyplot.py", line 115, in <module>
        _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
      File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/matplotlib/backends/__init__.py", line 32, in pylab_setup
        globals(),locals(),[backend_name],0)
      File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/matplotlib/backends/backend_tkagg.py", line 6, in <module>
        from six.moves import tkinter as Tk
      File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/six.py", line 92, in __get__
        result = self._resolve()
      File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/six.py", line 115, in _resolve
        return _import_module(self.mod)
      File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/six.py", line 82, in _import_module
        __import__(name)
      File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/lib/python3.6/tkinter/__init__.py", line 36, in <module>
        import _tkinter # If this fails your Python may not be configured for Tk
    ModuleNotFoundError: No module named '_tkinter'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-7wut1485/scikit-plot/

Unfortunately, I don't know how to debug this problem. If you need some info, please don't hesitate to ask!

Add Jupyter notebook examples

It would be nice to have Jupyter notebooks in the "examples" folder showing the different plots as used in a Jupyter notebook. It could contain the same exact code as the examples in the .py files, but adjusted for size (Jupyter notebook plots tend to come out much smaller).

Add new features of plotting Gain Charts and Lift Charts?

Hi Team!

I would like to know if you have any plan of adding new functions to plot Gain Charts and Lift Charts since they are popular in data science projects.

https://www.ibm.com/support/knowledgecenter/en/SSLVMB_23.0.0/spss/tutorials/mlp_bankloan_outputtype_02.html

https://docs.microsoft.com/en-us/sql/analysis-services/data-mining/lift-chart-analysis-services-data-mining

Thank you!

ModuleNotFoundError: No module named 'scikitplot'

Hi. I'm beginner in python and I got this error when I tried to run a code that uses scikitplot in w7. I'm using Spyder to do it. I used this command "import scikitplot as skplt".
I checked if scikitplot is installed and it is installed in my computer. I ran this code in Ubuntu last year, but I need to run again in w7 and it is not working in this SO. How can I solve this issue, please?

multilabel-indicator format is not supported

While plotting the roc curve I'm getting this error. please help

Too many indices For Array

I am facing IndexError but i don't know why as my train and test set are in perfect shape as required by cross_val_score but still im getting this error. Any suggestions ?

K-NN for Amazon Fud Review.html.pdf

reiinakano / scikit-plot Goto Github PK

scikit-plot's Issues

calculated use plot_roc_curve

calculated manually

Recommend Projects

Recommend Topics

Recommend Org