Git Product home page Git Product logo

lucazav / binclass-tools Goto Github PK

View Code? Open in Web Editor NEW
72.0 1.0 9.0 7.63 MB

The binclass-tools package contains a set of Python wrappers and interactive plots that facilitate the analysis of binary classification problems.

Home Page: https://medium.com/towards-data-science/finding-the-best-classification-threshold-for-imbalanced-classifications-with-interactive-plots-7d65828dda38

License: BSD 3-Clause "New" or "Revised" License

Python 1.30% Jupyter Notebook 98.70%
binary-classification data-science machine-learning python

binclass-tools's People

Contributors

gretavilla avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

binclass-tools's Issues

Please define "amount" and "cost.

Hi.

I read the medium article linked in the "About" section here. I got the sample code running for confusion_matrix_plot(). I read the README.md as well. I still have the following questions.

  1. What specifically is the "amount"? Amount of what? The amounts are all different (ex: [7.42079112e-01 9.20114688e-01 2.32900389e+00 ...]

  2. I assume the "cost" is the cost of a false positive and cost of a false negative. If so, why are there multiple values for False Negative? I printed the entire "train_cost_dict" in your example and it gave: "'FN': array([9.97586566e-01, 1.51283561e+00, 6.72133633e-01, ...).

Typically, the cost of a false negative is one consistent value multiplied by the number of false negatives. In your example you set FNs to a range of values "FN = np.abs(X_train[:, 12])", why?

The False Positives make sense and are consistent. Your example set "FP = 10". When move the slider I see 2 FPs cost $20.

Thanks!

Return the plots

I would like to have the plots returned when I call the plot functions

def curve_PR_plot(true_y, predicted_proba, beta = 1, title = "Precision Recall Curve", show_display_modebar = True): .... return area_under_pr_curve, full_fig

area_under_PR, plot = bc.curve_PR_plot(true_y=y_true, predicted_proba=proba, beta=1.5)

Undestanding optimal thresholds

This code:

y_true=[False,True,True,True,False,False,False]
y_pred_prob=[0.6,0.7,0.4,0.3,0.2,0.15,0.1

threshold_step = 0.05
optimize_threshold = 'all'
currency = '$'


train_cost_dict = bc.get_cost_dict(TN = 0, FP = 10, FN = 20, TP = 0)

cf_fig, var_metrics_df, invar_metrics_df, opt_thresh_df = bc.confusion_matrix_plot(
    true_y = y_true, 
    predicted_proba = y_pred_prob, 
    threshold_step = threshold_step, 
   # amounts = amounts, 
    cost_dict = train_cost_dict, 
    optimize_threshold = optimize_threshold, 
    #N_subsets = 70, subsets_size = 0.2, # default
    #with_replacement = False,           # default
    currency = currency,
    random_state = 123,
    title = 'Interactive Confusion Matrix for the Training Set')
cf_fig

Will produce this plot:
image

It is not clear to me why 0.05 is the optimal threshold for COST, when the Total cost at this threshold is calculated to be 40.0, while at threshold 0.25 the Total cost is calculated to be 10.

ValueError: Invalid property specified for object of type plotly.graph_objs.Heatmap: 'texttemplate'

When I try run this code on my notebook

`cf_fig, var_metrics_df, invar_metrics_df, opt_thresh_df = bc.confusion_matrix_plot(
true_y = y_train,
predicted_proba = train_predicted_proba,
threshold_step = threshold_step,
amounts = amounts,
cost_dict = train_cost_dict,
currency = currency,
#random_state = 123,
title = 'Interactive Confusion Matrix for the Training Set')

cf_fig`

I get this message:

ValueError Traceback (most recent call last)
/var/folders/bj/8cc70k514x71hctrwfbmzs8sf51jhw/T/ipykernel_79671/3959515032.py in
7 currency = currency,
8 #random_state = 123,
----> 9 title = 'Interactive Confusion Matrix for the Training Set')
10
11 cf_fig

~/.pyenv/versions/3.7.11/lib/python3.7/site-packages/bctools/plots.py in confusion_matrix_plot(true_y, predicted_proba, threshold_step, amounts, cost_dict, currency, title)
1363 colorscale = 'Blues',
1364 showscale = False,
-> 1365 visible=False), row=2, col=1)
1366
1367 # pivot metrics_dep_on_threshold_df

~/.pyenv/versions/3.7.11/lib/python3.7/site-packages/plotly/graph_objs/_heatmap.py in init(self, arg, autocolorscale, coloraxis, colorbar, colorscale, connectgaps, customdata, customdatasrc, dx, dy, hoverinfo, hoverinfosrc, hoverlabel, hoverongaps, hovertemplate, hovertemplatesrc, hovertext, hovertextsrc, ids, idssrc, legendgroup, legendgrouptitle, legendrank, meta, metasrc, name, opacity, reversescale, showlegend, showscale, stream, text, textsrc, transpose, uid, uirevision, visible, x, x0, xaxis, xcalendar, xgap, xhoverformat, xperiod, xperiod0, xperiodalignment, xsrc, xtype, y, y0, yaxis, ycalendar, ygap, yhoverformat, yperiod, yperiod0, yperiodalignment, ysrc, ytype, z, zauto, zhoverformat, zmax, zmid, zmin, zsmooth, zsrc, **kwargs)
2956 _v = name if name is not None else _v
2957 if _v is not None:
-> 2958 self["name"] = _v
2959 _v = arg.pop("opacity", None)
2960 _v = opacity if opacity is not None else _v

~/.pyenv/versions/3.7.11/lib/python3.7/site-packages/plotly/basedatatypes.py in _process_kwargs(self, **kwargs)
4335 """
4336 obj = self
-> 4337 if self.__validators is None:
4338
4339 class ValidatorCompat(object):

ValueError: Invalid property specified for object of type plotly.graph_objs.Heatmap: 'texttemplate'

Did you mean "hovertemplate"?

Valid properties:
    autocolorscale
        Determines whether the colorscale is a default palette
        (`autocolorscale: true`) or the palette determined by
        `colorscale`. In case `colorscale` is unspecified or
        `autocolorscale` is true, the default  palette will be
        chosen according to whether numbers in the `color`
        array are all positive, all negative or mixed.
    coloraxis
        Sets a reference to a shared color axis. References to
        these shared color axes are "coloraxis", "coloraxis2",
        "coloraxis3", etc. Settings for these shared color axes
        are set in the layout, under `layout.coloraxis`,
        `layout.coloraxis2`, etc. Note that multiple color
        scales can be linked to the same color axis.
    colorbar
        :class:`plotly.graph_objects.heatmap.ColorBar` instance
        or dict with compatible properties
    colorscale
        Sets the colorscale. The colorscale must be an array
        containing arrays mapping a normalized value to an rgb,
        rgba, hex, hsl, hsv, or named color string. At minimum,
        a mapping for the lowest (0) and highest (1) values are
        required. For example, `[[0, 'rgb(0,0,255)'], [1,
        'rgb(255,0,0)']]`. To control the bounds of the
        colorscale in color space, use`zmin` and `zmax`.
        Alternatively, `colorscale` may be a palette name
        string of the following list: Blackbody,Bluered,Blues,C
        ividis,Earth,Electric,Greens,Greys,Hot,Jet,Picnic,Portl
        and,Rainbow,RdBu,Reds,Viridis,YlGnBu,YlOrRd.
    connectgaps
        Determines whether or not gaps (i.e. {nan} or missing
        values) in the `z` data are filled in. It is defaulted
        to true if `z` is a one dimensional array and `zsmooth`
        is not false; otherwise it is defaulted to false.
    customdata
        Assigns extra data each datum. This may be useful when
        listening to hover, click and selection events. Note
        that, "scatter" traces also appends customdata items in
        the markers DOM elements
    customdatasrc
        Sets the source reference on Chart Studio Cloud for
        customdata .
    dx
        Sets the x coordinate step. See `x0` for more info.
    dy
        Sets the y coordinate step. See `y0` for more info.
    hoverinfo
        Determines which trace information appear on hover. If
        `none` or `skip` are set, no information is displayed
        upon hovering. But, if `none` is set, click and hover
        events are still fired.
    hoverinfosrc
        Sets the source reference on Chart Studio Cloud for
        hoverinfo .
    hoverlabel
        :class:`plotly.graph_objects.heatmap.Hoverlabel`
        instance or dict with compatible properties
    hoverongaps
        Determines whether or not gaps (i.e. {nan} or missing
        values) in the `z` data have hover labels associated
        with them.
    hovertemplate
        Template string used for rendering the information that
        appear on hover box. Note that this will override
        `hoverinfo`. Variables are inserted using %{variable},
        for example "y: %{y}" as well as %{xother}, {%_xother},
        {%_xother_}, {%xother_}. When showing info for several
        points, "xother" will be added to those with different
        x positions from the first point. An underscore before
        or after "(x|y)other" will add a space on that side,
        only when this field is shown. Numbers are formatted
        using d3-format's syntax %{variable:d3-format}, for
        example "Price: %{y:$.2f}".
        https://github.com/d3/d3-format/tree/v1.4.5#d3-format
        for details on the formatting syntax. Dates are
        formatted using d3-time-format's syntax
        %{variable|d3-time-format}, for example "Day:
        %{2019-01-01|%A}". https://github.com/d3/d3-time-
        format/tree/v2.2.3#locale_format for details on the
        date formatting syntax. The variables available in
        `hovertemplate` are the ones emitted as event data
        described at this link
        https://plotly.com/javascript/plotlyjs-events/#event-
        data. Additionally, every attributes that can be
        specified per-point (the ones that are `arrayOk: true`)
        are available.  Anything contained in tag `<extra>` is
        displayed in the secondary box, for example
        "<extra>{fullData.name}</extra>". To hide the secondary
        box completely, use an empty tag `<extra></extra>`.
    hovertemplatesrc
        Sets the source reference on Chart Studio Cloud for
        hovertemplate .
    hovertext
        Same as `text`.
    hovertextsrc
        Sets the source reference on Chart Studio Cloud for
        hovertext .
    ids
        Assigns id labels to each datum. These ids for object
        constancy of data points during animation. Should be an
        array of strings, not numbers or any other type.
    idssrc
        Sets the source reference on Chart Studio Cloud for
        ids .
    legendgroup
        Sets the legend group for this trace. Traces part of
        the same legend group hide/show at the same time when
        toggling legend items.
    legendgrouptitle
        :class:`plotly.graph_objects.heatmap.Legendgrouptitle`
        instance or dict with compatible properties
    legendrank
        Sets the legend rank for this trace. Items and groups
        with smaller ranks are presented on top/left side while
        with `*reversed* `legend.traceorder` they are on
        bottom/right side. The default legendrank is 1000, so
        that you can use ranks less than 1000 to place certain
        items before all unranked items, and ranks greater than
        1000 to go after all unranked items.
    meta
        Assigns extra meta information associated with this
        trace that can be used in various text attributes.
        Attributes such as trace `name`, graph, axis and
        colorbar `title.text`, annotation `text`
        `rangeselector`, `updatemenues` and `sliders` `label`
        text all support `meta`. To access the trace `meta`
        values in an attribute in the same trace, simply use
        `%{meta[i]}` where `i` is the index or key of the
        `meta` item in question. To access trace `meta` in
        layout attributes, use `%{data[n[.meta[i]}` where `i`
        is the index or key of the `meta` and `n` is the trace
        index.
    metasrc
        Sets the source reference on Chart Studio Cloud for
        meta .
    name
        Sets the trace name. The trace name appear as the
        legend item and on hover.
    opacity
        Sets the opacity of the trace.
    reversescale
        Reverses the color mapping if true. If true, `zmin`
        will correspond to the last color in the array and
        `zmax` will correspond to the first color.
    showlegend
        Determines whether or not an item corresponding to this
        trace is shown in the legend.
    showscale
        Determines whether or not a colorbar is displayed for
        this trace.
    stream
        :class:`plotly.graph_objects.heatmap.Stream` instance
        or dict with compatible properties
    text
        Sets the text elements associated with each z value.
    textsrc
        Sets the source reference on Chart Studio Cloud for
        text .
    transpose
        Transposes the z data.
    uid
        Assign an id to this trace, Use this to provide object
        constancy between traces during animations and
        transitions.
    uirevision
        Controls persistence of some user-driven changes to the
        trace: `constraintrange` in `parcoords` traces, as well
        as some `editable: true` modifications such as `name`
        and `colorbar.title`. Defaults to `layout.uirevision`.
        Note that other user-driven trace attribute changes are
        controlled by `layout` attributes: `trace.visible` is
        controlled by `layout.legend.uirevision`,
        `selectedpoints` is controlled by
        `layout.selectionrevision`, and `colorbar.(x|y)`
        (accessible with `config: {editable: true}`) is
        controlled by `layout.editrevision`. Trace changes are
        tracked by `uid`, which only falls back on trace index
        if no `uid` is provided. So if your app can add/remove
        traces before the end of the `data` array, such that
        the same trace has a different index, you can still
        preserve user-driven changes if you give each trace a
        `uid` that stays with it as it moves.
    visible
        Determines whether or not this trace is visible. If
        "legendonly", the trace is not drawn, but can appear as
        a legend item (provided that the legend itself is
        visible).
    x
        Sets the x coordinates.
    x0
        Alternate to `x`. Builds a linear space of x
        coordinates. Use with `dx` where `x0` is the starting
        coordinate and `dx` the step.
    xaxis
        Sets a reference between this trace's x coordinates and
        a 2D cartesian x axis. If "x" (the default value), the
        x coordinates refer to `layout.xaxis`. If "x2", the x
        coordinates refer to `layout.xaxis2`, and so on.
    xcalendar
        Sets the calendar system to use with `x` date data.
    xgap
        Sets the horizontal gap (in pixels) between bricks.
    xhoverformat
        Sets the hover text formatting rulefor `x`  using d3
        formatting mini-languages which are very similar to
        those in Python. For numbers, see:
        https://github.com/d3/d3-format/tree/v1.4.5#d3-format.
        And for dates see: https://github.com/d3/d3-time-
        format/tree/v2.2.3#locale_format. We add two items to
        d3's date formatter: "%h" for half of the year as a
        decimal number as well as "%{n}f" for fractional
        seconds with n digits. For example, *2016-10-13
        09:15:23.456* with tickformat "%H~%M~%S.%2f" would
        display *09~15~23.46*By default the values are
        formatted using `xaxis.hoverformat`.
    xperiod
        Only relevant when the axis `type` is "date". Sets the
        period positioning in milliseconds or "M<n>" on the x
        axis. Special values in the form of "M<n>" could be
        used to declare the number of months. In this case `n`
        must be a positive integer.
    xperiod0
        Only relevant when the axis `type` is "date". Sets the
        base for period positioning in milliseconds or date
        string on the x0 axis. When `x0period` is round number
        of weeks, the `x0period0` by default would be on a
        Sunday i.e. 2000-01-02, otherwise it would be at
        2000-01-01.
    xperiodalignment
        Only relevant when the axis `type` is "date". Sets the
        alignment of data points on the x axis.
    xsrc
        Sets the source reference on Chart Studio Cloud for  x
        .
    xtype
        If "array", the heatmap's x coordinates are given by
        "x" (the default behavior when `x` is provided). If
        "scaled", the heatmap's x coordinates are given by "x0"
        and "dx" (the default behavior when `x` is not
        provided).
    y
        Sets the y coordinates.
    y0
        Alternate to `y`. Builds a linear space of y
        coordinates. Use with `dy` where `y0` is the starting
        coordinate and `dy` the step.
    yaxis
        Sets a reference between this trace's y coordinates and
        a 2D cartesian y axis. If "y" (the default value), the
        y coordinates refer to `layout.yaxis`. If "y2", the y
        coordinates refer to `layout.yaxis2`, and so on.
    ycalendar
        Sets the calendar system to use with `y` date data.
    ygap
        Sets the vertical gap (in pixels) between bricks.
    yhoverformat
        Sets the hover text formatting rulefor `y`  using d3
        formatting mini-languages which are very similar to
        those in Python. For numbers, see:
        https://github.com/d3/d3-format/tree/v1.4.5#d3-format.
        And for dates see: https://github.com/d3/d3-time-
        format/tree/v2.2.3#locale_format. We add two items to
        d3's date formatter: "%h" for half of the year as a
        decimal number as well as "%{n}f" for fractional
        seconds with n digits. For example, *2016-10-13
        09:15:23.456* with tickformat "%H~%M~%S.%2f" would
        display *09~15~23.46*By default the values are
        formatted using `yaxis.hoverformat`.
    yperiod
        Only relevant when the axis `type` is "date". Sets the
        period positioning in milliseconds or "M<n>" on the y
        axis. Special values in the form of "M<n>" could be
        used to declare the number of months. In this case `n`
        must be a positive integer.
    yperiod0
        Only relevant when the axis `type` is "date". Sets the
        base for period positioning in milliseconds or date
        string on the y0 axis. When `y0period` is round number
        of weeks, the `y0period0` by default would be on a
        Sunday i.e. 2000-01-02, otherwise it would be at
        2000-01-01.
    yperiodalignment
        Only relevant when the axis `type` is "date". Sets the
        alignment of data points on the y axis.
    ysrc
        Sets the source reference on Chart Studio Cloud for  y
        .
    ytype
        If "array", the heatmap's y coordinates are given by
        "y" (the default behavior when `y` is provided) If
        "scaled", the heatmap's y coordinates are given by "y0"
        and "dy" (the default behavior when `y` is not
        provided)
    z
        Sets the z data.
    zauto
        Determines whether or not the color domain is computed
        with respect to the input data (here in `z`) or the
        bounds set in `zmin` and `zmax`  Defaults to `false`
        when `zmin` and `zmax` are set by the user.
    zhoverformat
        Sets the hover text formatting rulefor `z`  using d3
        formatting mini-languages which are very similar to
        those in Python. For numbers, see: https://github.com/d
        3/d3-format/tree/v1.4.5#d3-format.By default the values
        are formatted using generic number format.
    zmax
        Sets the upper bound of the color domain. Value should
        have the same units as in `z` and if set, `zmin` must
        be set as well.
    zmid
        Sets the mid-point of the color domain by scaling
        `zmin` and/or `zmax` to be equidistant to this point.
        Value should have the same units as in `z`. Has no
        effect when `zauto` is `false`.
    zmin
        Sets the lower bound of the color domain. Value should
        have the same units as in `z` and if set, `zmax` must
        be set as well.
    zsmooth
        Picks a smoothing algorithm use to smooth `z` data.
    zsrc
        Sets the source reference on Chart Studio Cloud for  z
        .

Did you mean "hovertemplate"?

Bad property path:
texttemplate
^^^^^^^^^^^^

Error on small data

Assume the following small test case

y_true=[False,True,True,True,False,False,False]
y_pred_prob=[0.6,0.7,0.4,0.3,0.2,0.15,0.1]
PR_plot, area_under_PR = bc.curve_PR_plot(true_y = y_true, 
                                      predicted_proba = y_pred_prob,
                                      beta = 1)

The package will fail at
--> 676 baseline = len(true_y[true_y==1]) / len(true_y)
TypeError: object of type 'bool' has no len()

The reason seems to be that after the subsetting, the list degenerated to a single item and ceased to be list.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.