Git Product home page Git Product logo

Comments (8)

jwijffels avatar jwijffels commented on July 20, 2024

What I meant is that if you have a predictor y, that it finds several cutpoints alongside y to make a good classification of the outcome.
Instead of having only one cutpoint.

from cutpointr.

Thie1e avatar Thie1e commented on July 20, 2024

I see. We should distinguish multiple "optimal" and multiple "good" cutpoints.

In the case of multiple optimal cutpoints cutpointr currently issues a warning. Issuing that warning is actually handled by the method function. Specifically, only the minimize_metric and maximize_metric functions issue such warnings.
The oc_OptimalCutpoints wrapper has a break ties argument so that for example the mean of the optimal cutpoints is returned (default). oc_youden_kernel and oc_normal don't lead to multiple optimal cutpoints.

Possible ways of handling multiple optimal cutpoints are:

  • Handle multiple optimal cutpoints in cutpointr so that method functions can return multiple optimal cutpoints. Then cutpointr would need an additional break_ties argument or similar to automatically select one of those cutpoints. Also, I'm not sure how to best return multiple cutpoints. If cutpointr selects one optimal cutpoint among the optimal ones, the output would need to be augmented by one column that includes a vector of optimal cutpoints in its elements. In any case, I'd like to keep cutpointr always return a single number in the optimal_cutpoint column.
  • As above, so let cutpointr handle multiple returned cutpoints but "throw away" alternative optimal cutpoints and only break the ties.
  • Augment maximize_metric and minimize_metric by a break_ties argument to select a function for handling multiple optimal cutpoints (as in oc_OptimalCutpoints)
  • Don't change anything. That is, let the minimize_metric and maximize_metric functions handle multiple optimal cutpoints. The idea is, that the method functions could be used separately without the cutpointr function.

When breaking ties using mean or median, note that the returned "optimal" cutpoint may not actually be optimal. In other words, that cutpoint may lead to a metric value that is below the optimal one. That is why maximize_metric and minimize_metric return the minimum or maximum of the optimal cutpoints. In general, we don't regard this issue as particularly important in the real world. I'd like to hear opinions, though.

Concerning the handling of other "good" cutpoints: Currently we don't have plans to return these. Is there much demand for that? I assume the idea is to search for an optimal cutpoint using maximize_metric or minimize_metric and then get the additional, say, 5 next best cutpoints in an additional tibble column along with the corresponding metric value.

We'd probably need at least one additional argument to do that:

  • How many additional cutpoints
  • Or alternatively all cutpoints that perform within 90% of the chosen metric or something like that. Unfortunately, in the case of a continuous (not integer) predictor variable and many observations this could return a large number of additional cutpoints.

I'll leave this issue open for a while and I'm interested in other opinions. Thanks everyone.

from cutpointr.

Thie1e avatar Thie1e commented on July 20, 2024

With version 0.6.0 cutpointr was enhanced by the option to return multiple optimal cutpoints. The new break_ties argument specifies if all optimal cutpoints should be returned or if they should be summarized, e.g. using mean or median. If break_ties = c all optimal cutpoints will be returned and the optimal_cutpoint column becomes a list.

> dat <- data.frame(y = c(0,0,0,1,0,1,1,1), x = 1:8)
> cutpointr(dat, x = x, class = y, break_ties = c, pos_class = 1, direction = ">=")
Multiple optimal cutpoints found
# A tibble: 1 x 15
  direction optimal_cutpoint method          sum_sens_spec acc      
  <chr>     <list>           <chr>                   <dbl> <list>   
1 >=        <dbl [2]>        maximize_metric          1.75 <dbl [2]>
  sensitivity specificity   AUC pos_class neg_class prevalence outcome
  <list>      <list>      <dbl>     <dbl>     <dbl>      <dbl> <chr>  
1 <dbl [2]>   <dbl [2]>   0.938      1.00         0      0.500 y      
  predictor data             roc_curve            
  <chr>     <list>           <list>               
1 x         <tibble [8 × 2]> <data.frame [9 × 10]> 

from cutpointr.

jwijffels avatar jwijffels commented on July 20, 2024

Thank you. I'll try it out!

from cutpointr.

jgarces02 avatar jgarces02 commented on July 20, 2024

Hi @Thie1e,

Sorry for continuing this issue but I don't know if always that break_ties = c should to appear multiple cutpoints... in my case only it's only appearing one.

cutpointr(data = dff2, x = var, class = c_PFS, metric = accuracy,
          method = maximize_boot_metric, summary_func = median,
          boot_cut = 100, boot_stratify = T, boot_runs = 100,
          break_ties = c)

# A tibble: 1 x 16
  direction optimal_cutpoint method               accuracy      acc sensitivity specificity      AUC pos_class neg_class prevalence outcome
  <chr>                <dbl> <chr>                   <dbl>    <dbl>       <dbl>       <dbl>    <dbl> <fct>     <fct>          <dbl> <chr>  
1 >=                 0.85742 maximize_boot_metric 0.683735 0.683735    0.138889    0.946429 0.585627 1         0           0.325301 c_PFS  
  predictor data               roc_curve          boot 
  <chr>     <list>             <list>             <lgl>
1 var<tibble [332 x 2]> <tibble [208 x 9]> NA

Thanks for your help!

from cutpointr.

Thie1e avatar Thie1e commented on July 20, 2024

Hi,

with method = maximize_boot_metric you won't get multiple optimal cutpoints, because the returned optimal cutpoint is (in the above example) the median of all optimal cutpoints that were calculated in the 100 (= boot_cut) bootstrap samples.

There may have been multiple optmal cutpoints in some of the bootstrap samples, but these just contribute to the median.

I'm rather wondering why the boot column is NA in the output, because it should be a tibble with 100 rows as specified by boot_runs. Does the above call really return an NA there? If so, can you post the data somewhere? Running a similar call on my machine returns the bootstrap data correctly.

from cutpointr.

jgarces02 avatar jgarces02 commented on July 20, 2024

Yes, you're right, that's a bit illogical get all cutpoints from a boot, totally agree (sorry for so stupid question).

Regarding yours... no, none NA was returned. And indeed, I don't know what happens, but I ran again the code and boot column was right formed (:dizzy_face:):

> cutpointr(data = dff2, x = var, class = c_PFS, metric = accuracy,
+           method = maximize_boot_metric, summary_func = median,
+           boot_cut = 100, boot_stratify = T, boot_runs = 100,
+           break_ties = c)
Assuming the positive class is 1
Assuming the positive class has higher x values
Running bootstrap...
# A tibble: 1 x 16
  direction optimal_cutpoint method               accuracy      acc sensitivity specificity      AUC pos_class neg_class prevalence outcome
  <chr>                <dbl> <chr>                   <dbl>    <dbl>       <dbl>       <dbl>    <dbl> <fct>     <fct>          <dbl> <chr>  
1 >=                    0.78 maximize_boot_metric 0.704735 0.704735    0.155963       0.944 0.623431 1         0           0.303621 c_PFS  
  predictor data               roc_curve          boot               
  <chr>     <list>             <list>             <list>             
1 pctCTCs   <tibble [359 x 2]> <tibble [209 x 9]> <tibble [100 x 23]>

So, thanks anyway for your help!

from cutpointr.

Thie1e avatar Thie1e commented on July 20, 2024

OK, glad to hear that. And don't worry, it's not a stupid question.

It makes a subtle difference, because break_ties still applies to the individual cutpoints of every bootstrap repetition, so the bootstrapped cutpoint may differ depending on break_ties, even if a seed was set. I just don't think that it makes a substantial difference, especially if boot_cut is large enough.

from cutpointr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.