Comments (8)
What I meant is that if you have a predictor y, that it finds several cutpoints alongside y to make a good classification of the outcome.
Instead of having only one cutpoint.
from cutpointr.
I see. We should distinguish multiple "optimal" and multiple "good" cutpoints.
In the case of multiple optimal cutpoints cutpointr currently issues a warning. Issuing that warning is actually handled by the method
function. Specifically, only the minimize_metric
and maximize_metric
functions issue such warnings.
The oc_OptimalCutpoints
wrapper has a break ties argument so that for example the mean of the optimal cutpoints is returned (default). oc_youden_kernel
and oc_normal
don't lead to multiple optimal cutpoints.
Possible ways of handling multiple optimal cutpoints are:
- Handle multiple optimal cutpoints in
cutpointr
so thatmethod
functions can return multiple optimal cutpoints. Thencutpointr
would need an additionalbreak_ties
argument or similar to automatically select one of those cutpoints. Also, I'm not sure how to best return multiple cutpoints. Ifcutpointr
selects one optimal cutpoint among the optimal ones, the output would need to be augmented by one column that includes a vector of optimal cutpoints in its elements. In any case, I'd like to keepcutpointr
always return a single number in theoptimal_cutpoint
column. - As above, so let
cutpointr
handle multiple returned cutpoints but "throw away" alternative optimal cutpoints and only break the ties. - Augment
maximize_metric
andminimize_metric
by abreak_ties
argument to select a function for handling multiple optimal cutpoints (as inoc_OptimalCutpoints
) - Don't change anything. That is, let the
minimize_metric
andmaximize_metric
functions handle multiple optimal cutpoints. The idea is, that themethod
functions could be used separately without thecutpointr
function.
When breaking ties using mean or median, note that the returned "optimal" cutpoint may not actually be optimal. In other words, that cutpoint may lead to a metric value that is below the optimal one. That is why maximize_metric
and minimize_metric
return the minimum or maximum of the optimal cutpoints. In general, we don't regard this issue as particularly important in the real world. I'd like to hear opinions, though.
Concerning the handling of other "good" cutpoints: Currently we don't have plans to return these. Is there much demand for that? I assume the idea is to search for an optimal cutpoint using maximize_metric
or minimize_metric
and then get the additional, say, 5 next best cutpoints in an additional tibble column along with the corresponding metric value.
We'd probably need at least one additional argument to do that:
- How many additional cutpoints
- Or alternatively all cutpoints that perform within 90% of the chosen metric or something like that. Unfortunately, in the case of a continuous (not integer) predictor variable and many observations this could return a large number of additional cutpoints.
I'll leave this issue open for a while and I'm interested in other opinions. Thanks everyone.
from cutpointr.
With version 0.6.0 cutpointr
was enhanced by the option to return multiple optimal cutpoints. The new break_ties
argument specifies if all optimal cutpoints should be returned or if they should be summarized, e.g. using mean
or median
. If break_ties = c
all optimal cutpoints will be returned and the optimal_cutpoint
column becomes a list.
> dat <- data.frame(y = c(0,0,0,1,0,1,1,1), x = 1:8)
> cutpointr(dat, x = x, class = y, break_ties = c, pos_class = 1, direction = ">=")
Multiple optimal cutpoints found
# A tibble: 1 x 15
direction optimal_cutpoint method sum_sens_spec acc
<chr> <list> <chr> <dbl> <list>
1 >= <dbl [2]> maximize_metric 1.75 <dbl [2]>
sensitivity specificity AUC pos_class neg_class prevalence outcome
<list> <list> <dbl> <dbl> <dbl> <dbl> <chr>
1 <dbl [2]> <dbl [2]> 0.938 1.00 0 0.500 y
predictor data roc_curve
<chr> <list> <list>
1 x <tibble [8 × 2]> <data.frame [9 × 10]>
from cutpointr.
Thank you. I'll try it out!
from cutpointr.
Hi @Thie1e,
Sorry for continuing this issue but I don't know if always that break_ties = c
should to appear multiple cutpoints... in my case only it's only appearing one.
cutpointr(data = dff2, x = var, class = c_PFS, metric = accuracy,
method = maximize_boot_metric, summary_func = median,
boot_cut = 100, boot_stratify = T, boot_runs = 100,
break_ties = c)
# A tibble: 1 x 16
direction optimal_cutpoint method accuracy acc sensitivity specificity AUC pos_class neg_class prevalence outcome
<chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <dbl> <chr>
1 >= 0.85742 maximize_boot_metric 0.683735 0.683735 0.138889 0.946429 0.585627 1 0 0.325301 c_PFS
predictor data roc_curve boot
<chr> <list> <list> <lgl>
1 var<tibble [332 x 2]> <tibble [208 x 9]> NA
Thanks for your help!
from cutpointr.
Hi,
with method = maximize_boot_metric
you won't get multiple optimal cutpoints, because the returned optimal cutpoint is (in the above example) the median of all optimal cutpoints that were calculated in the 100 (= boot_cut
) bootstrap samples.
There may have been multiple optmal cutpoints in some of the bootstrap samples, but these just contribute to the median.
I'm rather wondering why the boot
column is NA
in the output, because it should be a tibble with 100 rows as specified by boot_runs
. Does the above call really return an NA
there? If so, can you post the data somewhere? Running a similar call on my machine returns the bootstrap data correctly.
from cutpointr.
Yes, you're right, that's a bit illogical get all cutpoints from a boot, totally agree (sorry for so stupid question).
Regarding yours... no, none NA
was returned. And indeed, I don't know what happens, but I ran again the code and boot
column was right formed (:dizzy_face:):
> cutpointr(data = dff2, x = var, class = c_PFS, metric = accuracy,
+ method = maximize_boot_metric, summary_func = median,
+ boot_cut = 100, boot_stratify = T, boot_runs = 100,
+ break_ties = c)
Assuming the positive class is 1
Assuming the positive class has higher x values
Running bootstrap...
# A tibble: 1 x 16
direction optimal_cutpoint method accuracy acc sensitivity specificity AUC pos_class neg_class prevalence outcome
<chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <dbl> <chr>
1 >= 0.78 maximize_boot_metric 0.704735 0.704735 0.155963 0.944 0.623431 1 0 0.303621 c_PFS
predictor data roc_curve boot
<chr> <list> <list> <list>
1 pctCTCs <tibble [359 x 2]> <tibble [209 x 9]> <tibble [100 x 23]>
So, thanks anyway for your help!
from cutpointr.
OK, glad to hear that. And don't worry, it's not a stupid question.
It makes a subtle difference, because break_ties
still applies to the individual cutpoints of every bootstrap repetition, so the bootstrapped cutpoint may differ depending on break_ties
, even if a seed was set. I just don't think that it makes a substantial difference, especially if boot_cut
is large enough.
from cutpointr.
Related Issues (20)
- Cutpointr confidence interval predictive positive value HOT 2
- Missing metrics if maximize/minimize_boot_metric HOT 2
- Allow bootstrap stratification for maximize_boot_metric and minimize_boot_metric HOT 1
- Make printing of summary_cutpointr nicer in Rmd documents HOT 1
- 95% confidence intervals instead of getting limits at 5% and 95% in summary of cutpointr HOT 1
- Documentation and cutpointr output suggestion HOT 3
- Confidence Intervals for ROC curves
- Plot a the ROC curve with manual settings HOT 4
- cutpointr() subgroup option how to determine opt_cut$boot list belonging to which subgroup? HOT 2
- Specify a customer cutpoint using oc_manual=avalue ignored? HOT 2
- Can we specify the bootstrap sampling size? HOT 2
- How to access ppv values given a custom cutpoint HOT 2
- How to include more than one predictors? HOT 5
- Calculating confidence intervals in cutpointr HOT 1
- Creating a composite biomarker score using regression coefficients HOT 2
- direction parameter in the cutpointr() HOT 2
- Set manual color for only one line HOT 3
- add_metric adds the metric column multiple times
- An ambiguous region bounded by two cutpoint
- Explain oc_youden Kernel
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cutpointr.