Comments (36)
from finalfit.
Thanks Joseph for the very useful example.
This probably gets us quite far. There are summary_factorlist()
features not yet included, most notably, labels. But that would be an easy addition.
Would be tidied up and split into component functions prior to being deployed.
Interested in thoughts.
library(questionr)
data(hdv2003)
library(dplyr)
hdv2003$etud <- recode_factor(
hdv2003$nivetud,
"N'a jamais fait d'etudes" = "Primary",
"A arrete ses etudes, avant la derniere annee d'etudes primaires" = "Primary",
"Derniere annee d'etudes primaires" = "Primary",
"1er cycle" = "Secondary",
"2eme cycle" = "Secondary",
"Enseignement technique ou professionnel court" = "Professionnal",
"Enseignement technique ou professionnel long" = "Professionnal",
"Enseignement superieur y compris technique superieur" = "High"
)
library(survey)
dw <- svydesign(ids = ~ 1, weights = ~ poids, data = hdv2003)
summary_survey <- function(.data, dependent, explanatory, cont = "mean", cont_range = FALSE,
p = FALSE, column = FALSE, digits = c(1, 1, 3, 1), fit_id = FALSE){
# Define variable type
explanatory_type = .data$variables %>%
select(explanatory) %>%
purrr::map(is.numeric)
# Hypothesis test
if(p){
p_tests = explanatory %>%
purrr::map2(explanatory_type,
~if(!.y){
survey::svychisq(as.formula(paste0("~", ., "+", dependent)), .data)$p.value %>%
p_tidy(digits[3])
} else if (.y & cont == "mean"){
survey::svyttest(as.formula(paste0(.x, "~", dependent)), .data)$p.value %>%
p_tidy(digits[3])
} else if (.y & cont == "median"){
survey::svyranktest(as.formula(paste0(.x, "~", dependent)), .data)$p.value %>%
p_tidy(digits[3])
}
)
}
# Output table
explanatory %>%
purrr::map2(explanatory_type,
~ if(!.y){
survey::svytable(as.formula(paste0("~", .x, "+", dependent)), .data) %>%
as.data.frame(stringsAsFactors = FALSE) %>%
{ if(column) {
dplyr::group_by(., !! sym(dependent)) %>%
dplyr::mutate(
total = sum(Freq),
prop = 100 * Freq / total
)
} else {
dplyr::group_by_at(., 1) %>%
dplyr::mutate(
total = sum(Freq),
prop = 100 * Freq / total
)
}
} %>%
dplyr::mutate(
value = paste0(Freq %>% round_tidy(digits[4]), " (",
prop %>% round_tidy(digits[4]), ")")
) %>%
dplyr::select(-total, -prop, -Freq) %>%
tidyr::pivot_wider(names_from = !! dependent, values_from = value) %>%
dplyr::mutate(
label = names(.)[1]
) %>%
dplyr::rename(levels = 1)
} else {
{ if(cont == "mean") {
survey::svyby(as.formula(paste0("~", .x)), as.formula(paste0("~", dependent)), .data, svymean, na.rm = TRUE) %>%
dplyr::mutate(
value = paste0(!! sym(.x) %>% round_tidy(digits[1]), " (",
se %>% round_tidy(digits[2]), ")")
) %>%
dplyr::select(-c(.x, se)) %>%
tidyr::pivot_wider(names_from = !! dependent, values_from = value) %>%
dplyr::mutate(
label = .x,
levels = "Mean (SE)"
)
} else if(cont == "median"){
survey::svyby(as.formula(paste0("~", .x)), as.formula(paste0("~", dependent)), .data,
svyquantile, quantiles = 1:3/4, ci = TRUE, na.rm = TRUE) %>%
dplyr::rename(Q1 = 2,
Q2 = 3,
Q3 = 4) %>%
dplyr::mutate(
IQR = Q3 - Q1) %>%
{ if(cont_range){
dplyr::mutate(.,
value = paste0(Q2 %>% round_tidy(digits[1]), " (",
Q1 %>% round_tidy(digits[2]), " to ",
Q3 %>% round_tidy(digits[2]), ")")
)
} else {
dplyr::mutate(.,
value = paste0(Q2 %>% round_tidy(digits[1]), " (",
IQR %>% round_tidy(digits[2]), ")")
)
}} %>%
dplyr::select(-c(2:8)) %>%
tidyr::pivot_wider(names_from = !! dependent, values_from = value) %>%
dplyr::mutate(
label = .x,
levels = "Median (IQR)"
)
}
}
}) %>%
# Add hypothesis test
{ if(p){
purrr::map2_df(., p_tests,
~ mutate(.x,
p = .y)
)} else {
dplyr::bind_rows(.)
}} %>%
dplyr::select(label, levels, dplyr::everything()) %>%
as.data.frame() %>%
{ if(fit_id){
levels_id = .$levels
drop = levels_id %in% c("Mean (SE)", "Median (IQR)")
levels_id[drop] = ""
mutate(.,
fit_id = paste0(label, levels_id),
index = 1:dim(.)[1])
} else {
.
}} %>%
rm_duplicate_labels()
}
library(finalfit)
dependent = "sport"
# Choose different options to test
explanatory = c("sexe")
explanatory = c("heures.tv")
explanatory = c("sexe", "heures.tv", "etud")
dw %>%
summary_survey(dependent, explanatory, cont = "mean", cont_range = FALSE, p = FALSE, column = FALSE,
digits = c(1, 1, 3, 0))
ff_merge(
summary_survey(dw, dependent, explanatory, fit_id = TRUE),
svyglmmulti(dw, dependent, explanatory, family = "quasibinomial") %>%
fit2df(exp = TRUE, estimate_name = "OR (multivariable)"),
last_merge = TRUE
)
from finalfit.
There are lots of different glm()
options that could be added. Poisson/weights/propensity score as you rightly say. Rather than make things more complicated, ff_merge()
allows any table to be created. Here are some examples of going from a simple logistic regression model, to a weighted model.
Standard approach:
explanatory = c("age.factor", "age", "sex.factor", "nodes", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
finalfit(dependent, explanatory)
Build each column separately approach:
colon_s %>%
summary_factorlist(dependent, explanatory, fit_id=TRUE) %>%
ff_merge(
glmuni(colon_s, dependent, explanatory) %>%
fit2df(estimate_suffix = " (univariable)")
) %>%
ff_merge(
glmmulti(colon_s, dependent, explanatory) %>%
fit2df(estimate_suffix = " (multivariable)")
) %>%
dplyr::select(-fit_id, -index) %>%
dependent_label(colon_s, dependent)
Incorporate more complex model:
colon_s %>%
summary_factorlist(dependent, explanatory, fit_id=TRUE) %>%
ff_merge(
glmuni(colon_s, dependent, explanatory) %>%
fit2df(estimate_suffix = " (univariable)")
) %>%
ff_merge(
glm(ff_formula(dependent, explanatory), data=colon_s,
family = "binomial", weights = NULL) %>%
fit2df(estimate_suffix = " (multivariable)")
) %>%
dplyr::select(-fit_id, -index) %>%
dependent_label(colon_s, dependent)
Let me know what you think.
from finalfit.
That's great for multivariate models, thanks!
What do you think about weighted univariate stats? Would that require using different functions within the glmuni() function?
from finalfit.
lmuni()
, lmmulti()
, glmuni()
and glmmulti()
now supported weights.
explanatory = c("age.factor", "age", "sex.factor", "nodes", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
wgts = runif(dim(colon_s)[1], 0, 1)
library(finalfit)
colon_s %>%
summary_factorlist(dependent, explanatory, fit_id=TRUE) %>%
ff_merge(
glmuni(colon_s, dependent, explanatory, weights = wgts) %>%
fit2df(estimate_suffix = " (univariable)")
) %>%
ff_merge(
glm(ff_formula(dependent, explanatory), data=colon_s,
family = "binomial", weights = wgts) %>%
fit2df(estimate_suffix = " (multivariable)")
) %>%
dplyr::select(-fit_id, -index) %>%
dependent_label(colon_s, dependent)
from finalfit.
from finalfit.
Thanks.
Could you provide an example of what you suggest.
from finalfit.
from finalfit.
Hi,
I don't know too much about this area. I've added these functions and would appreciate if you could let me know if this is what you were thinking of. Also, are the examples correct.
Many thanks for your help with this.
Ewen
library(dplyr)
library(survey)
# Examples from survey::svyglm() help page
data(api)
# Label data frame
apistrat = apistrat %>%
mutate(
api00 = ff_label(api00, "API in 2000 (api00)"),
ell = ff_label(ell, "English language learners (percent)(ell)"),
meals = ff_label(meals, "Meals eligible (percent)(meals)"),
mobility = ff_label(mobility, "First year at the school (percent)(mobility)"),
sch.wide = ff_label(sch.wide, "School-wide target met (sch.wide)")
)
# Linear example
dependent = "api00"
explanatory = c("ell", "meals", "mobility")
# Stratified design
dstrat = svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
# Univariable fit
fit_uni = dstrat %>%
svyglmuni(dependent, explanatory) %>%
fit2df(estimate_suffix = " (univariable)")
# Multivariable fit
fit_multi = dstrat %>%
svyglmmulti(dependent, explanatory) %>%
fit2df(estimate_suffix = " (multivariable)")
# Pipe together
apistrat %>%
summary_factorlist(dependent, explanatory, fit_id = TRUE) %>%
ff_merge(fit_uni) %>%
ff_merge(fit_multi) %>%
select(-fit_id, -index) %>%
dependent_label(apistrat, dependent)
# Dependent: API in 2000 (api00) Mean (sd) Coefficient (univariable) Coefficient (multivariable)
# English language learners (percent)(ell) [0,84] 652.8 (121.0) -3.73 (-4.35--3.11, p<0.001) -0.48 (-1.25-0.29, p=0.222)
# Meals eligible (percent)(meals) [0,100] 652.8 (121.0) -3.38 (-3.71--3.05, p<0.001) -3.14 (-3.70--2.59, p<0.001)
# First year at the school (percent)(mobility) [1,99] 652.8 (121.0) -1.43 (-3.30-0.44, p=0.137) 0.23 (-0.54-1.00, p=0.567)
# Binomial example
## Note model family needs specified and exponentiation if desired
dependent = "sch.wide"
explanatory = c("ell", "meals", "mobility")
# Univariable fit
fit_uni = dstrat %>%
svyglmuni(dependent, explanatory, family = "quasibinomial") %>%
fit2df(exp = TRUE, estimate_name = "OR", estimate_suffix = " (univariable)")
# Multivariable fit
fit_multi = dstrat %>%
svyglmmulti(dependent, explanatory, family = "quasibinomial") %>%
fit2df(exp = TRUE, estimate_name = "OR", estimate_suffix = " (multivariable)")
# Pipe together
apistrat %>%
summary_factorlist(dependent, explanatory, fit_id = TRUE) %>%
ff_merge(fit_uni) %>%
ff_merge(fit_multi) %>%
select(-fit_id, -index) %>%
dependent_label(apistrat, dependent)
# Dependent: School-wide target met (sch.wide) No Yes OR (univariable) OR (multivariable)
# English language learners (percent)(ell) Mean (SD) 22.5 (19.3) 20.5 (20.0) 1.00 (0.98-1.01, p=0.715) 1.00 (0.97-1.02, p=0.851)
# Meals eligible (percent)(meals) Mean (SD) 46.0 (29.1) 44.7 (29.0) 1.00 (0.99-1.01, p=0.968) 1.00 (0.98-1.01, p=0.732)
# First year at the school (percent)(mobility) Mean (SD) 13.9 (8.6) 17.2 (13.0) 1.06 (1.00-1.12, p=0.049) 1.06 (1.00-1.13, p=0.058)
from finalfit.
That's fantastic, I should have time in the next week to go through this and confirm. I will also test on my own data. I'll respond as soon as I do. Thank you for this great update!
from finalfit.
Thanks. One issue is that the summary statistics from summary_factorlist()
are not weighted. They cannot be weighted using this function. So to be useful it might be necessary to incorporate svymean()
or similar.
What would be useful is to know how you would actually use this. What your final table would actually look like. What variations would be common in practice etc. Best wishes.
from finalfit.
Ewen I just took a look and that looks correct, though like you mention a key feature that would be really nice is the weighted means and SD in the table using svymean()
. The final table would be just like you are demonstrating, except weighted mean/SD.
This is a basic example of what those tables would look like in a paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5624560/pdf/cureus-0009-00000001536.pdf
from finalfit.
Ideally, summary_factorlist() should test if .data is a svydesign object. In such case, all stats could be computed using survey functions. i.e. svytable, svymean, svychisq, etc.
The desired table is the same as with unweighted data. It's just that the computation of them should used survey functions.
from finalfit.
Thanks @larmarange
Is it worth a new function? Would people use it?
from finalfit.
Dear @ewenharrison thank you very much for your feedback. You have done a fantastic job on finalfit
.
I just spent 3 days teaching R to master students and teachers in population studies. They were very impressed by summary_fatorlist()
, finalfit()
and or_plot()
as they are very easy to use.
In population studies, most of them are working with national surveys. Such surveys have complex sampling and/or post-calibration. Almost all of them required to use the survey
package.
When I presented multivariate analysis using survey
, they were very frustrated not to be able to use finalfit
.
I would say it's really worth to extend summary_facorlist()
to be able to deal with a survey design object instead of a data.frame and to use therefore survey
functions (such as svytable()
, svymean()
, svychisq()
...) to generate the table.
Similarly, if summary_factorlist()
is adaped to manage survey.design
class (it's simple to test if an object inherits that class to load a specific function), then it should be quite easy to adapt finalfit()
and or_fit()
as well.
By the way, just for you to know, survey
also provide a svycoxph()
function.
from finalfit.
Ok, well if you could point me to a suitable dataset and provide 2 or 3 example analyses using survey, together with the table output you would like to to see then I can have a look.
Many of the examples in library(survey)
help seem very complex and difficult to generalise.
from finalfit.
Please find below an exemple using a dataset provided by questionr
package.
library(questionr)
data(hdv2003)
library(dplyr)
#>
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
hdv2003$etud <- recode_factor(
hdv2003$nivetud,
"N'a jamais fait d'etudes" = "Primary",
"A arrete ses etudes, avant la derniere annee d'etudes primaires" = "Primary",
"Derniere annee d'etudes primaires" = "Primary",
"1er cycle" = "Secondary",
"2eme cycle" = "Secondary",
"Enseignement technique ou professionnel court" = "Professionnal",
"Enseignement technique ou professionnel long" = "Professionnal",
"Enseignement superieur y compris technique superieur" = "High"
)
library(survey)
#> Le chargement a nécessité le package : grid
#> Le chargement a nécessité le package : Matrix
#> Le chargement a nécessité le package : survival
#>
#> Attachement du package : 'survey'
#> The following object is masked from 'package:graphics':
#>
#> dotchart
dw <- svydesign(ids = ~ 1, weights = ~ poids, data = hdv2003)
# Note: summary.formula not compatible with svydesign objects
# dependent variable
freq(svytable(~ sport, dw))
#> n % val%
#> Non 6714760 60.7 60.7
#> Oui 4356466 39.3 39.3
# binary explanatory
tab <- svytable(~ sexe + sport, dw)
tab
#> sport
#> sexe Non Oui
#> Homme 2850210 2299173
#> Femme 3864550 2057293
rprop(tab)
#> sport
#> sexe Non Oui Total
#> Homme 55.4 44.6 100.0
#> Femme 65.3 34.7 100.0
#> Ensemble 60.7 39.3 100.0
svychisq(~ sexe + sport, dw)
#>
#> Pearson's X^2: Rao & Scott adjustment
#>
#> data: svychisq(~sexe + sport, dw)
#> F = 12.444, ndf = 1, ddf = 1999, p-value = 0.0004289
# explanotory with more modalities
tab <- svytable(~ etud + sport, dw)
tab
#> sport
#> etud Non Oui
#> Primary 1978619.6 234477.9
#> Secondary 1428124.8 671021.2
#> Professionnal 2050834.3 1315064.1
#> High 1071009.1 1504802.3
rprop(tab)
#> sport
#> etud Non Oui Total
#> Primary 89.4 10.6 100.0
#> Secondary 68.0 32.0 100.0
#> Professionnal 60.9 39.1 100.0
#> High 41.6 58.4 100.0
#> Ensemble 63.7 36.3 100.0
svychisq(~ etud + sport, dw)
#>
#> Pearson's X^2: Rao & Scott adjustment
#>
#> data: svychisq(~etud + sport, dw)
#> F = 45.593, ndf = 2.9831, ddf = 5963.2106, p-value < 2.2e-16
# quantitative explanatory
svyby(~ heures.tv, ~ sport, dw, svymean, na.rm = TRUE) # mean
#> sport heures.tv se
#> Non Non 2.417582 0.06689688
#> Oui Oui 1.785640 0.06573058
svyttest(heures.tv ~ sport, dw)
#>
#> Design-based t-test
#>
#> data: heures.tv ~ sport
#> t = -6.7382, df = 1993, p-value = 2.092e-11
#> alternative hypothesis: true difference in mean is not equal to 0
#> 95 percent confidence interval:
#> -0.8157581 -0.4481261
#> sample estimates:
#> difference in mean
#> -0.6319421
svyby(~ heures.tv, ~ sport, dw, svyquantile, quantiles = 1:3/4, ci = TRUE, na.rm = TRUE) # quartiles
#> Warning in vcov.svyquantile(X[[i]], ...): Only diagonal of vcov() available
#> Warning in vcov.svyquantile(X[[i]], ...): Only diagonal of vcov() available
#> sport 0.25 0.5 0.75 se.0.25 se.0.5 se.0.75
#> Non Non 1 2 3 0.0000000 0.0000000 0.2551067
#> Oui Oui 1 2 3 0.1020427 0.1371243 0.2551067
svyranktest(heures.tv ~ sport, dw)
#>
#> Design-based KruskalWallis test
#>
#> data: heures.tv ~ sport
#> t = -6.1587, df = 1993, p-value = 8.846e-10
#> alternative hypothesis: true difference in mean rank score is not equal to 0
#> sample estimates:
#> difference in mean rank score
#> -0.09905157
# binary regression
reg <- svyglm(sport ~ sexe + etud + heures.tv, family = quasibinomial(), design = dw)
questionr::odds.ratio(reg)
#> OR 2.5 % 97.5 % p
#> (Intercept) 0.18564 0.11762 0.2930 6.902e-13 ***
#> sexeFemme 0.79913 0.61522 1.0380 0.093059 .
#> etudSecondary 3.73818 2.32314 6.0151 6.261e-08 ***
#> etudProfessionnal 4.94649 3.18799 7.6750 1.398e-12 ***
#> etudHigh 10.16897 6.43759 16.0632 < 2.2e-16 ***
#> heures.tv 0.88979 0.81820 0.9677 0.006427 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
broom::tidy(reg, exponentiate = TRUE, conf.int = TRUE)
#> # A tibble: 6 x 7
#> term estimate std.error statistic p.value conf.low conf.high
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 0.186 0.233 -7.23 6.90e-13 0.118 0.293
#> 2 sexeFemme 0.799 0.133 -1.68 9.31e- 2 0.615 1.04
#> 3 etudSecondary 3.74 0.243 5.43 6.26e- 8 2.32 6.02
#> 4 etudProfessionn~ 4.95 0.224 7.13 1.40e-12 3.19 7.67
#> 5 etudHigh 10.2 0.233 9.94 9.76e-23 6.44 16.1
#> 6 heures.tv 0.890 0.0428 -2.73 6.43e- 3 0.818 0.968
from finalfit.
In terms of expected output, the idea would be to be able to do the following :
dep <- "sport"
expl <- c("sexe", "etud", "heures.tv")
library(finalfit)
dw %>% summary_factorlist(dep, expl, p = TRUE)
dw %>% finalfit(dep, expl)
The table will be exactly the same as with a classic data.frame, except that all stats would have been computed using survey
functions, taking into account sample weights and sample design.
Best regards
from finalfit.
Just for information, if you prefer working with dplyr
syntax, there is a package called srvyr
extening dplyr
verbs to survey
objects.
https://cran.r-project.org/web/packages/srvyr/vignettes/srvyr-vs-survey.html
With this package you can do things like:
dw <- as_survey_design(dw) # to add tbl_svy class to dw
dw %>% group_by(sport) %>% summarise(mean = survey_mean(heures.tv, na.rm = TRUE))
Here, the result will be a tibble.
from finalfit.
Thanks a lot. That's great!!!!
Just a small comment: for digits argument, it would be nice to consider a 5th value correspond to the number of digits for the number of observations (for example, to be able to use 0 for the number of observations while keeping 1 for percentage).
Otherwise, everything seems to behave as expected.
Thanks again
from finalfit.
from finalfit.
Thank you very much for this excellent development.
I also needed exactly this functionality.
I noticed that the currently Mean (SE) is used instead of (SD).
Definitely, more often we need SD for reporting together with means.
Unfortunately, svymean package does not provide SD but it can be derived easily from svyvar:
svyvar(~x,design)[1] %>% sqrt()
or
svyby(~x, ~dependent, design, svyvar, na.rm = TRUE)[2] %>% sqrt()
I would be great to have Mean (SD), which is also ensuring consistency with summary_factorlist().
Nobu Nishikiori
from finalfit.
from finalfit.
Thanks both.
Just to confirm, there is not a library(survey)
function that will give both the mean
and the sd
in the same function?
I have re-written the original summary_factorlist()
to remove some old dependencies and incorporate requested features.
We'll get that and this out soon. Thanks again. Ewen
from finalfit.
from finalfit.
I believe so.
Just to confirm, there is not a library(survey) function that will give
both the mean and the sd in the same function?
Thank you very much for incorporating the feature.
Nobu
from finalfit.
Please, I am a newbie to R and I am trying to adapt your "summary_survey(dw, dependent, explanatory, fit_id = TRUE)" solution to my work with below parameters:
surveydes=svydesign(id=~psu, strata=~strata, weights=~pweight, data=data)
dependent="knowledge"
explanatory=c("agegroup", "married", "children")
#knowledge (factor var: knowledgeable vs. not knowledgeable)
#agegroup (factor var: 15-24 vs. 25-34, 35-49)
#married (factor var: never married vs. ever married)
#children (integer: 0, 1, 2, ...,19
surveydes %>%
summary_survey(dependent, explanatory, cont = "mean", cont_range = FALSE, p = FALSE, column = FALSE,
digits = c(1, 1, 3, 0))
Each time I do run my code, it returns:
"Error in select(., explanatory) : unused argument (explanatory)"
I copied and pasted your "summary_survey" source codes in my R environment. But, I am certain there is something I have either misapplied or am not doing right.
Could you please advise?
Thank you,
Elhakim
from finalfit.
Hi, not sure why it is not working for you. Do you have a reproducible example?
from finalfit.
Thanks Joseph for the very useful example.
This probably gets us quite far. There are
summary_factorlist()
features not yet included, most notably, labels. But that would be an easy addition.Would be tidied up and split into component functions prior to being deployed.
Interested in thoughts.
library(questionr) data(hdv2003) library(dplyr) hdv2003$etud <- recode_factor( hdv2003$nivetud, "N'a jamais fait d'etudes" = "Primary", "A arrete ses etudes, avant la derniere annee d'etudes primaires" = "Primary", "Derniere annee d'etudes primaires" = "Primary", "1er cycle" = "Secondary", "2eme cycle" = "Secondary", "Enseignement technique ou professionnel court" = "Professionnal", "Enseignement technique ou professionnel long" = "Professionnal", "Enseignement superieur y compris technique superieur" = "High" ) library(survey) dw <- svydesign(ids = ~ 1, weights = ~ poids, data = hdv2003) summary_survey <- function(.data, dependent, explanatory, cont = "mean", cont_range = FALSE, p = FALSE, column = FALSE, digits = c(1, 1, 3, 1), fit_id = FALSE){ # Define variable type explanatory_type = .data$variables %>% select(explanatory) %>% purrr::map(is.numeric) # Hypothesis test if(p){ p_tests = explanatory %>% purrr::map2(explanatory_type, ~if(!.y){ survey::svychisq(as.formula(paste0("~", ., "+", dependent)), .data)$p.value %>% p_tidy(digits[3]) } else if (.y & cont == "mean"){ survey::svyttest(as.formula(paste0(.x, "~", dependent)), .data)$p.value %>% p_tidy(digits[3]) } else if (.y & cont == "median"){ survey::svyranktest(as.formula(paste0(.x, "~", dependent)), .data)$p.value %>% p_tidy(digits[3]) } ) } # Output table explanatory %>% purrr::map2(explanatory_type, ~ if(!.y){ survey::svytable(as.formula(paste0("~", .x, "+", dependent)), .data) %>% as.data.frame(stringsAsFactors = FALSE) %>% { if(column) { dplyr::group_by(., !! sym(dependent)) %>% dplyr::mutate( total = sum(Freq), prop = 100 * Freq / total ) } else { dplyr::group_by_at(., 1) %>% dplyr::mutate( total = sum(Freq), prop = 100 * Freq / total ) } } %>% dplyr::mutate( value = paste0(Freq %>% round_tidy(digits[4]), " (", prop %>% round_tidy(digits[4]), ")") ) %>% dplyr::select(-total, -prop, -Freq) %>% tidyr::pivot_wider(names_from = !! dependent, values_from = value) %>% dplyr::mutate( label = names(.)[1] ) %>% dplyr::rename(levels = 1) } else { { if(cont == "mean") { survey::svyby(as.formula(paste0("~", .x)), as.formula(paste0("~", dependent)), .data, svymean, na.rm = TRUE) %>% dplyr::mutate( value = paste0(!! sym(.x) %>% round_tidy(digits[1]), " (", se %>% round_tidy(digits[2]), ")") ) %>% dplyr::select(-c(.x, se)) %>% tidyr::pivot_wider(names_from = !! dependent, values_from = value) %>% dplyr::mutate( label = .x, levels = "Mean (SE)" ) } else if(cont == "median"){ survey::svyby(as.formula(paste0("~", .x)), as.formula(paste0("~", dependent)), .data, svyquantile, quantiles = 1:3/4, ci = TRUE, na.rm = TRUE) %>% dplyr::rename(Q1 = 2, Q2 = 3, Q3 = 4) %>% dplyr::mutate( IQR = Q3 - Q1) %>% { if(cont_range){ dplyr::mutate(., value = paste0(Q2 %>% round_tidy(digits[1]), " (", Q1 %>% round_tidy(digits[2]), " to ", Q3 %>% round_tidy(digits[2]), ")") ) } else { dplyr::mutate(., value = paste0(Q2 %>% round_tidy(digits[1]), " (", IQR %>% round_tidy(digits[2]), ")") ) }} %>% dplyr::select(-c(2:8)) %>% tidyr::pivot_wider(names_from = !! dependent, values_from = value) %>% dplyr::mutate( label = .x, levels = "Median (IQR)" ) } } }) %>% # Add hypothesis test { if(p){ purrr::map2_df(., p_tests, ~ mutate(.x, p = .y) )} else { dplyr::bind_rows(.) }} %>% dplyr::select(label, levels, dplyr::everything()) %>% as.data.frame() %>% { if(fit_id){ levels_id = .$levels drop = levels_id %in% c("Mean (SE)", "Median (IQR)") levels_id[drop] = "" mutate(., fit_id = paste0(label, levels_id), index = 1:dim(.)[1]) } else { . }} %>% rm_duplicate_labels() } library(finalfit) dependent = "sport" # Choose different options to test explanatory = c("sexe") explanatory = c("heures.tv") explanatory = c("sexe", "heures.tv", "etud") dw %>% summary_survey(dependent, explanatory, cont = "mean", cont_range = FALSE, p = FALSE, column = FALSE, digits = c(1, 1, 3, 0)) ff_merge( summary_survey(dw, dependent, explanatory, fit_id = TRUE), svyglmmulti(dw, dependent, explanatory, family = "quasibinomial") %>% fit2df(exp = TRUE, estimate_name = "OR (multivariable)"), last_merge = TRUE )
Many thanks for "summary_survey", it is a very useful function. Please I would like to have the row/col total in the output table.
from finalfit.
Many thanks for "summary_survey", it is a very useful function. Please I would like to have the row/col total in the output table.
from finalfit.
Many thanks for writing.
Yes, this is not yet implemented.
I guess you would need an observed total as well as a weighted total?
Best wishes,
Ewen
from finalfit.
@ewenharrison thanks for the reply
Yes, it would be great to have both totals if possible.
from finalfit.
Do you have published examples of how these tables are normally presented?
from finalfit.
label | levels | X1 | X2 | Total |
---|---|---|---|---|
Sexe | Féminin | 761.6 (77.1) | 23242.2 (85.7) | 24003.8(81.4) |
Masculin | 226.6 (22.9) | 3881.9 (14.3) | 4108.5 (18.6) | |
Total | 988.2(100.0) | 27124.1(100.0) | 28112.3(100.0) |
from finalfit.
@ewenharrison please I think the weighted total will be enough.
Since the observed total could be obtained from the summary_factorlist function
from finalfit.
Hi, thanks for emailing about this again.
Sorry if I misled, there is no plan for me to develop this function at the moment.
I guess there are a lot of people out there doing this sort of things - can't you find anything?
If you simply want the totals from the weighted analysis, could you not do this?
library(questionr)
library(tidyverse)
library(finalfit)
library(survey)
data(hdv2003)
dw <- svydesign(ids = ~ 1, weights = ~ poids, data = hdv2003)
dependent = "sport"
explanatory = c("sexe")
dw %>%
summary_survey(dependent, explanatory) %>%
bind_cols(Total = svytotal(as.formula(paste0("~", explanatory)), dw))
label levels Non Oui Total
1 sexe Homme 2850209.9 (55.4) 2299172.5 (44.6) 5149382
2 Femme 3864550.4 (65.3) 2057293.5 (34.7) 5921844
from finalfit.
Related Issues (20)
- ff_remove_ref for summaryfactorlist HOT 4
- finalfit results in table with "Coefficient" instead of "OR" HOT 3
- Why the box size is different when using hr_plot to draw the figure, how to adjust it? HOT 2
- inverse OR in finalfit in R HOT 6
- finalfit does not show median, only mean HOT 1
- inverse OR in finalfit in R continuation closed event
- Fine-gray model visualization HOT 2
- Extending cont_cut to or_plot HOT 1
- Adjust axes for coef plots? HOT 1
- Multinomial Logistic Regression HOT 4
- About modifying the hr_plot function HOT 3
- Error with or_plot HOT 2
- `fct_explicit_na()` was deprecated in forcats 1.0.0 HOT 1
- missing_compare - discrepancy in the manual (Kruskal-Wallis) HOT 1
- `Namespace in Imports field not imported from: ‘tidyselect’` HOT 1
- The output from the stratified models example does not align with the results of the stratified Cox model. HOT 4
- Table format of the finalfit output only gives HTML format script HOT 4
- Potential bugs finalfit command HOT 1
- Finalfit package: Descriptive statistics query for p values and getting Median as statistics HOT 1
- p>0.05, but 95% confidence interval for the odds ratio DOES NOT include 1. How can I check and standardise the methods used to calculate the p-values and confidence intervals? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from finalfit.