leeper / cregg Goto Github PK
View Code? Open in Web Editor NEWSimple Conjoint Analyses, Tidying, and Visualization
License: Other
Simple Conjoint Analyses, Tidying, and Visualization
License: Other
This is not a bug report - but rather a clarification question. Any advice as to how to remove parentheses around feature headers in plots? Thanks a lot for your work on this!
Plot defaults to saying "BY" - this should be "Reference" or something
Please specify whether your issue is about:
When designs are constrained, AMCEs for the constrained features are only defined over subsets of the other features against which they are constrained. It would be nice to have an example (e.g., using the immigration dataset) that shows how to clearly partition based on the constraints.
Thanks for a great package.
I'm hitting an error when using weights. The traceback info is below. The MWE runs without a weights argument. Any suggestions would be welcome.
8: getElement(x, "call")
7: getCall.default(object)
6: getCall(object)
5: update.default(weights, 0 ~ .)
4: update(weights, 0 ~ .)
3: all.vars(update(weights, 0 ~ .))
2: mm(data = data, formula = formula, id = id, weights = weights,
feature_order = feature_order, feature_labels = feature_labels,
level_order = level_order, ...)
1: cj(data = test, formula = dv ~ IV1 + IV2, weights = "wts", estimate = "mm")
## load package
library("cregg")
test.formula <- dv ~ IV1 + IV2
test <- data.frame(dv = sample(c(0, 1), 1000, replace = TRUE),
ID = rep(1:500, 2),
IV1 = sample(letters[1:3], 1000, replace = TRUE),
IV2 = sample(letters[4:6], 1000, replace = TRUE),
wts = runif(1000, 0, 2))
mmtest <- cj(test, test.formula, id = ~ID, estimate = "mm",
weights = "wts")
## session info for your system
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] cregg_0.3.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 survey_3.35-1 rstudioapi_0.10 magrittr_1.5 splines_3.5.3 tidyselect_0.2.5
[7] munsell_0.5.0 colorspace_1.4-1 lattice_0.20-38 R6_2.4.0 rlang_0.4.0 plyr_1.8.4
[13] dplyr_0.8.0.1 tools_3.5.3 grid_3.5.3 ggstance_0.3.1 gtable_0.3.0 survival_2.43-3
[19] yaml_2.2.0 lazyeval_0.2.2 assertthat_0.2.1 lmtest_0.9-36 tibble_2.1.1 crayon_1.3.4
[25] Matrix_1.2-15 purrr_0.3.2 ggplot2_3.1.0 glue_1.3.1 sandwich_2.5-0 compiler_3.5.3
[31] pillar_1.3.1 scales_1.0.0 pkgconfig_2.0.2 zoo_1.8-5
The feature_headers
code in plot.cj_...()
methods modifies x
filling in all other variables in the dataset with NA
. This messes with facetting by adding an (undesirable) NA
facet.
Is it sufficiently clear that mm() returns domain estimates rather than SEs based on subsetting the data?
x <- structure(list(level = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("John", "Kate"), class = "factor"), outcome = c(0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L), weight = c(1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L)), row.names = 1:8, class = "data.frame")
# what people might be expecting
with(subset(x, level == "John"), sqrt(sum((outcome - mean(outcome))^2)/3/4))
svymean(~outcome, svydesign(ids = ~1, weights = ~ 1, data = subset(x, level == "John")))
# what is actually returned (all are equivalent)
## mm()
mm(x, outcome ~ level)
## unweighted data, subset to John
svymean(~outcome, subset(svydesign(ids = ~1, weights = ~ 1, data = x), level == "John"))
## weighted data (Kate weight == 0), subset to John
svymean(~outcome, subset(svydesign(ids = ~1, weights = ~ weight, data = x), level == "John"))
## weighted data (Kate weight == 0), full data frame
svymean(~outcome, svydesign(ids = ~1, weights = ~ weight, data = x))
[ ] Document this better, pointing to vignette: https://cran.r-project.org/web/packages/survey/vignettes/domain.pdf
[ ] Add option to not calculate variances as if subsets are random samples of population?
In fully randomized, unconstrained designs, the AMCE = AME of each factor level. In constrained designs, the AMCE is the weighted average of AMEs for each possible combination (in their examples, weights are intended display probabilities).
Really, this statistic doesn't make any sense but it is what Hainmueller et al. propose. It's probably better to just present multiple AMEs for each feature across the levels of other features against which it is constrained. But, that's also difficult.
Please specify whether your issue is about:
Currently, CIs are from svymean()
but might be better from svyciprop()
when the outcome is binary. We can make this optional but it doesn't seem to work with svyby()
so may require refactoring.
My issue is about package functionality: when using facet_wrap
I get a panel for NA values. However, the gg
object (which the function plot()
plots does not contain any NAs. I can see you run into the same issue. From your website, and using the same coded provided (see below), a NA panel is produced.
## load package
library("cregg")
## plot with facetting
plot(stacked) + ggplot2::facet_wrap(~ contest_no, nrow = 1L)
## session info for your system
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.2.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Helsinki
tzcode source: internal
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] gtable_0.3.4 ggplot2_3.4.3 cregg_0.4.0 vtable_1.4.6 kableExtra_1.3.4
[6] dplyr_1.1.3 pacman_0.5.1
loaded via a namespace (and not attached):
[1] sandwich_3.0-2 utf8_1.2.3 generics_0.1.3 xml2_1.3.5 stringi_1.7.12
[6] lattice_0.21-9 digest_0.6.33 magrittr_2.0.3 evaluate_0.22 fastmap_1.1.1
[11] plyr_1.8.9 Matrix_1.6-1.1 DBI_1.1.3 survival_3.5-7 httr_1.4.7
[16] rvest_1.0.3 fansi_1.0.5 viridisLite_0.4.2 scales_1.2.1 cli_3.6.1
[21] crayon_1.5.2 mitools_2.4 rlang_1.1.1 cowplot_1.1.1 munsell_0.5.0
[26] splines_4.3.2 withr_2.5.1 tools_4.3.2 colorspace_2.1-0 webshot_0.5.5
[31] vctrs_0.6.3 R6_2.5.1 zoo_1.8-12 lifecycle_1.0.3 stringr_1.5.0
[36] ggstance_0.3.6 pkgconfig_2.0.3 pillar_1.9.0 Rcpp_1.0.11 glue_1.6.2
[41] systemfonts_1.0.4 lmtest_0.9-40 xfun_0.40 tibble_3.2.1 tidyselect_1.2.0
[46] rstudioapi_0.15.0 knitr_1.44 farver_2.1.1 htmltools_0.5.6.1 survey_4.2-1
[51] labeling_0.4.3 rmarkdown_2.25 svglite_2.1.2 compiler_4.3.2
I wonder if this is a possible bug, or just an error message that is not 100% user friendly.
Basically the MM is all fine, and the AMCE is not produced at all.
Do you need the reproducible example?
Put your code here:
## load package
library("cregg")
> all_mm <- mm(
+ datacj,
+ choice0 ~ location + reimburse + purchase + price + provider,
+ id = "userid",
+ weights = NULL,
+ feature_order = NULL,
+ feature_labels = my_labels,
+ level_order = c("descending"),
+ )
Warning messages:
1: In logLik.svyglm(x) : svyglm not fitted by maximum likelihood.
2: In logLik.svyglm(x) : svyglm not fitted by maximum likelihood.
3: In logLik.svyglm(x) : svyglm not fitted by maximum likelihood.
4: In logLik.svyglm(x) : svyglm not fitted by maximum likelihood.
5: In logLik.svyglm(x) : svyglm not fitted by maximum likelihood.
> amce(
+ datacj,
+ choice0 ~ location + reimburse + purchase + price + provider,
+ id = "userid",
+ weights = NULL,
+ feature_order = NULL,
+ feature_labels = my_labels,
+ level_order = c("descending")
+ )
Error in svydesign.default(ids = id, weights = ~1, data = data) :
Design has only one primary sampling unit
> traceback
function (x = NULL, max.lines = getOption("traceback.max.lines",
getOption("deparse.max.lines", -1L)))
{
n <- length(x <- .traceback(x, max.lines = max.lines))
if (n == 0L)
cat(gettext("No traceback available"), "\n")
else {
for (i in 1L:n) {
xi <- x[[i]]
label <- paste0(n - i + 1L, ": ")
m <- length(xi)
srcloc <- if (!is.null(srcref <- attr(xi, "srcref"))) {
srcfile <- attr(srcref, "srcfile")
paste0(" at ", basename(srcfile$filename), "#",
srcref[1L])
}
if (isTRUE(attr(xi, "truncated"))) {
xi <- c(xi, " ...")
m <- length(xi)
}
if (!is.null(srcloc)) {
xi[m] <- paste0(xi[m], srcloc)
}
if (m > 1)
label <- c(label, rep(substr(" ", 1L,
nchar(label, type = "w")), m - 1L))
cat(paste0(label, xi), sep = "\n")
}
}
invisible(x)
}
<bytecode: 0x12a605e78>
<environment: namespace:base>
## session info for your system
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Big Sur 11.6
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.1 Hmisc_4.7-1 Formula_1.2-4 lattice_0.20-44 labelled_2.10.0 tikzDevice_0.12.3.1 gridExtra_2.3
[8] readstata13_0.10.0 miceadds_3.15-21 mice_3.15.0 haven_2.4.3 dplyr_1.0.7 cregg_0.4.0 cjoint_2.1.0
[15] survey_4.1-1 survival_3.2-11 Matrix_1.5-1 ggplot2_3.3.5 lmtest_0.9-38 zoo_1.8-9 sandwich_3.0-1
[22] tidyr_1.1.4 pacman_0.5.1
loaded via a namespace (and not attached):
[1] splines_4.1.1 shiny_1.7.1 assertthat_0.2.1 latticeExtra_0.6-30 ggstance_0.3.5 pillar_1.6.3 backports_1.2.1
[8] glue_1.4.2 digest_0.6.28 RColorBrewer_1.1-2 promises_1.2.0.1 checkmate_2.1.0 colorspace_2.0-2 cowplot_1.1.1
[15] htmltools_0.5.2 httpuv_1.6.3 plyr_1.8.6 pkgconfig_2.0.3 broom_0.7.9 purrr_0.3.4 xtable_1.8-4
[22] scales_1.1.1 jpeg_0.1-9 later_1.3.0 tibble_3.1.5 htmlTable_2.4.1 generics_0.1.0 farver_2.1.0
[29] ellipsis_0.3.2 withr_2.4.2 nnet_7.3-16 cli_3.0.1 magrittr_2.0.1 crayon_1.4.1 mime_0.12
[36] deldir_1.0-6 fansi_0.5.0 foreign_0.8-81 data.table_1.14.2 tools_4.1.1 hms_1.1.1 mitools_2.4
[43] lifecycle_1.0.1 stringr_1.4.0 interp_1.1-3 munsell_0.5.0 cluster_2.1.2 compiler_4.1.1 rlang_0.4.11
[50] rstudioapi_0.13 htmlwidgets_1.5.4 filehash_2.4-3 base64enc_0.1-3 labeling_0.4.2 gtable_0.3.0 DBI_1.1.1
[57] R6_2.5.1 knitr_1.36 fastmap_1.1.0 utf8_1.2.2 stringi_1.7.5 Rcpp_1.0.7 vctrs_0.3.8
[64] rpart_4.1-15 png_0.1-7 xfun_0.26 tidyselect_1.1.1
Please specify whether your issue is about:
We could do this. It's low priority for me but would gladly accept help.
Please specify whether your issue is about:
Hi Thomas: As mentioned, I'm having trouble manually changing the shapes of points when plotting marginal means or AMCEs. Note that using scale_colour_manual()
works but scale_shape_manual()
is ignored.
library("cregg")
data("immigration")
mm_by <- cj(immigration, ChosenImmigrant ~ Gender + Education + LanguageSkills, id = ~CaseID, estimate = "mm", by = ~contest_no)
plot(mm_by, group = "contest_no", vline = 0.5) + scale_shape_manual(values=c(1, 2, 3, 4, 5)) + scale_colour_manual(values=c("red", "blue", "green", "purple", "black"))
Possibilities:
Please specify whether your issue is about:
Reported by Farsan Ghassim via email:
## load package
library("cregg")
## code goes here
iris$foo <- factor(c("a", "b", ""))
cregg::cj(iris, Sepal.Length ~ Species, estimate = "mm", by = ~ foo)
## BY outcome statistic feature level estimate std.error z p lower upper foo
## 1 Sepal.Length mm Species versicolor 6.023529 0.13406704 44.92923 0 5.760763 6.286296 a
## 2 Sepal.Length mm Species setosa 4.950000 0.09657727 51.25430 0 4.760712 5.139288 a
## 3 Sepal.Length mm Species virginica 6.570588 0.15190449 43.25473 0 6.272861 6.868316 a
## 4 a Sepal.Length mm Species virginica 6.756250 0.14368309 47.02189 0 6.474636 7.037864 b
## 5 a Sepal.Length mm Species setosa 5.052941 0.09286290 54.41291 0 4.870933 5.234949 b
## 6 a Sepal.Length mm Species versicolor 5.770588 0.12793126 45.10694 0 5.519848 6.021329 b
## session info for your system
sessionInfo()
Please specify whether your issue is about:
This is a question about package functionality.
I successfully reproduced the immigration example of conditional marginal means. However, also in your documentation, I do not like the x-axis dimensions of the figure.
Does the cregg package allow to change the x-axis range per plot? I would like the ranges to have the same distances, but not the ranges. In this case: 0.3 to 0.7 for the High and Low Ethnocentrism ones and -0.2 to 0.2 for the difference plot.
Options like range free don't seem to work.
Any help on this package is highly appreciated.
Put your code here:
## load package
library("cregg")
data("immigration")
## code goes here
# calculate conditional MMs
mms <- cj(na.omit(immigration), ChosenImmigrant ~ ReasonForApplication + LanguageSkills, id = ~CaseID, estimate = "mm", by = ~ethnosplit)
diff_mms <- cj(na.omit(immigration), ChosenImmigrant ~ ReasonForApplication + LanguageSkills, id = ~CaseID, estimate = "mm_diff",
by = ~ethnosplit)
plot(rbind(mms, diff_mms)) + ggplot2::facet_wrap(~BY, ncol = 3L)
## session info for your system
sessionInfo()
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 13.3
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] stargazer_5.2.3 data.table_1.14.8 gridExtra_2.3 xtable_1.8-4 Hmisc_4.7-1 Formula_1.2-4 survival_3.2-11
[8] lattice_0.20-44 cregg_0.4.0 tikzDevice_0.12.3.1 forcats_0.5.1 stringr_1.5.0 dplyr_1.1.1 purrr_1.0.1
[15] readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.3.5 tidyverse_1.3.1 pacman_0.5.1
loaded via a namespace (and not attached):
[1] fs_1.6.1 lubridate_1.7.10 RColorBrewer_1.1-2 httr_1.4.2 tools_4.1.1 backports_1.2.1 utf8_1.2.3
[8] R6_2.5.1 rpart_4.1-15 DBI_1.1.1 colorspace_2.0-2 nnet_7.3-16 withr_2.5.0 tidyselect_1.2.0
[15] compiler_4.1.1 cli_3.6.1 rvest_1.0.1 htmlTable_2.4.1 xml2_1.3.3 sandwich_3.0-1 labeling_0.4.2
[22] scales_1.2.1 checkmate_2.1.0 lmtest_0.9-38 digest_0.6.28 foreign_0.8-81 base64enc_0.1-3 jpeg_0.1-9
[29] pkgconfig_2.0.3 htmltools_0.5.5 dbplyr_2.1.1 fastmap_1.1.0 htmlwidgets_1.6.2 rlang_1.1.0 readxl_1.4.1
[36] rstudioapi_0.13 farver_2.1.0 generics_0.1.3 zoo_1.8-9 jsonlite_1.7.2 magrittr_2.0.3 interp_1.1-3
[43] Matrix_1.5-1 Rcpp_1.0.7 munsell_0.5.0 fansi_1.0.4 lifecycle_1.0.3 stringi_1.7.12 plyr_1.8.6
[50] ggstance_0.3.5 grid_4.1.1 crayon_1.5.2 deldir_1.0-6 haven_2.5.2 splines_4.1.1 hms_1.1.3
[57] knitr_1.42 pillar_1.9.0 reprex_2.0.1 glue_1.6.2 mitools_2.4 latticeExtra_0.6-30 modelr_0.1.8
[64] png_0.1-7 vctrs_0.6.1 tzdb_0.3.0 cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1 xfun_0.38
[71] broom_1.0.4 survey_4.1-1 filehash_2.4-3 cluster_2.1.2
Add an argument level_order
(or similar) that regulates whether feature levels are presented in final output in ascending or descending order within each feature. That will be reflected graphically later on with the base level either at the top or bottom of each feature.
Moved from @m-jankowski at #13:
This is also a problem when using mm_diffs().
In my case, I wanted to conduct a subgroup analysis conditional on the gender of the respondents (labeled as "Male" or "Female"). The conjoint experiment, however, also contained levels with these labels ("Male" and "Female"). Particularly problematic is that mm_diffs() did not throw an error message, but returned wrong estimates without any warnings.
Here is an artificial example using the immigration data:
# Data
data("immigration")
# Create subgroups
immigration$ethnosplit <- cut(immigration$ethnocentrism, 2)
# Rename subgroup levels
immigration$subgroup <- as.factor(ifelse(as.numeric(immigration$ethnosplit) == 1,
"Female",
"Male"))
# Estimate correct MMs by subgroup
mm_correct <- cj(na.omit(immigration),
ChosenImmigrant ~ Gender + Education + LanguageSkills,
estimate = "mm",
id = ~ CaseID,
by = ~ ethnosplit)
plot(mm_correct,
group = "ethnosplit",
vline = 0.5)
# Differences between subgroups
mmdiff_correct <- mm_diffs(na.omit(immigration),
ChosenImmigrant ~ Gender + Education + LanguageSkills,
id = ~ CaseID,
by = ~ ethnosplit)
plot(mmdiff_correct)
# Using subgroups with identical level names returns wrong estimates
mmdiff_problem <- mm_diffs(na.omit(immigration),
ChosenImmigrant ~ Gender + Education + LanguageSkills,
id = ~ CaseID,
by = ~ subgroup)
plot(mmdiff_problem)
Hi Thomas,
I encountered a small problem with cj(): in case you have two levels across any attributes that have the same name, cj() will not work. While there are tweaks around this (of course), it is not very convenient. In my example, we simply have a candidate experiment in which the attributes are priorities and the attribute levels are whether the candidates have or not have these priorities. Hence, the attribute levels are "Is a priority" vs. "Is not a priority" and this is repeated for several attributes.
Could potentially be fixed.
Thanks so much for providing this great package!
Chris
Dear Thomas,
I have a question about how to formally test whether estimates for two attribute levels within one dimension are significantly different from each other. E.g., in the example below, is the estimated effect for a person from Iraq significantly different from the effect size for a person coming from China?
Thank you!
## load package
library("cregg")
## code goes here
data("immigration")
# estimating AMCEs with constraints
x <- amce(immigration, ChosenImmigrant ~ Gender + ReasonForApplication * CountryOfOrigin,
id = ~CaseID)
plot(x)
## session info for your system
sessionInfo()
Please specify whether your issue is about:
Add a function to reshape a "wide" conjoint dataset into a "long" conjoint dataset, with some flexibility over how that works.
In my conjoint example for Qualtrics, features levels are stored in a pipe-separated character vector. Would be nice to be able to auto-parse this and add to the original data.
Please specify whether your issue is about:
Power analysis and sample-size determination could be useful. Some draft code from Hainmueller et al. is here: https://www.dropbox.com/s/xdamgzwl60h1b6w/conjointpower.R?dl=0
Cannot get past this error code and not sure what the issue is. Tried lots of things to match some of the formatting of the 'immigration' data set, but cant get anything to work. I stacked 4 data.frames exported from read.qualtrics where I added a group variable based off race/ethnic group.
Happy to send the data directly, but don't to post here.
Put your code here:
## load package
library("cregg")
model <- selected ~ Convenience + Crime + Housing.Cost + Neighborhood.Composition +
Neighborhood.Quality + School.Quality + Type.of.Place
out <- cj(df, model, id = ~ Response.ID, estimate = "mm", by ~ group)
## session info for your system
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6
Please specify whether your issue is about:
It would be nice to demonstrate that if you want higher-order marginal means, you need to combine features together. So if you have two features with levels (A,B), (C,D), you need to create a new factor variable with levels (AC, AD, BC, BD). This is easily done with interaction()
.
Rather than having to loop over respondent characteristics or profile constraints, it would be nice to have a function (or feature of existing functions) to do the analysis on a set of specified subgroups.
Please specify whether your issue is about:
Is there a way to get an idea of "variable importance" using the cregg
package? For example, one might look at the plotted marginal means and ask me, "Well, which variable had the biggest effect?" I was thinking maybe looking at the variance of the marginal means? By way of an example, in the code below, I added a noise variable to immigration
. I would want noise
to perform very poorly on this variable importance metric.
Another ludicrous example might be testing profiles of political candidates where we say their party and whether or not they like macaroni and cheese. We'd want some metric that says "Political party matters X times more than whether or not they like macaroni and cheese." Is that possible to derive from the output of cj()
? I wasn't sure if there is some canonical way of approaching this.
library(cregg)
library(tidyverse)
data("immigration")
set.seed(1839)
immigration$noise <- factor(sample(letters[1:5], nrow(immigration), TRUE))
f <- ChosenImmigrant ~ Gender + Education +
LanguageSkills + CountryOfOrigin + Job + JobExperience +
JobPlans + ReasonForApplication + PriorEntry + noise
res <- cj(immigration, f, id = ~ CaseID)
# maybe?
res %>%
group_by(feature) %>%
summarise(est_var = var(estimate)) %>%
arrange(desc(est_var))
Need to update docs
I was using the package, and I had an issue with cj_tidy. It took me some time but I found that the problem was with the reshaping function in line 27 of the code. My suggestion is to have a test to see if the dataset is a data.frame or not. Once I find the problem, it worked great!
Please specify whether your issue is about:
The test suite currently only tests point estimate accuracy but it should also test variances.
Please specify whether your issue is about:
If you are reporting (1) a bug or (2) a question about code, please supply:
Right now package doesn't validate if features
are factors
. It leads to rather errors like this
## load package
library("cregg")
df <- tibble::tibble(
choice = c(0, 1), f1 = c("a", "b"), f2 = c("d", "e"), id = c("1", "2")
)
mm(df, choice ~ f1 + f2, id = ~id)
## Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
## In addition: Warning messages:
## 1: Setting row names on a tibble is deprecated.
## 2: Setting row names on a tibble is deprecated.
## Called from: fix.by(by.y, y)
That's because make_term_levels_df
blindly assumes that features are factors
and ends up with a list of NULLs
lapply(df[c("f1", "f2")], levels)
## $f1
## NULL
##
## $f2
## NULL
The problem affects at least amce
, cj
and mm
.
A quick fix would be to simply test if that's the case:
if(!all(sapply(data[feature_names], is.factor))) stop("All features should be factors")
Alternatively one could attempt to fix the problem by explicitly casting to factors (maybe with a warning).
Please specify whether your issue is about:
See below for these things. I just copied and pasted the example for MM's from the package README.
I was curious if you could change the colors to gray scale, for example, instead of rainbow colors for journal publication? Couldn't figure out how to do this. Thanks!
Example here:
## load package
library("cregg")
data("immigration")
# taken from README
f1 <- ChosenImmigrant ~ Gender + Education + LanguageSkills + CountryOfOrigin + Job + JobExperience + JobPlans + ReasonForApplication +
PriorEntry
plot(mm(immigration, f1, id = ~CaseID), vline = 0.5) # change colors here?
## session info for your system
sessionInfo()
# OUTPUT:
### R version 3.5.2 (2018-12-20)
### Platform: x86_64-apple-darwin15.6.0 (64-bit)
### Running under: macOS Mojave 10.14.3
Please specify whether your issue is about:
Something weird going on here:
> amce(taxes, chose_plan ~ taxrate2 * taxrev)
Waiting for profiling to be done...
Error in coefs_tmp[!aliased, ] <- coef_summary :
number of items to replace is not a multiple of replacement length
> traceback()
2: get_coef_summary(mod = mod, data = data, id = id, alpha = alpha) at amce.R#135
1: amce(taxes, chose_plan ~ taxrate2 * taxrev)
> cj(taxes, chose_plan ~ taxrate2 * taxrev, estimate = "amce")
Waiting for profiling to be done...
Error in coefs_tmp[!aliased, ] <- coef_summary :
number of items to replace is not a multiple of replacement length
> traceback()
3: get_coef_summary(mod = mod, data = data, id = id, alpha = alpha) at amce.R#135
2: amce(data = data, formula = formula, id = id, weights = weights,
feature_order = feature_order, feature_labels = feature_labels,
level_order = level_order, ...) at cj.R#89
1: cj(taxes, chose_plan ~ taxrate2 * taxrev, estimate = "amce")
Seems to only occur with this feature combination, but not without constraint:
> amce(taxes, chose_plan ~ taxrate2 + taxrev)
Waiting for profiling to be done...
outcome statistic feature level estimate std.error z p lower upper
1 chose_plan amce Tax rate for $10,000-$35,000 10-35k: 5% 0.00000000 NA NA NA NA NA
2 chose_plan amce Tax rate for $10,000-$35,000 10-35k: 15% 0.02907463 0.008612538 3.375849 7.358823e-04 0.012194365 0.04595489
3 chose_plan amce Tax rate for $10,000-$35,000 10-35k: 25% 0.01679525 0.010284125 1.633124 1.024430e-01 -0.003361265 0.03695176
4 chose_plan amce Tax rate for $10,000-$35,000 10-35k: 35% -0.04091056 0.011919199 -3.432324 5.984314e-04 -0.064271757 -0.01754936
5 chose_plan amce Tax revenue <75% 0.00000000 NA NA NA NA NA
6 chose_plan amce Tax revenue 75-95% -0.04588052 0.008989190 -5.103965 3.326091e-07 -0.063499005 -0.02826203
7 chose_plan amce Tax revenue 95-105% -0.05998096 0.011307260 -5.304641 1.128948e-07 -0.082142777 -0.03781913
8 chose_plan amce Tax revenue 105-125% -0.07391775 0.011029219 -6.701994 2.055952e-11 -0.095534623 -0.05230088
9 chose_plan amce Tax revenue >125% -0.12287030 0.012610874 -9.743202 1.972386e-22 -0.147587158 -0.09815344
Doesn't happen with unconstrained features:
> cj(taxes, chose_plan ~ taxrate1 * taxrev, estimate = "amce")
Waiting for profiling to be done...
outcome statistic feature level estimate std.error z p lower upper
1 chose_plan amce Tax rate for <$10,000 <10k: 0% 0.000000000 NA NA NA NA NA
2 chose_plan amce Tax rate for <$10,000 <10k: 5% 0.006844784 0.008657404 0.7906278 4.291612e-01 -0.007395378 0.021084946
3 chose_plan amce Tax rate for <$10,000 <10k: 15% -0.050383031 0.008578737 -5.8730128 4.279453e-09 -0.064493798 -0.036272265
4 chose_plan amce Tax rate for <$10,000 <10k: 25% -0.167008614 0.009084111 -18.3846949 1.742172e-75 -0.181950648 -0.152066581
5 chose_plan amce Tax revenue <75% 0.000000000 NA NA NA NA NA
6 chose_plan amce Tax revenue 75-95% -0.002781705 0.009200863 -0.3023309 7.623998e-01 -0.017915778 0.012352367
7 chose_plan amce Tax revenue 95-105% -0.011051298 0.010595316 -1.0430362 2.969315e-01 -0.028479042 0.006376446
8 chose_plan amce Tax revenue 105-125% -0.034789897 0.009172065 -3.7930278 1.488215e-04 -0.049876602 -0.019703193
9 chose_plan amce Tax revenue >125% -0.072428072 0.010124390 -7.1538211 8.439503e-13 -0.089081211 -0.055774933
Doesn't occur with immigration
:
> cj(immigration, ChosenImmigrant ~ Education * Job, id = ~ CaseID)
outcome statistic feature level estimate std.error z p lower upper
1 ChosenImmigrant amce Educational Attainment No Formal 0.000000000 NA NA NA NA NA
2 ChosenImmigrant amce Educational Attainment 4th Grade 0.032242795 0.01564257 2.0612208 3.928198e-02 0.006513054 0.05797254
3 ChosenImmigrant amce Educational Attainment 8th Grade 0.055971117 0.01568247 3.5690243 3.583132e-04 0.030175749 0.08176649
4 ChosenImmigrant amce Educational Attainment High School 0.117940436 0.01569985 7.5122007 5.814154e-14 0.092116478 0.14376439
5 ChosenImmigrant amce Educational Attainment Two-Year College 0.164346505 0.02408703 6.8230296 8.914028e-12 0.124726870 0.20396614
6 ChosenImmigrant amce Educational Attainment College Degree 0.191558995 0.02414755 7.9328533 2.141674e-15 0.151839805 0.23127818
7 ChosenImmigrant amce Educational Attainment Graduate Degree 0.178682719 0.01774300 10.0706030 7.451979e-24 0.149498080 0.20786736
8 ChosenImmigrant amce Job Janitor 0.000000000 NA NA NA NA NA
9 ChosenImmigrant amce Job Waiter -0.004555776 0.01742414 -0.2614635 7.937351e-01 -0.033215933 0.02410438
10 ChosenImmigrant amce Job Child Care Provider 0.013684131 0.01745927 0.7837744 4.331725e-01 -0.015033819 0.04240208
11 ChosenImmigrant amce Job Gardener 0.015863618 0.01760330 0.9011731 3.674963e-01 -0.013091228 0.04481846
12 ChosenImmigrant amce Job Financial Analyst 0.065638780 0.03093098 2.1221049 3.382893e-02 0.014761848 0.11651571
13 ChosenImmigrant amce Job Construction Worker 0.033689581 0.01758459 1.9158585 5.538311e-02 0.004765508 0.06261365
14 ChosenImmigrant amce Job Teacher 0.074107007 0.01752927 4.2276146 2.361819e-05 0.045273920 0.10294009
15 ChosenImmigrant amce Job Computer Programmer 0.078108991 0.03011503 2.5936881 9.495257e-03 0.028574178 0.12764380
16 ChosenImmigrant amce Job Nurse 0.089129556 0.01748420 5.0977211 3.437667e-07 0.060370613 0.11788850
17 ChosenImmigrant amce Job Research Scientist 0.135076380 0.03014113 4.4814640 7.413274e-06 0.085498637 0.18465412
18 ChosenImmigrant amce Job Doctor 0.166308291 0.03004119 5.5360091 3.094415e-08 0.116894933 0.21572165
What's going on here?
Please specify whether your issue is about:
It seems that there is an issue with calculation of confidence intervals in mm_diffs. Instead of returning a 1 - alpha confidence interval, a 1 - 2*alpha confidence interval is currently returned.
In lines 71-72 of mm_diffs(), the confidence intervals are calculated so:
# CIs
mm_split[[i]][["lower"]] <- mm_split[[i]][["estimate"]] - (stats::qnorm(1-alpha) * mm_split[[i]][["std.error"]])
mm_split[[i]][["upper"]] <- mm_split[[i]][["estimate"]] + (stats::qnorm(1-alpha) * mm_split[[i]][["std.error"]])
But the function documentation specifies that alpha is a "A numeric value indicating the significance level at which to calculate confidence intervals for the MMs (by default 0.95, meaning 95-percent CIs are returned)." The default of alpha = 0.05 gives a 90-percent confidence interval as of right now. I'm sure the solution is obvious to you; changing alpha to alpha/2 in both cases above would fix the issue. Or adding + (alpha/2) within qnorm(), which seems to be what is done in mm() and amce_diffs()
This is mainly a problem for subsequent plotting, I think.
Thank you for this package, it's awesome!
Put your code here:
## load package
library("cregg")
data("immigration")
alpha <- .05
# Differences in MMs by Gender feature
temp <- mm_diffs(immigration,
ChosenImmigrant ~ LanguageSkills + Education,
~ Gender, id = ~ CaseID,
alpha = alpha)
#confidence interval from mm_diffs()
temp[, c("lower", "upper")]
#95% confidence interval calculated "by hand"
data.frame(lower = temp$estimate - qnorm(1 - alpha/2) * temp$std.error,
upper = temp$estimate + qnorm(1 - alpha/2) * temp$std.error)
## session info for your system
sessionInfo()
Please specify whether your issue is about:
It could be cool to implement tidy AME (of features not just AMCEs of levels) and AMIE (interactions again between features) estimation via FindIt
(see https://amstat.tandfonline.com/doi/full/10.1080/01621459.2018.1476246#.XXJKGi-ZMWo and https://CRAN.R-project.org/package=FindIt). I am going to fork the repo and attempt to write these functions so they're consistent with the rest of cregg
, but I understand this might be beyond the intended scope of the package and therefore not merged into the master. Also understand that these aren't yet widely used or recognised estimands. Might take me a while.
Please specify whether your issue is about:
Currently one would have to add a geom_text()
with some messy sprintf()
to get numeric values added. This could be an option or the default.
Please specify whether your issue is about:
I'd like to be able to run cregg
on multiple datasets in mice's mira
object. OR, I would appreciate help in combining multiple cregg
results datasets into a mira
object. Ultimately, I'd like to use the pool
function on multiple cregg
supplied results.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.