leeper / cregg Goto Github PK

View Code? Open in Web Editor NEW

49.0 6.0 7.0 5.44 MB

Simple Conjoint Analyses, Tidying, and Visualization

License: Other

Makefile 0.35% R 99.65%

conjoint-analysis amce conjoint r

cregg's People

Contributors

Stargazers

Watchers

Forkers

jasonyaopku mbarnfield emitrokhina mahdlab alejandrofernandezroldan ayumu-tanaka tuantx7110

cregg's Issues

Feature header style

This is not a bug report - but rather a clarification question. Any advice as to how to remove parentheses around feature headers in plots? Thanks a lot for your work on this!

Clarify legend title in amce_by_reference() plotting

Plot defaults to saying "BY" - this should be "Reference" or something

Add examples for partitioning constrained designs

Please specify whether your issue is about:

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

When designs are constrained, AMCEs for the constrained features are only defined over subsets of the other features against which they are constrained. It would be nice to have an example (e.g., using the immigration dataset) that shows how to clearly partition based on the constraints.

Error with weights and marginal means

Thanks for a great package.

I'm hitting an error when using weights. The traceback info is below. The MWE runs without a weights argument. Any suggestions would be welcome.

8: getElement(x, "call")
7: getCall.default(object)
6: getCall(object)
5: update.default(weights, 0 ~ .)
4: update(weights, 0 ~ .)
3: all.vars(update(weights, 0 ~ .))
2: mm(data = data, formula = formula, id = id, weights = weights, 
       feature_order = feature_order, feature_labels = feature_labels, 
       level_order = level_order, ...)
1: cj(data = test, formula = dv ~ IV1 + IV2, weights = "wts", estimate = "mm")

## load package
library("cregg")
  test.formula <- dv ~ IV1 + IV2
  test <- data.frame(dv = sample(c(0, 1), 1000, replace = TRUE), 
                        ID = rep(1:500, 2), 
                        IV1 = sample(letters[1:3], 1000, replace = TRUE), 
                        IV2 = sample(letters[4:6], 1000, replace = TRUE),
                        wts = runif(1000, 0, 2))
  mmtest <- cj(test, test.formula, id = ~ID, estimate = "mm", 
                  weights = "wts")




## session info for your system
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] cregg_0.3.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2       survey_3.35-1    rstudioapi_0.10  magrittr_1.5     splines_3.5.3    tidyselect_0.2.5
 [7] munsell_0.5.0    colorspace_1.4-1 lattice_0.20-38  R6_2.4.0         rlang_0.4.0      plyr_1.8.4      
[13] dplyr_0.8.0.1    tools_3.5.3      grid_3.5.3       ggstance_0.3.1   gtable_0.3.0     survival_2.43-3 
[19] yaml_2.2.0       lazyeval_0.2.2   assertthat_0.2.1 lmtest_0.9-36    tibble_2.1.1     crayon_1.3.4    
[25] Matrix_1.2-15    purrr_0.3.2      ggplot2_3.1.0    glue_1.3.1       sandwich_2.5-0   compiler_3.5.3  
[31] pillar_1.3.1     scales_1.0.0     pkgconfig_2.0.2  zoo_1.8-5

feature_headers conflicts with facetting

The feature_headers code in plot.cj_...() methods modifies x filling in all other variables in the dataset with NA. This messes with facetting by adding an (undesirable) NA facet.

mm() variance clarity

Is it sufficiently clear that mm() returns domain estimates rather than SEs based on subsetting the data?

x <- structure(list(level = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,  2L), .Label = c("John", "Kate"), class = "factor"), outcome = c(0L,  0L, 1L, 1L, 0L, 0L, 1L, 1L), weight = c(1L, 1L, 1L, 1L, 0L, 0L,  0L, 0L)), row.names = 1:8, class = "data.frame")

# what people might be expecting
with(subset(x, level == "John"), sqrt(sum((outcome - mean(outcome))^2)/3/4))
svymean(~outcome, svydesign(ids = ~1, weights = ~ 1, data = subset(x, level == "John")))

# what is actually returned (all are equivalent)
## mm()
mm(x, outcome ~ level)

## unweighted data, subset to John
svymean(~outcome, subset(svydesign(ids = ~1, weights = ~ 1, data = x), level == "John"))

## weighted data (Kate weight == 0), subset to John
svymean(~outcome, subset(svydesign(ids = ~1, weights = ~ weight, data = x), level == "John"))

## weighted data (Kate weight == 0), full data frame
svymean(~outcome, svydesign(ids = ~1, weights = ~ weight, data = x))

[ ] Document this better, pointing to vignette: https://cran.r-project.org/web/packages/survey/vignettes/domain.pdf
[ ] Add option to not calculate variances as if subsets are random samples of population?

AMCEs in constrained designs

In fully randomized, unconstrained designs, the AMCE = AME of each factor level. In constrained designs, the AMCE is the weighted average of AMEs for each possible combination (in their examples, weights are intended display probabilities).

Really, this statistic doesn't make any sense but it is what Hainmueller et al. propose. It's probably better to just present multiple AMEs for each feature across the levels of other features against which it is constrained. But, that's also difficult.

Add support for svyciprop() CIs for MMs

Please specify whether your issue is about:

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

Currently, CIs are from svymean() but might be better from svyciprop() when the outcome is binary. We can make this optional but it doesn't seem to work with svyby() so may require refactoring.

`facet_wrap` produces a NA panel

My issue is about package functionality: when using facet_wrap I get a panel for NA values. However, the gg object (which the function plot() plots does not contain any NAs. I can see you run into the same issue. From your website, and using the same coded provided (see below), a NA panel is produced.

## load package
library("cregg")

## plot with facetting
plot(stacked) + ggplot2::facet_wrap(~ contest_no, nrow = 1L)


## session info for your system
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.2.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Helsinki
tzcode source: internal

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gtable_0.3.4     ggplot2_3.4.3    cregg_0.4.0      vtable_1.4.6     kableExtra_1.3.4
[6] dplyr_1.1.3      pacman_0.5.1    

loaded via a namespace (and not attached):
 [1] sandwich_3.0-2    utf8_1.2.3        generics_0.1.3    xml2_1.3.5        stringi_1.7.12   
 [6] lattice_0.21-9    digest_0.6.33     magrittr_2.0.3    evaluate_0.22     fastmap_1.1.1    
[11] plyr_1.8.9        Matrix_1.6-1.1    DBI_1.1.3         survival_3.5-7    httr_1.4.7       
[16] rvest_1.0.3       fansi_1.0.5       viridisLite_0.4.2 scales_1.2.1      cli_3.6.1        
[21] crayon_1.5.2      mitools_2.4       rlang_1.1.1       cowplot_1.1.1     munsell_0.5.0    
[26] splines_4.3.2     withr_2.5.1       tools_4.3.2       colorspace_2.1-0  webshot_0.5.5    
[31] vctrs_0.6.3       R6_2.5.1          zoo_1.8-12        lifecycle_1.0.3   stringr_1.5.0    
[36] ggstance_0.3.6    pkgconfig_2.0.3   pillar_1.9.0      Rcpp_1.0.11       glue_1.6.2       
[41] systemfonts_1.0.4 lmtest_0.9-40     xfun_0.40         tibble_3.2.1      tidyselect_1.2.0 
[46] rstudioapi_0.15.0 knitr_1.44        farver_2.1.1      htmltools_0.5.6.1 survey_4.2-1     
[51] labeling_0.4.3    rmarkdown_2.25    svglite_2.1.2     compiler_4.3.2

Why is it possible to get MMs but not AMCEs (same data set), "Error in svydesign.default(ids = id, weights = ~1, data = data) : Design has only one primary sampling unit"

I wonder if this is a possible bug, or just an error message that is not 100% user friendly.

Basically the MM is all fine, and the AMCE is not produced at all.

Do you need the reproducible example?

Put your code here:

## load package
library("cregg")

> all_mm <- mm(
+   datacj,
+   choice0 ~ location + reimburse + purchase + price + provider,
+   id = "userid",
+   weights = NULL,
+   feature_order = NULL,
+   feature_labels = my_labels,
+   level_order = c("descending"),
+ )
Warning messages:
1: In logLik.svyglm(x) : svyglm not fitted by maximum likelihood.
2: In logLik.svyglm(x) : svyglm not fitted by maximum likelihood.
3: In logLik.svyglm(x) : svyglm not fitted by maximum likelihood.
4: In logLik.svyglm(x) : svyglm not fitted by maximum likelihood.
5: In logLik.svyglm(x) : svyglm not fitted by maximum likelihood.


> amce(
+   datacj,
+   choice0 ~ location + reimburse + purchase + price + provider,
+   id = "userid",
+   weights = NULL,
+   feature_order = NULL,
+   feature_labels = my_labels,
+   level_order = c("descending")
+ )
Error in svydesign.default(ids = id, weights = ~1, data = data) : 
  Design has only one primary sampling unit
> traceback
function (x = NULL, max.lines = getOption("traceback.max.lines", 
    getOption("deparse.max.lines", -1L))) 
{
    n <- length(x <- .traceback(x, max.lines = max.lines))
    if (n == 0L) 
        cat(gettext("No traceback available"), "\n")
    else {
        for (i in 1L:n) {
            xi <- x[[i]]
            label <- paste0(n - i + 1L, ": ")
            m <- length(xi)
            srcloc <- if (!is.null(srcref <- attr(xi, "srcref"))) {
                srcfile <- attr(srcref, "srcfile")
                paste0(" at ", basename(srcfile$filename), "#", 
                  srcref[1L])
            }
            if (isTRUE(attr(xi, "truncated"))) {
                xi <- c(xi, " ...")
                m <- length(xi)
            }
            if (!is.null(srcloc)) {
                xi[m] <- paste0(xi[m], srcloc)
            }
            if (m > 1) 
                label <- c(label, rep(substr("          ", 1L, 
                  nchar(label, type = "w")), m - 1L))
            cat(paste0(label, xi), sep = "\n")
        }
    }
    invisible(x)
}
<bytecode: 0x12a605e78>
<environment: namespace:base>


## session info for your system

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Big Sur 11.6

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.5.1       Hmisc_4.7-1         Formula_1.2-4       lattice_0.20-44     labelled_2.10.0     tikzDevice_0.12.3.1 gridExtra_2.3      
 [8] readstata13_0.10.0  miceadds_3.15-21    mice_3.15.0         haven_2.4.3         dplyr_1.0.7         cregg_0.4.0         cjoint_2.1.0       
[15] survey_4.1-1        survival_3.2-11     Matrix_1.5-1        ggplot2_3.3.5       lmtest_0.9-38       zoo_1.8-9           sandwich_3.0-1     
[22] tidyr_1.1.4         pacman_0.5.1       

loaded via a namespace (and not attached):
 [1] splines_4.1.1       shiny_1.7.1         assertthat_0.2.1    latticeExtra_0.6-30 ggstance_0.3.5      pillar_1.6.3        backports_1.2.1    
 [8] glue_1.4.2          digest_0.6.28       RColorBrewer_1.1-2  promises_1.2.0.1    checkmate_2.1.0     colorspace_2.0-2    cowplot_1.1.1      
[15] htmltools_0.5.2     httpuv_1.6.3        plyr_1.8.6          pkgconfig_2.0.3     broom_0.7.9         purrr_0.3.4         xtable_1.8-4       
[22] scales_1.1.1        jpeg_0.1-9          later_1.3.0         tibble_3.1.5        htmlTable_2.4.1     generics_0.1.0      farver_2.1.0       
[29] ellipsis_0.3.2      withr_2.4.2         nnet_7.3-16         cli_3.0.1           magrittr_2.0.1      crayon_1.4.1        mime_0.12          
[36] deldir_1.0-6        fansi_0.5.0         foreign_0.8-81      data.table_1.14.2   tools_4.1.1         hms_1.1.1           mitools_2.4        
[43] lifecycle_1.0.1     stringr_1.4.0       interp_1.1-3        munsell_0.5.0       cluster_2.1.2       compiler_4.1.1      rlang_0.4.11       
[50] rstudioapi_0.13     htmlwidgets_1.5.4   filehash_2.4-3      base64enc_0.1-3     labeling_0.4.2      gtable_0.3.0        DBI_1.1.1          
[57] R6_2.5.1            knitr_1.36          fastmap_1.1.0       utf8_1.2.2          stringi_1.7.5       Rcpp_1.0.7          vctrs_0.3.8        
[64] rpart_4.1-15        png_0.1-7           xfun_0.26           tidyselect_1.1.1

Support ACIE

Please specify whether your issue is about:

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

We could do this. It's low priority for me but would gladly accept help.

Change point shapes in plot of AMCEs or MMs

Please specify whether your issue is about:

[x ] a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

Hi Thomas: As mentioned, I'm having trouble manually changing the shapes of points when plotting marginal means or AMCEs. Note that using scale_colour_manual() works but scale_shape_manual() is ignored.

library("cregg")

data("immigration")

mm_by <- cj(immigration, ChosenImmigrant ~ Gender + Education + LanguageSkills, id = ~CaseID, estimate = "mm", by = ~contest_no)

plot(mm_by, group = "contest_no", vline = 0.5) + scale_shape_manual(values=c(1, 2, 3, 4, 5)) + scale_colour_manual(values=c("red", "blue", "green", "purple", "black"))

Additional diagnostics

Possibilities:

Sparsity or combinations of feature levels
Carryover assumption test (subset analysis by task round)
Balance testing (Hainmueller et al. 2014, 25)
Left/right preference test
Reference category sensitivity (a72d8c0)

Add test suite

cj(..., by = ) operations produce faulty results when by factor has blank levels

Please specify whether your issue is about:

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

Reported by Farsan Ghassim via email:

## load package
library("cregg")

## code goes here
iris$foo <- factor(c("a", "b", ""))
cregg::cj(iris, Sepal.Length ~ Species, estimate = "mm", by = ~ foo)
##   BY      outcome statistic feature      level estimate  std.error        z p    lower    upper foo
## 1    Sepal.Length        mm Species versicolor 6.023529 0.13406704 44.92923 0 5.760763 6.286296   a
## 2    Sepal.Length        mm Species     setosa 4.950000 0.09657727 51.25430 0 4.760712 5.139288   a
## 3    Sepal.Length        mm Species  virginica 6.570588 0.15190449 43.25473 0 6.272861 6.868316   a
## 4  a Sepal.Length        mm Species  virginica 6.756250 0.14368309 47.02189 0 6.474636 7.037864   b
## 5  a Sepal.Length        mm Species     setosa 5.052941 0.09286290 54.41291 0 4.870933 5.234949   b
## 6  a Sepal.Length        mm Species versicolor 5.770588 0.12793126 45.10694 0 5.519848 6.021329   b

## session info for your system
sessionInfo()

amce_diffs() needs to account for clustering

Please specify whether your issue is about:

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

Axis in mm-diff plots

This is a question about package functionality.

I successfully reproduced the immigration example of conditional marginal means. However, also in your documentation, I do not like the x-axis dimensions of the figure.
Does the cregg package allow to change the x-axis range per plot? I would like the ranges to have the same distances, but not the ranges. In this case: 0.3 to 0.7 for the High and Low Ethnocentrism ones and -0.2 to 0.2 for the difference plot.
Options like range free don't seem to work.

Any help on this package is highly appreciated.

Put your code here:

## load package
library("cregg")
data("immigration")
## code goes here
# calculate conditional MMs
mms <- cj(na.omit(immigration), ChosenImmigrant ~ ReasonForApplication + LanguageSkills, id = ~CaseID, estimate = "mm", by = ~ethnosplit)
diff_mms <- cj(na.omit(immigration), ChosenImmigrant ~ ReasonForApplication + LanguageSkills, id = ~CaseID, estimate = "mm_diff", 
    by = ~ethnosplit)
plot(rbind(mms, diff_mms)) + ggplot2::facet_wrap(~BY, ncol = 3L)

## session info for your system
sessionInfo()
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 13.3

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] stargazer_5.2.3     data.table_1.14.8   gridExtra_2.3       xtable_1.8-4        Hmisc_4.7-1         Formula_1.2-4       survival_3.2-11    
 [8] lattice_0.20-44     cregg_0.4.0         tikzDevice_0.12.3.1 forcats_0.5.1       stringr_1.5.0       dplyr_1.1.1         purrr_1.0.1        
[15] readr_2.1.4         tidyr_1.3.0         tibble_3.2.1        ggplot2_3.3.5       tidyverse_1.3.1     pacman_0.5.1       

loaded via a namespace (and not attached):
 [1] fs_1.6.1            lubridate_1.7.10    RColorBrewer_1.1-2  httr_1.4.2          tools_4.1.1         backports_1.2.1     utf8_1.2.3         
 [8] R6_2.5.1            rpart_4.1-15        DBI_1.1.1           colorspace_2.0-2    nnet_7.3-16         withr_2.5.0         tidyselect_1.2.0   
[15] compiler_4.1.1      cli_3.6.1           rvest_1.0.1         htmlTable_2.4.1     xml2_1.3.3          sandwich_3.0-1      labeling_0.4.2     
[22] scales_1.2.1        checkmate_2.1.0     lmtest_0.9-38       digest_0.6.28       foreign_0.8-81      base64enc_0.1-3     jpeg_0.1-9         
[29] pkgconfig_2.0.3     htmltools_0.5.5     dbplyr_2.1.1        fastmap_1.1.0       htmlwidgets_1.6.2   rlang_1.1.0         readxl_1.4.1       
[36] rstudioapi_0.13     farver_2.1.0        generics_0.1.3      zoo_1.8-9           jsonlite_1.7.2      magrittr_2.0.3      interp_1.1-3       
[43] Matrix_1.5-1        Rcpp_1.0.7          munsell_0.5.0       fansi_1.0.4         lifecycle_1.0.3     stringi_1.7.12      plyr_1.8.6         
[50] ggstance_0.3.5      grid_4.1.1          crayon_1.5.2        deldir_1.0-6        haven_2.5.2         splines_4.1.1       hms_1.1.3          
[57] knitr_1.42          pillar_1.9.0        reprex_2.0.1        glue_1.6.2          mitools_2.4         latticeExtra_0.6-30 modelr_0.1.8       
[64] png_0.1-7           vctrs_0.6.1         tzdb_0.3.0          cellranger_1.1.0    gtable_0.3.0        assertthat_0.2.1    xfun_0.38          
[71] broom_1.0.4         survey_4.1-1        filehash_2.4-3      cluster_2.1.2

Add argument to control level order within features

Add an argument level_order (or similar) that regulates whether feature levels are presented in final output in ascending or descending order within each feature. That will be reflected graphically later on with the base level either at the top or bottom of each feature.

Subgroup analysis on a factor with levels identical to feature levels produces wrong estimates

Moved from @m-jankowski at #13:

This is also a problem when using mm_diffs().

In my case, I wanted to conduct a subgroup analysis conditional on the gender of the respondents (labeled as "Male" or "Female"). The conjoint experiment, however, also contained levels with these labels ("Male" and "Female"). Particularly problematic is that mm_diffs() did not throw an error message, but returned wrong estimates without any warnings.

Here is an artificial example using the immigration data:

# Data
data("immigration")

# Create subgroups
immigration$ethnosplit <- cut(immigration$ethnocentrism, 2)

# Rename subgroup levels
immigration$subgroup <- as.factor(ifelse(as.numeric(immigration$ethnosplit) == 1, 
                                          "Female", 
                                          "Male"))

# Estimate correct MMs by subgroup
mm_correct <- cj(na.omit(immigration),
                 ChosenImmigrant ~ Gender + Education + LanguageSkills,
                 estimate = "mm",
                 id = ~ CaseID, 
                 by = ~ ethnosplit)

plot(mm_correct,
     group = "ethnosplit",
     vline = 0.5)

# Differences between subgroups

mmdiff_correct <- mm_diffs(na.omit(immigration), 
                   ChosenImmigrant ~ Gender + Education + LanguageSkills,
                   id = ~ CaseID, 
                   by = ~ ethnosplit)

plot(mmdiff_correct)

# Using subgroups with identical level names returns wrong estimates

mmdiff_problem <- mm_diffs(na.omit(immigration),
                   ChosenImmigrant ~ Gender + Education + LanguageSkills,
                   id = ~ CaseID, 
                   by = ~ subgroup)

plot(mmdiff_problem)

Same factor levels in different attributes

Hi Thomas,
I encountered a small problem with cj(): in case you have two levels across any attributes that have the same name, cj() will not work. While there are tweaks around this (of course), it is not very convenient. In my example, we simply have a candidate experiment in which the attributes are priorities and the attribute levels are whether the candidates have or not have these priorities. Hence, the attribute levels are "Is a priority" vs. "Is not a priority" and this is repeated for several attributes.
Could potentially be fixed.
Thanks so much for providing this great package!
Chris

Formally test for difference in effect sizes between attribute levels

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

Dear Thomas,

I have a question about how to formally test whether estimates for two attribute levels within one dimension are significantly different from each other. E.g., in the example below, is the estimated effect for a person from Iraq significantly different from the effect size for a person coming from China?

Thank you!

## load package
library("cregg")

## code goes here
data("immigration")
# estimating AMCEs with constraints
x <- amce(immigration, ChosenImmigrant ~ Gender + ReasonForApplication * CountryOfOrigin,
     id = ~CaseID)
plot(x)

## session info for your system
sessionInfo()

Reshaping functionality

Please specify whether your issue is about:

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

Add a function to reshape a "wide" conjoint dataset into a "long" conjoint dataset, with some flexibility over how that works.

Function to parse feature strings

In my conjoint example for Qualtrics, features levels are stored in a pipe-separated character vector. Would be nice to be able to auto-parse this and add to the original data.

Add functions for power analysis

Please specify whether your issue is about:

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

Power analysis and sample-size determination could be useful. Some draft code from Hainmueller et al. is here: https://www.dropbox.com/s/xdamgzwl60h1b6w/conjointpower.R?dl=0

Error in model.frame.default(weights, data = list(OUTCOME = c(1, 1, 1, : object is not a matrix

Cannot get past this error code and not sure what the issue is. Tried lots of things to match some of the formatting of the 'immigration' data set, but cant get anything to work. I stacked 4 data.frames exported from read.qualtrics where I added a group variable based off race/ethnic group.

Happy to send the data directly, but don't to post here.

Put your code here:

## load package
library("cregg")

model <- selected ~ Convenience + Crime + Housing.Cost + Neighborhood.Composition +
  Neighborhood.Quality + School.Quality + Type.of.Place

out <- cj(df, model, id = ~ Response.ID, estimate = "mm", by ~ group)


## session info for your system
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Add example of combining features

Please specify whether your issue is about:

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

It would be nice to demonstrate that if you want higher-order marginal means, you need to combine features together. So if you have two features with levels (A,B), (C,D), you need to create a new factor variable with levels (AC, AD, BC, BD). This is easily done with interaction().

Subgroup analyses

Rather than having to loop over respondent characteristics or profile constraints, it would be nice to have a function (or feature of existing functions) to do the analysis on a set of specified subgroups.

Add vignettes

Introduction

Variable importance metric?

Please specify whether your issue is about:

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

Is there a way to get an idea of "variable importance" using the cregg package? For example, one might look at the plotted marginal means and ask me, "Well, which variable had the biggest effect?" I was thinking maybe looking at the variance of the marginal means? By way of an example, in the code below, I added a noise variable to immigration. I would want noise to perform very poorly on this variable importance metric.

Another ludicrous example might be testing profiles of political candidates where we say their party and whether or not they like macaroni and cheese. We'd want some metric that says "Political party matters X times more than whether or not they like macaroni and cheese." Is that possible to derive from the output of cj()? I wasn't sure if there is some canonical way of approaching this.

library(cregg)
library(tidyverse)
data("immigration")

set.seed(1839)
immigration$noise <- factor(sample(letters[1:5], nrow(immigration), TRUE))

f <- ChosenImmigrant ~ Gender + Education +
  LanguageSkills + CountryOfOrigin + Job + JobExperience +
  JobPlans + ReasonForApplication + PriorEntry + noise

res <- cj(immigration, f, id = ~ CaseID)

# maybe?
res %>% 
  group_by(feature) %>% 
  summarise(est_var = var(estimate)) %>% 
  arrange(desc(est_var))

mm_diffs() works with with BY variable having > 2 levels

Need to update docs

cj_tidy does not work with tibbles

I was using the package, and I had an issue with cj_tidy. It took me some time but I found that the problem was with the reshaping function in line 27 of the code. My suggestion is to have a test to see if the dataset is a data.frame or not. Once I find the problem, it worked great!

Add variance tests

Please specify whether your issue is about:

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

The test suite currently only tests point estimate accuracy but it should also test variances.

Types of features should be validated.

Please specify whether your issue is about:

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

If you are reporting (1) a bug or (2) a question about code, please supply:

Right now package doesn't validate if features are factors. It leads to rather errors like this

## load package
library("cregg")

df <- tibble::tibble(
  choice = c(0, 1), f1 = c("a", "b"), f2 = c("d", "e"), id = c("1", "2")
)

mm(df, choice ~ f1 + f2, id = ~id)

## Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
## In addition: Warning messages:
## 1: Setting row names on a tibble is deprecated. 
## 2: Setting row names on a tibble is deprecated. 
## Called from: fix.by(by.y, y)

That's because make_term_levels_df blindly assumes that features are factors and ends up with a list of NULLs

lapply(df[c("f1", "f2")], levels)
## $f1
## NULL
## 
## $f2
## NULL

The problem affects at least amce, cj and mm.

A quick fix would be to simply test if that's the case:

if(!all(sapply(data[feature_names], is.factor))) stop("All features should be factors")

Alternatively one could attempt to fix the problem by explicitly casting to factors (maybe with a warning).

Color change

Please specify whether your issue is about:

a question about package functionality

See below for these things. I just copied and pasted the example for MM's from the package README.

I was curious if you could change the colors to gray scale, for example, instead of rainbow colors for journal publication? Couldn't figure out how to do this. Thanks!

Example here:

## load package
library("cregg")

data("immigration")

# taken from README
f1 <- ChosenImmigrant ~ Gender + Education + LanguageSkills + CountryOfOrigin + Job + JobExperience + JobPlans + ReasonForApplication + 
    PriorEntry
plot(mm(immigration, f1, id = ~CaseID), vline = 0.5) # change colors here?

## session info for your system
sessionInfo()

# OUTPUT:
### R version 3.5.2 (2018-12-20)
### Platform: x86_64-apple-darwin15.6.0 (64-bit)
### Running under: macOS Mojave 10.14.3

Weird issue with constraint in 'taxes' data

Please specify whether your issue is about:

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

Something weird going on here:

> amce(taxes, chose_plan ~ taxrate2 * taxrev)
Waiting for profiling to be done...
Error in coefs_tmp[!aliased, ] <- coef_summary : 
  number of items to replace is not a multiple of replacement length
> traceback()
2: get_coef_summary(mod = mod, data = data, id = id, alpha = alpha) at amce.R#135
1: amce(taxes, chose_plan ~ taxrate2 * taxrev)
> cj(taxes, chose_plan ~ taxrate2 * taxrev, estimate = "amce")
Waiting for profiling to be done...
Error in coefs_tmp[!aliased, ] <- coef_summary : 
  number of items to replace is not a multiple of replacement length
> traceback()
3: get_coef_summary(mod = mod, data = data, id = id, alpha = alpha) at amce.R#135
2: amce(data = data, formula = formula, id = id, weights = weights, 
       feature_order = feature_order, feature_labels = feature_labels, 
       level_order = level_order, ...) at cj.R#89
1: cj(taxes, chose_plan ~ taxrate2 * taxrev, estimate = "amce")

Seems to only occur with this feature combination, but not without constraint:

> amce(taxes, chose_plan ~ taxrate2 + taxrev)
Waiting for profiling to be done...
     outcome statistic                      feature       level    estimate   std.error         z            p        lower       upper
1 chose_plan      amce Tax rate for $10,000-$35,000  10-35k: 5%  0.00000000          NA        NA           NA           NA          NA
2 chose_plan      amce Tax rate for $10,000-$35,000 10-35k: 15%  0.02907463 0.008612538  3.375849 7.358823e-04  0.012194365  0.04595489
3 chose_plan      amce Tax rate for $10,000-$35,000 10-35k: 25%  0.01679525 0.010284125  1.633124 1.024430e-01 -0.003361265  0.03695176
4 chose_plan      amce Tax rate for $10,000-$35,000 10-35k: 35% -0.04091056 0.011919199 -3.432324 5.984314e-04 -0.064271757 -0.01754936
5 chose_plan      amce                  Tax revenue        <75%  0.00000000          NA        NA           NA           NA          NA
6 chose_plan      amce                  Tax revenue      75-95% -0.04588052 0.008989190 -5.103965 3.326091e-07 -0.063499005 -0.02826203
7 chose_plan      amce                  Tax revenue     95-105% -0.05998096 0.011307260 -5.304641 1.128948e-07 -0.082142777 -0.03781913
8 chose_plan      amce                  Tax revenue    105-125% -0.07391775 0.011029219 -6.701994 2.055952e-11 -0.095534623 -0.05230088
9 chose_plan      amce                  Tax revenue       >125% -0.12287030 0.012610874 -9.743202 1.972386e-22 -0.147587158 -0.09815344

Doesn't happen with unconstrained features:

> cj(taxes, chose_plan ~ taxrate1 * taxrev, estimate = "amce")
Waiting for profiling to be done...
     outcome statistic               feature     level     estimate   std.error           z            p        lower        upper
1 chose_plan      amce Tax rate for <$10,000  <10k: 0%  0.000000000          NA          NA           NA           NA           NA
2 chose_plan      amce Tax rate for <$10,000  <10k: 5%  0.006844784 0.008657404   0.7906278 4.291612e-01 -0.007395378  0.021084946
3 chose_plan      amce Tax rate for <$10,000 <10k: 15% -0.050383031 0.008578737  -5.8730128 4.279453e-09 -0.064493798 -0.036272265
4 chose_plan      amce Tax rate for <$10,000 <10k: 25% -0.167008614 0.009084111 -18.3846949 1.742172e-75 -0.181950648 -0.152066581
5 chose_plan      amce           Tax revenue      <75%  0.000000000          NA          NA           NA           NA           NA
6 chose_plan      amce           Tax revenue    75-95% -0.002781705 0.009200863  -0.3023309 7.623998e-01 -0.017915778  0.012352367
7 chose_plan      amce           Tax revenue   95-105% -0.011051298 0.010595316  -1.0430362 2.969315e-01 -0.028479042  0.006376446
8 chose_plan      amce           Tax revenue  105-125% -0.034789897 0.009172065  -3.7930278 1.488215e-04 -0.049876602 -0.019703193
9 chose_plan      amce           Tax revenue     >125% -0.072428072 0.010124390  -7.1538211 8.439503e-13 -0.089081211 -0.055774933

Doesn't occur with immigration:

> cj(immigration, ChosenImmigrant ~ Education * Job, id = ~ CaseID)
           outcome statistic                feature               level     estimate  std.error          z            p        lower      upper
1  ChosenImmigrant      amce Educational Attainment           No Formal  0.000000000         NA         NA           NA           NA         NA
2  ChosenImmigrant      amce Educational Attainment           4th Grade  0.032242795 0.01564257  2.0612208 3.928198e-02  0.006513054 0.05797254
3  ChosenImmigrant      amce Educational Attainment           8th Grade  0.055971117 0.01568247  3.5690243 3.583132e-04  0.030175749 0.08176649
4  ChosenImmigrant      amce Educational Attainment         High School  0.117940436 0.01569985  7.5122007 5.814154e-14  0.092116478 0.14376439
5  ChosenImmigrant      amce Educational Attainment    Two-Year College  0.164346505 0.02408703  6.8230296 8.914028e-12  0.124726870 0.20396614
6  ChosenImmigrant      amce Educational Attainment      College Degree  0.191558995 0.02414755  7.9328533 2.141674e-15  0.151839805 0.23127818
7  ChosenImmigrant      amce Educational Attainment     Graduate Degree  0.178682719 0.01774300 10.0706030 7.451979e-24  0.149498080 0.20786736
8  ChosenImmigrant      amce                    Job             Janitor  0.000000000         NA         NA           NA           NA         NA
9  ChosenImmigrant      amce                    Job              Waiter -0.004555776 0.01742414 -0.2614635 7.937351e-01 -0.033215933 0.02410438
10 ChosenImmigrant      amce                    Job Child Care Provider  0.013684131 0.01745927  0.7837744 4.331725e-01 -0.015033819 0.04240208
11 ChosenImmigrant      amce                    Job            Gardener  0.015863618 0.01760330  0.9011731 3.674963e-01 -0.013091228 0.04481846
12 ChosenImmigrant      amce                    Job   Financial Analyst  0.065638780 0.03093098  2.1221049 3.382893e-02  0.014761848 0.11651571
13 ChosenImmigrant      amce                    Job Construction Worker  0.033689581 0.01758459  1.9158585 5.538311e-02  0.004765508 0.06261365
14 ChosenImmigrant      amce                    Job             Teacher  0.074107007 0.01752927  4.2276146 2.361819e-05  0.045273920 0.10294009
15 ChosenImmigrant      amce                    Job Computer Programmer  0.078108991 0.03011503  2.5936881 9.495257e-03  0.028574178 0.12764380
16 ChosenImmigrant      amce                    Job               Nurse  0.089129556 0.01748420  5.0977211 3.437667e-07  0.060370613 0.11788850
17 ChosenImmigrant      amce                    Job  Research Scientist  0.135076380 0.03014113  4.4814640 7.413274e-06  0.085498637 0.18465412
18 ChosenImmigrant      amce                    Job              Doctor  0.166308291 0.03004119  5.5360091 3.094415e-08  0.116894933 0.21572165

What's going on here?

Fix quotes in 'taxes' documentation

dQuote isn't escaped properly in the documentation for the taxes dataset.

1 - 2*alpha confidence intervals returned by mm_diffs()

Please specify whether your issue is about:

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

It seems that there is an issue with calculation of confidence intervals in mm_diffs. Instead of returning a 1 - alpha confidence interval, a 1 - 2*alpha confidence interval is currently returned.
In lines 71-72 of mm_diffs(), the confidence intervals are calculated so:

        # CIs
        mm_split[[i]][["lower"]] <- mm_split[[i]][["estimate"]] - (stats::qnorm(1-alpha) * mm_split[[i]][["std.error"]])
        mm_split[[i]][["upper"]] <- mm_split[[i]][["estimate"]] + (stats::qnorm(1-alpha) * mm_split[[i]][["std.error"]])

But the function documentation specifies that alpha is a "A numeric value indicating the significance level at which to calculate confidence intervals for the MMs (by default 0.95, meaning 95-percent CIs are returned)." The default of alpha = 0.05 gives a 90-percent confidence interval as of right now. I'm sure the solution is obvious to you; changing alpha to alpha/2 in both cases above would fix the issue. Or adding + (alpha/2) within qnorm(), which seems to be what is done in mm() and amce_diffs()

This is mainly a problem for subsequent plotting, I think.

Thank you for this package, it's awesome!

Put your code here:

## load package
library("cregg")

data("immigration")
alpha <- .05
# Differences in MMs by Gender feature
temp <- mm_diffs(immigration, 
                 ChosenImmigrant ~ LanguageSkills + Education,
                 ~ Gender, id = ~ CaseID,
                 alpha = alpha)

#confidence interval from mm_diffs()
temp[, c("lower", "upper")]

#95% confidence interval calculated "by hand"
data.frame(lower = temp$estimate - qnorm(1 - alpha/2) * temp$std.error,
           upper = temp$estimate + qnorm(1 - alpha/2) * temp$std.error)


## session info for your system
sessionInfo()

AMEs and AMIEs

Please specify whether your issue is about:

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

It could be cool to implement tidy AME (of features not just AMCEs of levels) and AMIE (interactions again between features) estimation via FindIt (see https://amstat.tandfonline.com/doi/full/10.1080/01621459.2018.1476246#.XXJKGi-ZMWo and https://CRAN.R-project.org/package=FindIt). I am going to fork the repo and attempt to write these functions so they're consistent with the rest of cregg, but I understand this might be beyond the intended scope of the package and therefore not merged into the master. Also understand that these aren't yet widely used or recognised estimands. Might take me a while.

Add option to print estimates on figures

Please specify whether your issue is about:

a possible bug
a question about package functionality
a suggested code or documentation change, improvement to the code, or feature request

Currently one would have to add a geom_text() with some messy sprintf() to get numeric values added. This could be an option or the default.

Please make CREGG Compatibility with MICE

Please specify whether your issue is about:

a possible bug
a question about package functionality
[YES] a suggested code or documentation change, improvement to the code, or feature request

I'd like to be able to run cregg on multiple datasets in mice's mira object. OR, I would appreciate help in combining multiple cregg results datasets into a mira object. Ultimately, I'd like to use the pool function on multiple cregg supplied results.

leeper / cregg Goto Github PK

cregg's People

Contributors

Stargazers

Watchers

Forkers

cregg's Issues

Recommend Projects

Recommend Topics

Recommend Org