Git Product home page Git Product logo

manyecoevo's Introduction

DOI License: GPL v3 Contributor Covenant

ManyEcoEvo

The ManyEcoEvo package provides a suite of functions for:

  • Summarising, analysing and visualising the ManyEcoEvo dataset from Gould et al.1
  • Tidying and cleaning many-analyst style data for further analysis
  • Reproducing the analysis in Gould et al. using your own many-analyst style data

Note that the manuscript source-code for Gould et al. is located in a separate repository at https://github.com/egouldo/ManyAnalysts, which can be viewed at https://egouldo.github.io/ManyAnalysts/.

Installation

ManyEcoEvo:: can be installed using devtools:: from GitHub with:

devtools::install_github("egouldo/ManyEcoEvo")

Regenerating the ManyEcoEvo dataset

The data processing and analysis can be freely reproduced with the help of the targets:: package. Please see the documentation at https://docs.ropensci.org/targets/ for further detail.

Should you wish to completely reproduce the dataset generation and analysis in Gould et al., complete the following steps:

  1. Clone or download https://github.com/egouldo/this repository
  2. Run renv::restore() to load the packages used in the analysis pipeline locally on your machine (see renv:: for details)
  3. Run targets::tar_destroy() to remove any record and caches of existing targets
  4. Run targets::tar_make() in your console, depending on the power of your machine, the analysis pipeline will take between 2 and 7 minutes to execute (plus or minus some!)
  5. You can view a table of all targets in the pipeline by running targets::tar_meta()
  6. To interact with objects or ‘targets’ within the analysis pipeline, call targest::tar_load() or targets::tar_read():
targets::tar_read("ManyEcoEvo")
# A tibble: 2 × 4
  dataset    data                diversity_data      estimate_type
  <chr>      <list>              <named list>        <chr>        
1 blue tit   <tibble [174 × 38]> <tibble [174 × 54]> Zr           
2 eucalyptus <tibble [128 × 38]> <tibble [128 × 61]> Zr           

The script that generates datasets used in the ManyEcoEvo:: package is located in ManyEcoEvo/data-raw/tar_make.R.

License

This software is licensed with the GNU GPL 3 license.

Contributing

This package is released with a Contributor Code of Conduct.

Citation

To cite package ‘ManyEcoEvo’ in publications use:

  Gould E, Fraser H, Nakagawa S, Parker T (2023). _ManyEcoEvo: Meta-analyse
  data from 'Many-Analysts' style studies_. R package version 1.0.0,
  <https://github.com/egouldo/ManyEcoEvo>.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {ManyEcoEvo: Meta-analyse data from 'Many-Analysts' style studies},
    author = {Elliot Gould and Hannah S. Fraser and Shinichi Nakagawa and Timothy H. Parker},
    year = {2023},
    note = {R package version 1.0.0},
    url = {https://github.com/egouldo/ManyEcoEvo},
  }

Footnotes

  1. Gould, E., Fraser, H., Parker, T. et al. (2023). Same data, different analysts: Variation in effect sizes due to analytical decisions in ecology and evolutionary biology [Preprint]. Ecology and Evolutionary Biology. https://doi.org/10.32942/X2GG62

manyecoevo's People

Contributors

egouldo avatar hannahsfraser avatar parkerth avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

manyecoevo's Issues

roxygen example code error fit_metafor_mv

  • checking examples ... ERROR
    Running examples in ‘ManyEcoEvo-Ex.R’ failed
    The error most likely occurred in:

Name: fit_metafor_mv

Title: Fit Multivariate Metaregression using metafoR

Aliases: fit_metafor_mv

** Examples

#TODO -- is this the best way of setting up this fun?? (i.e. to take numeric vectors)?

Example Usage:

library(tidyverse);library(targets);library(metafor) # NOT RUN, TODO: remove after create pkg

source("R/functions.R") #NOT RUN, TODO: remove after create pkg

tar_read(round_2_survey_meta_analysis) %>%

  • filter(dataset == "eucalyptus") %>%
  • filter(!is.na(Zr),
  •      !is.na(VZr),
    
  •      !is.infinite(Zr),
    
  •      !is.infinite(VZr)) %>%
    
  • fit_metafor_mv(estimate = .$Zr, variance = .$VZr, estimate_type = "Zr", data = .)

── Fitting multivariate metaregression ──

Error in match.arg(method) : object 'VZr' not found
Calls: %>% ... fit_metafor_mv -> -> filter -> match.arg
Execution halted

Change processing of prediction data to allow csv files without reported CI's

Previously we had required reported CI's from analysts, however we recently agreed that we would use the CI's from the simulations. Currently, after addressing egouldo/ManyAnalysts#163 there is one analysis without reported CI's that is throwing an error inside the function:

rename_prediction_cols() called inside convert_predictions() within standardise_response() and then in prepare_response_variables() in the creation of target ManyEcoEvo_yi_results.


Previous commit ae8be9a fixed previous error, but this analysis is still causing issues further downstream:

! Error running targets::tar_make()
  Target errors: targets::tar_meta(fields = error, complete_only = TRUE)
  Tips: https://books.ropensci.org/targets/debugging.html
  Last error: Problem while computing `data = purrr::map2(...)`.The error occurred in group 24: response_id = "R_2V7qaLEfdbgUGg3",
  submission_id = 1, analysis_id = 1, split_id = 1.
Caused by error in `dplyr::mutate()`:
! Problem while computing `back_transformed_data = purrr::pmap(...)`.The error occurred in group 24: response_id = "R_2V7qaLEfdbgUGg3",
  submission_id = 1, analysis_id = 1, split_id = 1.
Caused by error in `data.table::setnames()`:
! 'old' is length 4 but 'new' is length 3

I suspect because its augmented_data is missing the confidence intervals:

Local .Rprofile detected at /Users/egould/Documents/code/ManyAnalysts/.Rprofile

library(targets)
library(tidyverse)
library(ManyAnalysts)

tar_load(ManyEcoEvo_yi)

ManyEcoEvo_yi %>% 
  filter(dataset=="eucalyptus") %>% 
  pluck("data", 1) %>% 
  filter(response_id == "R_2V7qaLEfdbgUGg3") %>% 
  pull("augmented_data")
#> $augmented_data_73780bf7
#> # A tibble: 3 × 4
#>      V1 SurveyID   fit se.fit
#>   <int> <chr>    <dbl>  <dbl>
#> 1     1 Q1       0.451  0.204
#> 2     2 Q2       0.768  0.210
#> 3     3 Q3       0.881  0.265

Created on 2023-03-22 by the reprex package (v2.0.1)

Originally posted by @egouldo in https://github.com/egouldo/ManyAnalysts/issues/163#issuecomment-1478982292

6 missing analyses in master_data cf. list_of_new_prediction_files

Hi Hannah, I found some missing analyses in master_data, do you think you could take a look and help me figure out what's going on? The response_id's seem to be in there but the exact submission, analysis and split id's in list_of_new_csv_files don't seem to be present in master_data.

PS you'll need to be on pipeline-anon branch!

image
missing_analyses <- c("R_CfUlQXg434XEESd", "R_3rIdpCqsQtsmgqT", "R_1LRqq2WHrQaENtM")

NAMESPACEs in Imports field not imported

See https://stackoverflow.com/questions/54039992/namespaces-in-imports-field-not-imported-from-all-declared-imports-should-be-us

In R CMD CHECK log:

* checking dependencies in R code ... WARNING
'::' or ':::' imports not declared from:
  ‘broom.mixed’ ‘cli’ ‘data.table’ ‘forcats’ ‘fs’ ‘ggeffects’
  ‘ggforestplot’ ‘glue’ ‘here’ ‘lme4’ ‘magick’ ‘metafor’ ‘metaviz’
  ‘naniar’ ‘parameters’ ‘parsnip’ ‘performance’ ‘pracma’ ‘progress’
  ‘readr’ ‘readxl’ ‘recipes’ ‘sae’ ‘stringr’ ‘timetk’ ‘workflows’
Namespace in Imports field not imported from: ‘tidyselect’
  All declared Imports should be used.

Code analyses still coded as 'CHECK'

Hi Hannah or @parkerth,

Could you please check the master_data file (at "data-raw/anonymised_data/master_data.csv") and code as necessary or mark as NA if can't resolve? there are a few analyses remaining with values 'CHECK':

response_id submission_id analysis_id split_id test_variable Bayesian linear_model model_subclass exclusions_effect_analysis Conclusion data_cleaning_preprocessing_tool data_cleaning_preprocessing_version data_analysis_tool data_analysis_version
R_11787O3NmejXKAH 1 2 2 net_rearing_manipulation 0 generalised standard exclude_all CHECK R NA R NA
R_11787O3NmejXKAH 1 2 3 net_rearing_manipulation 0 generalised standard exclude_all CHECK R NA R NA
R_11787O3NmejXKAH 1 2 1 net_rearing_manipulation 0 generalised standard exclude_all CHECK R NA R NA
R_1eXlFKlQdiD2F59 2 2 1 rear_Cs_at_start_of_rearing 0 linear standard retain CHECK CHECK CHECK CHECK CHECK
R_1GJlffAgZv6SY4y 1 1 1 NA CHECK CHECK standard exclude_all none_c R 4.0.0 R 4.0.0
R_1M0cMZL2IPYWHoi 1 1 1 NA CHECK CHECK CHECK exclude_all CHECK CHECK CHECK CHECK CHECK
R_1M0cMZL2IPYWHoi 1 1 1 NA 1 generalised CHECK exclude_all NA R 3.6.3 R 3.6.3
R_1QlnXdW5tKuUQIr 1 1 1 CHECK CHECK CHECK CHECK CHECK neg_q R 3.6.1 R 3.6.1
R_21gmMa0uclrNoTP 2 1 1 NA CHECK linear CHECK exclude_all CHECK CHECK CHECK CHECK CHECK
R_2Pjoz1X4q5XRClO 2 1 1 NA CHECK CHECK standard exclude_all CHECK CHECK CHECK CHECK CHECK
R_2zNKAmJcWbM4QtY 1 1 1 NA 0 CHECK standard retain none_q R 3.6.1 R 3.6.1
R_3EbbZxcQ3gctVZu 1 1 1 NA 0 CHECK hurdle exclude_all neg_q R 3.6.1 R 3.6.1
R_3Kvy0h01LXHWniT 2 2 1 NA 0 generalised standard retain CHECK R NA R NA
R_3nBCE4hMLh5s3qt 3 1 2 NA CHECK CHECK standard NA CHECK CHECK CHECK CHECK CHECK
R_3nBCE4hMLh5s3qt 3 1 1 NA CHECK CHECK standard NA CHECK CHECK CHECK CHECK CHECK
R_AzL6RdNTHtPjxzX 1 1 2 NA CHECK CHECK standard exclude_all CHECK CHECK CHECK CHECK CHECK
R_es2jrrN9CTGwl5D 1 1 1 NA CHECK CHECK CHECK exclude_all neg_q R 4.0.0 R 4.0.0

Constructed Variables are ignored when standardising (no average values in param table)

  • merge HF's code into data-raw script osf_load_analyst_datasets.R (this script is the data-raw script that creates euc_data and blue_tit_data
  • Rerun osf_load_analyst_datasets.R so sysdata objects are are-written with use_data() calls.
  • Re-run analysis_data_param_tables.R which creates internal object analysis_data_param_tables containing mean and sd for both BT and EUC datasets for all variables used but all analysts
  • Rebuild package
  • Rerun targets to incorporate changes

Update manuscript: summary section

  • Take the first paragraph of the summary section and insert it in the current 'statement of need' section first paragraph, before the word moreover (and tie it in)
  • take the third paragraph of the summary section and insert it into paragraph 2 of the 'statement of need' after the phrase "computationally reproduce our analysis"
  • get rid of the rest of the summary section
  • rename the statement of needs section 'introduction'

Write vignettes

Write vignettes (these dont seem to contain any content - the html files say they dont exist and on the github there's only placeholder content)

Ensure proper use of targets within pkg

  • remove ManyEcoEvo from pkgs vector
  • Remove all namespacing in targets code
  • May also need to remove any namespacing within R/ funs
  • declare ManyEcoEvo in tar_option_set() imports arg
  • tar destroy and rerun data-raw.R

duplicated id_col

I found an incorrectly duplicated id_col in master_data.csv for two separate analyses from the same submission for the same team, one blue tit and one eucalyptus. One will need to be recoded in response_id, submission_id and analysis_id and split_id columns.

See details in reprex below:

library(tidyverse)
library(here)
#> here() starts at /Users/elliotgould/Documents/GitHub/ManyAnalysts
library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test
library(ManyEcoEvo)

prepare_df_for_summarising <- function(data){
  data %>% mutate(across(.cols = c(num_fixed_variables,
                                   num_random_variables,
                                   sample_size,
                                   num_interactions,
                                   Bayesian, #NA's coming from CHECK values
                                   mixed_model,
                                   num_fixed_effects,
                                   num_random_effects), 
                         as.numeric),
                  lm = ifelse(linear_model == "linear", 1, 0),
                  glm = ifelse(linear_model == "generalised", 1, 0))
}

Master <- ManyEcoEvo %>% 
  select(data) %>% unnest(everything()) %>% 
  prepare_df_for_summarising() #NAs ok, caused by CHECK vals, not yet using THP's fixes
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `across(...)`.
#> Caused by warning:
#> ! NAs introduced by coercion

Note that we are getting an unexpected many to many relationship here, as per the warning above.

predictions <- read_csv(here::here("ms/predictions_Ids.csv")) %>% #TODO ask HF source
  distinct() %>% 
  left_join(Master, by = c("id_col")) %>% 
  prepare_df_for_summarising()
#> Rows: 258 Columns: 1
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): id_col
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

There are duplicate entries for one id_col, let’s identify these analyses:

predictions %>% janitor::get_dupes("id_col") %>% 
  select(id_col, ends_with("_id"), TeamIdentifier) %>% 
  knitr::kable()
id_col response_id submission_id analysis_id split_id TeamIdentifier
Byrock-1-8-1 R_3qfD5ZHHdBbTgk3 1 8 1 Byrock
Byrock-1-8-1 R_3HzSBqQTAmJJ9ye 1 8 1 Byrock

It seems that there are two separate response_id entries for this Team,
However, they are both coded with the same id_col.
let’s see which columns have values that are duplicated:

duplicated_variables <- 
  predictions %>% select(-review_data) %>% 
  janitor::get_dupes("id_col") %>% 
  summarise(id_col = unique(id_col), across(-all_of("id_col"), 
                   ~ first(.x) == last(.x))) %>% 
  select(id_col, where(isFALSE))

predictions %>% 
  semi_join(duplicated_variables, by = join_by("id_col")) %>% 
  select(id_col, colnames(duplicated_variables)) %>% 
  knitr::kable()
id_col response_id beta_estimate adjusted_df beta_SE transformation link_function_reported dataset mixed_model response_variable_name response_id_S2 sample_size linear_model exclusions_all Conclusion lm glm
Byrock-1-8-1 R_3qfD5ZHHdBbTgk3 -0.065490 458.3576 0.014100 identity identity blue tit 1 day_14_weight R_3qfD5ZHHdBbTgk3 3720 linear exclude neg_c 1 0
Byrock-1-8-1 R_3HzSBqQTAmJJ9ye -0.028464 345.0000 0.025721 log log eucalyptus 0 euc_sdlgs0_50cm R_3HzSBqQTAmJJ9ye 350 generalised retain neg_q 0 1

OK there is one for both Eucalyptus and for Blue tit, So the split_id is coded incorrectly as these are clearly separate analyses.
I can see that this id is also assigned to different response_id’s, i.e. from different submissions.
I note that in the file prediction_IDs.csv there are three duplicated entries for this id_col.
We should make sure that there isn’t a third analysis somewhere that is also duplicated in id_col.
Would be helpful to know how Hannah created this dataset.
OK, I also note that for response_id R_3HzSBqQTAmJJ9ye There are three entries in
predictions_validations_worksheet.csv belonging to this response_id. So that’s why there are multiple
entries in predictions_IDs.csv.
The submission, analysis and split ID columns in that data file are:

  • 1-8-1
  • 2-9-1
  • 3-10-1
    The predictions object here is created also from the Master object or ManyEcoEvo::ManyEcoEvo.
    Which comes from the master_data.csv file.
    Let’s look at that to see if that’s potentially the source of the problem:
Master %>%  
  filter(TeamIdentifier == "Byrock") %>% 
  select(id_col, dataset, all_of(ends_with("_id"))) %>% 
  distinct() %>% 
  janitor::get_dupes("id_col")
#> # A tibble: 2 × 7
#>   id_col       dupe_count dataset response_id submission_id analysis_id split_id
#>   <chr>             <int> <chr>   <chr>               <dbl>       <dbl>    <dbl>
#> 1 Byrock-1-8-1          2 blue t… R_3qfD5ZHH…             1           8        1
#> 2 Byrock-1-8-1          2 eucaly… R_3HzSBqQT…             1           8        1

Yes, different response_id for the same id_col for analyses of diff. datasets.
Let’s check check the raw data file. Here’s the reprex output I ran over at ManyEcoEvo:
```md *Local.Rprofiledetected at/Users/elliotgould/Documents/GitHub/ManyEcoEvo/.Rprofile`*

library(targets)
library(tidyverse)

There are no extra prediction file submissions for these analyses, so that’s not a problem.

tar_read(list_of_new_prediction_files) %>% 
  filter(response_id == "R_3qfD5ZHHdBbTgk3" | response_id == "R_3HzSBqQTAmJJ9ye") %>% 
  select(dataset, ends_with("_id"), csv_number) 
#> # A tibble: 0 × 6
#> # ℹ 6 variables: dataset <chr>, response_id <chr>, submission_id <dbl>,
#> #   analysis_id <dbl>, split_id <dbl>, csv_number <dbl>

Let’s check the underlying master_data:

readr::read_csv("data-raw/anonymised_data/master_data.csv") %>% 
  filter(TeamIdentifier == "Byrock") %>% 
  select(id_col, dataset, all_of(ends_with("_id"))) %>% 
  knitr::kable()
#> Rows: 302 Columns: 154
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (135): response_id, id_col, contrast, transformation, link_function_repo...
#> dbl  (16): submission_id, analysis_id, split_id, beta_estimate, adjusted_df,...
#> lgl   (3): Extra-pair_dad_ring, rear_Cs_out, rear_Cs_in
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
id_col dataset response_id submission_id analysis_id split_id hatch_nest_breed_ID rear_nest_breed_ID
Byrock-1-3-1 eucalyptus R_23UKvhBc7D608VO 1 3 1 NA NA
Byrock-5-7-1 eucalyptus R_23UKvhBc7D608VO 5 7 1 NA NA
Byrock-4-6-1 eucalyptus R_23UKvhBc7D608VO 4 6 1 NA NA
Byrock-2-4-1 eucalyptus R_23UKvhBc7D608VO 2 4 1 NA NA
Byrock-3-5-1 eucalyptus R_23UKvhBc7D608VO 3 5 1 NA NA
Byrock-3-10-1 eucalyptus R_3HzSBqQTAmJJ9ye 3 10 1 NA NA
Byrock-1-8-1 eucalyptus R_3HzSBqQTAmJJ9ye 1 8 1 NA NA
Byrock-2-9-1 eucalyptus R_3HzSBqQTAmJJ9ye 2 9 1 NA NA
Byrock-1-1-1 blue tit R_3iKJrflQwwxsps0 1 1 1 NA rear_nest_breed_ID
Byrock-2-2-1 blue tit R_3iKJrflQwwxsps0 2 2 1 NA rear_nest_breed_ID
Byrock-1-8-1 blue tit R_3qfD5ZHHdBbTgk3 1 8 1 NA rear_nest_breed_ID

Yes, this must be the source of the issue. Two 1-8-1 entries.

Created on 2024-06-18 with reprex v2.1.0


Created on 2024-06-18 with reprex v2.1.0

erroneous validation for submission data with predicted values as 0

Local .Rprofile detected at /Users/egould/code/ManyAnalysts/.Rprofile

library(targets)
library(pointblank)
library(tidyverse)
#> Warning: package 'tidyr' was built under R version 4.0.5
#> Warning: package 'dplyr' was built under R version 4.0.5
source("R/functions.R")
tar_load(all_prediction_data)

all_prediction_data %>% 
  pull("checks") %>% 
  discard(rlang::is_na) %>% 
  map_dfr(filter, f_pass < 1, stringr::str_detect(columns, "estimate|fit"))
#> # A tibble: 2 × 14
#>       i type  columns values precon active eval  units n_pass f_pass W     S    
#>   <int> <chr> <chr>   <chr>  <chr>  <lgl>  <chr> <dbl>  <dbl>  <dbl> <lgl> <lgl>
#> 1     7 col_… fit     <NA>   <NA>   TRUE   OK        1      0      0 TRUE  NA   
#> 2     3 col_… se.fit  <NA>   <NA>   TRUE   OK        1      0      0 TRUE  NA   
#> # … with 2 more variables: N <lgl>, extract <int>

all_prediction_data %>% 
  mutate(validation_fail = modify_if(.x = checks, 
                                    .p = negate(rlang::is_na),
                                    .f = ~ filter(.x, f_pass < 1, 
                                                  stringr::str_detect(columns, "estimate|fit")) %>% 
                                      nrow(.)) %>% 
           flatten_dbl(.) %>% 
           as.logical(.)) %>% 
  filter(validation_fail == TRUE) %>% pull(augmented_data,name = response_id)
#> $R_3EoS37IOWKLRxDV
#> # A tibble: 3 × 5
#>   SurveyID   fit se.fit ci.low ci.hi
#>   <chr>    <int>  <dbl>  <int> <int>
#> 1 Q1           0   1.41      0     2
#> 2 Q2           0   5.30      0     6
#> 3 Q3           0  24.3       0    26
#> 
#> $R_3EYx3CfuTZ7YNmF
#> # A tibble: 3 × 4
#>   SurveyID   fit ci.low ci.hi
#>   <chr>    <dbl>  <dbl> <dbl>
#> 1 Q1       0.968  0.654 1.28 
#> 2 Q2       0.382  0.281 0.482
#> 3 Q3       1.18   0.715 1.64

Created on 2022-06-03 by the reprex package (v2.0.1)

Validation has correctly identified R_3EYx3CfuTZ7YNmF to have failed, BUT R_3EoS37IOWKLRxDV should have passed validation... It seems like the 0's are being parsed as col_integer() rather than as col_double() during read_submission_data()

Add error handling to convert_predictions() when input df fails validation

So now what... what's correct behaviour?

  • Probably should add error handling when required columns / types aren't present in augmented_data arg for convert_predictions()...

Originally posted by @egouldo in https://github.com/egouldo/ManyAnalysts/issues/117#issuecomment-1145595743

Currently, this is being managed by dropping any datasets that have failed validation. This is OK for now, but is difficult to figure out what's causing the error, if triggered.

Missing variables to be coded in mean/sd parameter lookup table

  • Missing variables from ManyAnalysts:::analysis_data_param_tables
  response_id       submission_id analysis_id split_id response_variable_name      
  <chr>                     <dbl>       <dbl>    <dbl> <chr>                       
1 R_3dWjjdkDbTRCUqs             1           1        1 SMI                         
2 R_3eemYAT4onCCt0N             1           1        1 residual_day14_weight       
3 R_PvSAhRE12PvmEJb             1           1        1 SMI                         
4 R_SDyCfmAPuL4gcKZ             5           5        1 residual_day_14_weight_males

And also: euc_sdlgs50cm_2m

Originally posted by @egouldo in https://github.com/egouldo/ManyAnalysts/issues/162#issuecomment-1423526812

  • @hannahsfraser to Edit script /data-raw/analysis_datasets/osf_load_analyst_dataset.R, calculating variables SMI and residual_day14_weight and residual_day_14_weight_males

  • @egouldo ManyAnalysts::blue_tit_data & ManyAnalysts::euc_data to be regenerated by running /data-raw/analysis_datasets/osf_load_analyst_dataset.R

  • @egouldo Rebuild package (will update ManyAnalysts::blue_tit_data & ManyAnalysts::euc_data)

  • @egouldo In /data-raw/ run script: analysis_data_param_tables (which will run functions make_param_table on both ManyAnalysts::blue_tit_data and ManyAnalysts::euc_data.

  • @egouldo Rebuild package (will update analysis_data_param_tables)

  • @egouldo Rerun pipeline and check if missing back_transformed_data tables for the above cases are resolved egouldo/ManyAnalysts#162

  • 🗒️ Related to #82

warning messages on load data after library package

Warning messages:
1: Unknown or uninitialised column: anonymise_teams.
2: Unknown or uninitialised column: assign_transformation_type.
3: Unknown or uninitialised column: est_to_zr.
4: Unknown or uninitialised column: %nin%.
5: Unknown or uninitialised column: count_binary_coded_features.
6: Unknown or uninitialised column: square_back.
7: Unknown or uninitialised column: plot_model_means_box_cox_cat.
8: Unknown or uninitialised column: standardise_response.
9: Unknown or uninitialised column: prepare_diversity_summary_data.
10: Unknown or uninitialised column: summarise_study.

See this thread https://stackoverflow.com/questions/39041115/fixing-a-multiple-warning-unknown-column for examples where other users have encountered this problem.

I suspect it's either mapped mutate calls causing this warning or else either readr::

Collinearity subset analysis does not subset correct list-column of df's

list-col effects_analysis is not being subset, data is. Function is applied after other pre-processing to make ManyEcoEvo::ManyEcoEvo_results. Downstream analyses use effects_analysis as the input list-col of df's, however.

library(ManyEcoEvo)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)

pull_df <- function(x,y){
  x %>% 
    filter(dataset == "blue tit", 
           publishable_subset == "All", 
           expertise_subset == "All", 
           exclusion_set == "complete") %>% 
    pull({{y}})
}

ManyEcoEvo::ManyEcoEvo_results %>% pull_df(data) %>% map(dim)
#> $subset_complete
#> [1] 131  40
#> 
#> $subset_complete
#> [1] 119  40
ManyEcoEvo::ManyEcoEvo_results %>% pull_df(effects_analysis) %>% map(dim)
#> [[1]]
#> [1] 131  48
#> 
#> [[2]]
#> [1] 131  48

Created on 2024-06-14 with reprex v2.1.0

mutate(data = map(.x = data,
.f = dplyr::anti_join, collinearity_subset,
by = join_by(response_id, id_col, dataset) )) %>%

rm analyses marked as 'exclude' from analysis pipeline prior to prepare_ManyEcoEvo_*()

  • Missing response_variable_name so data can't be found in param lookup table, all are marked with exclude,
  • which leads me to reconsider that we should withdraw all excluded analyses prior to running them through the pipeline.
  • Update both _Zr and yi pipelines to ensure that all excluded analyses to not go through analysis pipeline, but are removed. We still need to be able to see these in either master_data_raw or else in ManyEcoEvo_* however... because we will want to provide details on how many analyses were excluded in each dataset for the manuscript!
response_id       submission_id analysis_id split_id response_variable_name exclusions_all
  <chr>                     <dbl>       <dbl>    <dbl> <chr>                  <chr>         
1 R_1E6WYqro4x6Nnz8             1           1        1 NA                     exclude       
2 R_1GEw0p99dpeSW3E             3           3        1 NA                     exclude       
3 R_1GEw0p99dpeSW3E             4           4        1 NA                     exclude       
4 R_CfUlQXg434XEESd             1           1        1 NA                     exclude       
5 R_PvSAhRE12PvmEJb             3           3        1 NA                     exclude       

Originally posted by @egouldo in https://github.com/egouldo/ManyAnalysts/issues/162#issuecomment-1423526812

functionalise manuscript code in SM3

  • fit_MA_mv
  • plot_forest
  • plot_forest_2 (and what's the difference)?
  • consider adding create_model_workflow, could wish to leave in manuscript
  • possibly_check_convergence_glm
  • plot_model_means_RE
  • walk_plot_effects_diversity
  • logged Euc analysis
  • Any calls to parameters::parameters to be added into make_viz()
  • add multivariate model fun and execution egouldo/ManyAnalysts#44

query analyses with NULL review_data

Some rows of analyst data in ManyEcoEvo (targets target) have NULL for review data. Need to check that this is OK / expected, and not the result of an error where the review data is lost / missing at some point.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.