egouldo / manyecoevo Goto Github PK

View Code? Open in Web Editor NEW

2.0 4.0 0.0 309.39 MB

Software for analysing Many-Analysts' style data and generating the ManyEcoEvo project data

Home Page: https://egouldo.github.io/ManyEcoEvo/

License: GNU General Public License v3.0

R 8.21% TeX 0.65% HTML 91.14%

ecology evolutionary-biology meta-analysis r

manyecoevo's Introduction

ManyEcoEvo

The ManyEcoEvo package provides a suite of functions for:

Summarising, analysing and visualising the ManyEcoEvo dataset from Gould et al.¹
Tidying and cleaning many-analyst style data for further analysis
Reproducing the analysis in Gould et al. using your own many-analyst style data

Note that the manuscript source-code for Gould et al. is located in a separate repository at https://github.com/egouldo/ManyAnalysts, which can be viewed at https://egouldo.github.io/ManyAnalysts/.

Installation

ManyEcoEvo:: can be installed using devtools:: from GitHub with:

devtools::install_github("egouldo/ManyEcoEvo")

Regenerating the ManyEcoEvo dataset

The data processing and analysis can be freely reproduced with the help of the targets:: package. Please see the documentation at https://docs.ropensci.org/targets/ for further detail.

Should you wish to completely reproduce the dataset generation and analysis in Gould et al., complete the following steps:

Clone or download https://github.com/egouldo/this repository
Run renv::restore() to load the packages used in the analysis pipeline locally on your machine (see renv:: for details)
Run targets::tar_destroy() to remove any record and caches of existing targets
Run targets::tar_make() in your console, depending on the power of your machine, the analysis pipeline will take between 2 and 7 minutes to execute (plus or minus some!)
You can view a table of all targets in the pipeline by running targets::tar_meta()
To interact with objects or ‘targets’ within the analysis pipeline, call targest::tar_load() or targets::tar_read():

targets::tar_read("ManyEcoEvo")

# A tibble: 2 × 4
  dataset    data                diversity_data      estimate_type
  <chr>      <list>              <named list>        <chr>        
1 blue tit   <tibble [174 × 38]> <tibble [174 × 54]> Zr           
2 eucalyptus <tibble [128 × 38]> <tibble [128 × 61]> Zr

The script that generates datasets used in the ManyEcoEvo:: package is located in ManyEcoEvo/data-raw/tar_make.R.

License

This software is licensed with the GNU GPL 3 license.

Contributing

This package is released with a Contributor Code of Conduct.

Citation

To cite package ‘ManyEcoEvo’ in publications use:

  Gould E, Fraser H, Nakagawa S, Parker T (2023). _ManyEcoEvo: Meta-analyse
  data from 'Many-Analysts' style studies_. R package version 1.0.0,
  <https://github.com/egouldo/ManyEcoEvo>.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {ManyEcoEvo: Meta-analyse data from 'Many-Analysts' style studies},
    author = {Elliot Gould and Hannah S. Fraser and Shinichi Nakagawa and Timothy H. Parker},
    year = {2023},
    note = {R package version 1.0.0},
    url = {https://github.com/egouldo/ManyEcoEvo},
  }

Gould, E., Fraser, H., Parker, T. et al. (2023). Same data, different analysts: Variation in effect sizes due to analytical decisions in ecology and evolutionary biology [Preprint]. Ecology and Evolutionary Biology. https://doi.org/10.32942/X2GG62 ↩

manyecoevo's People

Contributors

Stargazers

Watchers

manyecoevo's Issues

roxygen example code error fit_metafor_mv

checking examples ... ERROR
Running examples in ‘ManyEcoEvo-Ex.R’ failed
The error most likely occurred in:

Name: fit_metafor_mv

Title: Fit Multivariate Metaregression using metafoR

Aliases: fit_metafor_mv

** Examples

#TODO -- is this the best way of setting up this fun?? (i.e. to take numeric vectors)?

Example Usage:

library(tidyverse);library(targets);library(metafor) # NOT RUN, TODO: remove after create pkg

source("R/functions.R") #NOT RUN, TODO: remove after create pkg

tar_read(round_2_survey_meta_analysis) %>%

filter(dataset == "eucalyptus") %>%
filter(!is.na(Zr),
```
     !is.na(VZr),
```
```
     !is.infinite(Zr),
```
```
     !is.infinite(VZr)) %>%
```
fit_metafor_mv(estimate = .$Zr, variance = .$VZr, estimate_type = "Zr", data = .)

── Fitting multivariate metaregression ──

Error in match.arg(method) : object 'VZr' not found
Calls: %>% ... fit_metafor_mv -> -> filter -> match.arg
Execution halted

Change processing of prediction data to allow csv files without reported CI's

Previously we had required reported CI's from analysts, however we recently agreed that we would use the CI's from the simulations. Currently, after addressing egouldo/ManyAnalysts#163 there is one analysis without reported CI's that is throwing an error inside the function:

rename_prediction_cols() called inside convert_predictions() within standardise_response() and then in prepare_response_variables() in the creation of target ManyEcoEvo_yi_results.

Previous commit ae8be9a fixed previous error, but this analysis is still causing issues further downstream:

! Error running targets::tar_make()
  Target errors: targets::tar_meta(fields = error, complete_only = TRUE)
  Tips: https://books.ropensci.org/targets/debugging.html
  Last error: Problem while computing `data = purrr::map2(...)`.
ℹ The error occurred in group 24: response_id = "R_2V7qaLEfdbgUGg3",
  submission_id = 1, analysis_id = 1, split_id = 1.
Caused by error in `dplyr::mutate()`:
! Problem while computing `back_transformed_data = purrr::pmap(...)`.
ℹ The error occurred in group 24: response_id = "R_2V7qaLEfdbgUGg3",
  submission_id = 1, analysis_id = 1, split_id = 1.
Caused by error in `data.table::setnames()`:
! 'old' is length 4 but 'new' is length 3

I suspect because its augmented_data is missing the confidence intervals:

Local .Rprofile detected at /Users/egould/Documents/code/ManyAnalysts/.Rprofile

library(targets)
library(tidyverse)
library(ManyAnalysts)

tar_load(ManyEcoEvo_yi)

ManyEcoEvo_yi %>% 
  filter(dataset=="eucalyptus") %>% 
  pluck("data", 1) %>% 
  filter(response_id == "R_2V7qaLEfdbgUGg3") %>% 
  pull("augmented_data")
#> $augmented_data_73780bf7
#> # A tibble: 3 × 4
#>      V1 SurveyID   fit se.fit
#>   <int> <chr>    <dbl>  <dbl>
#> 1     1 Q1       0.451  0.204
#> 2     2 Q2       0.768  0.210
#> 3     3 Q3       0.881  0.265

^{Created on 2023-03-22 by the reprex package (v2.0.1)}

Originally posted by @egouldo in https://github.com/egouldo/ManyAnalysts/issues/163#issuecomment-1478982292

pkgdown GitHub workflow fails because curl 404 error for lme4 and rlang

See rstudio/renv#96

update revised split_id's in list_of_new_csv's

Some split_ids have been corrected in THP's combined master document.
Need to make sure that any matches in the list_of_new_csv_files.csv have also been corrected

Move expert_subset creation out of targets into internal pkg data

Implement same approach as ManyEcoEvo:::collinearity_subset

6 missing analyses in master_data cf. list_of_new_prediction_files

Hi Hannah, I found some missing analyses in master_data, do you think you could take a look and help me figure out what's going on? The response_id's seem to be in there but the exact submission, analysis and split id's in list_of_new_csv_files don't seem to be present in master_data.

PS you'll need to be on pipeline-anon branch!

missing_analyses <- c("R_CfUlQXg434XEESd", "R_3rIdpCqsQtsmgqT", "R_1LRqq2WHrQaENtM")

Incorporate feedback into draft manuscript

Reshuffle content order
Add new content where suggested (brief)

low priority Plot of residuals vs fitted values for each model

Plot of residuals vs fitted values for each model

use_data(upstream targets for manuscript and vignette)

Consider adding upstream or raw files to demonstrate pipeline / package functionality in package vignette and software manuscript. These should be added to tar_make.R.

May want to switch some data objects to internal objects for main manuscript to be called with ManyEcoEvo::: if don't want to expose to user.

Checkout:

define researcher degrees of freedom

I think you should either define 'researcher degrees of freedom' here or not use the term and just describe what you mean without the term
https://github.com/egouldo/ManyEcoEvo/blame/73f92109f3f0c373c421828448fa7870e306227e/manuscript/manuscript/manuscript.Rmd#L50

NAMESPACEs in Imports field not imported

See https://stackoverflow.com/questions/54039992/namespaces-in-imports-field-not-imported-from-all-declared-imports-should-be-us

In R CMD CHECK log:

* checking dependencies in R code ... WARNING
'::' or ':::' imports not declared from:
  ‘broom.mixed’ ‘cli’ ‘data.table’ ‘forcats’ ‘fs’ ‘ggeffects’
  ‘ggforestplot’ ‘glue’ ‘here’ ‘lme4’ ‘magick’ ‘metafor’ ‘metaviz’
  ‘naniar’ ‘parameters’ ‘parsnip’ ‘performance’ ‘pracma’ ‘progress’
  ‘readr’ ‘readxl’ ‘recipes’ ‘sae’ ‘stringr’ ‘timetk’ ‘workflows’
Namespace in Imports field not imported from: ‘tidyselect’
  All declared Imports should be used.

Code analyses still coded as 'CHECK'

Hi Hannah or @parkerth,

Could you please check the master_data file (at "data-raw/anonymised_data/master_data.csv") and code as necessary or mark as NA if can't resolve? there are a few analyses remaining with values 'CHECK':

response_id	submission_id	analysis_id	split_id	test_variable	Bayesian	linear_model	model_subclass	exclusions_effect_analysis	Conclusion	data_cleaning_preprocessing_tool	data_cleaning_preprocessing_version	data_analysis_tool	data_analysis_version
R_11787O3NmejXKAH	1	2	2	net_rearing_manipulation	0	generalised	standard	exclude_all	CHECK	R	NA	R	NA
R_11787O3NmejXKAH	1	2	3	net_rearing_manipulation	0	generalised	standard	exclude_all	CHECK	R	NA	R	NA
R_11787O3NmejXKAH	1	2	1	net_rearing_manipulation	0	generalised	standard	exclude_all	CHECK	R	NA	R	NA
R_1eXlFKlQdiD2F59	2	2	1	rear_Cs_at_start_of_rearing	0	linear	standard	retain	CHECK	CHECK	CHECK	CHECK	CHECK
R_1GJlffAgZv6SY4y	1	1	1	NA	CHECK	CHECK	standard	exclude_all	none_c	R	4.0.0	R	4.0.0
R_1M0cMZL2IPYWHoi	1	1	1	NA	CHECK	CHECK	CHECK	exclude_all	CHECK	CHECK	CHECK	CHECK	CHECK
R_1M0cMZL2IPYWHoi	1	1	1	NA	1	generalised	CHECK	exclude_all	NA	R	3.6.3	R	3.6.3
R_1QlnXdW5tKuUQIr	1	1	1	CHECK	CHECK	CHECK	CHECK	CHECK	neg_q	R	3.6.1	R	3.6.1
R_21gmMa0uclrNoTP	2	1	1	NA	CHECK	linear	CHECK	exclude_all	CHECK	CHECK	CHECK	CHECK	CHECK
R_2Pjoz1X4q5XRClO	2	1	1	NA	CHECK	CHECK	standard	exclude_all	CHECK	CHECK	CHECK	CHECK	CHECK
R_2zNKAmJcWbM4QtY	1	1	1	NA	0	CHECK	standard	retain	none_q	R	3.6.1	R	3.6.1
R_3EbbZxcQ3gctVZu	1	1	1	NA	0	CHECK	hurdle	exclude_all	neg_q	R	3.6.1	R	3.6.1
R_3Kvy0h01LXHWniT	2	2	1	NA	0	generalised	standard	retain	CHECK	R	NA	R	NA
R_3nBCE4hMLh5s3qt	3	1	2	NA	CHECK	CHECK	standard	NA	CHECK	CHECK	CHECK	CHECK	CHECK
R_3nBCE4hMLh5s3qt	3	1	1	NA	CHECK	CHECK	standard	NA	CHECK	CHECK	CHECK	CHECK	CHECK
R_AzL6RdNTHtPjxzX	1	1	2	NA	CHECK	CHECK	standard	exclude_all	CHECK	CHECK	CHECK	CHECK	CHECK
R_es2jrrN9CTGwl5D	1	1	1	NA	CHECK	CHECK	CHECK	exclude_all	neg_q	R	4.0.0	R	4.0.0

Identify which studies removed based on SD exclusino for both bt and euc yi

Calculate SE's for out-of-sample estimatse where CI is assymetrical

From Tim:

I wanted to let you know that Shinichi endorsed the following plan for SEs missing for out-of-sample estimates where the CI is asymmetrical:

calculate the full width of the CI, then divide that by (2*1.96) to get an estimate of the SE.

Constructed Variables are ignored when standardising (no average values in param table)

merge HF's code into data-raw script osf_load_analyst_datasets.R (this script is the data-raw script that creates euc_data and blue_tit_data
Rerun osf_load_analyst_datasets.R so sysdata objects are are-written with use_data() calls.
Re-run analysis_data_param_tables.R which creates internal object analysis_data_param_tables containing mean and sd for both BT and EUC datasets for all variables used but all analysts
Rebuild package
Rerun targets to incorporate changes

Update manuscript: summary section

Take the first paragraph of the summary section and insert it in the current 'statement of need' section first paragraph, before the word moreover (and tie it in)
take the third paragraph of the summary section and insert it into paragraph 2 of the 'statement of need' after the phrase "computationally reproduce our analysis"
get rid of the rest of the summary section
rename the statement of needs section 'introduction'

Write vignettes

Write vignettes (these dont seem to contain any content - the html files say they dont exist and on the github there's only placeholder content)

Fix pkgdown double logo

Update at pkgdown:: See issue here:

See issue here: r-lib/pkgdown#2184

vignette: add small batch of files to inst/extdata/ to demo yi cleaning and validation

ensure inverse weights applied to categorical weighting models

See: egouldo/ManyAnalysts#12

R CMD CHECK job - imports error for GitHub packages

Note that packages declared under Remotes: must still be listed under imports / depends / suggests !

Failing run: https://github.com/egouldo/ManyEcoEvo/actions/runs/6776706273/job/18418772440

See: https://github.com/r-lib/devtools/blob/main/vignettes/dependencies.Rmd

Make timeline less specific

I think it's strange to say 10 years here when Silberzahn was the first and was only published 6 years ago. Maybe go with something less precise sounding like 'decade' or say 6 years

https://github.com/egouldo/ManyEcoEvo/blame/73f92109f3f0c373c421828448fa7870e306227e/manuscript/manuscript/manuscript.Rmd#L69

Implement recommendations from Lortie & Fillazola for clean code

https://doi.org/10.1111/2041-210X.13961

R CMD CHECK - replace example code calling targets with objects

For objects we need in examples, make sure that we set up use_data() and export the objects directly rather than calling targets.

Relatedly: Should we put the targets objects in data-raw, or inst??

manuscript methods / package-build (?) section

Add a (methods?) section with a schematic of the manyecoevo pipeline and a description of all the things it does, referencing the vignettes you've proposed

Ensure proper use of targets within pkg

remove ManyEcoEvo from pkgs vector
Remove all namespacing in targets code
May also need to remove any namespacing within R/ funs
declare ManyEcoEvo in tar_option_set() imports arg
tar destroy and rerun data-raw.R

duplicated id_col

I found an incorrectly duplicated id_col in master_data.csv for two separate analyses from the same submission for the same team, one blue tit and one eucalyptus. One will need to be recoded in response_id, submission_id and analysis_id and split_id columns.

See details in reprex below:

library(tidyverse)
library(here)
#> here() starts at /Users/elliotgould/Documents/GitHub/ManyAnalysts

library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test

library(ManyEcoEvo)

prepare_df_for_summarising <- function(data){
  data %>% mutate(across(.cols = c(num_fixed_variables,
                                   num_random_variables,
                                   sample_size,
                                   num_interactions,
                                   Bayesian, #NA's coming from CHECK values
                                   mixed_model,
                                   num_fixed_effects,
                                   num_random_effects), 
                         as.numeric),
                  lm = ifelse(linear_model == "linear", 1, 0),
                  glm = ifelse(linear_model == "generalised", 1, 0))
}

Master <- ManyEcoEvo %>% 
  select(data) %>% unnest(everything()) %>% 
  prepare_df_for_summarising() #NAs ok, caused by CHECK vals, not yet using THP's fixes
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `across(...)`.
#> Caused by warning:
#> ! NAs introduced by coercion

Note that we are getting an unexpected many to many relationship here, as per the warning above.

predictions <- read_csv(here::here("ms/predictions_Ids.csv")) %>% #TODO ask HF source
  distinct() %>% 
  left_join(Master, by = c("id_col")) %>% 
  prepare_df_for_summarising()
#> Rows: 258 Columns: 1
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): id_col
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

There are duplicate entries for one id_col, let’s identify these analyses:

predictions %>% janitor::get_dupes("id_col") %>% 
  select(id_col, ends_with("_id"), TeamIdentifier) %>% 
  knitr::kable()

id_col	response_id	submission_id	analysis_id	split_id	TeamIdentifier
Byrock-1-8-1	R_3qfD5ZHHdBbTgk3	1	8	1	Byrock
Byrock-1-8-1	R_3HzSBqQTAmJJ9ye	1	8	1	Byrock

It seems that there are two separate response_id entries for this Team,
However, they are both coded with the same id_col.
let’s see which columns have values that are duplicated:

duplicated_variables <- 
  predictions %>% select(-review_data) %>% 
  janitor::get_dupes("id_col") %>% 
  summarise(id_col = unique(id_col), across(-all_of("id_col"), 
                   ~ first(.x) == last(.x))) %>% 
  select(id_col, where(isFALSE))

predictions %>% 
  semi_join(duplicated_variables, by = join_by("id_col")) %>% 
  select(id_col, colnames(duplicated_variables)) %>% 
  knitr::kable()

id_col	response_id	beta_estimate	adjusted_df	beta_SE	transformation	link_function_reported	dataset	mixed_model	response_variable_name	response_id_S2	sample_size	linear_model	exclusions_all	Conclusion	lm	glm
Byrock-1-8-1	R_3qfD5ZHHdBbTgk3	-0.065490	458.3576	0.014100	identity	identity	blue tit	1	day_14_weight	R_3qfD5ZHHdBbTgk3	3720	linear	exclude	neg_c	1	0
Byrock-1-8-1	R_3HzSBqQTAmJJ9ye	-0.028464	345.0000	0.025721	log	log	eucalyptus	0	euc_sdlgs0_50cm	R_3HzSBqQTAmJJ9ye	350	generalised	retain	neg_q	0	1

OK there is one for both Eucalyptus and for Blue tit, So the split_id is coded incorrectly as these are clearly separate analyses.
I can see that this id is also assigned to different response_id’s, i.e. from different submissions.
I note that in the file prediction_IDs.csv there are three duplicated entries for this id_col.
We should make sure that there isn’t a third analysis somewhere that is also duplicated in id_col.
Would be helpful to know how Hannah created this dataset.
OK, I also note that for response_id R_3HzSBqQTAmJJ9ye There are three entries in
predictions_validations_worksheet.csv belonging to this response_id. So that’s why there are multiple
entries in predictions_IDs.csv.
The submission, analysis and split ID columns in that data file are:

1-8-1
2-9-1
3-10-1
The predictions object here is created also from the Master object or ManyEcoEvo::ManyEcoEvo.
Which comes from the master_data.csv file.
Let’s look at that to see if that’s potentially the source of the problem:

Master %>%  
  filter(TeamIdentifier == "Byrock") %>% 
  select(id_col, dataset, all_of(ends_with("_id"))) %>% 
  distinct() %>% 
  janitor::get_dupes("id_col")
#> # A tibble: 2 × 7
#>   id_col       dupe_count dataset response_id submission_id analysis_id split_id
#>   <chr>             <int> <chr>   <chr>               <dbl>       <dbl>    <dbl>
#> 1 Byrock-1-8-1          2 blue t… R_3qfD5ZHH…             1           8        1
#> 2 Byrock-1-8-1          2 eucaly… R_3HzSBqQT…             1           8        1

Yes, different response_id for the same id_col for analyses of diff. datasets.
Let’s check check the raw data file. Here’s the reprex output I ran over at ManyEcoEvo:
```md *Local.Rprofiledetected at/Users/elliotgould/Documents/GitHub/ManyEcoEvo/.Rprofile`*

library(targets)
library(tidyverse)

There are no extra prediction file submissions for these analyses, so that’s not a problem.

tar_read(list_of_new_prediction_files) %>% 
  filter(response_id == "R_3qfD5ZHHdBbTgk3" | response_id == "R_3HzSBqQTAmJJ9ye") %>% 
  select(dataset, ends_with("_id"), csv_number) 
#> # A tibble: 0 × 6
#> # ℹ 6 variables: dataset <chr>, response_id <chr>, submission_id <dbl>,
#> #   analysis_id <dbl>, split_id <dbl>, csv_number <dbl>

Let’s check the underlying master_data:

readr::read_csv("data-raw/anonymised_data/master_data.csv") %>% 
  filter(TeamIdentifier == "Byrock") %>% 
  select(id_col, dataset, all_of(ends_with("_id"))) %>% 
  knitr::kable()
#> Rows: 302 Columns: 154
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (135): response_id, id_col, contrast, transformation, link_function_repo...
#> dbl  (16): submission_id, analysis_id, split_id, beta_estimate, adjusted_df,...
#> lgl   (3): Extra-pair_dad_ring, rear_Cs_out, rear_Cs_in
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

id_col	dataset	response_id	submission_id	analysis_id	split_id	hatch_nest_breed_ID	rear_nest_breed_ID
Byrock-1-3-1	eucalyptus	R_23UKvhBc7D608VO	1	3	1	NA	NA
Byrock-5-7-1	eucalyptus	R_23UKvhBc7D608VO	5	7	1	NA	NA
Byrock-4-6-1	eucalyptus	R_23UKvhBc7D608VO	4	6	1	NA	NA
Byrock-2-4-1	eucalyptus	R_23UKvhBc7D608VO	2	4	1	NA	NA
Byrock-3-5-1	eucalyptus	R_23UKvhBc7D608VO	3	5	1	NA	NA
Byrock-3-10-1	eucalyptus	R_3HzSBqQTAmJJ9ye	3	10	1	NA	NA
Byrock-1-8-1	eucalyptus	R_3HzSBqQTAmJJ9ye	1	8	1	NA	NA
Byrock-2-9-1	eucalyptus	R_3HzSBqQTAmJJ9ye	2	9	1	NA	NA
Byrock-1-1-1	blue tit	R_3iKJrflQwwxsps0	1	1	1	NA	rear_nest_breed_ID
Byrock-2-2-1	blue tit	R_3iKJrflQwwxsps0	2	2	1	NA	rear_nest_breed_ID
Byrock-1-8-1	blue tit	R_3qfD5ZHHdBbTgk3	1	8	1	NA	rear_nest_breed_ID

Yes, this must be the source of the issue. Two 1-8-1 entries.

^{Created on 2024-06-18 with reprex v2.1.0}

Write manuscript abstract

pkgdown - GitHub actions how to load ManyEcoEvo ?

r-dependencies action seems to get around this by having a local:: argument to the yaml parameter extra-packages:... (https://pak.r-lib.org/reference/pak_package_sources.html?q=local#local-packages-local-) but I've switched to renv in the GitHub action workflow... what can we do to get around this?

Do we specify ManyEcoEvo in renv? Seems a bit circular.

update citation once manuscript published

See https://stackoverflow.com/questions/72994827/having-two-preferred-citations-when-implementing-a-citation-cff-to-a-project for adding preferred citation to CFF\

note will need two citations, one for main manuscript, one for secondary software manuscript

Zr data subsets all have different number of cols

Identify source of variation in pipeline
Ensure all effects_analysis tbl's have the same columns

erroneous validation for submission data with predicted values as 0

Local .Rprofile detected at /Users/egould/code/ManyAnalysts/.Rprofile

library(targets)
library(pointblank)
library(tidyverse)
#> Warning: package 'tidyr' was built under R version 4.0.5
#> Warning: package 'dplyr' was built under R version 4.0.5
source("R/functions.R")
tar_load(all_prediction_data)

all_prediction_data %>% 
  pull("checks") %>% 
  discard(rlang::is_na) %>% 
  map_dfr(filter, f_pass < 1, stringr::str_detect(columns, "estimate|fit"))
#> # A tibble: 2 × 14
#>       i type  columns values precon active eval  units n_pass f_pass W     S    
#>   <int> <chr> <chr>   <chr>  <chr>  <lgl>  <chr> <dbl>  <dbl>  <dbl> <lgl> <lgl>
#> 1     7 col_… fit     <NA>   <NA>   TRUE   OK        1      0      0 TRUE  NA   
#> 2     3 col_… se.fit  <NA>   <NA>   TRUE   OK        1      0      0 TRUE  NA   
#> # … with 2 more variables: N <lgl>, extract <int>

all_prediction_data %>% 
  mutate(validation_fail = modify_if(.x = checks, 
                                    .p = negate(rlang::is_na),
                                    .f = ~ filter(.x, f_pass < 1, 
                                                  stringr::str_detect(columns, "estimate|fit")) %>% 
                                      nrow(.)) %>% 
           flatten_dbl(.) %>% 
           as.logical(.)) %>% 
  filter(validation_fail == TRUE) %>% pull(augmented_data,name = response_id)
#> $R_3EoS37IOWKLRxDV
#> # A tibble: 3 × 5
#>   SurveyID   fit se.fit ci.low ci.hi
#>   <chr>    <int>  <dbl>  <int> <int>
#> 1 Q1           0   1.41      0     2
#> 2 Q2           0   5.30      0     6
#> 3 Q3           0  24.3       0    26
#> 
#> $R_3EYx3CfuTZ7YNmF
#> # A tibble: 3 × 4
#>   SurveyID   fit ci.low ci.hi
#>   <chr>    <dbl>  <dbl> <dbl>
#> 1 Q1       0.968  0.654 1.28 
#> 2 Q2       0.382  0.281 0.482
#> 3 Q3       1.18   0.715 1.64

^{Created on 2022-06-03 by the reprex package (v2.0.1)}

Validation has correctly identified R_3EYx3CfuTZ7YNmF to have failed, BUT R_3EoS37IOWKLRxDV should have passed validation... It seems like the 0's are being parsed as col_integer() rather than as col_double() during read_submission_data()

Add error handling to convert_predictions() when input df fails validation

So now what... what's correct behaviour?

Probably should add error handling when required columns / types aren't present in augmented_data arg for convert_predictions()...

Originally posted by @egouldo in https://github.com/egouldo/ManyAnalysts/issues/117#issuecomment-1145595743

Currently, this is being managed by dropping any datasets that have failed validation. This is OK for now, but is difficult to figure out what's causing the error, if triggered.

Missing variables to be coded in mean/sd parameter lookup table

Missing variables from ManyAnalysts:::analysis_data_param_tables

  response_id       submission_id analysis_id split_id response_variable_name      
  <chr>                     <dbl>       <dbl>    <dbl> <chr>                       
1 R_3dWjjdkDbTRCUqs             1           1        1 SMI                         
2 R_3eemYAT4onCCt0N             1           1        1 residual_day14_weight       
3 R_PvSAhRE12PvmEJb             1           1        1 SMI                         
4 R_SDyCfmAPuL4gcKZ             5           5        1 residual_day_14_weight_males

And also: euc_sdlgs50cm_2m

Originally posted by @egouldo in https://github.com/egouldo/ManyAnalysts/issues/162#issuecomment-1423526812

@hannahsfraser to Edit script /data-raw/analysis_datasets/osf_load_analyst_dataset.R, calculating variables SMI and residual_day14_weight and residual_day_14_weight_males
@egouldo ManyAnalysts::blue_tit_data & ManyAnalysts::euc_data to be regenerated by running /data-raw/analysis_datasets/osf_load_analyst_dataset.R
@egouldo Rebuild package (will update ManyAnalysts::blue_tit_data & ManyAnalysts::euc_data)
@egouldo In /data-raw/ run script: analysis_data_param_tables (which will run functions make_param_table on both ManyAnalysts::blue_tit_data and ManyAnalysts::euc_data.
@egouldo Rebuild package (will update analysis_data_param_tables)
@egouldo Rerun pipeline and check if missing back_transformed_data tables for the above cases are resolved egouldo/ManyAnalysts#162
🗒️ Related to #82

create collinearity removed subset

egouldo/ManyAnalysts#27 (comment)

Set up binder

https://the-turing-way.netlify.app/communication/binder/zero-to-binder.html

Duplicate entry for `R_1LRQQ2WHQAENTM_1-1-1` in Forest Plot specification curve

This study ID has two entries...

response variable: euc_sdlngs0_50cm
response_variable: euc_sdlngs50cm_2m

But if this is the case, they should have two different study id's! Investigate source of error and fix.

warning messages on load data after library package

Warning messages:
1: Unknown or uninitialised column: anonymise_teams.
2: Unknown or uninitialised column: assign_transformation_type.
3: Unknown or uninitialised column: est_to_zr.
4: Unknown or uninitialised column: %nin%.
5: Unknown or uninitialised column: count_binary_coded_features.
6: Unknown or uninitialised column: square_back.
7: Unknown or uninitialised column: plot_model_means_box_cox_cat.
8: Unknown or uninitialised column: standardise_response.
9: Unknown or uninitialised column: prepare_diversity_summary_data.
10: Unknown or uninitialised column: summarise_study.

See this thread https://stackoverflow.com/questions/39041115/fixing-a-multiple-warning-unknown-column for examples where other users have encountered this problem.

I suspect it's either mapped mutate calls causing this warning or else either readr::

Collinearity subset analysis does not subset correct list-column of df's

list-col effects_analysis is not being subset, data is. Function is applied after other pre-processing to make ManyEcoEvo::ManyEcoEvo_results. Downstream analyses use effects_analysis as the input list-col of df's, however.

library(ManyEcoEvo)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

library(purrr)

pull_df <- function(x,y){
  x %>% 
    filter(dataset == "blue tit", 
           publishable_subset == "All", 
           expertise_subset == "All", 
           exclusion_set == "complete") %>% 
    pull({{y}})
}

ManyEcoEvo::ManyEcoEvo_results %>% pull_df(data) %>% map(dim)
#> $subset_complete
#> [1] 131  40
#> 
#> $subset_complete
#> [1] 119  40

ManyEcoEvo::ManyEcoEvo_results %>% pull_df(effects_analysis) %>% map(dim)
#> [[1]]
#> [1] 131  48
#> 
#> [[2]]
#> [1] 131  48

^{Created on 2024-06-14 with reprex v2.1.0}

ManyEcoEvo/R/generate_collinearity_subset.R

Lines 53 to 55 in 77c89f6

 mutate(data = map(.x = data, 

 .f = dplyr::anti_join, collinearity_subset, 

 by = join_by(response_id, id_col, dataset) )) %>%

Convert descriptive summary table code to functions

Table 3.1

Table A1

Table A2

stop pkgdown GitHub pages building with Jekyll despite .nojekyll present!

Potentially in wrong dir?

See https://github.com/orgs/community/discussions/23564

Low priority: Link standard deviations referred to in text to the data

rm analyses marked as 'exclude' from analysis pipeline prior to prepare_ManyEcoEvo_*()

Missing response_variable_name so data can't be found in param lookup table, all are marked with exclude,
which leads me to reconsider that we should withdraw all excluded analyses prior to running them through the pipeline.
Update both _Zr and yi pipelines to ensure that all excluded analyses to not go through analysis pipeline, but are removed. We still need to be able to see these in either master_data_raw or else in ManyEcoEvo_* however... because we will want to provide details on how many analyses were excluded in each dataset for the manuscript!

response_id       submission_id analysis_id split_id response_variable_name exclusions_all
  <chr>                     <dbl>       <dbl>    <dbl> <chr>                  <chr>         
1 R_1E6WYqro4x6Nnz8             1           1        1 NA                     exclude       
2 R_1GEw0p99dpeSW3E             3           3        1 NA                     exclude       
3 R_1GEw0p99dpeSW3E             4           4        1 NA                     exclude       
4 R_CfUlQXg434XEESd             1           1        1 NA                     exclude       
5 R_PvSAhRE12PvmEJb             3           3        1 NA                     exclude

Originally posted by @egouldo in https://github.com/egouldo/ManyAnalysts/issues/162#issuecomment-1423526812

Fixed by @parkerth
@egouldo to rerun pipeline
and then update https://github.com/egouldo/ManyAnalysts

Package publication: rename fn plot_effects_diversity

This function needs to be renamed since we are actually using Sorensen's similarity index, not diversity. Also the function documentation needs to be renamed.

	mutate(data = map(.x = data,
	.f = dplyr::anti_join, collinearity_subset,
	by = join_by(response_id, id_col, dataset) )) %>%

egouldo / manyecoevo Goto Github PK

manyecoevo's Introduction

ManyEcoEvo

Installation

Regenerating the ManyEcoEvo dataset

License

Contributing

Citation

Footnotes

manyecoevo's People

Contributors

Stargazers

Watchers

manyecoevo's Issues

Name: fit_metafor_mv

Title: Fit Multivariate Metaregression using metafoR

Aliases: fit_metafor_mv

** Examples

Example Usage:

library(tidyverse);library(targets);library(metafor) # NOT RUN, TODO: remove after create pkg

source("R/functions.R") #NOT RUN, TODO: remove after create pkg

Recommend Projects

Recommend Topics

Recommend Org