A systematic/quantitative review of articles, which provides a basis for identifying what has been done so far in the field of plant pathology research reproducibility and suggestions for ways to improving it.

Home Page: https://openplantpathology.github.io/Reproducibility_in_Plant_Pathology

License: Other

TeX 66.05% R 16.80% Dockerfile 0.58% Lua 16.52% Shell 0.05%

plant-pathology reproducible-research reproducible-science open-science r rstats research-compendium

reproducibility_in_plant_pathology's Introduction

Reproducibility in Plant Pathology

This repository contains the data and code for our article:

Sparks, A.H., Del Ponte, E.M., Alves, K. S., Foster, Z., Grünwald, N. J. (2023). Openness and computational reproducibility in plant pathology: where do we stand and a way forward, Phytopathology https://doi.org/10.1094/PHYTO-10-21-0430-PER.

Our pre-print is online on the agriRxiv preprint server:

Sparks, A.H., Del Ponte, E.M., Alves, K. S., Foster, Z., Grünwald, N. J. (2023). Openness and computational reproducibility in plant pathology: where do we stand and a way forward. agriRxiv, Accessed 07 Aug 2024. Online at https://doi.org/10.31220/agriRxiv.2021.00082

The paper is a systematic and quantitative review of articles published in 21 plant pathology journals that spans five years of publications. It provides a basis for identifying what has been done so far in the discipline of plant pathology’s published research to ensure computational reproducibility. The results show that as a discipline, plant pathologists are not widely sharing data or code openly, making the works largely unreproducible. Based on these results and our own experiences, we offer suggestions as to how we can further improve reproducibility in the discipline of plant pathology, but which are not unique to the discipline, that would allow reviewers to make better suggestions, readers to learn more from the work and earns author more citations for their work.

How to cite

Please cite this compendium as:

Sparks, A.H., Del Ponte, E.M., Alves, K. S., Foster, Z., Grünwald, N. J. (2024). Compendium of R code and data for ‘Status and Best Practices for Reproducible Research In Plant Pathology’. Accessed 07 Aug 2024. Online at https://doi.org/10.5281/zenodo.1250664

How to download or install

The R package

This repository is organized as an R package. There is one R function, import_notes() that is used in the paper’s figure and table making when the file, analysis/paper/paper.Rmd is knit. Additionally, a bibliography file, “references.bib”, of the articles that were examined and the notes from the evaluation, “Reproducibility_in_plant_pathology_notes.ods” of the articles are both located in inst/extdata directory. We have used the R package structure to help manage dependencies, to take advantage of continuous integration for automated code testing and for file organisation.

You can download the compendium as a zip from from this URL: https://github.com/openplantpathology/Reproducibility_in_Plant_Pathology/archive/main.zip

Or you can install this compendium as an R package, Reproducibility.in.Plant.Pathology, from GitHub with:

if (!require("remotes"))
  install.packages("remotes")
remotes::install_github("openplantpathology/Reproducibility_in_Plant_Pathology"
)

Once the download is complete, open the Reproducibility_in_Plant_Pathology.Rproj in RStudio to begin working with the package and compendium files.

The Docker Instance

Get the latest instance from Dockerhub, launch it and go to localhost:8787 in your browser. Login with rstudio, password is rstudio.

docker pull adamhsparks/reproducibility_in_plant_pathology
docker run -d -p 8787:8787 adamhsparks/reproducibility_in_plant_pathology

The Paper

The file structure follows a normal R package with one exception. The top-level “/analysis” directory contains the directories and files necessary to re-knit the MS Word document of the paper from an Rmd file, “/analysis/paper/paper.Rmd”.

A script, knit_paper.R, is located in analysis/paper/knit_paper.R that will knit the manuscript and the supplementary materials in a Docker session.

reproducibility_in_plant_pathology's People

Contributors

Stargazers

Watchers

Forkers

grunwald bhimchaulagain kuldeepsjadon dafealos aris-budiman

reproducibility_in_plant_pathology's Issues

We have both gsheets and googlesheets used in the package

Would be good to just use one not both.

Getting the references into our tidy data frame in R

I did some checking, we can save ourselves some typing.

Zotero will export a CSV file of a .bib file. We can import that into R and select the columns that we need from that file. That should greatly simplify our typing data in and cut down on possible data entry errors.

JabRef will do this too, but doesn't appear to export the DOI field.

Not building the pkgdown site locally

I get this error when I try to build the site. The website does not build the new Rmd file in the DOC folder.

build_site()
Error in yaml::yaml.load(string, ...) :
Scanner error: while scanning a simple key at line 2, column 1 could not find expected ':' at line 3, column 1
In addition: Warning message:
In strptime(x, fmt, tz = "GMT") :
unknown timezone 'zone/tz/2017c.1.0/zoneinfo/America/Sao_Paulo'

Update document detailing second round of articles surveyed

Text in this document needs to be checked, and updated as necessary https://github.com/openplantpathology/Reproducibility_in_Plant_Pathology/blob/master/vignettes/b2_assigning_articles.Rmd

Check that bib file of references in ext-data is up-to-date

Some articles were changed during the reviewing. Double check the bib file of reviewed articles against the bib file and make sure it's correct.

Add tests for internal functions

mainly import_rrpp() should be tested

Removed "page_fees" from evaluation

I removed the page fees field, there was some discussion in the vignette around it and it was deemed difficult to determine, I'm also not sure how useful it is for our purposes, so I've removed it.

Add multipanel figure of the four indices

A 2x2 figure with the index ratings of:

citation,
software,
code avail. and
data avail,

Figures

Figure 2 is cited before figure 1, and this needs to be changed before submission. They should be cited in order of appearance. Also, Phytopathology would have figures labeled with letters such as Figure 1A.

Rmd vs md files

@emdelponte, I've set up the Rmd files to knit GitHub md files, there is no need to delete them.

Rather, we should knit README.Rmd to create the README.md file in the /src directory. That will create the final output, README.md, showing the lists and other results of our methodology. This way when you go into the /src directory it will automatically display that file.

Other references to include when writing

https://academic.oup.com/joc/advance-article/doi/10.1093/joc/jqz052/5803422

Dividing up the work

How to organise our efforts for writing this manuscript?

Decide on attributes of papers to recorded

Here is a start based on our previous discussions, but we should put this in an Rmd.

Paper attributes

Raw data accessibility:
- online
- publicly accessible (i.e. dont have to email anyone, have an account anywhere, or pay anything)
- well annotated so its understandable independent of the methods/paper
Computational methods:
- publicly accessible (i.e. dont have to email anyone, have an account anywhere, or pay anything)
- Using open source, free software
- All scripted; no manual editing or point and click
- version controlled from start of project
- well annotated so its understandable independent of the methods/paper

Journal attributes

Impact factor
country
page charges
Open/restricted access
issues/year
presence/absence of instructions encouraging reproducibility
presence/absence of supplementary material section

Move package structure to rrtools

@benmarwick has created a very nice R package for research compendia, rrtools.

I've used it elsewhere in my research, https://github.com/phytopathology/rice_awd_pests.

And since this project is already an R package focusing on reproducible research, it seems like a good idea to use the rrtools package to format this compendium.

Article writing

@adamhsparks @grunwald @zachary-foster

I did some work on the manuscript Rmd file and committed the changes:

Inserted portions of text for the abstract and introduction sections
Made and included diagram in the text using DiagrammeR package. Anyone can make further changes and check by running the chunk. The programming of the diagram uses Graphviz and is quite easy.
Described three levels of reproducibility associated with the diagram
Changed the style of the html file - the export to .doc file is not working with the diagram code and this format was eliminated for now. The export to html is working fine but not accessible on GitHub because the docs folder is set to pages.

Branches

I've created branches for everyone following @zachary-foster's lead.

In the three I've created (I've not made any changes to yours, @zachary-foster), I have rearranged the files to that the package passes the build tests. To do that I moved the article_data.csv file to inst/extdata.

I've also added the missing DOI information to the doi field in this file in the branches I created and subset the files to be looked at by the assignee, e.g. adams-changes contains an article_data.csv that only has my 50 assigned articles in it.

I'll complete my changes here and then merge back with the master branch.

cheers!

Article classifications

Articles will be classified by the evaluator as to whether they are:

Fundamental research
Applied research
Molecular research
Combined

Is there another classification that you would like to use or are these three (or a combination of) enough?

Paper categorization?

@adamhsparks @grunwald @zachary-foster

I went through my sample of 50 articles again and came up with this categorization specific to papers in plant disease-related journals. You may want to propose new ones, split or combine categories. I started with applied versus fundamental and would like your input also to check whether the subcategories are in the right category.

The idea is to provide specific guidelines with regards reproducibility of studies in plant pathology. Is this useful?

Fundamental

Basic pathogen biology
Plant-pathogen interaction
Pathogen population genetics
Molecular evolution and ecology
Functional genomics
Quantitative epidemiology and modeling
New taxa (species)

Applied

Molecular diagnosis/survey of plant pathogens
Disease/pathogen/metabolites survey
Pathogen/disease report (morphology/molecular/pathogenicity)
Experimental research (treatment effects)
Research methods and toolbox (e.g. primer dev, selective media, software, etc)
Screening germplasm/cultivar for resistance

In the article assignments, Nik has too many articles

Somehow in the assigning Nik ends up with 52 rows not 50.

I think an article or two get duplicated. I'll look into it.

Assigned articles, bibliography and notes

I've made a good start on this, but am not quite done, I hope to finish this tomorrow (Wed. my time).

All references are in the /data/Journals_for_review.bib file

Our assignments (complete but not in repository yet) will be in Article_notes.csv

I'll left join on DOI for a complete list of notes and assignments. However, a few of the journals do not have DOI values so I'll have to do some work by hand to join them completely.

Decide on article selection method

I like what @adamhsparks did here. Any other ideas?

Computational tool availablility and data availability ratings

Thinking back to a thread on Twitter recently, GitHub and the like are not proper repositories. Really we should be ranking availability of scripts and data the highest for being available from libraries (e.g., K-State's KREX, USQ's e-prints, etc.). Zenodo, being CERN might be top tier, but GitHub, FigShare are probably second tier.

Review sections

I've made a nascent attempt at outlining the document.

Are the sections I've proposed acceptable or do we need more or less?

Make me collaborator

Hi Adam, please add me as a collaborator to this repository. Thanks!

Document content

I've started filling in some content into the outline. I'm aware that I'm probably too R-centric, reproducible research does not revolve around R or scientific computing, however, these tools do make it much easier.

Best Practices

We need some good best practices for research from field level to in silico.

Examples/the State of RR in Plant Pathology

I've started filling in with my own work where I've made everything available. We should attempt to quantify efforts to make research reproducible/replicable in plant pathology somehow.

From our e-mail string:

The systematic/quantitative review of articles will provide a nice basis for identifying what has been done so far in our field. We may be able to see which fields are more “reproducible” than others and what are the trends. Then, we could provide guidelines for the best practices (tools available, format, etc) with examples and case studies. I could work on something related to meta-analysis, for example.

By the way, interesting that meta-analysis (which uses published or unpublished data) has been used in plant pathology during the last 10 years but the data and codes are not being shared as far as I know. An open database that allows others to keep adding data would be very useful.

Ideas for analysis

A few ideas for things to look at in our paper:

Changes over time (it is only five years but there could be a reasonable trend)
Reproducibility index by journal, this may be trickier since we may not have enough articles for each journal
Reproducibility index by article type. I think this could work. See #10 for classification discussion
Journal publisher type? Society vs commercial publisher or some sort of this classification?

Fields in `article_data.csv`

There are some fields in the article_data.csv file that are already described elsewhere or can be easily derived from the citation.

However, fields like "country" or "Open_or_Restricted" or "Reproducibility_Instructions" can be interpreted and filled in various ways.

I'd propose the following, tell me if I'm off here, @zachary-foster, since you set them up.

country - Country of publisher, e.g. Springer Netherlands, entered as the 3 letter ISO code, NLD
Open_or_Restricted - entered as "Open"/"Restricted"
Reproducibility_Instructions - entered as "yes"/"no"

Paper title needs to be decided on

An early working title we had was:
"What Does Reproducible Research Mean for Plant Pathology?"

We also had:
"How Can We Open Plant Pathology?"

Currently, it is:
"Insights Into Computational Reproducibility in Plant Pathology and a Way Forward"

The current version is appropriate for a paper, but we're leaning toward a letter to the editor, so this needs to be revisited with an eye toward that format.

openplantpathology / reproducibility_in_plant_pathology Goto Github PK