Git Product home page Git Product logo

openplantpathology / reproducibility_in_plant_pathology Goto Github PK

View Code? Open in Web Editor NEW
23.0 8.0 6.0 2.11 GB

A systematic/quantitative review of articles, which provides a basis for identifying what has been done so far in the field of plant pathology research reproducibility and suggestions for ways to improving it.

Home Page: https://openplantpathology.github.io/Reproducibility_in_Plant_Pathology

License: Other

TeX 66.05% R 16.80% Dockerfile 0.58% Lua 16.52% Shell 0.05%
plant-pathology reproducible-research reproducible-science open-science r rstats research-compendium

reproducibility_in_plant_pathology's Introduction

Reproducibility in Plant Pathology

Publish Docker DOI Project Status: Inactive – The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

This repository contains the data and code for our article:

Sparks, A.H., Del Ponte, E.M., Alves, K. S., Foster, Z., Grünwald, N. J. (2023). Openness and computational reproducibility in plant pathology: where do we stand and a way forward, Phytopathology https://doi.org/10.1094/PHYTO-10-21-0430-PER.

Our pre-print is online on the agriRxiv preprint server:

Sparks, A.H., Del Ponte, E.M., Alves, K. S., Foster, Z., Grünwald, N. J. (2023). Openness and computational reproducibility in plant pathology: where do we stand and a way forward. agriRxiv, Accessed 07 Aug 2024. Online at https://doi.org/10.31220/agriRxiv.2021.00082

The paper is a systematic and quantitative review of articles published in 21 plant pathology journals that spans five years of publications. It provides a basis for identifying what has been done so far in the discipline of plant pathology’s published research to ensure computational reproducibility. The results show that as a discipline, plant pathologists are not widely sharing data or code openly, making the works largely unreproducible. Based on these results and our own experiences, we offer suggestions as to how we can further improve reproducibility in the discipline of plant pathology, but which are not unique to the discipline, that would allow reviewers to make better suggestions, readers to learn more from the work and earns author more citations for their work.

How to cite

Please cite this compendium as:

Sparks, A.H., Del Ponte, E.M., Alves, K. S., Foster, Z., Grünwald, N. J. (2024). Compendium of R code and data for ‘Status and Best Practices for Reproducible Research In Plant Pathology’. Accessed 07 Aug 2024. Online at https://doi.org/10.5281/zenodo.1250664

How to download or install

The R package

This repository is organized as an R package. There is one R function, import_notes() that is used in the paper’s figure and table making when the file, analysis/paper/paper.Rmd is knit. Additionally, a bibliography file, “references.bib”, of the articles that were examined and the notes from the evaluation, “Reproducibility_in_plant_pathology_notes.ods” of the articles are both located in inst/extdata directory. We have used the R package structure to help manage dependencies, to take advantage of continuous integration for automated code testing and for file organisation.

You can download the compendium as a zip from from this URL: https://github.com/openplantpathology/Reproducibility_in_Plant_Pathology/archive/main.zip

Or you can install this compendium as an R package, Reproducibility.in.Plant.Pathology, from GitHub with:

if (!require("remotes"))
  install.packages("remotes")
remotes::install_github("openplantpathology/Reproducibility_in_Plant_Pathology"
)

Once the download is complete, open the Reproducibility_in_Plant_Pathology.Rproj in RStudio to begin working with the package and compendium files.

The Docker Instance

Get the latest instance from Dockerhub, launch it and go to localhost:8787 in your browser. Login with rstudio, password is rstudio.

docker pull adamhsparks/reproducibility_in_plant_pathology
docker run -d -p 8787:8787 adamhsparks/reproducibility_in_plant_pathology

The Paper

The file structure follows a normal R package with one exception. The top-level “/analysis” directory contains the directories and files necessary to re-knit the MS Word document of the paper from an Rmd file, “/analysis/paper/paper.Rmd”.

A script, knit_paper.R, is located in analysis/paper/knit_paper.R that will knit the manuscript and the supplementary materials in a Docker session.

Meta

Licensing

Code: MIT year: 2024, copyright holder: Adam H. Sparks

Data: CC-0 attribution requested in reuse

Adam H. Sparks Senior Research Scientist Farming Systems Innovation Primary Industries Development Department of Primary Industries and Regional Development Level 6.34, 1 Nash St., Perth WA 6000

https://adamhsparks.netlify.app

Code of Conduct

Please note that the Reproducibility.in.Plant.Pathology project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

reproducibility_in_plant_pathology's People

Contributors

adamhsparks avatar emdelponte avatar grunwald avatar zachary-foster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reproducibility_in_plant_pathology's Issues

Getting the references into our tidy data frame in R

I did some checking, we can save ourselves some typing.

Zotero will export a CSV file of a .bib file. We can import that into R and select the columns that we need from that file. That should greatly simplify our typing data in and cut down on possible data entry errors.

JabRef will do this too, but doesn't appear to export the DOI field.

Not building the pkgdown site locally

I get this error when I try to build the site. The website does not build the new Rmd file in the DOC folder.

build_site()
Error in yaml::yaml.load(string, ...) :
Scanner error: while scanning a simple key at line 2, column 1 could not find expected ':' at line 3, column 1
In addition: Warning message:
In strptime(x, fmt, tz = "GMT") :
unknown timezone 'zone/tz/2017c.1.0/zoneinfo/America/Sao_Paulo'

Removed "page_fees" from evaluation

I removed the page fees field, there was some discussion in the vignette around it and it was deemed difficult to determine, I'm also not sure how useful it is for our purposes, so I've removed it.

Figures

Figure 2 is cited before figure 1, and this needs to be changed before submission. They should be cited in order of appearance. Also, Phytopathology would have figures labeled with letters such as Figure 1A.

Rmd vs md files

@emdelponte, I've set up the Rmd files to knit GitHub md files, there is no need to delete them.

Rather, we should knit README.Rmd to create the README.md file in the /src directory. That will create the final output, README.md, showing the lists and other results of our methodology. This way when you go into the /src directory it will automatically display that file.

Decide on attributes of papers to recorded

Here is a start based on our previous discussions, but we should put this in an Rmd.

Paper attributes

  • Raw data accessibility:
    • online
    • publicly accessible (i.e. dont have to email anyone, have an account anywhere, or pay anything)
    • well annotated so its understandable independent of the methods/paper
  • Computational methods:
    • publicly accessible (i.e. dont have to email anyone, have an account anywhere, or pay anything)
    • Using open source, free software
    • All scripted; no manual editing or point and click
    • version controlled from start of project
    • well annotated so its understandable independent of the methods/paper

Journal attributes

  • Impact factor
  • country
  • page charges
  • Open/restricted access
  • issues/year
  • presence/absence of instructions encouraging reproducibility
  • presence/absence of supplementary material section

Article writing

@adamhsparks @grunwald @zachary-foster

I did some work on the manuscript Rmd file and committed the changes:

  • Inserted portions of text for the abstract and introduction sections
  • Made and included diagram in the text using DiagrammeR package. Anyone can make further changes and check by running the chunk. The programming of the diagram uses Graphviz and is quite easy.
  • Described three levels of reproducibility associated with the diagram
  • Changed the style of the html file - the export to .doc file is not working with the diagram code and this format was eliminated for now. The export to html is working fine but not accessible on GitHub because the docs folder is set to pages.

Branches

I've created branches for everyone following @zachary-foster's lead.

In the three I've created (I've not made any changes to yours, @zachary-foster), I have rearranged the files to that the package passes the build tests. To do that I moved the article_data.csv file to inst/extdata.

I've also added the missing DOI information to the doi field in this file in the branches I created and subset the files to be looked at by the assignee, e.g. adams-changes contains an article_data.csv that only has my 50 assigned articles in it.

I'll complete my changes here and then merge back with the master branch.

cheers!

Article classifications

Articles will be classified by the evaluator as to whether they are:

  • Fundamental research
  • Applied research
  • Molecular research
  • Combined

Is there another classification that you would like to use or are these three (or a combination of) enough?

Paper categorization?

@adamhsparks @grunwald @zachary-foster

I went through my sample of 50 articles again and came up with this categorization specific to papers in plant disease-related journals. You may want to propose new ones, split or combine categories. I started with applied versus fundamental and would like your input also to check whether the subcategories are in the right category.

The idea is to provide specific guidelines with regards reproducibility of studies in plant pathology. Is this useful?

Fundamental

  • Basic pathogen biology
  • Plant-pathogen interaction
  • Pathogen population genetics
  • Molecular evolution and ecology
  • Functional genomics
  • Quantitative epidemiology and modeling
  • New taxa (species)

Applied

  • Molecular diagnosis/survey of plant pathogens
  • Disease/pathogen/metabolites survey
  • Pathogen/disease report (morphology/molecular/pathogenicity)
  • Experimental research (treatment effects)
  • Research methods and toolbox (e.g. primer dev, selective media, software, etc)
  • Screening germplasm/cultivar for resistance

Assigned articles, bibliography and notes

I've made a good start on this, but am not quite done, I hope to finish this tomorrow (Wed. my time).

All references are in the /data/Journals_for_review.bib file

Our assignments (complete but not in repository yet) will be in Article_notes.csv

I'll left join on DOI for a complete list of notes and assignments. However, a few of the journals do not have DOI values so I'll have to do some work by hand to join them completely.

Computational tool availablility and data availability ratings

Thinking back to a thread on Twitter recently, GitHub and the like are not proper repositories. Really we should be ranking availability of scripts and data the highest for being available from libraries (e.g., K-State's KREX, USQ's e-prints, etc.). Zenodo, being CERN might be top tier, but GitHub, FigShare are probably second tier.

Review sections

I've made a nascent attempt at outlining the document.

Are the sections I've proposed acceptable or do we need more or less?

Document content

I've started filling in some content into the outline. I'm aware that I'm probably too R-centric, reproducible research does not revolve around R or scientific computing, however, these tools do make it much easier.

Best Practices

We need some good best practices for research from field level to in silico.

Examples/the State of RR in Plant Pathology

I've started filling in with my own work where I've made everything available. We should attempt to quantify efforts to make research reproducible/replicable in plant pathology somehow.

From our e-mail string:

The systematic/quantitative review of articles will provide a nice basis for identifying what has been done so far in our field. We may be able to see which fields are more “reproducible” than others and what are the trends. Then, we could provide guidelines for the best practices (tools available, format, etc) with examples and case studies. I could work on something related to meta-analysis, for example.

By the way, interesting that meta-analysis (which uses published or unpublished data) has been used in plant pathology during the last 10 years but the data and codes are not being shared as far as I know. An open database that allows others to keep adding data would be very useful.

Ideas for analysis

A few ideas for things to look at in our paper:

  • Changes over time (it is only five years but there could be a reasonable trend)
  • Reproducibility index by journal, this may be trickier since we may not have enough articles for each journal
  • Reproducibility index by article type. I think this could work. See #10 for classification discussion
  • Journal publisher type? Society vs commercial publisher or some sort of this classification?

Fields in `article_data.csv`

There are some fields in the article_data.csv file that are already described elsewhere or can be easily derived from the citation.

However, fields like "country" or "Open_or_Restricted" or "Reproducibility_Instructions" can be interpreted and filled in various ways.

I'd propose the following, tell me if I'm off here, @zachary-foster, since you set them up.

  • country - Country of publisher, e.g. Springer Netherlands, entered as the 3 letter ISO code, NLD
  • Open_or_Restricted - entered as "Open"/"Restricted"
  • Reproducibility_Instructions - entered as "yes"/"no"

Paper title needs to be decided on

An early working title we had was:
"What Does Reproducible Research Mean for Plant Pathology?"

We also had:
"How Can We Open Plant Pathology?"

Currently, it is:
"Insights Into Computational Reproducibility in Plant Pathology and a Way Forward"

The current version is appropriate for a paper, but we're leaning toward a letter to the editor, so this needs to be revisited with an eye toward that format.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.