Git Product home page Git Product logo

tibhannover / bacdiver Goto Github PK

View Code? Open in Web Editor NEW
10.0 2.0 12.0 3.3 MB

Inofficial R client for the DSMZ's Bacterial Diversity Metadatabase (former contact: @katrinleinweber). https://api.bacdive.dsmz.de/client_examples seems to be the official alternatives.

Home Page: https://TIBHannover.GitHub.io/BacDiveR/

License: MIT License

R 98.05% Makefile 1.95%
r microorganism bacterial-database bacteriology webservice-client microbiology biobank r-package rstats bacterial-samples

bacdiver's Introduction

BacDiveR lifecycle DOI Travis build status CII Best Practices

This R package provided a programmatic interface to the Bacterial Diversity Metadatabase of the DSMZ (German Collection of Microorganisms and Cell Cultures).

As of June 2021, BacDive's "redesign" has rendered this R package inoperable. Apparently, they want you to use either of these clients.

Old README below

BacDiveR helps you improve your research on bacteria and archaea by providing access to "structured information on [...] their taxonomy, morphology, physiology, cultivation, geographic origin, application, interaction" and more (Söhngen et al. 2016). Specifically, you can:

  • download the BacDive data you need for offline investigation, and

  • document your searches and downloads in .R scripts, .Rmd files, etc.

Thus, BacDiveR can be the basis for a reproducible data analysis pipeline. See TIBHannover.GitHub.io/BacDiveR for more details, /news there for the changelog, and GitHub.com/TIBHannover/BacDiveR for the latest source code.

It was also built to serve as a demonstration object during TIB's "FAIR Data & Software" workshop.

Installation

  1. Because the BacDive Web Service requires registration please do that first and wait for DSMZ staff to grant you access.

  2. Once you have your login credentials, install the latest BacDiveR release from GitHub with: if(!require('devtools')) install.packages('devtools'); devtools::install_github('TIBHannover/BacDiveR').

  3. After installing, follow the instructions on the console to save your login credentials locally and restart R(Studio) or run usethis::edit_r_environ() and ensure it contains the following:

[email protected]
BacDive_password=YOUR_20_char_password

In the examples and vignettes, the data retrieval will only work if your login credentials are correct in themselves (no typos) and were correctly saved. Console output like "{\"detail\": \"Invalid username/password\"}", or Error: $ operator is invalid for atomic vectors indicates that either the login credentials are incorrect, or the .Renviron file.

How to use

There are two main functions: retrieve_data() and retrieve_search_results(). Please click on their names to read their docu, and find real-life examples in the vignettes "BacDive-ing in" and "Pre-Configuring Advanced Searches".

You can also run citation('BacDiveR') in the R console and use its output because that ensures you are citing exactly the installed version.

If you want to import this repo's metadata into a reference manager directly, I recommend Zotero and its GitHub translator. Please double-check, that the citation refers to the same version number that you ran your analysis with.

When using BibTeX, you may want to try changing the item type from to @Software ;-) Support for that is being worked on.

Don't forget to also cite BacDive itself whenever you used their data, regardless of access method.

How to contribute: See CONTRIBUTING.md file.

Known issues: See bugs and ADRs.

Similar tools

These seem to scrape all data, instead of retrieving specific datasets.

References

  • Söhngen, Bunk, Podstawka, Gleim, Overmann. 2014. “BacDive — the Bacterial Diversity Metadatabase.” Nucleic Acids Research 42 (D1): D592–D599. doi:10.1093/nar/gkt1058.

  • Söhngen, Podstawka, Bunk, Gleim, Vetcininova, Reimer, Ebeling, Pendarovski, Overmann. 2016. “BacDive – the Bacterial Diversity Metadatabase in 2016.” Nucleic Acids Research 44 (D1): D581–D585. doi:10.1093/nar/gkv983.

  • Reimer, Vetcininova, Carbasse, Söhngen, Gleim, Ebeling, Overmann. 2018. “BacDive in 2019: Bacterial Phenotypic Data for High-Throughput Biodiversity Analysis” Nucleic Acids Research doi:10.1093/nar/gky879.

bacdiver's People

Contributors

axel-klinger avatar katrinleinweber avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

bacdiver's Issues

randomise test searches

https://bacdive.dsmz.de/api/bacdive/example uses some specific search terms. If automatic tests use these as well, their internal statistics about popular datasets might be skewed. Maybe they are already, and this fact is accounted for by the DSMZ.

  • ask whether any such statistics are collected

The test search terms could be randomised to avoid this problem: sample(seq(100000, 999999), size = 1), acc <- paste(sample(LETTERS, size = 2), collapse = ""), int), paste("DSM", round(int / 1000)) or similar.

  • ask for max ranges

This would spread out the "popularity" inflation, but might require fine-tuning the seq ranges. Plus, it would assume continuous numbering on their end.

  • ask whether this is the case

Remove invalid \n in JSON

While implementing #31 and switching from rjson to jsonlite I noticed that some fields contain a insufficiently escaped \ns. This results in lexical error: invalid character inside string..

@ceb15: Please consider ensuring that those are escaped as \\n already BacDive or (I presume) during JSON serialisation.

screen shot 2018-03-20 at 11 00 02

I'll parse them away for now.

Compare temp data to https://zenodo.org/record/1175609

  • check whether that dataset has different source
    • partially from BacDive => reproduce results
  • write vignette about extracting growth temp from that dataset & through BacDiveR & mention @mengqvist then
    • parse his dataset, try to retrieve same species from BacDive

split retrieve_IDs off from retrieve_data()

Extract to different function? If yes, by scraping IDs from paged URL returns (official examples), or by storing the URLs as intermediate result, plus providing helper functions to narrow that result down to the IDs?

Or, implement as an internal loop-back in retrieve_data(…, searchType = "taxon") based on new parameter taxon_data = TRUE?

  • ask, whether only taxon search can return multiple IDs

Write management plans

https://figshare.com/articles/Managing_Research_Software_Development_better_software_better_research/5930662 p24f & http://www.software.ac.uk/software-management-plans

What software will you write?
What will your software do? 
Will your software have a name?
Who are the intended users of your software?
Is for one type of user or for many?
What expertise is required?
How will you make your software available?
How will your software contribute to research and how will you measure its contribution?

aggregate datasets into useful structure before returning

noticed while working on #16

retrieve_data() currently appends multiple downloads into a continuous list in which the datasets can't be addressed anymore. We need a data structure, that lets the user $-address the datasets, and their fields. Ideally, each dataset is referred to by index = bacdive_id. Something like a sparse list-of-lists?!?

ideas:

  • aggregate JSON strings in character vector, then rjson::fromJSON() them "in-place" or somehow that creates the nested lists "below / as lower hierarchies" of that vector
  • write-out each dataset to a file (kind of a local cache), then maybe concatenate files & re-import as a useful data structure
  • use jsonlite to create 1 dataframe per bacdive_ID, then add those to a list
  • keep on c()ombining downloads, but aggregate into a higher-level list and use an apply variant to extract a field/element from the resulting "megastructure"

Make taxon search more prominent?

Assuming the vast majority ("90%") of BacDive users looks up data about a strain, backdive_ID as the default search may not be as useful.

Maybe rather a retrieve_taxon_data("…", filter_by = c("property_A", "prop_B", "C")) function?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.