Git Product home page Git Product logo

tpt-siphonaptera's Introduction

tpt-siphonaptera

Code for cleaning and merging Siphonaptera taxonomy for the Terrestrial Parasite Tracker

Taxonomy Cleaning for Terrestrial Parasite Tracker Taxonomy

The R scripts in this repository were designed for cleaning taxonomic classifications received from various sources for the Terrestrial Parasite Tracker Thematic Collections Network (TPT) Taxonomy Reource. Specific scripts were created for transforming data from each resource as well as merging the resources for review.

lib_func.r

Loads all needed libraries and functions for other scripts. Should be run before any other scripts are run.

Lewis_transform.r

Transforms BYU Lewis list as updated provided by Mike Hastriter to Darwin Core

Input

File Name Description
Lewis World Species List MMM DD YYYY.xlsx Lewis database as provided by Mike Hastriter at BYU
Lewis_reviewed.xlsx Names from Lewis_name_review output that have been corrected and are to be returned to the working file
Lewis_removed.xlsx Names from Lewis_name_review output that have been removed from the working file
tpt_dwc_template.xlsx Template (no data) for Darwin Core file

Output

File Name Description
Lewis_duplicates.csv Names removed from the original data because they were duplicates
Lewis_name_review.csv Names removed from the original data that need review before adding back or removing (see inputs above)
Lewis_non_DwC.csv Name ID plus all non Darwin Core fields from original file
Lewis_DwC.csv Name ID plus all applicable Darwin Core fields

NMNH_transform.r

Transforms Smithsonian (NMNH) list of taxa to Darwin Core

Input

File Name Description
NMNH_Siphonaptera.xlsx Catalog of fleas from the Smithsonian
NMNH_reviewed.xlsx Names from NMNH_name_review output that have been corrected and are to be returned to the working file
tpt_dwc_template.xlsx Template (no data) for Darwin Core file

Output

File Name Description
NMNH_need_review.csv Names removed from the original data that need review before adding back or removing (see inputs above)
NMNH_non_DwC.csv Name ID plus all non Darwin Core fields from original file
NMNH_DwC.csv Name ID plus all applicable Darwin Core fields

FMNH_transform.r

Transforms Field Museum (FMNH) list of taxa to Darwin Core

Input

File Name Description
FMNH_Siphonaptera.xlsx List of flea names from the Field Museum
FMNH_reviewed.xlsx Names from NMNH_name_review output that have been corrected and are to be returned to the working file
tpt_dwc_template.xlsx Template (no data) for Darwin Core file

Output

File Name Description
FMNH_need_review.csv Names removed from the original data that need review before adding back or removing (see inputs above)
FMNH_non_DwC.csv Name ID plus all non Darwin Core fields from original file
FMNH_DwC.csv Name ID plus all applicable Darwin Core fields

CoL_transform.r

Transforms Catalogue of Life (CoL) download to Darwin Core

Input

File Name Description
CoL_DwC.xlsx Flea names from Catalogue of Life download
tpt_dwc_template.xlsx Template (no data) for Darwin Core file

Output

File Name Description
CoL_DwC.csv Name ID plus all applicable Darwin Core fields

merge_taxotools.r

Transforms Global Biodiversity Information Facility (GBIF) download and all ofther Darwin Core files to taxotools format, then merges them and generates a checklist for expert review

Input

File Name Description
Lewis_DwC.csv Output of Lewis_transform.r
NMNH_DwC.csv Output of NMNH_transform.r
FMNH_DwC.csv Output of FMNH_transform.r
CoL_DwC.csv Output of CoL_transform.r
GBIF_Siphonaptera.xlsx Flea names from GBIF download (already in DwC format, but still transformed a bit in this script)

Output

File Name Description
problems.csv Names that could not be merged and need review
taxo_siphonaptera.csv Merged list of names
Flea_taxolist.html Checklist of merged names for expert review

arctos_upload_transform.r

Transforms Darwin Core files to Arctos hierarchical tool upload format (awaiting final list to create transform)

Input

File Name Description
Arctos_upload.csv Template (no data) for Arctos upload

Output

File Name Description

Usage

Information from a source may need to be run through the appropriate script multiple times. Any change to a primary source will require re-run and a new merge.

tpt-siphonaptera's People

Contributors

jegelewicz avatar

Watchers

 avatar

Forkers

vijaybarve

tpt-siphonaptera's Issues

Names in GBIF

I'd wanted to give a concrete example of what the removal of the type checklist from the GBIF backbone has done.

The taxon in question is Hoplopsyllus tenuidigitus Stewart, 1940

When this taxon was part of the backbone, everything worked as expected. Occurrences with an identification of Hoplopsyllus tenuidigitus could be found with that name and as it was recorded as a synonym of Euchoplopsyllus glacialis (Taschenberg, 1880) it would have been interpreted as that and so could be found using the currently accepted name as well. Everyone would be happy.

Now that Hoplopsyllus tenuidigitus Stewart, 1940 has been deleted, the occurrences with this identification are interpreted as Hoplopsyllus Baker, 1905 which has no association with the currently accepted name Euchoplopsyllus glacialis (Taschenberg, 1880) and so will not be discovered by anyone searching that name.

A search in occurrences for Hoplopsyllus tenuidigitus Stewart, 1940 using "Search all fields" does bring up the record, but it is buried in with a bunch of other things also interpreted as Hoplopsyllus Baker, 1905.

image

I thought that "limiting my search to this taxon only" might help, but it just removes the Hoplopsyllus tenuidigitus Stewart, 1940 records and a search in scientific name for Hoplopsyllus tenuidigitus Stewart, 1940 finds nothing. It seems the only way to get to things with a verbatim identification of Hoplopsyllus tenuidigitus Stewart, 1940 is to know beforehand the GBIF taxon id and place it in the url.

The end result is that things that previously would have been discovered now are not.

EXCEPT

History of the name Hoplopsyllus tenuidigitus Stewart, 1940 using Google reveals a tangled web:

1940 - Described by Stewart - unknown where

1940-41 Evidence found in Ten papers on WESTERN FLEAS in which are erected two new genera and fourteen new species

image

1953 - listed as synonom of Hoplopsyllus foxi Ewing 1924 in

A Synopsis of North American Fleas, North of Mexico, and Notice of a Supplementary Index

image

1967 - listed as synonom of Hoplopsyllus (Euhoplopsyllus) glacialis foxi Ewing 1924 in

NIH Bulletin

image

1996-97 listed as synonom of Euhoplopsyllus foxi Ewing 1924 in

Nomina Insecta Nearctica

image

2021 listed as synonym of Euchoplopsyllus glacialis (Taschenberg, 1880) in GBIF

So, I cannot really trace what happened to this name or why without a lot more time and access to publications and I also cannot say for certain that the name Hoplopsyllus tenuidigitus Stewart, 1940 is indeed currently represented by Euchoplopsyllus glacialis (Taschenberg, 1880). BUT I do believe that these published associations should be made if anyone is going to discover specimens of interest using any of the names that might be associated with a name used as an identification.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.