Git Product home page Git Product logo

digitization's Introduction

Scripts for the FLMNH

This repository serves as a collection of scripts, library classes, and other projects developed to help improve the digitization workflow at the Florida Museum of Natural History at UF.

Installing Dependencies:

$ pip3 install --user -r requirements.txt

Scripts

There is a selection of scripts available in the scripts directory. They all have unique CLI structures, so be sure to run whichever is needed with the --help flag to get started. The below table provides a brief overview for each script:

Script Description
dynaiello.py A version of the Aiello script with less column restrictions. Copy and rename entries based on a CSV file.
gene_copy.py Removes divergent consensus sequences (IBA pipeline) from .fas/.fasta files
gene_parser.py Parses .fa/.fasta files to extract accession numbers and gene names
mgcl_tracker.py Tracks the used catalog numbers in the filesystem against a range/csv of numbers
protein_combine.py Combines separated protein/nucleotide files into one combined file
relocate.py (deprecated) Relocates 'troublesome' images based on the log output of other scripts
suspect_numbers.py Agreggates 'suspect' catalog numbers in a filesystem
unique_values.py Outputs all the unique values in the columns of a CSV or XLSX file
wls.py (deprecated) Generates CSV of specimen at current working directory
wrangler.py Assigns BOMBID numbers to collection specimen

Digitization Program

Although a little dated, a few major workflow tasks for were implemented as libraries used by the wrapper digitization.py file.

Usage:

A text-based list of programs to load will show on launch. Each program has a unique help prompt once loaded that will be available once selected.

Programs

This is not an exhaustive list:

Name Description Example(s)
Rename Replaces the existing bash scripts previously for renaming images that have been named via the physical barcode scanner: MGCL 0123456 (2).CR2 --> MGCL 0123456_V.CR2
Legacy Upgrade We have ran Legacy Upgrade on server data already, if you are a current volunteer you most likely do not need to run this script. Legacy Upgrade is to be used to upgrade data to the new standards. MGCL__0123456__V__M.CR2 --> MGCL_0123456_V.CR2
Aiello This program is designed for a specific task, and should only be used accordingly. It will parse out a specifically formatted excel sheet to locate file references in the museum server and copy these files to another location with a different naming scheme to be used in a particular project.
Rescale This will rescale images in a given directory (and subdirectories, if required by user) by a user provided proportion. This is to aid in the upload of images, as the hi-res images previously took up a large amount of space. This will allow for smaller file sizes without losing a noticeable amount of image quality.
Zipper This will analyze a directory (and subdirectories, if required by user) and determine how many 1GB zip archives should be created. This is to aid in the upload of images, as well. Important notes: this script only looks inside folders entitled LOW-RES at all levels. This is because only the downscaled images will be grouped together for upload, as a means of saving cloud storage.

Other Projects

digitization's People

Contributors

aaronleopold avatar dependabot[bot] avatar notoriouseng avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.