Git Product home page Git Product logo

giraffe-sv-paper's Introduction

Giraffe Mapper Evaluation and Application Scripts

This repository contains scripts used to reproduce our work with the new Giraffe short read mapper in vg.

Workflow Overview

The scripts expect to be run in roughly this order:

Finding Files Used

If you do not have access to UCSC's internal AWS systems, you will probably not be able to access many of the files the scripts use at their given paths. Public archived copies of the data should be available via UCSC and via Zenodo with preregistered DOI 10.5281/zenodo.4721495.

Replication Considerations

Note that the top level workflows are not automated. Within each section, you will have to manually prepare the environment for and run each script. Some scripts expect to run locally with vg or snakemake installed and sufficient memory and scratch space, some scripts expect to run with access to a Kubernetes cluster, and some scripts expect to be launched on a Toil-managed autoscaling Mesos or Kubernetes cluster. We provide hints as to how to set up such environments, but a full tutorial is not given here. Additionally, scripts that launch asynchronous Kubernetes jobs do not include code to wait for the jobs to complete; that monitoring must be provided by you.

We provide scripts as close to what we actually ran as possible; these scripts will not be fully portable to your environment without modification. If you do not have access to UCSC's AWS storage buckets (such as s3://vg-k8s or s3://vg-data), or if you would like to avoid overwriting the original analysis artifacts, some scripts will have to be adapted to point at where you intend to keep your artifacts for your repetition of the analysis. Additionally, scripts designed to kick off Kubernetes jobs may need to be adapted to reference your Kubernetes environment's AWS credential secrets or namespace names.

The scripts provided here access the Internet, and invoke other software that accesses the Internet, to download code and container images. While we include code and container image snapshots in our code and data archive, we have not done the required engineering work in our software stack to enable those snapshots to be used as an alternative to the Internet loactions where our scripts, and the software they invoke, expect to find things. Consequently, if, say, quay.io decides to stop hosting the container images we used for free forever, the scripts are likely to stop working as written. Additionally, while we provide snapshots of the containers and software we produced for this work, we have not provided snapshots of other containers that our workflows use (such as, for example, aslethalfang/tabix:1.7). If these containers cease to be retrievable (for example, if they become old and Docker Hub deletes them due to inactivity, or if new authentication requirements become applicable for accessing them), then these scripts will stop working as written.

giraffe-sv-paper's People

Contributors

adamnovak avatar xchang1 avatar jmonlong avatar cmarkello avatar jonassibbesen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.