Git Product home page Git Product logo

ngs_docker's Introduction

NGS_docker

playing with docker and try making a variant calling pipeline inspired by Omics Pipe and NGSeasy

NGS_docker is a dockerized NGS pipeline for variant calling. This program includes GATK, which is only free for academic and non-profit use. For details see: https://www.broadinstitute.org/gatk/about/#licensing

TODO: build gatk from https://hub.docker.com/r/biodckrdev/gatk/~/dockerfile/ instead of using ADD

design notes

  • NGS_docker is designed for a readable and lightweighted variant calling pipeline. The wgs_pipe.py is the main entry of the pipeline. Tasks are executed and logged by run_task decorator, Additional parameters can be added in prog_cfg.py. functions following the same format def foo(args, param_dict): ... return cmd, output_file where cmd are the command to be executed by the run_task, output_file are required for check existance.
  • all programs like BWA, GATK and related reference data are dockerized.
  • the structure of output dir is:
rootdir/
   samplename/
       /samplename.vcf #final result
       /log/{samplename}.std.txt # put all program output here
       /log/{samplename}.err.txt # put error msg here
       /log/{samplename}.run.txt # put pipeline running status here
       /tmp # all intermadiate files
       /tmp/cache_dict.pkl 
       /report 
           /variantanno # variant annotation result by snpEff
           /fastqc # quality control report by fastqc
  • cache_dict.pkl is a pickle dumped dictionary with key:value be the command:outputfiles and is updated after successfully finishing each task. NGS_docker checks the return status of each tasks and delete the output_file if task fails. So for each tasks if the outputfile exists and the command is the same with what stores in cache_dict.pkl, we will skip it.
  • all reference data is named by its version (eg. hg19) and is stored in /ref and mount by tasks by --volumes-from {name}.
  • rootdir is a host directory, its subdir rootdir/samplename will be used to store output. This directory is mounted as /out_dir and set as working directory for all tasks.

ngs_docker's People

Contributors

luyitian avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.