Git Product home page Git Product logo

mad's Introduction

Minor Allele Detection Pipeline

Identifying minor-allele single nucleotide variants (SNVs) in SARS-CoV2 is essential for understanding the evolution of the virus, and can have implications in public health, vaccine development, and treatment. One important question in the emergence of new mutations is whether we can effectively identify new variants that arise in an individual before they become fixed in the population. To that end, we developed the MAD (Minor Allele Detection) Pipeline using Nextflow to evaluate the performance of various different variant callers (iVar, VarScan, HaplotypeCaller, Mutect2, lofreq, freebayes, and our in-house pipeline timo) in identifying minority variants and their allele frequencies.

The MAD pipeline is a fully containerized, out of the box solution for evaluating variant callers for any organism, requiring the user to provide a set of SNPs to build a mutation model (in VCF format), an aligned bam file (for modeling), paired-end reads (in fastq format), and a reference genome. Given this organism and sequencer specific input, MAD will build the appropriate models, simulate SNVs and fastq reads, downsample the reads, perform alignment, deduplication, variant calling, and produce plots summarizing the results in an HTML report.

Requirements:

Nextflow v20.07.1

Docker:

All required software is packaged in a docker container available at: https://hub.docker.com/r/gencorefacility/mad

Running the pipeline:

Clone the git repository and edit the following parameters in the nextflow.config file:

params.ref: Path to reference genome fasta. BWA index, fasta index, and picard reference dictionary must exist in the same dir.

params.fcid: Unique name for this analysis (alphanumeric, no spaces)

params.outdir: Output path

params.mut_model_vcf: Path to VCF file to build mutation model

params.error_model_fq_read1: Read 1 of paired end fastq reads for error model

params.error_model_fq_read2: Read 2 of paired end fastq reads for error model

params.readsim_model_bam: Bam file for modeling

params.mut_rate: Mutation rate (between 0 and 1)

params.readsim_cov: Simulation coverage

params.readsim_downsample_fracs: Simulation downsampling fractions [random seed, fraction]

params.readsim_allele_fracs: Simulation allele frequencies

process: Configuration for scheduler. Replace 'slurm' with execution scheduler. Resources required will vary greatly depending on input, however reasonable defaults have been provided by default.

Run the main.nf script, providing the path to the config, and specifying the -with-docker or -with-singularity parameter along with the docker repo:

nextflow run main.nf -c <path_to_config> -with-docker gencorefacility/mad:1

Output:

An HTML report summarizing the results will be stored in params.outdir/out/reports/MAD_report.html

mad's People

Contributors

genericdata avatar mohammedkhalfan avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

kej310

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.