Git Product home page Git Product logo

strelkasomatic's Introduction

strelkaSomatic

Strelka variant caller in somatic mode

Overview

Dependencies

Usage

Cromwell

java -jar cromwell.jar run strelkaSomatic.wdl --inputs inputs.json

Inputs

Required workflow parameters:

Parameter Value Description
tumorBam File Input BAM file with tumor data
tumorBai File BAM index file for tumor data
normalBam File Input BAM file with normal data
normalBai File BAM index file for normal data
reference String Reference assembly id

Optional workflow parameters:

Parameter Value Default Description
bedFile String? None BED file designating regions to process
numChunk Int? None If BED file given, number of chunks in which to split each chromosome
outputFileNamePrefix String "strelkaSomatic" Prefix for output files

Optional task parameters:

Parameter Value Default Description
splitIntervals.gatk String "$GATK_ROOT/bin/gatk" GATK executable path
splitIntervals.splitIntervalsExtraArgs String? None Additional arguments for the 'gatk SplitIntervals' command
splitIntervals.nonRefModules String "gatk/4.1.2.0" Environment modules other than the genome refence
splitIntervals.memory Int 32 Memory allocated for job
splitIntervals.overhead Int 8 Memory overhead for running on a node
splitIntervals.timeout Int 72 Hours before task timeout
convertIntervalsToBed.modules String "python/3.7" Environment modules
convertIntervalsToBed.memory Int 16 Memory allocated for job
convertIntervalsToBed.timeout Int 4 Hours before task timeout
configureAndRunParallel.nonRefModules String "python/2.7 samtools/1.9 strelka/2.9.10" Environment module names other than genome reference
configureAndRunParallel.jobMemory Int 16 Memory allocated for job
configureAndRunParallel.threads Int 4 Number of threads for processing
configureAndRunParallel.timeout Int 4 Hours before task timeout
snvsVcfGather.modules String "gatk/4.1.2.0" Environment module names and version to load (space separated) before command execution
snvsVcfGather.gatk String "$GATK_ROOT/bin/gatk" GATK to use
snvsVcfGather.memory Int 16 Memory allocated for job
snvsVcfGather.timeout Int 12 Hours before task timeout
indelsVcfGather.modules String "gatk/4.1.2.0" Environment module names and version to load (space separated) before command execution
indelsVcfGather.gatk String "$GATK_ROOT/bin/gatk" GATK to use
indelsVcfGather.memory Int 16 Memory allocated for job
indelsVcfGather.timeout Int 12 Hours before task timeout

Outputs

Output Type Description Labels
snvsVcf File VCF file with SNVs, .gz compressed vidarr_label: snvsVcf
indelsVcf File VCF file with indels, .gz compressed vidarr_label: indelsVcf

Commands

This section lists command(s) run by strelkaSomatic workflow

  • Running strelkaSomatic

Main task, configuring and running Strelka2

 	set -eo pipefail
 
 	~{writeBed}
 	~{indexFeatureFile}
 
 	configureStrelkaSomaticWorkflow.py \
 	--normalBam ~{normalBam} \
 	--tumorBam ~{tumorBam} \
 	--referenceFasta ~{refFasta} \
 	~{regionsBedArg} \
 	--runDir .
 
 	./runWorkflow.py -m local -j ~{threads}

Converting interval files to .bed

.bed files use 0-based index, we need to do a proper conversion of interval files

         python3 <<CODE
         import os, re
         intervalFiles = re.split(",", "~{sep=',' intervalFiles}")
         for intervalFile in intervalFiles:
             items = re.split("\.", os.path.basename(intervalFile))
             items.pop() # remove .interval_list suffix
             bedName = ".".join(items)+".bed"
             with open(intervalFile, 'r') as inFile, open(bedName, 'w') as outFile:
                 for line in inFile:
                     if not re.match("@", line): # omit the GATK header
                         fields = re.split("\t", line.strip())
                         fields[1] = str(int(fields[1]) - 1)
                         outFile.write("\t".join(fields)+"\n")
         CODE

Splitting intervals

SplitIntervals is used to produce a requested number of files to make the analysis parallel, decreasing demand for resources per chunk and improving speed

 	set -eo pipefail
 
 	mkdir interval-files
 	ln -s ~{refFai}
 	ln -s ~{refDict}
 	~{gatk} --java-options "-Xmx~{memory-8}g" SplitIntervals \
 	-R ~{refFasta} \
 	~{intervalsArg} \
 	~{scatterArg} \
 	--subdivision-mode BALANCING_WITHOUT_INTERVAL_SUBDIVISION \
 	-O interval-files \
 	~{splitIntervalsExtraArgs}
 
 	cp interval-files/*.interval_list .

Combining vcf files after scatter

This task uses GatherVcfs tools from GATK suite which needs the files to be ordered. No overlap between covered features allowed.

    set -eo pipefail
 
    ~{gatk} GatherVcfs \
    -I ~{sep=" -I " vcfs} \
    -R ~{refFasta}
    -O ~{outputName}
 
    gzip ~{outputName}

Support

For support, please file an issue on the Github project or send an email to [email protected] .

Generated with generate-markdown-readme (https://github.com/oicr-gsi/gsi-wdl-tools/)

strelkasomatic's People

Contributors

callunity avatar iainrb avatar pruzanov avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.