Git Product home page Git Product logo

wes-filters's Introduction

Modular filters for flagging of putative false-positive mutation calls in whole-exome sequencing

Use

These scripts are intended to be used to add annotation to a MAF whether a given variant is a possible false positive. All take stdin and can write to stdout and are standalone with two exceptions, for which a fillout operation needs to be run. Filter flags are added to the FILTER column, in a comma-separated manner. This filters almost exclusively operate on SNVs. Additionally, this repo contains a wrapper for running a VCF-based false-positive filter which populates the FILTER field of a VCF file, which can be retained if conversion to MAF is carried out with vcf2maf.

Generic script to apply filters: applyFilter.sh

This script is a wrapper which will run any of the R based filters in this repository. The output MAF is annotated with headers to indicate which filter was used and which version of the repository.

Usage:

	applyFilter.sh FILTER_NAME INPUT_MAF OUTPUT_MAF [Additional Parameters]

example:

	applyFilter.sh filter_blacklist_regions.R \
		Proj_1234_CMO_MAF.txt filteredMAF.txt

The first lines of the output MAF will look as follows:

#version 2.4
#wes-filters/applyFilter.sh VERSION=v1.0.1-2-g4d3694b FILTER=filter_blacklist_regions.R

Filters

  • Common variants A variant is considered common if its minor allele frequency in ExAC exceeds 0.0004. This filter needs an ExAC_AF column which easiest is can be added to a MAF by running maf2maf, which now also annotates the FILTER column. This hopefully will render this filter script obsolete. With the -f flag this filter will annotate a maf with information from another MAF.
./filter_common_variants.R -m input.maf -o output.maf
  • Low-confidence calls A variant is considered a low-confidence call if it fulfills n_alt_count > 1 | t_depth < 20 | t_alt_count <= 3. Interpretation and use of this filter depends on the nature of the sequencing experiment.
./filter_low_conf.R -m input.maf -o output.maf
  • Presence in study normals Flags a variant if it is supported by 3 reads or more in any of the normals sequenced in the same study. The cut-off for supporting reads can be set with the -n flag. See instructions below for how to generate a fillout file.
./filter_cohort_normals.R -m input.maf -o output.maf -f study.fillout
  • Presence in pool of normals Similarily to the previous filter, a variant is flagged by this filter if it is supported by 3 reads or more in at least 3 samples in a pool of normals. See instructions below for how to generate a fillout file.
./filter_normal_panel.R -m input.maf -o output.maf -f pon.fillout
  • Presence in FFPE pool Flags a variant if it is supported by 3 reads or more in a fillout against an FFPE pool. The cut-off for supporting reads can be set with the -n flag. See instructions below for how to generate a fillout file.
./filter_ffpe_pool.R -m input.maf -o output.maf -f ffpe.fillout
  • FFPE artifact Flags a variant if it looks like an FFPE artifact, i.e. occurs at low VAF and is a C>T substitution. This script also can help identifying samples suffering from FFPE artifacts by using the -i flag.
./filter_ffpe.R -m input.maf -o output.maf
 ### or
./filter_ffpe.R -m input.maf -i
  • Low-mappability ("blacklisted") regions Filter variants located in regions of to which sequencing reads are hard to map, as defined by ENCODE and RepeatMasker. See data/source.txt for details on the files used for this annotation.
./filter_blacklist_regions.R -m input.maf -o output.maf

Fillout wrapper

This script wraps GetBaseCountsMultiSample on luna and can be used to generate fillout files (i.e. allele counts for variants in input MAF) across a set of BAM files. The -n flag can be used to run multithreaded. The genome of the MAF and the BAMs needs to be consistent and specified with the -g flag, which knows where the assemblies for GRCh37, hg19, b37, and b37_dmp are located on luna. The script convert-maf-to-hg19.sh can be used to fake an hg19 MAF.

./maf_fillout.py -m input.maf -b file1.bam file2.bam [..] -g genome -n threads -o output.fillout

fpfilter.pl wrapper

This script wraps fpfilter.pl from variant-filter. Filter parameters in fpfilter.pl might be ajusted according to the nature of the sequencing experiment. Temporary files generated are removed upon completion. Like the fillout wrapper, this script knows where GRCh37, hg19, b37, and b37_dmp are located on luna.

./run-fpfilter.py -v input.vcf -b tumor.bam -g genome -f path/to/fpfilter.pl

wes-filters's People

Contributors

kpjonsson avatar alexpenson avatar soccin avatar md09 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.