Git Product home page Git Product logo

basic-variant-filter's Introduction

Basic-Variant-Filter: Perform basic somatic variant filtration using a filtration string.

Introduction

Somatic variant manual review usually starts with some basic filtration to weed out obviously incorrect calls made by variant caller(s). This process is usually done with ad-hoc bash scripts, using awk and/or sed, but leads to great variation. This script, implemented in Haskell, provides a standardized method for basic filtration on somatic variant data. This is performed using a small language to describe the filtration desired called a filtration string.

Prerequisites

bvf.hs assumes you have a the GHC compiler and packages installed that it imports. The easiest way to do this is to download the Haskell Platform.

Installing required packages

To install the peripheral packages bvf.hs requires, you can call the following command assuming you have cabal, a package manager and build system for Haskell, installed on your system (it comes with the Haskell Platform).

$ cabal install [packagename]

Required packages

  • Text.PrettyPrint.Boxes
  • System.Process
  • Data.List.Split
  • System.Temporary

Input

A prerequisite for getting useful output from this script is to have the correct input file structure. This script (at this point in time) assumes that the header of the file is the first line of the file, and that all headers in the file are the same.

For example:

TUMOR.AD NORMAL.AD TUMOR.AF MUTATION
23,32 43,45 3.2 missense
TUMOR.AD NORMAL.AD TUMOR.AF MUTATION
53,23 12,13 2.1 noncoding
32,13 32,34 5.1 missense
TUMOR.AD NORMAL.AD TUMOR.AF MUTATION
34,53 42,23 4.4 missense

Usage

bvf.hs is easy to use.

You can call it using the runghc command provided by the GHC compiler as such:
$ runghc bvf.hs inputfile.tsv

For maximum performance, please compile and run the source code as follows:
$ ghc -O2 -o BVF bvf.hs
$ ./BVF inputfile.tsv

Arguments

bvf.hs has few different command line arguments:

Usage: bvf [-vV?ioF] [file]
  -v          --verbose              Output on stderr.
  -V, -?      --version              Show version number.
  -o OUTFILE  --outputfile=OUTFILE   The output file.
  -F FIELDS   --filterfields=FIELDS  The fields to filter on.
  -E          --stripheaderexact     Strip the headers in the file (exact).
  -H          --stripheadersanshead  Strip the headers in the file (without head).
  -T          --stripheadersanstail  Strip the headers in the file (without tail).
              --help                 Print this help message.

The -v option, the verbose option, will provide a full error message.
The -V option, the version option, will show the version of bvf in use.
The -o option, the outputfile option, is used to output the operation (or lack thereof) on the input .tsv file into a output file, whose name is specified by the user, for example filteredinput.tsv.
The -F option, the filterfields option, which is where the user specifies the filtration string that will be used by bvf to filter the input .tsv file.
The -S option, the stripheader option, tells bvf to strip all lines in the input .tsv file that are header lines (any line that matches the first line of the input .tsv file).
Finally, the --help option outputs the help message seen above.

Filtration String

The default behavior of running bvf.hs on a input .tsv file is to cat it back to the user.

This behavior is desirable because the user can quickly see differences in filtration schemes by then piping into wc -l, or any other unix tool for that matter.

This also allows the user to apply filtration schemes only to specific rows of the input .tsv file based on the value of any given column.

The filtration string describes the filtration that will occur on the input file. It is a simple, standardized, string based command line argument with the following structure:
;COLUMNOFFILTRATION:STRUCTURE~OPERATION~COMPARISON;

Multiple filtration strings can be defined:
;TUMOR.AD:x,y~+~>=40;TUMOR.AF:x:~|~>=3.0;

Please see the wiki for more in-depth usage examples and explanation.

Docker

A docker-based solution (Dockerfile) is availible in the corresponding repository. Currently, this Dockerfile assumes that you run docker interactively.

Credits

Documentation was added March 2019.
Author : Matthew Mosior

basic-variant-filter's People

Contributors

matthew-mosior avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.