Git Product home page Git Product logo

vf2's Introduction

VF

This is an open-source prototype software implementation of our algorithms for variant filtering in variation graphs. Its purpose is to reduce graph size while preserving subsequent mapping accuracy. The graph size reduction naturally benefits read-to-graph mapping algorithms. The intuition of the proposed graph reduction framework is following. Consider a complete variation graph constructed from a set of given haplotypes. Any substring of a haplotype has a corresponding path in the complete variation graph. Not including some variants will introduce errors in the corresponding paths. If the number of such errors is matched with the error tolerance built into sequence-to-graph mapping algorithms, the same identical paths can still be found. Mathematically, this problem is phrased in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. See our preprint for more details.

Dependencies

Installation

The above dependencies can be handled by running script dependencies.sh.

git clone https://github.com/NedaTavakoli/VF.git
cd VF
./dependencies.sh
make

After a successful compilation, expect executables named as greedy_snp, lp_snp, greedy_snp_indels, ilp_snp_indels, greedy_sv and ilp_sv in a directory named build.

Usage

All the executables implement a variety of algorithms to achieve variant graph size reduction, but they all have a similar interface.

SYNOPSIS
        greedy_snp        -a <alpha> -d <delta> -vcf <file1> -chr <id> [-prefix <file2>]
        lp_snp            -a <alpha> -d <delta> -vcf <file1> -chr <id> [-prefix <file2>]
        greedy_snp_indels -a <alpha> -d <delta> -vcf <file1> -chr <id> [-prefix <file2>]
        ilp_snp_indels    -a <alpha> -d <delta> -vcf <file1> -chr <id> [-prefix <file2>] [--pos]
        greedy_sv         -a <alpha> -d <delta> -vcf <file1> -chr <id> [-prefix <file2>]
        ilp_sv            -a <alpha> -d <delta> -vcf <file1> -chr <id> [-prefix <file2>] [--pos]


OPTIONS
        <alpha>     path length in variation graph (e.g., 500)
        <delta>     differences allowed (e.g., 10)
        <file1>     uncompressed vcf file (something.vcf)
        <file2>     filename to optionally save input and output variants
        <id>        chromosome id (e.g., 1 or chr1), make it consistent with vcf file
        --pos       set objective to minimize variation positions rather than variant count

A few example runs are made available for user's reference. In practice, α should be a function of read lengths whereas δ is determined based on sequencing errors and error-tolerance of read-to-graph mapping algorithms. NOTE: At runtime, lp_snp and ilp_sv_indels executables might complain if you don't have a valid Gurobi license file. It is straight-forward and free to get one for academic use here. If you are using a shared HPC-cluster resource, Gurobi may be available as a module.

Benchmark

We evaluated the magnitude of graph reduction achieved in human chromosome variation graphs using VF (v1.0) with multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α=10 kbp, δ=1000), 99.99% SNPs and 73% indel structural variants could be safely excluded from human chromosome 1 variation graph.

Publications

vf2's People

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.