Git Product home page Git Product logo

bmftools's Introduction

BMFtools

BMFtools (Barcoded Molecular Families tools) is a suite of tools for barcoded reads which takes advantage of PCR redundancy for error reduction/elimination. The core functionality consists of molecular demultiplexing at fastq stage, producing a unique observation for each sequenced founded template molecule. Accessory tools provide postprocessing, filtering, quality control, and summary statistics.

Installation

Requirements: gcc4.9+ samtools 1.2+

git clone https://github.com/ARUP-NGS/BMFtools --recursive
cd BMFtools
make

Use

bmftools
bmftools <--help/-h>
bmftools <subcommand> <-h>

Tools

Name Use
bmftools cap Postprocess a tagged BAM for BMF-agnostic tools.
bmftools depth Calculates depth of coverage over a set of bed intervals.
bmftools collapse Collapse initial fastq records by barcode
bmftools err Calculate error rates based on cycle, base call, and quality score.
bmftools famstats Calculate family size statistics for a bam alignment file.
bmftools filter Filter or split a bam file by a set of filters.
bmftools mark Add tags for rsq.
bmftools stack A maximally-permissive variant caller using molecular barcode metadata analogous to samtools mpileup.
bmftools rsq Rescue reads with using positional inference to collapse to unique observations in spite of errors in the barcode sequence.
bmftools sort Sort for bam rescue
bmftools target Calculates on-target rate.
bmftools vet Curate variant calls from another variant caller (.bcf) and a bam alignment.

These tools are divided into four categories:

  1. Core functionality
  2. Manipulation
  3. Analysis

Core Functionality

####bmftools collapse bmftools collapse combines reads sharing barcodes into single observations respectively.

First, the barcodes are added to the comment fields of the fastqs and split the records into subsets based on the first characters in the barcode. Then, reads with exactly-matching barcode are collapsed, with a meta-analysis performed on each base call.

bmftools collapse inline collapses templates where both strands were sequenced, whereas collapse secondary lacks strand information.

####bmftools rsq bmftools rsq uses positional information to collapse reads sharing alignment signatures with close barcodes under the assumption that they came from the same original founding molecule but with errors in reading the barcode.

Manipulation

####bmftools cap Caps quality scores using barcode metadata to facilitate working with barcode-agnostic tools.

####bmftools filter Filters or splits a bam file based on a set of filters. These can be inverted with -v (analogous to grep).

Filters:

  • Fail reads with insufficient mapping quality.

  • Fail reads with insufficient family size.

  • Fail read pairs by aligned fraction.

  • Fail reads outside of a bed region.

  • Fail reads without all bits in given parameter in the sam flag field.

  • Fail reads with any bits in given parameter in the sam flag field.

####bmftools vet Curates SNV calls from a tumor/normal variant call file using barcode metadata from the bams used to produce the variant call file.

Analysis

####bmftools depth Calculates depth of coverage across a bed file using barcode metadata.

####bmftools target Calculates on-target fraction for bed file using barcode metadata.

####bmftools err Calculates error rates by a variety of parameters. Additionally, pre-computes the quality score recalibration for the optional collapse recalibration step.

####bmftools famstats Calculates summary statistics related to family size and demultiplexing.

####bmftools stack A maximally-permissive variant caller using molecular barcode metadata analogous to samtools mpileup.

BMF Tags

Tag Content Format
DR Whether the read was sequenced from both strands. Only valid for inline chemistry. Integer [0, 1]
FA Number of reads in Family which Agreed with final sequence at each base uint32_t array
FM Size of family (number of reads sharing barcode.), e.g., "Family Members" Integer
FP Read Passes Filter related to barcoding. Determines QC fail flag in bmftools mark (without -q). Integer [0, 1]
NF Mean number of differences between reads and consensus per read in family Float
PV Phred Values for a base call after meta-analysis uint32_t array
RV Number of reversed reads in consensus. Only for inline chemistry. Integer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.