Git Product home page Git Product logo

mrbam's Introduction

MrBam

For a given mutation, query its mutated reads from a BAM, merge the reads by positions and give the unique count.

Prerequisites

  • Python 3.4+
  • Pysam ($pip install pysam)

Usage

$ python -m MrBam.main --help
usage: main.py [-h] [-c CFDNA] [-g GDNA] [-o OUTPUT] [-i INFO] [-q QUAL] [-s]
               [-f] [-v] query

example:
  $ MrBam sample.vcf --cfdna sample_cfdna.bam -o sample_MrBam.vcf --simple

positional arguments:
  query                vcf file contains mutations to query

optional arguments:
  -h, --help            show this help message and exit
  -c, --cfdna CFDNA     bam file contains cfdna reads info. There must be a
                        corresponding .bai file in the same directory
  -g, --gdna GDNA       bam file contains gdna reads info. There must be a
                        corresponding .bai file in the same directory
  -o, --output OUTPUT   output vcf file. Will be overwritten if already exists
  --skip SKIP           skip the first N lines
  -q, --qual QUAL       drop bases whose qulity is less than this (default:
                        25)
  -s, --simple          annotate less infomations into vcf output
  -f, --fast            do not infer origin read size by CIGAR, it can be
                        faster and consume less memory.
  --drop-inconsist      drop different reads stack at the same position. This
                        decreases sensitivity.
  --dropXA              drop reads that has XA tag (multiple alignment)
  -m, --mismatch-limit MISMATCH_LIMIT
                        if set, drop reads that has more mismatches than the
                        limit. requires a 'MD' or a 'NM' tag to be present.
  -v, --verbos          output debug info

Performace

#sample  option    bam_size(mb)  vcf_lines  CPU_time(s)  Memory(mb)
Sam3     (default) 194           14978      147          1116
Sam3     --fast    194           14978      129          27
Sam2     (default) 655           33702      500          3162
Sam2     --fast    655           33702      417          28
Sam1     (default) 1620          113066     5952         8377
Sam1     --fast    1620          113066     5785         34
Sam4     (default) 2338          648336     49067        9912
Sam4     --fast    2338          648336     60393        36
  • CPU_time is user + sys
  • Memory may vary accroding to system memory pressure
  • Test on Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

mrbam's People

Contributors

sfchen avatar ylxdzsw avatar young-matthew avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mrbam's Issues

Handle the issue when run with python2

Since the default python version is 2.7, this tool should handle the error caused by using default python 2.7, and then suggest the user to use python3

Error while performing MrBam with Python 3.6.8

I have some issue when running MrBam.

line 31, in anno
    pos =int(line[1]) - 1
ValueError: invalid literal for int() with base 10: 'Start'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "Mrbam.py", line 86, in <module>
    anno(o)
  File "/home/rico/Downloads/MrBam-master/MrBam/anno.py", line 33, in anno
    raise Exception("vcf header should be expluded", o.query)
Exception: ('vcf header should be expluded', '/home/rico/Downloads/TagSequencing.snp.hg19_multianno.txt')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.