Git Product home page Git Product logo

arrowsam's Introduction

ArrowSAM

ArrowSAM is an in-memory Sequence Alignment/Map (SAM) representation which uses Apache Arrow framework (A cross-language development platform for in-memory data) and Plasma (Shared-Memory) Object Store to store and process SAM columnar data in-memory.

Citing ArrowSAM

The following paper describes the ArrowSAM format and its usage to speedup genomics pipelines. If you use ArrowSAM in your work, please cite the following paper.

Ahmad et al., (2020). "ArrowSAM: In-Memory Genomics Data Processing Using Apache Arrow", ICCAIS. doi.org/10.1109/ICCAIS48893.2020.9096725

Ahmad et al., "Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework", BMC Genomics, presented at APBC2020. https://doi.org/10.1186/s12864-020-07013-y

This repo contains following three components:

  1. ArrowSAM (In-memory SAM data representation) integrated BWA-MEM, Picard and GATK tools.

  2. A Singularity container def file (To create an environment to use all Apache Arrow related tools and libraries for ArrowSAM).

  3. Scripts to run different GATK best practices recommended workflows (using different in-memory data placement techniques like ArrowSAM, ramDisk and pipes for fast processing) to run complete DNA analysis pipeline efficiently.

Note: ArrowSAM and all other workflows are based on single node, multi-core machines.

How to run

  1. Install Singularity container

  2. Download our Singularity script and generate singularity image (this image contains all Arrow related packges necessary for building/compiling BWA-MEM, Picard and GATK)

  3. Now enter into generated image using command:

     sudo singularity shell <image_name>.simg
    
  4. Download BWA-MEM inside image

     git clone https://github.com/tahashmi/bwa.git
    
  5. Go into bwa dir and compile BWA-MEM:

     cd bwa
     make
    
  6. Now you can run BWA-MEM.

arrowsam's People

Contributors

tahashmi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.