Git Product home page Git Product logo

jpleyte / genomicsdb Goto Github PK

View Code? Open in Web Editor NEW

This project forked from genomicsdb/genomicsdb

0.0 1.0 0.0 84.35 MB

Highly performant data storage in C++ for importing, querying and transforming variant data with C/C++/Java/Spark bindings. Used in gatk4.

Home Page: https://www.genomicsdb.org

License: Other

CMake 2.54% C 0.04% Dockerfile 0.08% Shell 1.66% Python 5.67% Java 20.52% C++ 68.80% Scala 0.70%

genomicsdb's Introduction

License: MIT Maven Central

Master Develop
Travis Travis
codecov codecov

GenomicsDB, originally from Intel Health and Lifesciences, is built on top of a fork of htslib and a tile-based array storage system for importing, querying and transforming variant data. Variant data is sparse by nature (sparse relative to the whole genome) and using sparse array data stores is a perfect fit for storing such data. GenomicsDB is a highly performant scalable data storage written in C++ for importing, querying and transforming genomic variant data.

  • Supported platforms : Linux and MacOS.
  • Supported filesystems : POSIX, HDFS, EMRFS(S3), GCS and Azure Blob.

Included are

  • JVM/Spark wrappers that allow for streaming VariantContext buffers to/from the C++ layer among other functions. GenomicsDB jars with native libraries and only zlib dependencies are regularly published on Maven Central.
  • Native tools for incremental ingestion of variants in the form of VCF/BCF/CSV into GenomicsDB for performance.
  • MPI and Spark support for parallel querying of GenomicsDB.

GenomicsDB is packaged into gatk4 and benefits qualitatively from a large user base.

The GenomicsDB documentation for users is hosted as a Github wiki: https://github.com/GenomicsDB/GenomicsDB/wiki

External Contributions

GenomicsDB is open source and all participation is welcome. GenomicsDB is released under the MIT License and all external contributors are expected to grant an MIT License for their contributions.

Checklist before creating Pull Request

Please ensure that the code is well documented in Javadoc style for Java/Scala or roughly adhere to Google C++ Style for C/C++.

genomicsdb's People

Contributors

kgururaj avatar stavrospapadopoulos avatar nalinigans avatar kdatta avatar mlathara avatar jeffhammond avatar francares avatar jakebolewski avatar cpjagan avatar psfoley avatar joshblum avatar gitmach avatar danking avatar jackgoldsmith4 avatar lbergelson avatar mingrutar avatar paolonarvaez avatar luszczek avatar raonyguimaraes avatar hillsd avatar mishalinaik avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.