Git Product home page Git Product logo

nucleus's Introduction

Nucleus

Nucleus is a library of Python and C++ code designed to make it easy to read, write and analyze data in common genomics file formats like SAM and VCF. In addition, Nucleus enables painless integration with the TensorFlow machine learning framework, as anywhere a genomics file is consumed or produced, a TensorFlow tfrecords file may be used instead.

Poll

Which of these would most increase your usage of Nucleus? (Click on an option to vote on it.)

Installation

Nucleus currently only works on modern Linux systems. To install it, just run

pip install --user google-nucleus

Documentation

Building from source

For Ubuntu 14, Ubuntu 16 and Debian 9 systems, building from source is easy. Simply type

source install.sh

For all other systems, you will need to first install CLIF by following the instructions at https://github.com/google/clif#installation before running install.sh.

Note that install.sh extensively depends on apt-get, so it is unlikely to run without extensive modifications on non-Debian-based systems.

Nucleus depends on TensorFlow. By default, install.sh will install a CPU-only version of a stable TensorFlow release (currently 1.11). If that isn't what you want, there are several other options that can be enabled with a simple edit to install.sh.

Running install.sh will build all of Nucleus's programs and libraries. You can find the generated binaries under bazel-bin/nucleus. If in addition to building Nucleus you would like to run its tests, execute

bazel test -c opt $COPT_FLAGS nucleus/...

Version

This is Nucleus 0.2.2. Nucleus follows semantic versioning.

New in 0.2.2:

  • Faster SAM file querying and read overlap calculations.
  • Writing protocol buffers to files uses less memory.
  • Smaller pip package.
  • nucleus/util:io_utils refactored into nucleus/io:tfrecord and nucleus/io:sharded_file_utils.
  • Alleles coming from VCF files are now always normalized as uppercase.

New in 0.2.1:

  • Upgrades htslib dependency from 1.6 to 1.9.
  • Minor VCF parsing fixes.
  • Added new example program, apply_genotyping_prior.
  • Slightly more robust pip package.

New in 0.2.0:

  • Support for reading and writing BedGraph files.
  • Support for reading and writing GFF files.
  • Support for reading and writing CRAM files.
  • Support for writing SAM/BAM files.
  • Support for reading unindexed FASTA files.
  • Iteration support for indexed FASTA files.
  • Ability to read VCF files from memory.
  • Python API documentation.
  • Python 3 compatibility.
  • Added universal file converter example program.

License

Nucleus is licensed under the terms of the Apache 2 license.

Support

The Genomics team in Google Brain actively supports Nucleus and are always interested in improving its quality. If you run into an issue, please report the problem on our Issue tracker. Be sure to add enough detail to your report that we can reproduce the problem and fix it. We encourage including links to snippets of BAM/VCF/etc files that provoke the bug, if possible. Depending on the severity of the issue we may patch Nucleus immediately with the fix or roll it into the next release.

Contributing

Interested in contributing? See CONTRIBUTING.

History

Nucleus grew out of the DeepVariant project.

Disclaimer

This is not an official Google product.

nucleus's People

Contributors

xunjieli avatar cmclean avatar tedyun avatar pichuan avatar gunjanbaid avatar thomascolthurst avatar

Watchers

Shyamal Suhana Chandra avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.