Git Product home page Git Product logo

janggu's Introduction

Janggu - Deep learning for Genomics

Documentation Status Travis-CI Build Status Coverage Status PyPI Package latest release License Supported Python Versions Downloads

Janggu logo

Janggu is a python package that facilitates deep learning in the context of genomics. The package is freely available under a GPL-3.0 license.

Janggu visual abstract

In particular, the package allows for easy access to typical Genomics data formats and out-of-the-box evaluation so that you can concentrate on designing the neural network architecture for the purpose of quickly testing biological hypothesis. A comprehensive documentation is available here.

Hallmarks of Janggu:

  1. Janggu provides special Genomics datasets that allow you to access raw data in FASTA, BAM, BIGWIG, BED and GFF file format.
  2. Various normalization procedures are supported for dealing with of the genomics dataset, including 'TPM', 'zscore' or custom normalizers.
  3. Biological features can be represented in terms of higher-order sequence features, e.g. di-nucleotide based features.
  4. The dataset objects are directly consumable with neural networks for example implemented using keras or using scikit-learn (see src/examples in this repository).
  5. Numpy format output of a keras model can be converted to represent genomic coverage tracks, which allows exporting the predictions as BIGWIG files and visualization of genome browser-like plots.
  6. Genomic datasets can be stored in various ways, including as numpy array, sparse dataset or in hdf5 format.
  7. Caching of Genomic datasets avoids time consuming preprocessing steps and facilitates fast reloading.
  8. Janggu provides a wrapper for keras models with built-in logging functionality and automatized result evaluation.
  9. Janggu supports input feature importance attribution using the integrated gradients method and variant effect prediction assessment.
  10. Janggu provides a utilities such as keras layer for scanning both DNA strands for motif occurrences.

Why the name Janggu?

Janggu is a Korean percussion instrument that looks like an hourglass.

Like the two ends of the instrument, the philosophy of the Janggu package is to help with the two ends of a deep learning application in genomics, namely data acquisition and evaluation.

Installation

A list of python dependencies is defined in setup.py. Additionally, bedtools is required for pybedtools which janggu depends on.

The simplest way to install janggu is via the conda package management system. Assuming you have already installed conda, create a new environment and type

pip install janggu

The janggu neural network model depends on tensorflow which you have to install depending on whether you want to use GPU support or CPU only. To install tensorflow type

conda install tensorflow  # or tensorflow-gpu

Further information regarding the installation of tensorflow can be found on the official tensorflow webpage

To verify that the installation works try to run the example contained in the janggu package as follows

git clone https://github.com/BIMSBbioinfo/janggu
cd janggu
python ./src/examples/classify_fasta.py single

A model is then trained to predict the class labels of two sets of toy sequencesby scanning the forward strand for sequence patterns and using an ordinary mono-nucleotide one-hot sequence encoding. The entire training process takes a few minutes on CPU backend. Eventually, some example prediction scores are shown for Oct4 and Mafk sequences. The accuracy should be around 85% and individual example prediction scores should tend to be higher for Oct4 than for Mafk.

You may also try to rerun the training by evaluating sequences features on both strands and using higher-order sequence encoding using i.e. the command-line arguments: dnaconv -order 2. Accuracies and prediction scores for the individual example sequences should improve compared to the previous example.

A range of additional examples can be found in './src/examples' including some jupyter notebooks or by following the tutorial.

janggu's People

Contributors

wkopp avatar annalaura94 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.