Git Product home page Git Product logo

pheniqs's Introduction

Pheniqs

Pheniqs is a flexible generic barcode classifier for high-throughput next-gen sequencing that caters to a wide variety of experimental designs and has been designed for efficient data processing.

Qestions? lior [dot] galanti [ at sign ] nyu.edu or just open a ticket.

Citing Pheniqs: Pheniqs 2.0: accurate, high-performance Bayesian decoding and confidence estimation for combinatorial barcode indexing

Please visit the Pheniqs website for more information. You might also want to check the intro talk given by Lior Galanti on April 29, 2021 for the NYU gencore.

Powerful and intuitive syntax

  • Classifies standard barcode types: Sample, Cellular, and Molecular Index
  • Directly writes barcodes to standard or custom BAM fields
  • Addresses index tags in arbitrary locations along reads
  • Easily accommodates custom barcode types, eliminating the need for pre- or post-processing
  • Easily handles any number of combinatorial barcode tags

Noise and quality aware probabilistic classifier

  • Increased accuracy over standard edit distance methods
  • Reports classification error probabilities in SAM auxiliary tags
  • Modular design allows addition of new classifiers

Robust engineering

  • Multithreaded C++ implementation optimized for speed
  • POSIX standard stream integration
  • Directly interfaces with low level HTSLib C API
  • Performance scales linearly with the number of available processing cores

Easy to install or build

  • Stable releases available from Bioconda
  • Custom package manager can build dependencies and binaries from scratch
  • Easily installed on clusters or cloud without elevated permissions
  • Portable compiled binaries available
  • Available in a Docker container

Easy to use

  • Simple command line syntax with autocomplete
  • Reusable, inheritence enabled, JSON encoded configuration
  • Preconfigured barcode library sets
  • Reads and writes multiple file formats: FASTQ, SAM/BAM/CRAM
  • Fast standalone file format interconversion
  • Helper scripts to assist in configuration file bootstraping
  • Facilitates more robust and reproducible downstream analysis

Pheniqs runs on all modern POSIX systems and provides an easy to learn command line interface with autocomplete and an extensible reusable configuration syntax. Pheniqs is an ideal utility to pre- and post-process sequence reads for other bioinformatics tools, and it may also be used simply to rapidly and efficiently interconvert a variety of standard sequence file formats without invoking any of its barcode processing features.

For more advanced users and sequencing core managers, we provide detailed build instructions and a custom package manager to easily build portable, statically linked, Pheniqs binaries for deployment on computing clusters. Developers can find code examples and API documention that enable them to expand Pheniqs with new classification algorithms and take advantage of the optimized multithreaded pipeline.

Pheniqs is open sourced and free for academic use under the terms of the NYU license agreement.

Build Status

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.