Git Product home page Git Product logo

deenurp's Introduction

DeeNuRP

16S rRNA gene sequence curation and phylogenetic reference set creation

Installation

The Easy Way

  • confirm availability of necessary libraries to compile dependencies (on Ubuntu: sudo apt-get install gfortran libopenblas-dev liblapack-dev)
  • Install Python 2.7
  • run bin/bootstrap.sh
  • run source deenurp-env/bin/activate

the deenurp executable should now be on your $PATH

The Hard Way

See required system libraries above.

First, install binary dependencies:

Optional (for filter-outliers and pairwise-distances):

Finally, install:

python setup.py install

De-novo reference set creation

Similarity-search based reference sequence selection

Running

The deenurp package under the current directory provides to subcommands, accessed via the script deenurp.py, or the command deenurp if installed.

Subcommands fall into two general categories:

  • Building a set of reference sequences for use in refpkg building
  • Selecting sequences for a specific reference package

Creating a sequence set for refpkg building

deenurp filter-outliers

Removes outlier sequences from a reference database

deenurp expand-named

Expands poorly-represented names in a sequence file by similarity search

deenurp cluster-refs

Cluster reference sequences, first by tax-id at a specified rank (default: species), then by similarity for unnamed sequences or sequences not classified to the desired rank. Serves as input to search-sequences.

Selecting sequences for a reference package

deenurp hrefpkg-build

Builds a set of hierarchical reference packages.

deenurp search-sequences

Searches a set of sequences against a FASTA file containing possible reference sequences.

This subcommand does searches sequences against a reference FASTA file, saving the results and some metadata to a sqlite database for use in select-references

deenurp select-references

Given the output of search-sequences, select-references attempts to find a good set of reference sequences.

For each reference cluster with a minimal amount of sequences having best hits to the cluster, (see cluster-refs), selects a set number of sequences to serve as references.

deenurp fill-lonely

Taxa who are the sole descendent of their parent can complicate taxonomic classification.

The fill-lonely subcommand finds some company for these lonely taxa.

deenurp add-reps

Fetches sequences from a sequence file which match the taxtable for a reference set at a given rank. Useful for adding type strains.

deenurp tax2tree

Runs the tax2tree program on a reference package, updating the seq_info file.

Sequences whose lineage changes are relabeled. The prior tax_id is added to the seq_info file in the reference package.

deenurp's People

Contributors

nhoffman avatar crosenth avatar cmccoy avatar dhoogest avatar

Watchers

 avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.