Git Product home page Git Product logo

lederhosen's Introduction

Lederhosen

Cluster raw Illumina 16S rRNA amplicon data to generate OTUs.

Who can use Lederhosen?

Lederhosen is free and open source under the MIT open source license

How do I get Lederhosen?

  1. Obtain & Install UCLUST

  2. Obtain & Install BLAT

  3. Get a copy of TaxCollector

  4. Install Lederhosen by typing:

    sudo gem install lederhosen

  5. Check installation by typing lederhosen. You should see some help text.

Features

  • Sequence trimming (paired-end Illumina).
  • K-mer filtering.
  • Clustering w/ UCLUST.
  • UCLUST output filtering.
  • Separation of representative reads.
  • Separation of all reads belonging to each cluster.
  • Identification of clusters using TaxCollector.
  • Generation of OTU abundancy matrices.

How do I use Lederhosen?

Lederhosen is just a convenient wrapper for UCLUST and BLAT with some scripts for quality filtering, de-noising of data as well as creation of nice tables. It is similar to QIIME but meant for paired-end Illumina data rather than single-end 454. The basic lederhosen pipeline consists of: trimming, joining, sorting, filtering, clustering, more filtering, and output generation (OTU tables, representative reads, reads by cluster, and taxonomic descriptions for clusters). See the example pipeline in pipeline.sh.

Tasks

Lederhosen is invoked by typing lederhosen [TASK]

trim

Trim (Illumina) reads using quality scores. Output will be a directory of fasta files. Reads can optionally be gzipped.

lederhosen trim --reads_dir=reads/*.txt --out_dir=trimmed/

join

Join paired reads from all samples end-to-end. This method enables the use of uclust with paired-end data. Output will be a single fasta file.

lederhosen join --trimmed=trimmed/*.fasta --output=joined.fasta

If your reads are not paired, then you do not need to do this step. Instead, concatenate all of the trimmed reads files.

cat trimmed/*.fasta > joined.fasta

sort

Sort reads by length. This is a requirement for uclust's single-linkage clustering algorithim.

lederhosen sort --input=joined.fasta --output=sorted.fasta

k_filter

K-mer abundance noise filtering. This step is experimental and optional. It may reduce the time it takes to perform the clustering.

lederhosen k_filter --input=joined.fasta --output=filtered.fasta --k=10 --cutoff=50

cluster

Cluster reads using UCLUST. Output is a uc file.

lederhosen cluster --input=sorted.fasta --identity=0.80 --output=clusters.uc

uc_filter

Filter UC file removing singleton clusters or clusters that are only present in a few samples. This greatly reduces the noise of the data without removing many of the reads.

lederhosen uc_filter --input=clusters.uc --output=clusters.uc.filtered --reads=50 --samples=10

otu_table

Create an OTU abundance table where rows are samples and columns are clusters. The entries are the number of reads for that cluster in a sample.

lederhosen otu_table --clusters=clusters.uc --output=otu_prefix.csv

rep_reads

Get representative reads for each cluster. Output is a single fasta file.

lederhosen rep_reads --clusters=clusters.uc --joined=joined.fasta --output=representative_reads.fasta

split

Get all reads belonging to each cluster. Output is a directory containing a fasta file for each cluster. The fasta file contains the joined reads.

lederhosen split --clusters=clusters.uc --reads=joined.fasta --min-clst-size=100

name

Identify clusters in a database using the representative reads. This is a simple wrapper for BLAT. The output is a tab-delimited file similar to a BLAST output file. For this step you need to have BLAT installed and also a TaxCollector database.

lederhosen name --reps=representative_reads.fasta --database taxcollector.fa --output blast_like_output.txt

add_names

Add phylogenetic classification of clusters to OTU abundance file.

lederhosen add_names --blat=blat_output.txt --level=taxonomic_level --table=otu_file.csv --output=named_out_file.csv

Where taxonomic_level can be: kingdom, domain, phylum, class, order, family, genus or species. This method only works with a TaxCollector database.

squish

Squish an OTU abundance file by column name (phylogenetic description)

lederhosen squish --csv-file=named_out_file.csv --output=squished_named_out_file.csv

lederhosen's People

Contributors

audy avatar

Watchers

James Cloos avatar

Forkers

cerbero

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.