Git Product home page Git Product logo

any2fasta's Introduction

Build Status License: GPL v3 Don't judge me

any2fasta

Convert various sequence formats to FASTA

Motivation

You may wonder why this tool even exists. Well, I tried to do the right thing and use established tools like readseq and seqret from EMBOSS, but they both mangled IDs containing | or . characters, and there is no way to fix this behaviour. This resulted in inconsitences between my .gbk and .fna versions of files in my pipelines.

Then you may wonder why I didn't use Bioperl or Biopython. Well they are heavyweight libraries, and actually very slow at parsing Genbank files. This script uses only core Perl modules, has no other dependencies, and runs very quickly.

It supports the following input formats:

  1. Genbank flat file, typically .gb, .gbk, .gbff (starts with LOCUS)
  2. EMBL flat file, typically .embl, (starts with ID)
  3. GFF with sequence, typically .gff, .gff3 (starts with ##gff)
  4. FASTA DNA, typically .fasta, .fa, .fna, .ffn (starts with >)
  5. FASTQ DNA, typically .fastq, .fq (starts with @)
  6. CLUSTAL alignments, typically .clw, .clu (starts with CLUSTAL or MUSCLE)
  7. STOCKHOLM alignments, typically .sth (starts with # STOCKHOLM)
  8. GFA assembly graph, typically .gfa (starts with ^[A-Z]\t)

Files may be compressed with:

  1. gzip, typically .gz
  2. bzip2, typically .bz2
  3. zip, typically .zip

Installation

any2fasta has no dependencies except Perl 5.10 or higher. It only uses core modules, so no CPAN needed.

Direct script download

% cd /usr/local/bin  # choose a folder in your $PATH
% wget https://raw.githubusercontent.com/tseemann/any2fasta/master/any2fasta
% chmod +x any2fasta

Homebrew

% brew install brewsci/bio/any2fasta

Conda

% conda install -c bioconda any2fasta

Github

% git clone https://github.com/tseemann/any2fasta.git
% cp any2fasta/any2fasta /usr/local/bin # choose a folder in your $PATH

Test Installation

% ./any2fasta -v
any2fasta 0.2.2

% ./any2fasta -h
NAME
  any2fasta 0.4.2
SYNOPSIS
  Convert various sequence formats into FASTA
USAGE
  any2fasta [options] file.{gb,fa,fq,gff,gfa,clw,sth}[.gz,bz2,zip] > output.fasta
OPTIONS
  -h       Print this help
  -v       Print version and exit
  -q       No output while running, only errors
  -n       Replace ambiguous IUPAC letters with 'N'
  -l       Lowercase the sequence
  -u       Uppercase the sequence
END

Examples

% any2fasta ref.gbk > ref.fna

% any2fasta in.fasta > out.fasta  # should behave like "cat"

% any2fasta prokka.gff > prokka.fna  # only if GFF has FASTA appended

% any2fasta - < file.gb > file.fasta  # '-' means stdin

% anyfasta genes.gff.gz > genes.ffn  # automatically decompresses

% any2fasta 1.gb 2.fa.gz 3.gff.bz2 - > out.fa  # multiple files and stdin

% any2fasta R1.fq.gz | bzip2 > R1.fa.bz2  # 'seqtk seq -A' is much faster

% any2fasta -q 23S.clw > 23S.aln  # gaps '-' will be preserved

% any2fasta pfam4321.sth > pfam4321.aln  # '.' gaps will become '-'

Options

  • -n replaces any characters that aren't A,C,G,T with N (gaps preserved)
  • -l will lowercase all the letters
  • -u will uppercase all the letters
  • -q will prevent logging messages being printed

Issues

Submit feedback to the Issue Tracker

License

GPL v3

Author

Torsten Seemann

any2fasta's People

Contributors

tseemann avatar andersgs avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.