Git Product home page Git Product logo

fasta_utilities's Introduction

Fasta Utilities

A collection of scripts developed to interact with FASTA, FASTQ and SAM files. All the scripts use the ReadFastx module I wrote, which reads either a FASTA or FASTQ file by record. It also uses the FileBar module, which gives a terminal progress bar on a file as it is processed. ReadSam is the SAM equivalent.

Conversion

  • 2big.pl - takes a bed, SAM, or wiggle file and creates a big version of it to upload to ucsc
  • fastq2fasta.pl - converts FASTQ to FASTA
  • sam2fastq.pl - Converts a SAM format to FASTQ format
  • mate_pair2paired_end.pl - converts mate pair reads to paired end orientation

Reformatting

  • fix_headers.pl - fixes the FASTQ header by removing spaces and optionally appending a suffix
  • remap_file.pl - takes a file with tab delimited mappings and substitutes each of the first terms for each of the second terms
  • standardize_names.pl - renames FASTA files from ncbi into uscs nomenclature chr##
  • unique_headers.pl - reads a FASTA file and ensures all of the names are unique
  • wrap.pl - limits FASTA lines to 80 characters

Modification

  • bisulfite_convert.pl - bisulfite converts the sequences given to it
  • merge_records.pl - merges all the input records into one record, by default uses the name of the first record, but can be changed with -name
  • reverse_complement.pl - takes sequences and reverse complements them
  • trim_fasta.pl - trims a fastx file to x bp
  • pairs_sorted.pl - takes two files of reads sorted by header, and outputs two files containing those reads which have pairs
  • pairs_unsorted.pl - gets the pairs of the files in the first file from the second file, pairs are matched by header name
  • regex_fasta.pl - applies the given regex to the FASTA headers or sequence
  • remove_ambiguous.pl - removes ambiguity codes from FASTA files
  • splice.pl - splice a FASTA file given a gff file
  • split_fasta.pl - splits a multi FASTA file into multiple files, can split in different ways
  • subset_fasta.pl - subsets a FASTA file
  • trans_fasta.pl - translate a FASTA cDNA to protein
  • generate_fasta.pl - create a random FASTA file
  • consensus.pl - generate a consensus FASTA file from a bam file

Filtering

  • filter_align.pl - filters alignments from a bam or SAM file
  • filter_reads.pl - filters aligned reads from a file by mapping with bowtie2
  • get_fasta.pl - selects FASTA records which match or don't match a pattern
  • in_list.pl - reads a list of headers, and a fastx file and outputs records which are in the list
  • size_select.pl - returns the sequences with lengths between the values specified by -low and -high
  • sort.pl - sorts a FASTA file using gnu sort, can sort by header, sequence, length ect.

Information

  • avg_coverage.pl - gets the average coverage per sequence from a bam file
  • lengths.pl - length of each record
  • calcN.pl - takes a file of FASTA lengths, or a FASTA or FASTQ file directly, and calculates the nX of the file, by default N50
  • CpG_count.pl - counts the number of CpGs in a FASTA file
  • distances.pl - get the within group bitscore distance of all the records in a FASTA file using blast
  • fasta_head.pl - emulates unix head for FASTA and FASTQ files
  • fasta_tail.pl - emulates unix tail for FASTA and FASTQ files
  • percent_GC.pl - calculates percent GC for each FASTA record in a file, as well as the total GC content
  • sam_lengths.pl - gets the sequence lengths from a SAM file
  • size.pl - gets the total size of a FASTA file and the number of sequences

Bed scripts

  • absolute_coordinates.pl - takes a file with the chromosome and location and a file of chromosome sizes, and converts the coordinates to an absolute scale for plotting
  • bed2igv.pl - converts a bed file to a igv snapshot script
  • combine_bed.pl - Combine bed files
  • gff2bed.pl - converts a gff file to a bed file

Miscellaneous

  • align_progress.pl - given the input filename and the output filename, figures out the last line using tail, then greps for that header in the input, and works out the percentage that way
  • blast_information.pl - gets sequence information from gi numbers from a blast results file
  • fetch_entrez.pl - Download a number of sequences from an entrez query
  • fetch_gi.pl - download FASTA files from NCBI and outputs a FASTA file
  • fetch_sra.pl - downloads the sra sequences from NCBI using aspera and outputs a FASTQ file
  • generate_map.pl - remaps FASTA sequences from the first file to FASTA sequences from the second file, matches by hashing the sequence
  • mpileup_counts.pl - parses a mpileup file and gets the base counts
  • rename_script.pl - Rename a file, changing any references to the old name in the file to the new name

Installation

Install with optional prefix, omit the prefix if you want to install system-wide.

perl Makefile.PL PREFIX=$HOME
make
make install

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.