Git Product home page Git Product logo

blast-based-pairwise-genome-comparer's Introduction

blast-based-pairwise-genome-comparer

A blast-based programme to compare two similar genomes, and retrieve functional data (paralog families with network representations, id-to-GO/Pfam/COG associations, reciprocal best matches list...) It has been developed to study two ecotypes of a single bacterial species. Therefore, it is adapted to the comparison of two highly similar genomes.

Installation

Download the .py file and make it executable using a "chmod +x .py" command. Requires the following softwares to be installed:

Input files

This programme has been developed to analyse genomes annotated by PROKKA. Therefore, it accepts two types of input:

  • either two PROKKA folders (one per genome) including .faa, .ffn, .fna and .gff files.
  • or separated predicted transcripts and proteins multi-fasta files, genomes sequences and .gff-type annotation files.

How to run

Required arguments

  • -folders Two comma-separated folders including transcripts (.ffn), proteins (.faa) and genome (.fna) fasta files

OR

  • -genomes Two comma-separated genomes (fasta-files)
  • -prot Two comma-separated predicted proteins multi-fasta files
  • -rna Two comma-separated predicted transcripts multi-fasta files
  • -gff Two comma-separated .gff annotation files

NB: please make sure to follow the same order organism1,organism2 in each of these arguments

$ app.py -folders "data/PROKKA1,data/PROKKA2

OR

$ app.py -genomes "data/genome1.fna,data/genome2.fna" -prot "data/p1.faa,data/p2.faa" 
    -rna "data/t1.ffn,data/t2.ffn" -gff "data/1.gff,data/2.gff"

Optional arguments

  • -o Output folder, default: creates a folder in the programme's directory
  • -minLrap minLrap significance threshold for blast hits, default=0.7
  • -pident Identity percentage significance threshold for blast hits, default=70
  • -graph Ouput type for the plot showing genomes similarity. 'img' (default) or 'show' to display using Python, default='img'
  • -paralog_pid Identity percentage significance threshold to consider proteins as homologous, default=60

To use these two options, you need to uncomment the last lines of the script and to have COG and Pfam rpsBlast databases.

  • -cog Perform COG rpsblasts ? 'yes' or 'no' (default)
  • -pfam Perform Pfam rpsblasts and get protein-GO associations ? 'yes'(default) or 'no' (default)

Data it retrieves

  • all-against-all blastp original outputs + "hitsTables" giving supplementary informations (reciprocity, minLrap, maxLrap)
  • all-against-all tblastx outputs + "hitsTables" giving supplementary informations (reciprocity, minLrap, maxLrap)
  • "CDStables", showing for each gene the result of the blastp and tblastx (cds start and end, %id, minLrap and reciprocity)
  • a genomes similarity graph (.png) showing the %id for each cds along the genome
  • a reciprocal best-matches list, based on chosen %id and minLrap thresholds
  • a "paralog" folder containing a list of numbered paralog families and a network weighted graph for each one (.png)

If you wish to get functional data using COG, Pfam and GO annotation systems, uncomment the last lines of the script. You would then obtain:

  • rpsblast on Pfam database output
  • rpsblast on COG database output
  • a list of Pfam numbers per protein id for each organism
  • a list of GO numbers per protein id for each organism
  • a list of COG numbers per protein id for each organism

blast-based-pairwise-genome-comparer's People

Contributors

fantin-mesny avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.