Git Product home page Git Product logo

mgcplotter's Introduction

MGCplotter: Microbial Genome Circular plotter

Python3 OS License Latest PyPI version Bioconda
CI codecov

Table of Contents

Overview

MGCplotter is easy-to-use command line tool for plotting microbial genome in circular layout using Circos. MGCplotter requires Genbank format genome file and implements following 3 main functions for plotting figure.

  1. Plot Basic Features of Microbial Genome
    Basic Features mean Forward/Reverse CDS, rRNA, tRNA, GC content, GC skew.
    MGCplotter can control plot result of feature's color/size/visibility by command options.

  2. Assign & Plot COG Functional Classification
    Assign COG functional classification to reference genome CDS using COGclassifier. COG functional classification colors are used in plot result of forward/reverse CDS.

    List of COG Functional Classification Color

    COG_definition_fig

  3. Search & Plot Conserved CDS between reference and query species
    Conserved CDS of query genome relative to reference genome is searched by MMseqs2 RBH method. Each query conserved CDS is plotted with gradient color based on identity of RBH result.

MGCplotter_example_fig
Fig.1: Plot result of Mycoplasma Gallisepticum genome
Outer to inner tracks mean (1) Forward CDS (2) Reverse CDS (3) rRNA (4) tRNA (5) GC content (6) GC skew, respectively. COG functional classification color is assigned to Forward/Reverse CDS.

MGCplotter_example_fig
Fig.2: Add conserved CDS tracks of 3 query species to Fig.1
Conserved CDS of query genomes relative to reference genome is shown.

Installation

MGCplotter is implemented in Python3.

Install bioconda package:

conda install -c conda-forge -c bioconda mgcplotter

Install PyPI pakcage:

pip install mgcplotter

Use Docker (Docker Image):

docker pull moshi4/mgcplotter:latest
docker run moshi4/mgcplotter:latest MGCplotter -h

Dependencies

  • Circos
    Software package for visualizing data and information in circular layout
  • COGclassifier
    A tool for classifying prokaryote protein sequences into COG functional category
  • MMseqs2
    Ultra fast and sensitive sequence search and clustering suite

Usage

Basic Command

MGCplotter -r [genome genbank file] -o [output directory] --assign_cog_color

Options

General Options:
  -r R, --ref_file R      Reference genome genbank file (*.gb|*.gbk|*.gbff)
  -o O, --outdir O        Output directory
  --query_files  [ ...]   Query CDS fasta or genome genbank files (*.fa|*.faa|*.fasta|*.gb|*.gbk|*.gbff)
  --cog_evalue            COGclassifier e-value parameter (Default: 1e-02)
  --mmseqs_evalue         MMseqs RBH search e-value parameter (Default: 1e-03)
  -t , --thread_num       Threads number parameter (Default: MaxThread - 1)
  -f, --force             Forcibly overwrite previous calculation result (Default: OFF)
  -v, --version           Print version information
  -h, --help              Show this help message and exit

Graph Size Options:
  --ticks_labelsize       Ticks label size (Default: 35)
  --forward_cds_r         Forward CDS track radius size (Default: 0.07)
  --reverse_cds_r         Reverse CDS track radius size (Default: 0.07)
  --rrna_r                rRNA track radius size (Default: 0.07)
  --trna_r                tRNA track radius size (Default: 0.07)
  --conserved_cds_r       Conserved CDS track radius size (Default: 0.04)
  --gc_content_r          GC content track radius size (Default: 0.15)
  --gc_skew_r             GC skew track radius size (Default: 0.15)

Graph Color Options:
  --assign_cog_color      Assign COG classification color to reference CDSs (Default: OFF)
  --cog_color_json        User-defined COG classification color json file
  --forward_cds_color     Forward CDS color (Default: 'red')
  --reverse_cds_color     Reverse CDS color (Default: 'blue')
  --rrna_color            rRNA color (Default: 'green')
  --trna_color            tRNA color (Default: 'magenta')
  --conserved_cds_color   Conserved CDS color (Default: 'chocolate')
  --gc_content_p_color    GC content color for positive value from average (Default: 'black')
  --gc_content_n_color    GC content color for negative value from average (Default: 'grey')
  --gc_skew_p_color       GC skew color for positive value (Default: 'olive')
  --gc_skew_n_color       GC skew color for negative value (Default: 'purple')

For graph color options, user can use matplotlib named color (e.g. 'red') or hexcolor code (e.g. '#ff0000').

Matplotlib named color list

Matplotlib named color list

Example Command

1. M.Gallisepticum genome simple plot (= Fig.1)

Reference: Mgallisepticum.gbff (0.63 MB)

MGCplotter -r Mgallisepticum.gbff -o ./example_result01 --assign_cog_color

2. M.Gallisepticum genome plot with 3 query conserved CDS (= Fig.2)

Reference: Mgallisepticum.gbff (0.63 MB), Query: example02 (2.0 MB)

MGCplotter -r Mgallisepticum.gbff -o ./example_result02 --assign_cog_color \
           --query_files ./example02/*.gbff

Output Contents

  • circos[.png|.svg]
    Plot result figure file

  • reference_cds.faa
    Reference genome CDS fasta file (Extract from genbank file)

  • circos_config/
    Circos config files directory

  • circos_legend/
    Circos legend files directory

  • cogclassifier/
    COGclassifier result files directory

  • rbh_search/
    MMseqs RBH result files directory

Example Gallery

1. E.coli genome simple plot (No COG assignment)

Reference: ecoli.gbk (3.5 MB)

MGCplotter -r ./ecoli.gbk -o ./gallery_result01 --rrna_color blue --trna_color red \
           --gc_content_p_color orange --gc_content_n_color blue \
           --gc_skew_p_color pink --gc_skew_n_color green 

MGCplotter_gallery_fig

2. E.coli genome plot with 3 query conserved CDS

Reference: ecoli.gbk (3.5 MB), Query: gallery02 (10.7 MB)

MGCplotter -r ./ecoli.gbk -o ./gallery_result02 --assign_cog_color \
           --query_files ./gallery02/NC_011751.gbk ./gallery02/NC_017634.gbk ./gallery02/NC_018658.gbk \
           --ticks_labelsize 50

Conserved CDS tracks are lined up from outside to inside in --query_files argument order. In this case, NC_011751,NC_017634,NC_018658 are lined up from outside to inside.

MGCplotter_gallery_fig

3. M.Gallisepticum genome plot with 30 query conserved CDS

Reference: Mgallisepticum.gbff (0.63 MB), Query: gallery03 (19.6 MB)

MGCplotter -r ./Mgallisepticum.gbff -o ./gallery_result03 --assign_cog_color \
          --query_files ./gallery03/*.gbff --conserved_cds_color '#dc143c' \
          --rrna_r 0 --trna_r 0 --conserved_cds_r 0.01

MGCplotter_gallery_fig

4. M.Alvi genome contigs plot with 6 query conserved CDS

Reference: Malvi.gbk (0.57 MB), Query: gallery04 (1.0 MB)

MGCplotter -r ./Malvi.gbk -o ./gallery_result04 --assign_cog_color \
           --query_files ./gallery04/*.faa --conserved_cds_r 0.05 \
           --gc_content_r 0 --gc_skew_r 0

Malvi.gbk is multi record(contig) Genbank format genome file. In MGCplotter, multi contigs are simply concatenated and each contig boundary is shown in mostouter circle color (lightgrey/darkgrey).

MGCplotter_gallery_fig

5. M.Gallisepticum genome plot (User-defined COG classification color)

Reference: Mgallisepticum.gbk (0.63 MB), COG Color Json: cog_color.json (0.5 KB)

MGCplotter -r ./Mgallisepticum.gbff -o ./gallery_result05 --assign_cog_color \
          --cog_color_json ./cog_color.json

User can change COG functional classification color by user-defined color json file. Template json file can be obtained by generate_cog_color_template command.

COG functional classification color template json
{
  "J": "#f43cf3",
  "A": "#f04ff0",
  "K": "#f04fa0",
  "L": "#f04f4f",
  "B": "#f4793c",
  "D": "#f0f04f",
  "Y": "#f3f43c",
  "V": "#f5f52a",
  "T": "#f7f718",
  "M": "#caf718",
  "N": "#9ef718",
  "Z": "#71f718",
  "W": "#45f718",
  "U": "#18f718",
  "O": "#07f830",
  "X": "#07f807",
  "C": "#2af5f5",
  "G": "#3cf3f4",
  "E": "#4ff0f0",
  "F": "#4f9ff0",
  "H": "#4f4ff0",
  "I": "#793cf4",
  "P": "#3c3cf4",
  "Q": "#2a5df5",
  "R": "#939393",
  "S": "#808080",
  "-": "#6c6c6c"
}
COG color json used in this gallery (cog_color.json)
{
  "J": "red",
  "A": "red",
  "K": "red",
  "L": "red",
  "B": "red",
  "D": "limegreen",
  "Y": "limegreen",
  "V": "limegreen",
  "T": "limegreen",
  "M": "limegreen",
  "N": "limegreen",
  "Z": "limegreen",
  "W": "limegreen",
  "U": "limegreen",
  "O": "limegreen",
  "X": "limegreen",
  "C": "deepskyblue",
  "G": "deepskyblue",
  "E": "deepskyblue",
  "F": "deepskyblue",
  "H": "deepskyblue",
  "I": "deepskyblue",
  "P": "deepskyblue",
  "Q": "deepskyblue",
  "R": "lightgrey",
  "S": "lightgrey",
  "-": "darkgrey"
}

MGCplotter_gallery_fig

In this gallery, color classification is defined based on following five COG major categories.

  1. Information Storage and Processing (J,A,K,L,B) => red
  2. Cellular Processes and Signaling (D,Y,V,T,M,N,Z,W,U,O,X) => limegreen
  3. Metabolism (C,G,E,F,H,I,P,Q) => deepskyblue
  4. Poorly Characterized (R,S) => lightgrey
  5. No COG Classified (-) => darkgrey

mgcplotter's People

Contributors

moshi4 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.