Git Product home page Git Product logo

dot's Introduction

Dot

Dot is an interactive dot plot viewer for genome-genome alignments.

This is an extended version of Maria Nattestad's original software, which is available at github.com/MariaNattestad/dot

Dot is publicly available here: dot.sandbox.bio And can also be used locally by cloning this repository and simply opening the index.html file in a web browser.

My modifications

  • Create all files required for Dot in one command
  • In particular, this script can create Dot-compatible annotation files
  • This library provides Python bindings for Mummer4's Nucmer, available here: github.com/mummer4/mummer
  • Easy installation via pip

To do:

  • Add optional title string to html
  • Create fna from gbk if possible!

Installation

This package requires at least Python 3.9.

Mummer must be installed.

pip install git+https://github.com/MrTomRod/dot.git

Usage

Create all files required for Dot in one command

# just the dotplot
dot run --html-out /path/to/dotplot.html --fasta-ref ref.fna --fasta-qry qry.fna
# dotplot + genes
dot run --html-out /path/to/dotplot.html --fasta-ref ref.fna --fasta-qry qry.fna --gbk-ref ref.gbk --gbk-qry qry.gbk

This will:

  1. run nucmer -> out.delta
  2. run DotPrep -> out.coords, out.coords.idx, out.uniqueAnchorFiltered_l10000.delta.gz
  3. create a reference Dot annotation file -> out.ref.annotations
  4. create a query Dot annotation file -> out.qry.annotations
  5. create a standalone html file that can be opened in a browser

To keep the intermediary files, add the argument --outdir /path/to/outdir

It is possible to run steps 1 and 2 only, to create a dotplot without genes:

dot run --outdir /path/to/outdir --fasta-ref ref.fna --fasta-qry qry.fna

And to add genes in a second step:

dot create-annotations --gbk ref.gbk --is-ref True > out.ref.annotations
dot create-annotations --gbk qry.gbk --is-ref False --index out.coords.idx > out.qry.annotations

Dot-compatible annotation files

The reference annotation file is easy to produce: Simpy parse the fields ref (contig ID), ref_start, ref_end, name and strand (+ / -) from the GenBank file.

The query annotation file is a bit trickier: I noticed that nucmer sometimes inverts contigs on the y-axis. As a result, the annotations on this contig are inverted relative to the dot plot. For this reason, my script reads out.coords.idx to learn which contigs are inverted. It then also inverts the locations of the genes. In addition, my script discardss contigs that are not shown in the dot plot (yes, that can happen).

Python bindings

This library gives Python bindings for Nucmer:

from dot import Nucmer

nucmer = Nucmer()

outfile = nucmer.align(
    fasta_ref='ref.fasta', fasta_qry='qry.fasta',
    workdir='/path/to/existing/dir',
    arguments={'--mincluster': 100}
)

print(outfile)  # /path/to/existing/dir/out.delta

And also all of the Dot functions described above:

from dot import DotPrep

dotprep = DotPrep()

coords, index = dotprep.run_python(fasta_ref='ref.fasta', fasta_qry='qry.fasta', mincluster=100)
# coords: Dot's out.coords file as string
# index: Dot's out.coords.idx file as string

ref_annos = dotprep.gbk_to_annotation_file(gbk='ref.gbk', is_ref=True)
qry_annos = dotprep.gbk_to_annotation_file(gbk='qry.gbk', is_ref=False, index=index)
# Dot-compatible annotation files as string

ref_annos = dotprep.gbk_to_annotation_dict(gbk='ref.gbk', is_ref=True)
qry_annos = dotprep.gbk_to_annotation_dict(gbk='qry.gbk', is_ref=False, index=index)
# Dot-compatible annotations, as Python dictionaries

dot's People

Contributors

csbao avatar marianattestad avatar mrtomrod avatar robertaboukhalil avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.