Git Product home page Git Product logo

cov-browser's Introduction

CoV-SARS-2 Genome Tracker

Abstract: Genome sequences constitute the primary evidence on the origin and spread of the 2019-2020 Covid-19 pandemic. Rapid comparative analysis of coronavirus CoV-SARS-2 genomes is critical for disease control, outbreak forecasting, and for developing clinical interventions. With CoV Genome Tracker we trace viral genomic changes in real time using a haplotype network, an accurate and scalable representation of evolutionary changes at micro-evolutionary time scale. We resolve direction of mutations by using a bat-associated genome as the outgroup. At the macro-evolutionary time scale, the Genome Tracker provides gene-by-gene and codon-by-codon evolutionary rates to facilitate the search for molecular targets of clinical interventions.

This is the repository for the CoV-SARS-2 Genome Tracker http://cov.genometracker.org

Outline

  1. Download hCoV-19 genomic sequences from GISAID
  2. Parse sequences and meta-data using parse-metadata.ipynb
    • a. Align each hCoV-19 genome to an NCBI reference genome, Wuhan-Hu-1, accession ID NC_045512 with align-genome.sh
    • b. Identify variation sites with samtools and bcftools, and create a haplotype alignment using bcftools using infer-genome.sh
  3. Discard haplotypes with missing 10% or more bases then identify haplotype imputing interior missing bases with closest haplotype
  4. Build minimum spanning tree for unique haplotypes using hapnet.pl
  5. Create a bootstrap alignments and infer networks using hapnet-boot.pl
  6. Get consensus network using net-consense.pl

Workflow

Parse metadata

Reads the GISAID hCov-19 sequence metadata and adds geo-location.

parse-metadata.ipynb

It outputs into the following formats:

  • covid-19-[current_date].tsv
  • covid-[current_date].fasta

align-genome.sh

Align each CoV-SARS-2 isolates from GISAID to the NCBI reference genome Wuhan-Hu-1 (Genbank accession ID: NC_045512)

input: fasta containing hCov-19 genome, reference seq Wuhan-Hu-1, ploidy file, folder name bamfile for non-human sequences, outgroup (sorted bam file).

infer-genome.bash [ covid-[file_date.fasta] ] [Wuhan-Hu-1 reference sequence] [vcf ploidy file] [folder name that contains non-human cov genome bam files] [outgroup bam file]

impute-hap.pl

Estimating missing genotypes from haplotype or genotype reference panel.

  • input:
    • SAM file containing haplotypes with each sequence < 10% gaps (any non-ATGC: "n", "." or "-")
impute-hap.pl [input] []
  • output
    • log file with the removed sequences with 10% gaps or more
    • sequence type (ST) alignment

hapnet.pl

Reconstruct a network of unique haplotypes This program calculates the minimum spanning tree of a set haplotype.

hapnet.pl

Cite

Akther, Bezrucenkovas, Sulkow, Panlasigui, Li, Qiu, and Di,"CoV Genome Tracker: tracing genomic footprints of Covid-19 pandemic". BioRXiv https://www.biorxiv.org/content/biorxiv/early/2020/04/14/2020.04.10.036343.full.pdf

cov-browser's People

Contributors

weigang avatar weigangq avatar edgaras00 avatar cnsp avatar briansulkow avatar dilie avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.