Git Product home page Git Product logo

enterovirus_a71's Introduction

Enterovirus A71 Nextstrain Analysis

This build performs a full Nextstrain analysis of Enterovirus A71. You can choose to either run a >=600 base pair VP1 run or a >=6400 base pair whole genome run.

If you are unfamiliar with or haven't installed Nextstrain you can find an introduction and full documentation here.

You can read the master's thesis by Simon Grimm, based on this build, here.

This build could be extended in the future to do several additional things:

  1. Including additional metadata like patient age, granular spatial data or clinical outcomes.
  2. Automating updates of the build with the newest available sequences. See Emma Hodcroft's Enterovirus D68 build for some efforts to implement this with a closely related virus.

Data used for this build can be downloaded from viprbrc.org. I've added instructions for how to download sequences manually at the end of this README.

To learn more about Enterovirus A71, I recommend this very well written review article by Solomon et al.

Organization of repository:

This repo contains the following folders and files:

scripts contains custom python scripts which are being called from the snakefile.

snakefile contains the entire computational pipeline. This file uses the Snakemake workflow management system, which allows elegant, reproducible biocomputational analyses. You can find snakemake's documentation here. If you want to change some part of the analysis or call your own scripts, you need to edit this file.

ev_a71/vp1 contains sequences and config files used for the >=600 bp VP1 run.

ev_a71/whole_genome contains sequences and config files used for the >=6400 bp whole genome run.

In the folder ev_a71/vp1/config and ev_a71/whole_genome/config respectively, you can find configuration files required for running nextstrain:

  • coloring scheme colors.tsv
  • geographical locations geo_regions.tsv
  • latitude data lat_longs.tsv
  • dropped strains dropped_strains.txt
  • virus clade assignments clades_genome.tsv
  • reference sequence reference_sequence.gb

The reference sequence used for this build can be found online. It was sequenced in 1970, is called BrCr, and its accession number is U22521.

Quickstart

Setup

Nextstrain environment

To run this repository you need to install the Nextstrain environment. You can find detailed install instructions here.

Running build

Before running a build, you need to initialize nextstrain by executing

conda activate nextstrain

Following this you can create a vp1 build and a whole genome build simply by executing

snakemake --cores 1

If you only want one of those builds, you can either create a vp1 build by executing

snakemake ev_a71/vp1/auspice/ev_a71_vp1.json --cores 1

or you can create a whole genome build by executing

snakemake ev_a71/whole_genome/auspice/ev_a71_whole_genome.json --cores 1

Visualizing build

If everything worked out, you can now visualize your build using auspice (which is contained within nextstrain).

For the vp1 build do this via

auspice view --datasetDir ev_a71/vp1/auspice

For the whole genome build do this via

auspice view --datasetDir ev_a71/whole_genome/auspice

You might need to run the command export PORT=4001 if you want to run two auspice visualizations simultaneously.

Sequences

You can download up-to-date sequences. This can be done via viprbrc.org. On the landing page, pick Enterovirus (you should find this under the header "Featured Viruses").

Within the Enterovirus Taxonomy Browser, pick Enterovirus A. On the Genome Search page, click on "Search Criteria". There you can select Enterovirus A71 sequences. As of January 2022, there should be ~13'000 sequences. You do NOT need to specify sequence length, as subsampling by length is included in this build.

Sequences should be downloaded in "Genome FASTA" format. Under Format for FASTA file definition line pick Custom format, adding ALL metadata fields. You can now download the sequences.

Save the resulting file as vipr.fasta in the folder ev_a71/whole_genome/data and ev_a71/vp1/data.

Feedback

If you have any questions or comments feel free to reach out via github, twitter (@Simon__Grimm) or via simon(dot)grimm(at)unibas(dot)ch.

enterovirus_a71's People

Contributors

scsmncao avatar simonleandergrimm avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.