Git Product home page Git Product logo

covid19's Introduction

covid19

Software requirements

R and R packages

We ran R scripts with R v3.6.2. We used the following packages installable from CRAN:

  • dplyr (v0.85)
  • rvest (v0.3.5)
  • rworldmap (v1.3-6)
  • ggplot2 (v3.3.0)

Python and python packages:

We ran python scripts with Python 3.7, installed using Anaconda. See docstrings of *.py files for further information on reproducing results. Packages were installed with pip3 and include:

  • matplotlib
  • mmh3

Data

SARS-CoV2 and other coronavirus protein sequences

We manually downloaded the full polyprotein 1ab (ORF1ab), spike (S) protein, membrane (M) protein, envelope (E) protein, and nucleocapsid (N) protein sequence FASTA files for the following 34 coronavirus species from the National Center of Biotechnology Information (NCBI) Reference Sequence Database. Note that for SARS-CoV-2 and SARS-CoV we downloaded FASTA files for all additional annotated protein sequences comprising their full viral proteomes:

Name ORF1ab Spike Envelope Membrane Nucleocapsid
SARS-CoV-2* YP_009724389.1 YP_009724390.1 YP_009724392.1 YP_009724393.1 YP_009724397.2
SARS-CoV* NP_828849.2 NP_828851.1 NP_828854.1 NP_828855.1 NP_828858.1
OC43 YP_009555238.1 YP_009555241.1 YP_009555243.1 YP_009555244.1 YP_009555245.1
Bovine-CoV NP_150073.2 NP_150077.1 NP_150081.1 NP_150082.1 NP_150083.1
HKU24 YP_009113022.1 YP_009113025.1 YP_009113028.1 YP_009113029.1 YP_009113031.1
HKU1 YP_173236.1 YP_173238.1 YP_173240.1 YP_173241.1 YP_173242.1
MHV AAU06353.1 AAU06356.1 AAU06359.1 AAU06360.1 NP_045302.1
Rat-CoV YP_003029844.1 YP_003029848.1 YP_003029850.1 YP_003029851.1 YP_003029852.1
Bat-BCoV YP_009072438.1 YP_009072440.1 YP_009072442.1 YP_009072443.1 YP_009072446.1
Hedgehog-CoV YP_009513008.1 YP_009513010.1 YP_009513016.1 YP_009513017.1 YP_009513018.1
MERS-CoV YP_009047202.1 YP_009047204.1 YP_009047209.1 YP_009047210.1 YP_009047211.1
HKU4 YP_001039952.1 YP_001039953.1 YP_001039958.1 YP_001039959.1 YP_001039960.1
HKU5 YP_001039961.1 YP_001039962.1 YP_001039967.1 YP_001039968.1 YP_001039969.1
GCCDC1 YP_009273004.1 YP_009273005.1 YP_009273007.1 YP_009273008.1 YP_009273009.1
HKU9 YP_001039970.1 YP_001039971.1 YP_001039973.1 YP_001039974.1 YP_001039975.1
HKU14 YP_005454239.1 YP_005454245.1 YP_005454247.1 YP_005454248.1 YP_005454249.1
CDPHE15 YP_008439200.1 YP_008439202.1 YP_008439204.1 YP_008439205.1 YP_008439206.1
HKU10 YP_006908641.2 YP_006908642.1 YP_006908644.1 YP_006908645.1 YP_006908646.1
BtRf-AlphaCoV YP_009199789.1 YP_009199790.1 YP_009199792.1 YP_009199793.1 YP_009199794.1
229E ARU07599.1 ARU07601.1 ARU07603.1 ARU07604.1 ARU07605.1
LuchengRn-CoV YP_009336483.1 YP_009336484.1 YP_009336485.1 YP_009336486.1 YP_009336487.1
Ferret-CoV YP_009256195.1 YP_009256197.1 YP_009256199.1 YP_009256200.1 YP_009256201.1
Mink-CoV YP_009019180.1 YP_009019182.1 YP_009019184.1 YP_009019185.1 YP_009019186.1
Bat-CoV-1A YP_001718603.1 YP_001718605.1 YP_001718607.1 YP_001718608.1 YP_001718609.1
HKU8 YP_001718610.1 YP_001718612.1 YP_001718614.1 YP_001718615.1 YP_001718616.1
BtMr-AlphaCoV YP_009199608.1 YP_009199609.1 YP_009199611.1 YP_009199612.1 YP_009199613.1
BtNv-AlphaCoV YP_009201729.1 YP_009201730.1 YP_009201732.1 YP_009201733.1 YP_009201734.1
Porcine-EDV NP_598309.2 NP_598310.1 NP_598312.1 NP_598313.1 NP_598314.1
BtCoV512 YP_001351683.1 YP_001351684.1 YP_001351686.1 YP_001351687.1 YP_001351688.1
HKU2 YP_001552234.1 YP_001552236.1 YP_001552238.1 YP_001552239.1 YP_001552240.1
NL63 YP_003766.2 YP_003767.1 YP_003769.1 YP_003770.1 YP_003771.1
NL63-related YP_009328933.1 YP_009328935.1 YP_009328937.1 YP_009328938.1 YP_009328939.1
FCoV YP_004070193.2 YP_004070194.1 YP_004070197.1 YP_004070198.1 YP_004070199.1
TGE NP_058422.1 NP_058424.1 NP_058426.1 NP_058427.2 NP_058428.1

HLA allele frequency data

We designed and executed a custom R script (HLA_frequencies.R) to scrape all population and human leukocyte antigen (HLA) allele frequency from the Allele Frequency Net Database.

Data analysis

Protein sequence alignments

For each protein class (i.e. ORF1ab, S, M, E, N), all 34 coronavirus sequences were aligned using the web-based Clustal Omega multisequence aligner tool employing default parameters: sequence type [Protein], output alignment format [clustal_num], dealign [false], mBed-like clustering guide-tree [true], mBed-like clustering iteration [true], number of combined iterations [0], maximum guide tree iterations [-1], and maximum HMM iterations [-1].

covid19's People

Contributors

reidt03 avatar boeinco avatar nellore avatar metamaden avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.