Git Product home page Git Product logo

arnaud00013 / sars-cov-2-hpda-evolutionary-analysis Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 39.39 MB

This repository contains a script for analyzing SARS-CoV-2 evolution in epitopes during the first two waves of the COVID-19 pandemic. The immunological data come from a high protein density array analysis of SARS-CoV-2 epitopes in 15 patients (N'Guessan A. et al., 2022). This script and the related data can be updated manually to integrate data from other waves or other sets of epitopes.

License: MIT License

R 100.00%
sars-cov-2 evolution within-host transmission variants-of-interest variants-of-concern omicron delta immunology selection

sars-cov-2-hpda-evolutionary-analysis's Introduction

SARS-CoV-2-HPDA-evolutionary-analysis

@Author: Arnaud N'Guessan

Overview

This repository contains a script for analyzing SARS-CoV-2 evolution in epitopes during the first two waves of the COVID-19 pandemic. The immunological data come from a high protein density array analysis of SARS-CoV-2 epitopes in 15 patients (N'Guessan A. et al., 2022). This script and the related data can be updated manually to integrate data from other waves or other sets of epitopes.

Dependencies

R (version 3.5.2+) packages: "ggplot2", "seqinr", "grid", "RColorBrewer", "randomcoloR", "gplots", "lmPerm", "ggpubr", "gridExtra", "RColorBrewer", "indicspecies", "tidyr", "Cairo", "parallel", "foreach", "doParallel", "infotheo", "VennDiagram", "Biostrings", "session"

The script

a) Inputs:

-->OUTPUT_WORKSPACE: The absolute path of the "SARS-CoV-2-HPDA-evolutionary-analysis/" repertory in your system. Make sure that it contains a sub-directory named "depth_report_NCBI_SRA_amplicon/" which contains all the samples depth coverage analysis files (a .csv file generated by "samtools depth" or a csv file with 3 columns/fields corresponding to the sample, the position of the site in the reference genome MN908947_3 and the site depth respectively). We added an example of such depth report file in "SARS-CoV-2-HPDA-evolutionary-analysis/depth_report_NCBI_SRA_amplicon/" so that you can visualize what it should look like (each sample needs to have its own depth report file). Next, make sure that the "SARS-CoV-2-HPDA-evolutionary-analysis/" repertory should also contain the script (high_confidence_epitopes_analysis.r) and the related data (Epitopes_mapped.csv, MN908947_3.fasta, Table_signature_mutations.csv, df_high_confidence_epitope_metrics.rds, df_sars_cov_2_epitopes.rds, df_variants_SRA_amplicon_first_wave.rds and df_variants_SRA_amplicon_second_wave.rds)

-->NB_CPUS: the number of cpus to use for analyzes that are performed through parallel programming (R doParallel)

b) Outputs: Various plots showing the evolutionary profile of SARS-CoV-2 epitopes during waves 1 and 2 + comparisons between lineages / variants.

c)Running the script For running the script from a terminal (command line), you must have R (version 3.5.2+) installed or loaded (slurm module) and you must run the command: Rscript high_confidence_epitopes_analysis.r $OUTPUT_WORKSPACE $NB_CPUS

d) Updating lineage signature mutation data To update or edit the lineage signature mutation data, you can open the file "Table_signature_mutations.csv" in Excel and add the new signature mutation + its lineage as a new entry in the table. Only these 2 columns are mandatory. You can set the other fields/columns as "NA" or leave them empty. Don't forget to save the table as a .csv file. You can also make the edits in your favorite text editor (newline for a signature mutation X of lineage Z sequences would be: X,Z,NA,NA,NA OR X,Z,"","","" OR X,Z,,,). The signature mutation name needs to be in the format ORF_name:Old_amino_acidResidue_position_in_ORF_protein_sequenceNew_amino_acid (e.g. ORF8:L84S).

References

We defined signature mutations of each variant (see "Table_signature_mutations.csv") as substitutions that are present in >=90% of sequences assigned to that lineage. We calculated the prevalence of substitutions in thousands of publicly available consensus sequences collected from NCBI during 2020 and added data from CoV-Spectrum about under-represented lineage in the database or lineages that emerged during 2021 (Chen et al., 2021). The signature mutation dataset is a mix of mutation prevalence data from our own NCBI consensus seqeunces database (for the earlier lineage) and GISAID data obtained from cov-spectrum (for more recent lineages like Omicron). Thus, multiple PANGO versions are involved (v.2.1.7 for the earliest 2020 lineages and v.3.1.20 for recent variants like Omicron). The signature mutation prevalence dataset is presented here as a json file named "Database_Missense_and_Nonsense_signature_mutations_prevalence_in_SC2_lineages_consensus_sequences_as_of_2021_01_16_plus_VOCs.json".

Chen, C., Nadeau, S., Yared, M., Voinov, P., Ning, X., Roemer, C. & Stadler, T. "CoV-Spectrum: Analysis of globally shared SARS-CoV-2 data to Identify and Characterize New Variants" Bioinformatics (2021); doi: 10.1093/bioinformatics/btab856.

sars-cov-2-hpda-evolutionary-analysis's People

Contributors

arnaud00013 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.