Covid-seq at OUS, NSC node of NorSeq

SARS-CoV-2 whole genome sequencing based on multiplexed amplicon method using short-read Illumina sequencers

Library prep

Swift Normalase® Amplicon SARS-CoV-2 Panels (SNAP) with Additional Genome Coverage.

150 bp paired-end sequencing using Illumina NovaSeq 6000 with NextSeq, MiSeq and HiSeq as backup in the said order.

Bioinformatics analysis

This is a snapshot of the production code

Execution:

nextflow run main.nf --outpath <Output_folder> --samplelist <SampleList.csv>  --align_tool "bowtie2" -resume

References used for the analysis can be found in the folder util. See below for details on the sample list.

The pipeline uses singularity or docker containers. To use docker, specify: --use_docker. The containers are here: https://github.com/nsc-norway/covid-seq-containers/ . See the file README_script_covid for how to generate the singularity images.

In brief:

Primers used in the library prep are trimmed from raw reads using NSCtrim
Low quality reads and adapter sequences are trimmed/removed using fastp
Clean reads are aligned to the genome using bowtie2
Variants and consensus seqeunces are identified using samtools mpileup and iVar
Secondary analysis is based on Pangolin and Nextclade

Nextflow + Singularity (through Docker) + SLURM executed in linux cluster with 1000+ cores and 5 TB+ Memory

Singularity image build files will be uploaded soon.

Sample list

The sample list is a CSV file with one row per sample. Most of the pipeline only uses three columns -- sample, fastq_1 and fastq_2. The report generator and the check_variants.py script use additional information such as the well position.

See here for an example file.

Column name	Description
`sample`	Sample name, must be unique. See below for pos/neg controls.
`Well`	Well position, for reports & variants file. Must be in format A1, B1, ... H12.
`OrigCtValue`	Ct value for report, or 'NA' if no Ct value is available.
`ProjectName`	Project name, for report and QC plots only.
`SeqRunId`	Run ID, for report.
`SequencerType`	Sequencer type, for report.
`fastq_1`	Path to read 1 fastq file, relative to where nextflow is started.
`fastq_2`	Path to read 2 fastq file, relative to where nextflow is started.
`MIKInputCols`	Optional additional columns to include in the report. Key=Value pairs separated by semicolons.
`ControlName`	Control sample metadata. This has to be the tenth column to be used by report generator, so MIKInputCols is required if this column is used.

If the sample name includes 'neg' / 'NEG' it is treated as a negative control by the report generator and QC plots. Same with 'pos' / 'POS', and positive control. If the strings 'NEG' or 'POS' are found in the ControlName column, that sets the negative / positive control status used for the QC plots.

It is required to have more than one sample passing the alignment QC (BOWTIE2_ALIGN) step. If you only have a single sample, you can disable the process NSC4FHI_NOISE_NEXTCLADE in main.nf by commenting lines 142 to 145 (add //). If you disable that process, the pipeline will run successfully with a single sample.

//NSC4FHI_NOISE_NEXTCLADE(
//    NOISE_EXTRACTOR.out.NOISE_SUMMARY_FILES_out.collect(),
//    NEXTCLADE_FOR_FHI.out.NEXTCLADE_FOR_FHI_out
//    )

nsc-norway / covid-seq Goto Github PK

covid-seq's Introduction

Covid-seq at OUS, NSC node of NorSeq

SARS-CoV-2 whole genome sequencing based on multiplexed amplicon method using short-read Illumina sequencers

Library prep

Bioinformatics analysis

In brief:

Sample list

covid-seq's People

Contributors

Stargazers

Watchers

Forkers

covid-seq's Issues

Container images

samplesheet format

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent