Covid-seq at OUS, NSC node of NorSeq
SARS-CoV-2 whole genome sequencing based on multiplexed amplicon method using short-read Illumina sequencers
Swift Normalase® Amplicon SARS-CoV-2 Panels (SNAP) with Additional Genome Coverage.
150 bp paired-end sequencing using Illumina NovaSeq 6000 with NextSeq, MiSeq and HiSeq as backup in the said order.
This is a snapshot of the production code
Execution:
nextflow run main.nf --outpath <Output_folder> --samplelist <SampleList.csv> --align_tool "bowtie2" -resume
References used for the analysis can be found in the folder util. See below for details on the sample list.
The pipeline uses singularity or docker containers. To use docker, specify: --use_docker
. The containers are here: https://github.com/nsc-norway/covid-seq-containers/ . See the file README_script_covid for how to generate the singularity images.
Primers used in the library prep are trimmed from raw reads using NSCtrim
Low quality reads and adapter sequences are trimmed/removed using fastp
Clean reads are aligned to the genome using bowtie2
Variants and consensus seqeunces are identified using samtools mpileup and iVar
Secondary analysis is based on Pangolin and Nextclade
Nextflow + Singularity (through Docker) + SLURM executed in linux cluster with 1000+ cores and 5 TB+ Memory
Singularity image build files will be uploaded soon.
The sample list is a CSV file with one row per sample. Most of the pipeline only uses three columns -- sample
, fastq_1
and fastq_2
.
The report generator and the check_variants.py script use additional information such as the well position.
See here for an example file.
Column name | Description |
---|---|
sample |
Sample name, must be unique. See below for pos/neg controls. |
Well |
Well position, for reports & variants file. Must be in format A1, B1, ... H12. |
OrigCtValue |
Ct value for report, or 'NA' if no Ct value is available. |
ProjectName |
Project name, for report and QC plots only. |
SeqRunId |
Run ID, for report. |
SequencerType |
Sequencer type, for report. |
fastq_1 |
Path to read 1 fastq file, relative to where nextflow is started. |
fastq_2 |
Path to read 2 fastq file, relative to where nextflow is started. |
MIKInputCols |
Optional additional columns to include in the report. Key=Value pairs separated by semicolons. |
ControlName |
Control sample metadata. This has to be the tenth column to be used by report generator, so MIKInputCols is required if this column is used. |
If the sample name includes 'neg' / 'NEG' it is treated as a negative control by the report generator and QC plots. Same with 'pos' / 'POS', and positive control.
If the strings 'NEG' or 'POS' are found in the ControlName
column, that sets the negative / positive control status used for the QC plots.
It is required to have more than one sample passing the alignment QC (BOWTIE2_ALIGN) step. If you only have a single sample, you can disable the
process NSC4FHI_NOISE_NEXTCLADE in main.nf
by commenting lines 142 to 145 (add //). If you disable that process, the pipeline will run successfully
with a single sample.
//NSC4FHI_NOISE_NEXTCLADE(
// NOISE_EXTRACTOR.out.NOISE_SUMMARY_FILES_out.collect(),
// NEXTCLADE_FOR_FHI.out.NEXTCLADE_FOR_FHI_out
// )