Git Product home page Git Product logo

covid-seq's Introduction

NorSeq logo

Covid-seq at OUS, NSC node of NorSeq

SARS-CoV-2 whole genome sequencing based on multiplexed amplicon method using short-read Illumina sequencers

Library prep

Swift Normalase® Amplicon SARS-CoV-2 Panels (SNAP) with Additional Genome Coverage.

150 bp paired-end sequencing using Illumina NovaSeq 6000 with NextSeq, MiSeq and HiSeq as backup in the said order.

Bioinformatics analysis

This is a snapshot of the production code

Execution:

nextflow run main.nf --outpath <Output_folder> --samplelist <SampleList.csv>  --align_tool "bowtie2" -resume

References used for the analysis can be found in the folder util. See below for details on the sample list.

The pipeline uses singularity or docker containers. To use docker, specify: --use_docker. The containers are here: https://github.com/nsc-norway/covid-seq-containers/ . See the file README_script_covid for how to generate the singularity images.

In brief:

Primers used in the library prep are trimmed from raw reads using NSCtrim
Low quality reads and adapter sequences are trimmed/removed using fastp
Clean reads are aligned to the genome using bowtie2
Variants and consensus seqeunces are identified using samtools mpileup and iVar
Secondary analysis is based on Pangolin and Nextclade

Nextflow + Singularity (through Docker) + SLURM executed in linux cluster with 1000+ cores and 5 TB+ Memory

Singularity image build files will be uploaded soon.

Sample list

The sample list is a CSV file with one row per sample. Most of the pipeline only uses three columns -- sample, fastq_1 and fastq_2. The report generator and the check_variants.py script use additional information such as the well position.

See here for an example file.

Column name Description
sample  Sample name, must be unique. See below for pos/neg controls.
Well Well position, for reports & variants file. Must be in format A1, B1, ... H12.
OrigCtValue  Ct value for report, or 'NA' if no Ct value is available.
ProjectName Project name, for report and QC plots only.
SeqRunId Run ID, for report.
SequencerType Sequencer type, for report.
fastq_1 Path to read 1 fastq file, relative to where nextflow is started.
fastq_2 Path to read 2 fastq file, relative to where nextflow is started.
MIKInputCols Optional additional columns to include in the report. Key=Value pairs separated by semicolons.
ControlName Control sample metadata. This has to be the tenth column to be used by report generator, so MIKInputCols is required if this column is used.

If the sample name includes 'neg' / 'NEG' it is treated as a negative control by the report generator and QC plots. Same with 'pos' / 'POS', and positive control. If the strings 'NEG' or 'POS' are found in the ControlName column, that sets the negative / positive control status used for the QC plots.

It is required to have more than one sample passing the alignment QC (BOWTIE2_ALIGN) step. If you only have a single sample, you can disable the process NSC4FHI_NOISE_NEXTCLADE in main.nf by commenting lines 142 to 145 (add //). If you disable that process, the pipeline will run successfully with a single sample.

//NSC4FHI_NOISE_NEXTCLADE(
//    NOISE_EXTRACTOR.out.NOISE_SUMMARY_FILES_out.collect(),
//    NEXTCLADE_FOR_FHI.out.NEXTCLADE_FOR_FHI_out
//    )

covid-seq's People

Contributors

arvindsundaram avatar fa2k avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

inful

covid-seq's Issues

Container images

Would it be possible for you to provide reference to the docker container images used for generating your singularity images?

samplesheet format

Would it be possible for you to provide an example or a format description of the sampleListFile?

An example with header and one sample line would suffice :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.