quality_control's Introduction

QC pipeline

Run this pipeline to get initial quality control for new sequence lanes.

To Run

Download the reference genome.

This can be done by running

wget ftp://ftp.ensemblgenomes.org/pub/plants/release-27/fasta/zea_mays/dna//Zea_mays.AGPv3.27.dna.genome.fa.gz

on most unix systems. You may need to use curl instead on a OSX.

Move the reference into the data directory and rename as ref.fasta.gz (keep the reference gzipped). Then run:

sbatch -p <queue> ref.sh

This will index the reference for you.

Quality Control of fastq

For quality control, then run:

while read r1 r2; do sbatch -p <queue> qc.sh $r1 $r2; done < list.txt

Where list.txt is a file with the full path to read1 and read2 on a single line separated by white space. If you have multiple lanes you want to run simultaneously, then each line of the file should have read1 and read2 separated by white space.

Ideally this should be run on gzipped fastq files.

The script will create a directory of summaries named after each file in the results directory, with an html output called fastqc_report.html in each directory. The initial quality report will also list over-represented sequences, which should include adapters if there is considerable adapter contamination.

To test

Run with test_r1.fastq.gz and test_r2.fastq.gz

Recommend Projects

rossibarra / quality_control Goto Github PK

quality_control's Introduction

QC pipeline

To Run

Download the reference genome.

Quality Control of fastq

To test

quality_control's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent