Git Product home page Git Product logo

fastqc's Introduction

image

JOSS (journal of open source software) DOI

image

Python 3.8 | 3.9 | 3.10

This is is the fastqc pipeline from the Sequana projet

Overview

Runs fastqc and multiqc on a set of Sequencing data to produce control quality reports

Input

A set of FastQ files (paired or single-end) compressed or not

Output

An HTML file summary.html (individual fastqc reports, mutli-samples report)

Status

Production

Wiki

https://github.com/sequana/fastqc/wiki

Documentation

This README file, the Wiki from the github repository (link above) and https://sequana.readthedocs.io

Citation

Cokelaer et al, (2017), 'Sequana': a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi:10.21105/joss.00352

Installation

sequana_fastqc is based on Python3, just install the package as follows:

pip install sequana_fastqc --upgrade

You will need third-party software such as fastqc. Please see below for details.

Usage

If you have a set of FastQ files in a data/ directory, type:

sequana_fastqc --input-directory data

To know more about the options (e.g., add a different pattern to restrict the execution to a subset of the input files, change the output/working directory, etc):

sequana_fastqc --help

The call to sequana_fastqc creates a directory fastqc. Then, you go to the working directory and execute the pipeline as follows:

cd fastqc
sh fastqc.sh  # for a local run

This launch a snakemake pipeline. If you are familiar with snakemake, you can retrieve the fastqc.rules and config.yaml files and then execute the pipeline yourself with specific parameters:

snakemake -s fastqc.rules --cores 4 --stats stats.txt

Or use sequanix interface.

Please see the Wiki for more examples and features.

Tutorial

You can retrieve test data from sequana_fastqc (https://github.com/sequana/fastqc) or type:

wget https://raw.githubusercontent.com/sequana/fastqc/master/sequana_pipelines/fastqc/data/data_R1_001.fastq.gz
wget https://raw.githubusercontent.com/sequana/fastqc/master/sequana_pipelines/fastqc/data/data_R2_001.fastq.gz

then, prepare the pipeline:

sequana_fastqc --input-directory .
cd fastqc
sh fastq.sh

# once done, remove temporary files (snakemake and others)
make clean

Just open the HTML entry called summary.html. A multiqc report is also available. You will get expected images such as the following one:

image

Please see the Wiki for more examples and features.

Requirements

This pipelines requires the following executable(s):

  • fastqc
  • falco (optional)

For Linux users, we provide apptainer/singularity images available through the damona project (https://damona.readthedocs.io).

To make use of them, initiliase the pipeline with the --use-apptainer option and everything should be downloaded automatically for you, which also guarantees reproducibility:

sequana_fastqc --input-directory data --use-apptainer --apptainer-prefix ~/images

image

Details

This pipeline runs fastqc in parallel on the input fastq files (paired or not) and then execute multiqc. A brief sequana summary report is also produced. s You may use falco instead of fastqc. This is experimental but seem to work for Illumina/FastQ files.

This pipeline has been tested on several hundreds of MiSeq, NextSeq, MiniSeq, ISeq100, Pacbio runs.

It produces a md5sum of your data. It copes with empty samples. Produces ready-to-use HTML reports, etc

Rules and configuration details

Here is the latest documented configuration file to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.

Changelog

Version Description
1.8.2 * Fix the onerror typo in the pipeline, fix CI.
1.8.1 * update __init__ (version)

1.8.0

  • uses pyproject instead of setuptools
  • uses click instead of argparse and newest sequana_pipetools (0.16.0)

1.7.1

  • Set wrapper version in the config based on new sequana_pipetools feature
1.7.0 * Use new rulegraph wrapper and new graphviz apptainer
1.6.2 * slight refactorisation to use rulegraph wrapper

1.6.1

  • pin sequana version to 1.4.4 to force usage of new fastqc module to fix falco. Updated config documentation.
1.6.0 * Fixed falco output error and use singularity containers
1.5.0 * removed modules completely.
1.4.2 * simplified pipeline (suppress setup and use existing wrapper)
1.4.1 * simplified pipeline with wrappers/rules

1.4.0

  • This version uses sequana 0.12.0 and new sequana-wrappers mechanism. Functionalities is unchanged. Also based on sequana_pipetools 0.6.X

1.3.0

  • add option --skip-multiqc (in case of memory trouble)
  • Fix typo in the link towards fastqc reports in the summary.html table
  • Fix number of samples in the paired case (divide by 2)

1.2.0

  • compatibility with Sequanix
  • Fix pipeline to cope with new snakemake API

1.1.0

  • add new rule to allow users to choose falco software instead of fastqc. Note that fastqc is 4 times faster but still a work in progress (version 0.1 as of Nov 2020).
  • allows the pipeline to process pacbio files (in fact any files accepted by fastqc i.e. SAM and BAM files
  • More doc, test and info on the wiki
1.0.1 * add md5sum of input files as md5.txt file

1.0.0

  • a stable version. Added a wiki on github as well and a singularity recipes
0.9.15 * For the HTML reports, takes into account samples with zero reads
0.9.14 * round up some statistics in the main table
0.9.13 * improve the summary HTML report
0.9.12 * implemented new --from-project option

0.9.11

  • now depends on sequana_pipetools instead of sequana.pipelines to speed up --help calls
  • new summary.html report created with pipeline summary
  • new rule (plotting)
0.9.10 * simplify the onsuccess section
0.9.9 * add missing png and pipeline (regression bug)
0.9.8 * add missing multi_config file

0.9.7

  • check existence of input directory in main.py
  • add a logo
  • fix schema
  • add multiqc_config
  • add sequana + sequana_fastqc version
0.9.6 add the readtag option

Contribute & Code of Conduct

To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

fastqc's People

Contributors

cokelaer avatar dependabot[bot] avatar llemee avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fastqc's Issues

Improvements

  • add schema
  • add logo
  • better HTML output with versioning

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.