Git Product home page Git Product logo

babs-mnaseqpe's Introduction

BABS-MNASeqPE

Introduction

A Nextflow pipeline for processing paired-end Illumina MNASeq sequencing data.

The pipeline was written by The Bioinformatics & Biostatistics Group at The Francis Crick Institute, London.

Pipeline summary

  1. Raw read QC (FastQC, Fastq Screen)
  2. Adapter trimming (cutadapt)
  3. Alignment (BWA)
  4. Mark duplicates (picard)
  5. Filtering to remove:
    • reads that are marked as duplicates (SAMtools)
    • reads that arent marked as primary alignments (SAMtools)
    • reads that are unmapped (SAMtools)
    • reads that map to multiple locations (SAMtools)
    • reads containing > 3 mismatches in either read of the pair (BAMTools)
    • reads that have a user-defined insert size (BAMTools)
    • reads that are soft-clipped (BAMTools)
    • reads that map to different chromosomes (Pysam)
    • reads that arent in FR orientation (Pysam)
    • reads where only one read of the pair fails the above criteria (Pysam)
  6. Merge alignments at replicate-level (picard)
    • Re-mark duplicates (picard)
    • Remove duplicate reads (optional; SAMtools)
    • Create normalised bigWig files scaled to 1 million mapped read pairs (BEDTools, wigToBigWig)
  7. Call nucleosome positions and generate smoothed, normalised coverage wig files that can be used to generate occupancy profile plots between samples across features of interest (DANPOS2)
  8. Create IGV session file containing bigWig tracks for data visualisation (IGV)
  9. Collect and present QC at the raw read and alignment-level (MultiQC)

Documentation

The documentation for the pipeline can be found in the docs/ directory:

  1. Installation
  2. Pipeline configuration
  3. Reference genome
  4. Design file
  5. Running the pipeline
  6. Output and interpretation of results
  7. Troubleshooting

Pipeline DAG

BABS-MNASeqPE directed acyclic graph

Credits

The pipeline was written by the The Bioinformatics & Biostatistics Group at The Francis Crick Institute, London.

The pipeline was developed by Harshil Patel.

The NGI-RNAseq pipeline developed by Phil Ewels was used a template for this pipeline. Many thanks to Phil and the team at SciLifeLab.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.