Git Product home page Git Product logo

mosaicatcher-pipeline's Introduction

MosaiCatcher mosaicatcher-pipeline workflow checks Snakemake

Structural variant calling from single-cell Strand-seq data Snakemake pipeline.

Overview of this workflow

This workflow uses Snakemake to execute all steps of MosaiCatcher in order. The starting point are single-cell BAM files from Strand-seq experiments and the final output are SV predictions in a tabular format as well as in a graphical representation. To get to this point, the workflow goes through the following steps:

  1. Binning of sequencing reads in genomic windows of 200kb via mosaic
  2. Strand state detection
  3. [Optional]Normalization of coverage with respect to a reference sample
  4. Multi-variate segmentation of cells (mosaic)
  5. Haplotype resolution via StrandPhaseR
  6. Bayesian classification of segmentation to find SVs using MosaiClassifier
  7. Visualization of results using custom R plots
summary
MosaiCatcher snakemake pipeline and visualisations examples
ashleys-qc-pipeline

๐Ÿ“˜ Documentation

๐Ÿ“† Roadmap

Technical-related features

  • Zenodo automatic download of external files + indexes (1.2.1)
  • Multiple samples in the parent folder (1.2.2)
  • Automatic testing of BAM SM tag compared to sample folder name (1.2.3)
  • On-error/success e-mail (1.3)
  • HPC execution (slurm profile for the moment) (1.3)
  • Full singularity image with preinstalled conda envs (1.5.1)
  • Single BAM folder with side config file (1.6.1)
  • (EMBL) GeneCore mode of execution: allow selection and execution directly by specifying genecore run folder (2022-11-02-H372MAFX5 for instance) (1.8.2)
  • Version synchronisation between ashleys-qc-pipeline and mosaicatcher-pipeline (1.8.3)
  • Report captions update (1.8.5)
  • Clustering plot (heatmap) & SV calls plot update (1.8.6)
  • ashleys_pipeline_only parameter: using mosaicatcher-pipeline, trigger ashleys-qc-pipeline only and will stop after the generation of the counts, ashleys predictions & plots to allow the user manual reviewing/selection of the cells to be processed (2.2.0)
  • Plotting options (enable/disable segmentation back colors)

Bioinformatic-related features

  • Self-handling of low-coverage cells (1.6.1)
  • Upstream ashleys-qc-pipeline and FASTQ handle (1.6.1)
  • Change of reference genome (currently only GRCh38) (1.7.0)
  • Ploidy detection at the segment and the chromosome level: used to bypass StrandPhaseR if more than half of a chromosome is haploid (1.7.0)
  • inpub_bam_legacy mode (bam/selected folders) (1.8.4)
  • Blacklist regions files for T2T & hg19 (1.8.5)
  • ArbiGent integration: Strand-Seq based genotyper to study SV containly at least 500bp of uniquely mappable sequence (1.9.0)
  • scNOVA integration: Strand-Seq Single-Cell Nucleosome Occupancy and genetic Variation Analysis (1.9.2)
  • multistep_normalisation and multistep_normalisation_for_SV_calling parameters to replace GC analysis module (library size normalisation, GC correction, Variance Stabilising Transformation) (2.1.1)
  • Strand-Seq processing based on mm10 assembly (2.1.2)
  • UCSC ready to use file generation including counts & SV calls (2.1.2)
  • blacklist_regions parameter: (2.2.0)
  • IGV ready to use XML session generation: (2.2.2)
  • Pooled samples

Small issues to fix

  • replace input_bam_location by data_location (harmonization with ashleys-qc-pipeline)
  • List of commands available through list_commands parameter (1.8.6
  • Move pysam / SM tag comparison script to snakemake rule (2.2.0)

๐Ÿ›‘ Troubleshooting & Current limitations

  • Do not change the structure of your input folder after running the pipeline, first execution will build a config dataframe file (OUTPUT_DIRECTORY/config/config.tsv) that contains the list of cells and the associated paths
  • Do not change the list of chromosomes after a first execution (i.e: first execution on chr17, second execution on all chromosomes)

๐Ÿ’‚โ€โ™‚๏ธ Authors (alphabetical order)

  • Ashraf Hufash
  • Cosenza Marco
  • Ebert Peter
  • Ghareghani Maryam
  • Grimes Karen
  • Gros Christina
  • Hรถps Wolfram
  • Jeong Hyobin
  • Kinanen Venla
  • Korbel Jan
  • Marschall Tobias
  • Meiers Sasha
  • Porubsky David
  • Rausch Tobias
  • Sanders Ashley
  • Van Vliet Alex
  • Weber Thomas (maintainer and current developer)

๐Ÿ“• References

Strand-seq publication: Falconer, E., Hills, M., Naumann, U. et al. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat Methods 9, 1107โ€“1112 (2012). https://doi.org/10.1038/nmeth.2206

scTRIP/MosaiCatcher original publication: Sanders, A.D., Meiers, S., Ghareghani, M. et al. Single-cell analysis of structural variations and complex rearrangements with tri-channel processing. Nat Biotechnol 38, 343โ€“354 (2020). https://doi.org/10.1038/s41587-019-0366-x

ArbiGent publication: Porubsky, David, Wolfram Hรถps, Hufsah Ashraf, PingHsun Hsieh, Bernardo Rodriguez-Martin, Feyza Yilmaz, Jana Ebler, et al. 2022. โ€œRecurrent Inversion Polymorphisms in Humans Associate with Genetic Instability and Genomic Disorders.โ€ Cell 185 (11): 1986-2005.e26. https://doi.org/10.1016/j.cell.2022.04.017.

scNOVA publication: Jeong, Hyobin, Karen Grimes, Kerstin K. Rauwolf, Peter-Martin Bruch, Tobias Rausch, Patrick Hasenfeld, Eva Benito, et al. 2022. โ€œFunctional Analysis of Structural Variants in Single Cells Using Strand-Seq.โ€ Nature Biotechnology, November, 1โ€“13. https://doi.org/10.1038/s41587-022-01551-4.

scNOVA publication: Jeong, Hyobin, Karen Grimes, Kerstin K. Rauwolf, Peter-Martin Bruch, Tobias Rausch, Patrick Hasenfeld, Eva Benito, et al. 2022. โ€œFunctional Analysis of Structural Variants in Single Cells Using Strand-Seq.โ€ Nature Biotechnology, November, 1โ€“13. https://doi.org/10.1038/s41587-022-01551-4.

mosaicatcher-pipeline's People

Contributors

weber8thomas avatar tobiasmarschall avatar meiers avatar maryamghr avatar drashley avatar jeongdo801 avatar daewoooo avatar bozbezbozzel avatar tobiasrausch avatar hufsah-ashraf avatar whops avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.