Git Product home page Git Product logo

snp-dmx-cancer's Introduction

snp-dmx-cancer

This repository contains reproducible code for the Snakemake workflow, benchmark evaluations, and supplementary analyses in our paper evaluating genetic variation-based demultiplexing tools (in particular Vireo) for pooled single-cell RNA sequencing samples in cancer (high-grade serous ovarian cancer (HGSOC) and lung adenocarcinoma).

Paper

For details on the analyses, see our paper: Weber et al. (2021), "Genetic demultiplexing of pooled single-cell RNA-sequencing samples in cancer facilitates effective experimental design", bioRxiv

Contents

Workflow

The Snakemake workflow implements a complete workflow for one dataset and doublets simulation scenario (HGSOC dataset, 20% doublets), using the best-performing set of tools (bulk RNA-seq samples genotype from bcftools, demultiplexing using cellSNP/Vireo) from the benchmark. The workflow is modular, and can be adapted to substitute alternative tools.

Scripts for the Snakemake workflow are saved in workflow/:

Benchmark evaluations

Scripts for the benchmark evaluations are saved in benchmarking/:

Supplementary analyses

Scripts for the supplementary analyses are saved in the following directories:

Additional scripts

Scripts for additional steps outside the main workflow and benchmark evaluations:

  • alternative/: scripts for alternative tools that were not used in the final workflow, which may be useful in the future (e.g. salmon alevin instead of Cell Ranger)
  • download_EGA/: script to download data files for lung adenocarcinoma dataset (Kim et al. 2020) from European Genome-phenome Archive (EGA) (requires access to the controlled access data repository)
  • download_souporcell/: scripts to download data files for healthy (non-cancer) iPSC cell line dataset from souporcell paper (Heaton et al. 2020) from European Nucleotide Archive (ENA)
  • filter_vcf/: script to filter 1000 Genomes Project genotype VCF file to retain only SNPs in 3' untranslated regions (UTRs), for faster runtime
  • genotype/: scripts to run different options of tools to generate custom genotype VCF file, including from matched bulk RNA-seq samples (using either bcftools or cellSNP), or directly from single-cell RNA-seq samples (using cellSNP)

Links

snp-dmx-cancer's People

Contributors

lmweber avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.