Git Product home page Git Product logo

scatac-seq's Introduction

scATAC-seq: Comparing chromatin accessibility in hematopoietic stem cells (HSCs) of young vs aged mice

I wrote this R/Bioconductor script analysis pipeline for single-cell ATAC-seq (scATAC-seq) data looking at differential accessibility in hematopoietic stem cells (HSCs) from 10-week-old 'young' mice vs 20-month-old 'aged' mice. The data is from the NCBI Gene Expression Omnibus (GEO) repository and is obtained by running the Cell Ranger ATAC pipeline which carries out the following steps:

  • demultiplexing of raw BCL base call files into FASTQ files
  • read alignment
  • barcode counting
  • peak calling with reference “refdata-cellranger-arc-mm10-2020-A-2.0.0”,
  • outputs: BED file of peaks, CSV file with cell barcodes metadata, TSV/BED file of each unique fragment and associated cell barcode, etc

Overall Functionality:

  1. Data Loading and Preprocessing: It loads necessary R packages (Signac, Seurat, tidyverse, etc.) and reads in BED peak files, CSV cell barcodes metadata, and TSV/BED fragment files for young and aged single-cell ATAC-seq datasets.
  2. Peak Filtering and Common Set Creation: It identifies a common set of peaks by reducing peaks from individual datasets and filters out peaks based on width criteria.
  3. Initial Quality Control: Filters out low-quality cells based on specific cutoffs for various quality control metrics.
  4. Count Matrix Generation: Generates count matrices for both young and aged datasets based on the common peak set.
  5. Seurat Object Creation: Constructs Seurat objects for each dataset.
  6. Integration and Dimensionality Reduction: Integrates datasets, performs TF-IDF normalization followed by SVD, and conducts UMAP-based visualization and clustering.
  7. Gene Annotation and Analysis: Extracts gene annotations, adds them to the Seurat object, and conducts gene activity analysis.
  8. Normalization and Visualization of "RNA" Data: Normalizes gene activity "RNA" data, visualizes canonical marker genes, and identifies differential peaks between young and aged datasets.
  9. Visualization of Peaks: Visualizes coverage plots for selected peaks and their closest genomic features.

Input Files Required:

  • Individual peak files for young and aged datasets (youngPeaks.txt, agedPeaks.txt)
  • Metadata files for young and aged datasets (GSM5723631_Young_HSC_singlecell.csv, GSM5723632_Aged_HSC_singlecell.csv)
  • Fragment files for young and aged datasets (GSM5723631_Young_HSC_fragments.tsv.gz, GSM5723632_Aged_HSC_fragments.tsv.gz)
  • Annotation file (EnsDb.Mmusculus.v79)

R/Bioconductor Packages and Bioinformatics Tools Required:

  • Signac
  • Seurat
  • tidyverse
  • patchwork
  • GenomicRanges
  • future
  • EnsDb.Mmusculus.v79 (annotation)

Outputs:

  • Seurat objects (combined, young, aged) containing integrated and processed single-cell ATAC-seq data.
  • Visualizations: Various plots for quality control, gene activity, differential peaks, UMAP visualization, coverage plots for peaks, and more.

scatac-seq's People

Contributors

felixm3 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.