Comments (4)
Hi Sasi,
For differential peak accessibility, the peak size will not matter as you're comparing the accessibility of the same peak across different cells. For upstream normalization, you could potentially divide the counts by peak width before running LSI to normalize for different peak sizes. I have not tested this extensively, but in testing with the mouse brain dataset this had no effect that I could see on the resulting embeddings.
If you think this could be an issue for your dataset then I'd encourage you to try correcting for peak width and comparing with results without correction, and would be interested to hear your feedback.
Here is an example showing how to correct for peak width and comparing with/without this correction:
library(Signac)
library(Seurat)
library(ggplot2)
# using data from the mouse brain vignette as an example
brain <- readRDS("brain.rds")
# first compute the length of each peak
brain <- RegionStats(
object = brain,
genome = BSgenome.Mmusculus.UCSC.mm10,
sep = c(":", "-")
)
# extract the width of each peak
peak.width <- GetAssayData(brain, assay = 'peaks', slot = 'meta.features')$sequence.length
# extract the raw counts
counts <- GetAssayData(brain, assay = 'peaks', slot = 'counts')
# divide the counts for each peak by the width of the peak
norm.counts <- counts / peak.width
# create a new assay with the normalized counts and perform the same processing
brain[['normPeaks']] <- CreateAssayObject(counts = norm.counts)
DefaultAssay(brain) <- 'normPeaks'
brain <- RunTFIDF(brain)
brain <- FindTopFeatures(brain, min.cutoff = 'q5')
brain <- RunSVD(brain, n = 50, reduction.key = 'LSI_', reduction.name = 'lsiNorm')
brain <- RunUMAP(brain, dims = 1:50, reduction = 'lsiNorm', reduction.name = 'umapNorm')
# compare UMAP embeddings for normalized vs raw counts
p1 <- DimPlot(brain, reduction = 'umapNorm', label = TRUE) + NoLegend() + ggtitle('Length-normalized peak counts')
p2 <- DimPlot(brain, reduction = 'umap', label = TRUE) + NoLegend() + ggtitle('Raw peak counts')
cowplot::plot_grid(p1, p2)
Above shows a UMAP computed using length-normalized counts (left) and raw counts (right), with cells colored the same in both plots. I can't see much of a difference, but this is just one case. If you can show an example where this makes a difference I'd can add a function to perform this normalization in Signac.
Tim
from signac.
Hi Tim,
Thank you for the detailed explanation,will try the peak width correction and see how it impacts our data.
We found that peaks with more width are significant than the smaller peaks.
The distribution of widths of differential peaks is skewed towards right.
Best regards
Sasi
from signac.
That's interesting, I think this can be a complicated issue because there are likely technical and biological factors involved. Longer peaks will tend to have higher counts (since there are more chances to overlap a read), and so potentially you have higher power to detect differential accessibility. I will have to think more about this, but thanks for bringing it up.
from signac.
Yes ,will check and update if the peak width correction is helping .
from signac.
Related Issues (20)
- Question for obtaining nucleosome free region
- QC about TSS.enrichment and nucleosome_signal
- 10X multiome subsetting cancer cell line with blacklist_ratio ~ 1 for both hg38_blacklist and hg38_blacklist_unified HOT 4
- RegionHeatmap order
- GeneActivity on pseudo-bulk data HOT 1
- RunSLSI error HOT 1
- Error in Links(cds) <- links HOT 1
- Error in fixupDN.if.valid(value, x@Dim) when using ReadMGATK HOT 1
- Motif "positions" slot is NULL HOT 1
- Coverage plot HOT 1
- Number of rows in provided matrix does not match the number of rows in the object HOT 2
- Dimensions of feature matrix do not match length of peaks granges HOT 3
- Integrating ATAC-seq data from multiple species with a consensus peak set
- An error about "RunChromVAR" HOT 2
- Significant Discrepancy in Peak Counts Between Cell Ranger ATAC and Signac Peak Calling HOT 1
- SNP infomation
- **Error in slice_sample(): ! n must be a round number, not the number 1146.4. HOT 10
- how to do monocle3 by standard 10x output files HOT 1
- Seqinfo object HOT 1
- FindTransferAnchors replacement error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from signac.