stuart-lab / signac Goto Github PK
View Code? Open in Web Editor NEWR toolkit for the analysis of single-cell chromatin data
Home Page: https://stuartlab.org/signac/
License: Other
R toolkit for the analysis of single-cell chromatin data
Home Page: https://stuartlab.org/signac/
License: Other
Hi Tim,
People would want to analyze the aggregated scATAC-seq data for their actual samples by Signac. 10xCellrangerATAC offers "aggr" function, and it normalizes the data by subsampling fragments from higher-depth GEM wells until they all have an equal number of unique fragments per cell by default. Signac also normalizes the data by TF-IDF. Do you think we should omit the normalization done by CellrangerATAC "aggr"?
Thanks!
Yoshi
Is it possible to run signac on cellranger aggr 10x file?
Hi,
I was wondering if there was a simple way to put two coverage plots from two conditions and compare them side by side by scaling them the same way. I'm not really aware of how cowplot works or how I'd be able to get the CoveragePlot function with the same y axis limit as another one to compare them.
Thanks!
Hello!
I have been having troubles using the function ClosestFeature. I input a set of peak regions found to be differentially activated in cell clusters but i get the following error:
peaks
[1] "chr9-25677250-25678875" "chr2-219252833-2192552861"
[3] "chr1-201538135-201540645" "chr6-33570673-33571808"
[5] "chr2-96536348-965394011" "chr3-194002751-194004173"
[7] "chr12-55846923-55849266" "chr11-111734956-111736614"
[9] "chr9-130845807-130848326" "chr10-3172019-31735621"
[11] "chr7-108001168-108004822"
ClosestFeature (region= peaks, annotation = genes(TxDb.Hsapiens.UCSC.hg38.knownGene), sep=c('-','-'))
Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") :
In range 2: at least two out of 'start', 'end', and 'width', must
be supplied.
In addition: Warning message:
In .normargSEW0(end, "end") : NAs introduced by coercion to integer range
If i remove peaks region two then it works fine, but i still would like what is happening there.
peaks <- peaks[-2]
ClosestFeature (region= peaks, annotation = genes(TxDb.Hsapiens.UCSC.hg38.knownGene), sep=c('-',' gene_id region distance
286319 286319 chr9-25676389-25678440 0
107985246 107985246 chr1-201507241-201534784 3350
449520 449520 chr6-33540046-33589026 0
9936 9936 chr2-159768628-159798255 0
647323 647323 chr3-193957372-194003760 0
4327 4327 chr12-55835433-55842966 3956
5519 5519 chr11-111726908-111766389 0
25 25 chr9-130713016-130887675 0
9712 9712 chr10-11453946-11611754 0
3912 3912 chr7-107923799-108003187 0
However, why the peak regions are all different from the ones i input?
Thanks a lot!
In using CoveragePlot to visualize pseudobulk profiles by cluster, the function is currently unable to find cells in requested regions that are on any double digit chromosomes ( those > chr 9) for mouse mm10 aligned Cell Ranger Processed data. The regions I am trying to plot come from a differential accessibility analysis where I know cells have signal there (can be visualized via VlnPlot or FeaturePlot). The plotting works fine for any regions on chromosomes 1:9, but not the others, so I don't know if its an issue with CoveragePlot or the annotation.
The function call looks like this:
CoveragePlot(
object = seurat.signac.object,
region = region1,
sep = c(":","-"),
annotation = EnsDb.Mmusculus.v79,
extend.upstream = 5000,
extend.downstream = 5000,
ncol = 1,
idents=c(0,1,2,3)
)
My attached packages are as follows:
[1] EnsDb.Mmusculus.v79_2.99.0 ensembldb_2.8.0 AnnotationFilter_1.8.0
[4] GenomicFeatures_1.36.1 AnnotationDbi_1.46.0 Signac_0.1.0
[7] SummarizedExperiment_1.14.0 DelayedArray_0.10.0 BiocParallel_1.17.18
[10] matrixStats_0.54.0 Biobase_2.44.0 GenomicRanges_1.36.0
[13] GenomeInfoDb_1.20.0 IRanges_2.18.1 S4Vectors_0.22.0
[16] BiocGenerics_0.30.0 chromVAR_1.6.0 data.table_1.12.2
[19] dplyr_0.8.1 ggplot2_3.2.0 Seurat_3.0.2
Hi,
I have pooled hg19 + mm10 scATAC-seq data, 10X, CellRanger v3 output. Wanted to run signac MergeWithRegions for two different samples, made with same method. Since the data is mixed-species, I have genome version appended to peak names where "" is replaced with "-" during CreateSeuratObject (peak names e.g. hg19-chr1:10384-10479 & mm10-chr1:10384-10479). So now the "hg19-" and "mm10-" in peak names do not match the "hg19" and "mm10_" in the fragments file (where chromosomes as hg19_chr1 and mm10_chr1).
a <- SubsetData(CN.atac, cells.use = [email protected][which([email protected]$is_mm10_cell_barcode == "1"), ]) - NO SUBSET PRODUCED
Do you have a solution for using such mixed-species samples? And subsetting them easier for integration?
unintegrated <- MergeWithRegions(
object.1 = TR.atac,
object.2 = CN.atac,
assay.1 = "peaks",
assay.2 = "peaks",
sep.1 = c(":", "-"),
sep.2 = c(":", "-")
)
Error in .get_data_frame_col_as_numeric(df, granges_cols[["start"]]) :
some values in the "start" column cannot be turned into numeric values
In addition: Warning message:
Expected 3 pieces. Additional pieces discarded in 224211 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
System configuration
Running on:
Operating System: CentOS Linux 7 (Core)
Version of Python: python3.7
version: R version 3.6.0 (2019-04-26)
Signac * 0.1.5 2019-10-27 [1] Github (timoast/signac@241ad37)
Thank you in advance,
Maria
Hi,
I am confused by NucleosomeSignal function. In the documentation it says: "Computes the ratio of fragments < 147 bp to fragments between 147 bp and 294 bp." When I check the code, I see the following:
mn_ratio <- function(x) { mononucleosome = sum(x[x > min.threshold & x < max.threshold]) nucleosome_free <- sum(x[x <= min.threshold]) return(mononucleosome / nucleosome_free) }
So it calculates ratio of fragments between 147 bp and 294 bp to fragments < 147 bp instead. Is it wrong or am I missing something?
Hi,
How do I go about using signac
with custom genome, for which the EnsDb, and BSgenome is not yet available. Do I need to start with those two first?
Sameet
Hello,
I would like to understand the concepts and functions for scATAC-seq integration vs merging in signac a bit better. My data is very low-dimensional, same cell type, couple of treatments, run on same 10X Chromium run on different lanes (one lane per treatment), sequenced separately and then run through cellranger v3 separately. I would like to 1) Verify that my top peaks (peaks represented in e.g. 50% cells) overlap some existing DNase footprints (little quality control),
2) Compare top peaks between treatment and control. For this I would need to first identify peak regions that were discovered across the treatment and control samples, so let's say I want to get peaks intersect between 2 samples for simplicity here. I have followed the tutorial "scATAC-seq data integration". Integration I think of as finding peaks that intersect at least x bp, taking the coordinates of these peaks, and introducing a new peak coordinate based on the intersecting peaks (after e.g. centering).
MergeWithRegions - merge datasets in a coordinate-aware manner. Does this mean that no shifts in peak coordinates were introduced (meaning no integration)? I first tried this but when looking at the UMAP the two samples clearly separated in 2 clusters, which I do not expect from the biology. I was a bit unexpected, though (since I should have very low tech variation). When running FindTopFeatures (min.cutoff = "q75") on this "unintegrated" object, I do get some 24K peaks - can I expect these peaks to contribute to the technical variance then (since UMAP generates 2 different clusters)?
So I continued with GetIntersectingFeatures which gives me some 97K peaks out of a total of 267K across 2 samples. Peaks are with original coordinates. Here I have a uniform peak set that I could also use for my objective 1) - see how well they agree with DNase footprints. Correct?
I can now continue with subsampling some peaks and running FeatureMatrix, and add a new assay with these subsampled peaks to one of my samples (seurat objects) "# create a new assay and add it to the 10x dataset". Will that seurat object subset be used by FindIntegrationAnchors or IntegrateData, in any other way than the peaks.use subset directly that I specify for the FindIntegrationAnchors? After making the integrated object, I have there 2 samples with subsampled as well as all peaks across both samples.
integrated
An object of class Seurat
267444 features across 5039 samples within 2 assays
Active assay: integrated (10000 features)
1 other assay present: peaks
Can I now use FindTopFeatures to compare the top peaks between the two samples in the integrated object (objective 2)? If so, is this correct:
integrated <- RunTFIDF(integrated, assay = 'peaks')
integrated <- FindTopFeatures(integrated, assay = 'peaks', min.cutoff = "q75")
Since I am not sure it's comparing the peaks between my two samples (two sets of cells with different treatments) and rather across all the cells.
Alternatively, I guess I could use FeatureMatrix -> CreateGeneActivityMatrix for each sample separately using the same set of peaks and compare the peak-associated genes (quantify gene expression ‘activity’ from ATAC-seq reads)?
I hope you can understand my questions and thank you in advance,
Maria
Arabidopsis does not have a ensemblDB library like mouse and human do. Do you have a grasp on how to make such a thing, it is not obvious to me.
stuck at this point:
ene.coords <- genes(EnsDb.Hsapiens.v75, filter = ~ gene_biotype == "protein_coding")
or if you have an easy bypass using a gtf
Dear Dr. Stuart,
I want to integrate Cicero gene activity matrix with the normalized gene expression matrix. Normalization of the gene expression matrix is done by the function (NormalizeData()) in Seurat pipeline.
The gene activity matrix is normalized by Cicero pipeline. Can I integrate both of them (one is normalized by Seurat pipeline and the other is normalized in a different way by Cicero pipeline)? Or I should use the unnormalized gene activity matrix and normalize it by Seurat function (NormalizeData()) and then integrate them by CCA (as in Seurat)?
Thanks
Hi,
I am asking about the function: findMarkers() that detects the differentially expressed genes in Seurat. Does it consider all the genes or the most variable 2000 genes only for selecting the differentially expressed genes?
Thanks
Hi,
To integrate scRNA-seq data and scATAC-seq data, I followed this tutorial: https://satijalab.org/seurat/v3.1/atacseq_integration_vignette.html
However, when I read the methods section of your paper, I found that the gene expression matrix (scRNA-seq) and gene activity matrix (obtained from scATAC-seq) should have the same number of features (genes) and probably in the same order. This information was not mentioned in the tutorial I mentioned above. Does that mean that I need to repeat all the work again but with both matrices having the same number of genes? Is there in Seurat an easy function to perform this step and produce an output compatible with the Seurat objects?
Thanks
while reading in fragments information with command line:
fragment.path <- "/Users/outs/fragments.tsv.bgz"
scatac12wpcsignac <- SetFragments(
object = scatac12wpcsignac,
file = fragment.path
)
got this error:
Error in SetFragments(object = scatac12wpcsignac, file = fragment.path) :
Requested file does not exist or is not indexed
Which fragment file should we use, there is a tsv.bgz and a fragments.tsv.gz.tbi from 10x Cell ranger outputs.
Hi timoast,
Signac is a great tool. Thanks for that!
I would like to extract the normalized accessibility data for all identities as plotted with CoveragePlot. With this data, I would like to know all regions of open chromatin (above a certain threshold) in order to look for cis-acting elements. I tried looking at the functions buildup and came up with the idea to use the cutmat matrix. But I am stuck, because I dont know how to do it for all identities and multiple regions in an efficient way.
Thanks in advance!
Ps. Is a command list of the signac package avialable? Something similar to Seurat's one would be nice https://satijalab.org/seurat/essential_commands.html
I'm trying to align 2 10X datasets that should be very similar.
sci <- brain03
tenx <- brain
At this code, I get an error unless I change the code to the following:
intersecting.regions <- GetIntersectingFeatures(
object.1 = sci,
object.2 = tenx,
sep.1 = c(":", "-"), ## from c("-", "-")
sep.2 = c(":", "-")
)
The error reads as follows:
Error in .get_data_frame_col_as_numeric(df, granges_cols[["end"]]) :
some values in the "end" column cannot be turned into numeric values
In addition: Warning message:
Expected 3 pieces. Missing pieces filled withNA
in 96883 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
By changing the hypen to a colon I can get past that but then I get the following error running this code:
sci_peaks_tenx <- FeatureMatrix(
fragments = GetFragments(object = tenx, assay = 'peaks'),
features = StringToGRanges(peaks.use),
cells = colnames(tenx)
)
Error in .get_data_frame_col_as_numeric(df, granges_cols[["end"]]) :
some values in the "end" column cannot be turned into numeric values
In addition: Warning message:
Expected 3 pieces. Missing pieces filled withNA
in 10000 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
Hi,
This is more of documentation request than issue.
I see that chromVar function is added in latest Signac release.
Can you please provide the workflow to run chromVar through Signac?
Best regards
Sasi
Hi,
Great package for analysing scATAC-seq. I was wondering, is there is a way to export the per cluster coverage as wiggle tracks or bigwig file ? While the CoveragePlot function does the job, this is if we need to view it in UCSC genome browser.
Thanks
Dear all,
I am asking about FindTransferAnchors function,
does it look for the nearest match between scRNA-seq cell and scATAC-seq cells? Or it looks for the nearest match between scRNA-seq clusters and scATAC-seq clusters?
From the output of the function, I believe that it looks for the nearest match between cells, right?
But I noticed that one cell in scRNA-seq may have more than one match in scATAC-seq cells, is that true?
So, to take the two most matching cells from the two datasets together, I should take the cells with the highest score? Is there a threshold for the score to determine the confidence of matching? For example, if the score is > 0.5, can I consider this a confident match?
and for the function TransferData, how does it determine the score of cell type prediction in scATAC-seq cells? I found that the scores here are totally different from the matching scores in FindTransferAnchors function.
Thank you
Hi,
I sometimes find cellular doublets in scATAC-seq dataset like scRNA-seq data. Do you know the way to efficiently remove them like doublet-finder? If the Signac had such kind of function, it would be really useful.
Yoshi
Thanks a lot for developing the package!
I am trying to visualize the motif activity from chromVAR with FeaturePlot using a red (high)-white (0)-blue (low) color scheme as in the last plot in: https://bioconductor.org/packages/release/bioc/vignettes/chromVAR/inst/doc/Introduction.html but when I specify 3 colors, the plot appears with a very different scale (removing values below 1).
How can I keep the full scale but specify 3 colors?
Many thanks,
Elena
When installing signac
devtools::install_github("timoast/signac")
Reported error:
Downloading GitHub repo timoast/signac@master
Skipping 13 packages not available: AnnotationFilter, BiocGenerics, GenomeInfoDb, GenomicFeatures, GenomicRanges, IRanges, Rsamtools, S4Vectors, TFBSTools, ggbio, motifmatchr, patchwork, AnnotationDbi
Installing 3 packages: ggbio, ggseqlogo, patchwork
Error: Failed to install 'Signac' from GitHub:
(converted from warning) packages 'ggbio', 'patchwork' are not available (for R version 3.5.3)
Hi Tim,
Is there any way in Signac where the peak width can be taken as a factor while calculating the differential peaks or in the upstream normalization so that we can minimize the effect of peak width.
Best regards
Sasi
Hey Tim – excited to begin using the package! Thanks for releasing this.
I downloaded the data directly from the vignette, but there was some issues in processing, first encountered in the QC plotting and during filtering (only < 100 samples left using your filters). Downloading the data directly from the 10X website and there was no issues.
The data links are just off a bit? Not sure what the underlying differences are.
Links from Signac
http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_v1_pbmc_10k/atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5
http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_v1_pbmc_10k/atac_v1_pbmc_10k_fragments.tsv.gz
http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_v1_pbmc_10k/atac_v1_pbmc_10k_fragments.tsv.gz.tbi
http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_pbmc_10k_v1/atac_pbmc_10k_v1_singlecell.csv
Links for 10X
http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_pbmc_10k_v1/atac_pbmc_10k_v1_filtered_peak_bc_matrix.h5
http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_pbmc_10k_v1/atac_pbmc_10k_v1_singlecell.csv
http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_pbmc_10k_v1/atac_pbmc_10k_v1_fragments.tsv.gz.tbi
http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_pbmc_10k_v1/atac_pbmc_10k_v1_fragments.tsv.gz
Hi,
I'm trying to currently cluster the scATAC data in Signac based on scRNA clustered data from Seurat. I'm currently following the mouse brain vignette. When I get to this step of the code,
transfer.anchors <- FindTransferAnchors(reference = KO.rna, query = KO, reduction = 'cca', dims = 1:40)
R encounters a fatal error and exits.
Session Info:
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Hi,
this might be a stupid question, but I am confused by the RunSVD implementation. Shouldn't we center the data first before irlba function? Or is it already done implicitly by TF-IDF normalization? And why do you normalize principal components after computing them (cell.embeddings to norm.embeddings)?
Hi,
I'm interested in extracting the peak matrix after QC and preprocessing through Signac.
Is there any way to directly output the normalized peak matrix from:
pbmc <- RunTFIDF(pbmc)
Hi,
My understanding is there are only two states of a chromatin region, either open or closed (1 or 0) in most areas. The different levels of read counts might largely reflect amplification bias rather than true biology. Do you think we need binalize the count matrix before TF-IDF /SVD processing?
Thanks !
Yoshi
Hi, dears
I encountered an error while executing the FindMotifs command.
enriched.motifs <- FindMotifs(object = pbmc,features = head(rownames(da_peaks), 1000))
Selecting background regions to match input sequence characteristics
Matching GC.percent distribution
Error in density.default(x = mf.query[[i]], kernel = "gaussian", bw = 1) :
'x' contains missing values
How to fix this issue?
Best,
Furong
Hi timoast,
Thank you for developing this nice tool. I encountered some problems when doing the motif analysis. Can you give me some ideas about why this happened? Thanks.
Jason
"Features do not match in Assay and Motif object. Subsetting the Motif object."
Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE): invalid character indexing
Traceback:
1. RunChromVAR(object = pbmc, genome = BSgenome.Mmusculus.UCSC.mm10,
. motif.matrix = motif)
2. motif.matrix[rownames(x = peak.matrix), ]
3. `[.Motif`(motif.matrix, rownames(x = peak.matrix), )
4. subset.Motif(x = x, features = i, motifs = j, ...)
5. GetMotifData(object = x, slot = "data")[features, motifs]
6. GetMotifData(object = x, slot = "data")[features, motifs]
7. callGeneric(x, i = i, j = j, drop = TRUE)
8. eval(call, parent.frame())
9. eval(call, parent.frame())
10. x[i = i, j = j, drop = TRUE]
11. x[i = i, j = j, drop = TRUE]
12. subCsp_ij(x, i, j, drop = drop)
13. intI(i, n = d[1], dn[[1]], give.dn = FALSE)
14. stop("invalid character indexing")
Hi,timoast
Thanks for your provide such a useful tool for analysing single cell ATAC data. Recently I used it to analyze mouse data and had problem with the function CoveragePlot():
Error in plot_to_gtable(x) :
Argument needs to be of class "ggplot", "gtable", "grob", "recordedplot", or a function that plots to an R graphicsdevice when called, but is a logical
In addition: Warning message:
`quo_expr()` is deprecated as of rlang 0.2.0.
Please use `quo_squash()` instead.
I also tried the data from the vignette mouse brain, but the problem still exists. I guess maybe something wrong with the version of R or package.
Here is my code for CoveragePlot():
CoveragePlot(
object = FRC.ATAC,
region = rownames(da_peaks)[2],
sep = c(":", "-"),
annotation = EnsDb.Mmusculus.v79,
extend.upstream = 5000 ,
extend.downstream = 5000)
Session info():
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.7 (Maipo)
Matrix products: default
BLAS: /usr/lib64/libblas.so.3.4.2
LAPACK: /usr/lib64/liblapack.so.3.4.2
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] BSgenome.Mmusculus.UCSC.mm10_1.4.0 BSgenome_1.52.0
[3] rtracklayer_1.44.0 Biostrings_2.52.0
[5] XVector_0.24.0 TFBSTools_1.22.0
[7] JASPAR2018_1.1.1 cowplot_0.9.4
[9] reticulate_1.13 ggplot2_3.2.1
[11] RColorBrewer_1.1-2 dplyr_0.8.3
[13] EnsDb.Mmusculus.v79_2.99.0 ensembldb_2.8.0
[15] AnnotationFilter_1.8.0 GenomicFeatures_1.36.2
[17] AnnotationDbi_1.46.0 Biobase_2.44.0
[19] GenomicRanges_1.36.0 GenomeInfoDb_1.20.0
[21] IRanges_2.18.1 S4Vectors_0.22.0
[23] BiocGenerics_0.30.0 Seurat_3.0.2
[25] Signac_0.1.5
Any suggestions will be helpful.
Hello Tim,
I've tried to integrate two 10X scATAC data sets using Seurat/Signac, however when I run the "FindTransferAnchor" function, R session has always been automatically stopped. I tried other computers, but same problem always popped up.
transfer.anchors.WT <- FindTransferAnchors(reference = WT.RNA, query = WT, reduction = 'cca', dims = 1:50)
Looking forward to utilizing Signac!
When I was running through the mouse brain tutorial, I got:
"Error in is(file, "TabixFile") :
object 'fragment_file_filtered' not found"
after running
gene.activities <- FeatureMatrix(
should it be:
gene.activities <- FeatureMatrix(
Also, sorry if I missed this, but where can I download the allen_brain.rds file?
Thanks in advance!
Josh
how do we select cutoff for number of dimensions for RunUMAP?
Hi
I tried to merge two scATAC samples the way we can merge Seurat object using merge function in scRNA-seq. However, as the scATAC is different and we need to merge fragments files as well, can you suggest me the way to merge two Signac object?
Great work so far, really like where this going.
I like to run chromVar and/or findMotifs but seem be stuck at creating the MotifObject and perhaps a few more. Any help in what steps I need to perform to get there?
Hi there,
I wanted to let you know that we've got an unfortunate naming conflict with your new package over at glotzerlab/signac. We just received an issue from one of your users, and I imagine that this problem will crop up again. Do you have any plans to roll your package into your Seurat package at any point? That might alleviate this difficulty in the long term.
Dear Tim,
Thanks for developing very useful tool for scATAC analysis.
In the vignette of "Human PBMCs" and "Adult mouse brain", latent vars = 'peak_region_fragments' was used in finding the differential peaks with FindMarkers function.
da_peaks <- FindMarkers( .......
test.use = 'LR',
latent.vars = 'peak_region_fragments' )
"Motif analysis with Signac" vignette, latent vars = 'nCount_peaks' was adopted in da_peaks selections.
da_peaks <- FindMarkers(.......
test.use = 'LR',
latent.vars = 'nCount_peaks')
I am wondering there are some reasons to select "nCount_peaks" in this situation?
Best,
Yoshi
Hi, I am having a recurring problem with CCA when I try to integrate scATAC and scRNAseq data. This occurs both with my own data and the data provided in the vignette here: https://satijalab.org/signac/articles/pbmc_vignette.html
All goes well until the "FindTransferAnchors" command, when CCA throws an error as such:
> transfer.anchors <- FindTransferAnchors(
+ reference = pbmc_rna,
+ query = pbmc,
+ reduction = 'cca'
+ )
Running CCA
*** caught segfault ***
address (nil), cause 'memory not mapped'
Traceback:
1: Standardize(mat = object1, display_progress = FALSE)
2: RunCCA.default(object1 = data1, object2 = data2, standardize = TRUE, num.cc = num.cc, verbose = verbose, )
3: RunCCA(object1 = data1, object2 = data2, standardize = TRUE, num.cc = num.cc, verbose = verbose, )
4: RunCCA.Seurat(object1 = reference, object2 = query, features = features, num.cc = max(dims), renormalize = FALSE, rescale = FALSE, verbose = verbose)
5: RunCCA(object1 = reference, object2 = query, features = features, num.cc = max(dims), renormalize = FALSE, rescale = FALSE, verbose = verbose)
6: FindTransferAnchors(reference = pbmc_rna, query = pbmc, reduction = "cca")
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
I have also tried using pcaproject instead of CCA for reduction, but get the following error.
Error in if (is.numeric(x = feature.mean) && feature.mean != "SCT") { :
missing value where TRUE/FALSE needed
Any suggestions would be appreciated! I am running R version 3.6.1 but have also tried in R version 3.5.2. Please let me know if any other information would be useful.
Many thanks,
Nathan
Hi,
Thank you for developing and providing an excellent tool for epigenetic research. I am really impressed by your developing this useful package.
I am wondering how to integrate several scATAC-seq datasets when there are batch effects (like different mice or human samples). Integration of gene activity matrices in Seurat standard workflow might be possible but I feel it is not so good idea. Could you have any idea or recommendation to decrease the batch effects in TF-IDF or SVD steps?
Hello,
it's a powerful tool for analysing single cell ATAC data. And recently I used it to analyse mouse data by imporing Mus references. Unfortunately, when I was converting the chromosome names to gene names, there is an error info:
Warning: Non-unique features (rownames) present in the input matrix, making unique
Error: Feature names of counts matrix cannot be empty
And my code is as following:
gene.coords <- genes(EnsDb.Mmusculus.v79, filter = ~ gene_biotype == "protein_coding")
seqlevelsStyle(gene.coords) <- 'UCSC'
genebody.coords <- keepStandardChromosomes(gene.coords, pruning.mode = 'coarse')
genebodyandpromoter.coords <- Extend(x = gene.coords, upstream = 2000, downstream = 0)
# create a gene by cell matrix
gene.activities <- FeatureMatrix(fragments = fragment.path,features = genebodyandpromoter.coords,cells = colnames(ON_A1),chunk = 10)
# convert rownames from chromsomal coordinates into gene names
gene.key <- genebodyandpromoter.coords$gene_name
names(gene.key) <- GRangesToString(grange = genebodyandpromoter.coords)
rownames(gene.activities) <- gene.key[rownames(gene.activities)]
# add the gene activity matrix to the Seurat object as a new assay, and normalize it
ON_A1[['RNA']] <- CreateAssayObject(counts = gene.activities)
Warning: Non-unique features (rownames) present in the input matrix, making unique
Error: Feature names of counts matrix cannot be empty
And I guess maybe I need to filter duplicated genes firstly, so I check the duplicated genes:
rownames(gene.activities)[duplicated(rownames(gene.activities))]
chr1-55300069-55342589 chr1-133655879-133662885 chr11-88971949-88990742
"" "Zbed6" "Coil"
chr11-114924893-114936386 chr14-8088078-8090554 chr15-10998817-11027939
"Cd300lb" "Rpp14" "Slc45a2"
chr15-58028243-58078425 chr15-81000628-81011736 chr15-100578273-100586075
"9130401M01Rik" "Sgsm3" "Pou6f1"
chr18-36937205-37187657 chr18-37008712-37187657 chr18-37018149-37187657
"PCDHA9" "PCDHA9" "PCDHA9"
chr18-37088013-37187657 chr18-37141503-37187657 chr18-37723706-37841873
"PCDHA9" "PCDHA9" "PCDHGB6"
chr18-37753731-37841873 chr18-37804364-37841873 chr18-78755215-79089015
"PCDHGB6" "PCDHGB6" "Setbp1"
chr2-60420013-60555130 chr2-152224839-152281852 chr3-34077454-34083222
"Pla2r1" "Csnk2a1" "Dnajc19"
chr3-35895620-35934787 chr3-54691105-54716837 chr3-105868907-105924036
"Dcun1d1" "Supt20" "Adora3"
chr3-135283566-135304221 chr4-88643816-88646385 chr5-138259658-138266046
"Bdh2" "Ifna13" "BC037034"
chr6-121007241-121083609 chr6-131541784-131553705 chr7-23982065-23985048
"Mical3" "5430401F13Rik" "Vmn1r90"
chr7-24860278-24862412 chr7-29906223-29916813 chr7-43425670-43435991
"Gm9844" "" "Lim2"
chr7-43428088-43435991 chr7-59982495-60142219 chr7-139638778-139685817
"Lim2" "Snrpn" "Cfap46"
chr8-58427494-58913627 chr9-109976869-110081742 chr9-110084393-110103026
"Galntl6" "Map4" "DHX30"
chrX-75410444-75418588
"Mtcp1"
All of above are duplicated gene names. But I don't know exactly how to filter them. And Also, I find the NULL Feature names chr1-55300069-55342589 ""
in this rownames, shuold I filter them dierectly or replace it by others?
And I continue the workflow after filtering the NULL Feature names. But I was thinking if the key gene was improtant and I wanted to project it or visualise the gene expression by violin plot, which gene would be used or just calculated its average ?
Sincerely,
Thanks
Hi,
First of all ,congratulations on the new package for scATAC.
I was trying to calculate the following on my own dataset
brain$pct_reads_in_peaks <- brain$peak_region_fragments / brain$total * 100
And i got the following message
Error in
[[.Seurat(x, i, drop = TRUE) : Cannot find 'total' in this Seurat object
Is the total calculated when CreateSeuratObject is run?
I was not able to find how it is calculated and it was missing for my dataset.
Best regards
Sasi
Under QC for your tutorial (for both mouse brain and PBMCs) you compute percent reads in peaks with the following
pbmc$pct_reads_in_peaks <- pbmc$peak_region_fragments / pbmc$total * 100
Based on reading about the singlecell.csv file output by cell ranger (https://support.10xgenomics.com/single-cell-atac/software/pipelines/latest/output/singlecell) I believe that this line should instead be
pbmc$pct_reads_in_peaks <- pbmc$peak_region_fragments / pbmc$passed_filters * 100
The total refers to all read pairs, which includes duplicates and other non-usable reads. The passed_filters should be more accurate as these don't include duplicate and low-quality reads. Using the total will give artificially low values for the percent of reads in peaks.
Overall, thanks for making this package! I've been impressed with Seurat for single cell RNA and am looking forward to using tools from your group for single cell ATAC as well.
Hi I am merging the two object with MergeWithRegions,
and get this error:
Error in .get_data_frame_col_as_numeric(df, granges_cols[["end"]]) :
some values in the "end" column cannot be turned into numeric values
In addition: Warning message:
Expected 3 pieces. Missing pieces filled with `NA` in 218594 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
How could I fix this? Thank you!
Hi Tim,
I have 2 samples of different treatment conditions merged. I would like to do differential peak analysis on this merged object. Using instructions from the FAQ (https://satijalab.org/signac/articles/faq.html), I merged fragments from my 2 samples and then utilized FeatureMatrix to generate a feature x cell matrix, with unified peaks. Weirdly the number of cells decreases after I generate the matrix. Here is what I have done so far:
library(Signac)
library(Seurat)
library(GenomeInfoDb)
library(EnsDb.Mmusculus.v79)
library(ggplot2)
set.seed(1234)
#read Sham files and create Sham object
counts_sham <- Read10X_h5("/Documents/Sham/outs/filtered_peak_bc_matrix.h5")
metadata_sham <- read.csv(
file = "/Documents/Sham/outs/singlecell.csv",header = TRUE,
row.names = 1
)
spinalcord_sham <- CreateSeuratObject(
counts = counts_sham,
assay = 'peaks',
project = 'sham',
min.cells = 1,
meta.data = metadata_sham
)
"Warning message:
In CreateSeuratObject(counts = counts_sham, assay = "peaks", project = "sham", :
Some cells in meta.data not present in provided counts matrix."
#read SNI files and create SNI object
counts_SNI <- Read10X_h5("/Documents/SNI/outs/filtered_peak_bc_matrix.h5")
metadata_SNI <- read.csv(
file = "/Documents/SNI/outs/singlecell.csv",header = TRUE,
row.names = 1
)
spinalcord_SNI <- CreateSeuratObject(
counts = counts_SNI,
assay = 'peaks',
project = 'SNI',
min.cells = 1,
meta.data = metadata_SNI
)
"Warning message:
In CreateSeuratObject(counts = counts_sham, assay = "peaks", project = "sham", :
Some cells in meta.data not present in provided counts matrix."
spinalcord_sham
"An object of class Seurat
72986 features across 1930 samples within 1 assay
Active assay: peaks (72986 features)"
spinalcord_SNI
"An object of class Seurat
75997 features across 5665 samples within 1 assay
Active assay: peaks (75997 features)"
#merge objects
merged.obj <- MergeWithRegions(
object.1 = spinalcord_sham,
object.2 = spinalcord_SNI,
sep.1 = c(":", "-"),
sep.2 = c(":", "-")
)
merged.obj
"An object of class Seurat
66013 features across 7595 samples within 1 assay
Active assay: peaks (66013 features)"
#set fragment path for merged fragment file (generated with bgzip and tabix)
fragment.path <- "fragments_merged.tsv.gz"
merged.obj <- SetFragments(object = merged.obj, file = fragment.path)
#extract gene coordinates from Ensembl, and ensure name formatting is consistent with Seurat object
gene.coords <- genes(EnsDb.Mmusculus.v79, filter = ~ gene_biotype == "protein_coding")
seqlevelsStyle(gene.coords) <- 'UCSC'
genebody.coords <- keepStandardChromosomes(gene.coords, pruning.mode = 'coarse')
genebodyandpromoter.coords <- Extend(x = gene.coords, upstream = 2000, downstream = 0)
#create a gene by cell matrix
gene.activities <- FeatureMatrix(
fragments = fragment.path,
features = genebodyandpromoter.coords,
cells = colnames(merged.obj),
chunk = 10
)
"Extracting reads overlapping genomic regions
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=13m 20s
Constructing matrix"
#qc
merged.obj <- NucleosomeSignal(object = merged.obj)
merged.obj$pct_reads_in_peaks <- merged.obj$peak_region_fragments / merged.obj$passed_filters * 100
"Error in names(data.return) <- rep.int(x = colnames(x = x), times = length(x = i)) :
'names' attribute [7595] must be the same length as the vector [7583]"
#I checked the dimensions for gene.activities and the number of cells is less than the merged object
dim(gene.activities)
[1] 21532 7583
It looks like I'm losing cells after the gene coordinate extraction step, decreasing from 7595 to 7583. Please advise.
Thanks,
Lipin
The function CreateMotifMatrix
has an option "pwm" that expects:
"A PFMatrixList object containing position weight matrices to use".
Since "pwm" means position weight matrix and it is different than "pfm" - which is position frequency matrix -, what does the function exactly expects? A pfm or a pwm?
Also in the vignette, you wrote that you are getting the pwm's but you are actually getting pfm's.
I am confused.
Hi there,
I am trying the function PeriodPlot but get the following error:
pbmc$nucleosome_group <- ifelse(pbmc$nucleosome_signal > 10, 'NS > 10', 'NS < 10')
PeriodPlot(object = pbmc, group.by = 'nucleosome_group')
Error: At least one layer must contain all faceting variables:group
.
group
group
what should I do to solve this problem?
Thanks in advance
Hi Tim,
The enrichment of Tn5 integration events at a specific motif, like TSS enrichment visualization in the vignette, would be very visually effective to show the motif activity in each cell-type (like Nat Biotechnol. 2019 Aug;37(8):925-936, Fig.2F). I guess it can be done using or modifying TSSenrichment / TSSPlot function...How do you think we can do this? I would be grateful if you could add it to the function or vignette when you have time.
Thanks!
Yoshi
Hi Tim,
Thanks for developing very useful tool for scATAC analysis.
In the vignette of "Motif analysis", enriched motifs are evaluated on the differential peaks of Pvalb compared with Sst.
da_peaks <- FindMarkers(
object = mouse_brain,
ident.1 = 'Pvalb',
ident.2 = 'Sst',
only.pos = TRUE,
test.use = 'LR',
latent.vars = 'nCount_peaks')
enriched.motifs <- FindMotifs(
object = mouse_brain,
features = rownames(da_peaks[da_peaks$p_val < 0.01, ]))
When I tried similarly motif analysis on these of Sst (that is, closed motifs in Pvalb compared to Sst) I saw many overlapping motifs on the both differentially active motif lists for Pvalb and Sst (MEF2s, SP1/2, PLAG2 etc...).
da_peaks <- FindMarkers(
object = mouse_brain,
ident.1 = 'Sst',
ident.2 = 'Pvalb',
only.pos = TRUE,
test.use = 'LR',
latent.vars = 'nCount_peaks')
enriched.motifs <- FindMotifs(
object = mouse_brain,
features = rownames(da_peaks[da_peaks$p_val < 0.01, ]))
These results makes identifying differentially-active motifs between cell types difficult - Do you have any suggestions for dealing with this problem? Using chromvar motif analysis might be simplest solution.
Thanks,
Yoshi
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.