stuart-lab / signac Goto Github PK

View Code? Open in Web Editor NEW

307.0 307.0 84.0 11.01 MB

R toolkit for the analysis of single-cell chromatin data

Home Page: https://stuartlab.org/signac/

License: Other

R 96.01% Dockerfile 0.06% C++ 3.93%

atac bioinformatics single-cell

signac's People

Contributors

Stargazers

Watchers

Forkers

galib36 becavin-lab zorrodong amz965 mengchengyao austin-s-h puriney nirhad gbloeb k3yavi lyuguol dragonmasterx87 fl-yu garrobocel hoondy saketkc yueyuxiaoyang tkisss matthieurouland bit-vs-it paulasoler f6v sqsun logstar jing-xinxing kmwinkley fmjlab maxim-h jingxinguo pacomito aanderson54 amsszlh jclinchai sqjin minghao2016 mluciarr wwjiaxuan tobinbv17 beginner984 iandriver 2015pp shiwei23 standardgalactic leifwigge leonrunning magcurly jmacdon alsmith151 wisren alexander-7 orions014 xxbb79 geneticresources jeffmgranja mdmorelli joschif wangyong98 aminnaghdloo rnaimehaom simrit1 marzamki liangdp1984 vivischuch qindan2008 yamajackr kur1sutaru colin986 gesmira nrockweiler paplomatasp tbouderlique baboon61 twmccart helianfeixing chancen-king ohoelske federicomarini bschilder tliu76 zm19911213 preddykolan19 liz-is settylab

signac's Issues

Normalization of aggregated dataset in CellrangerATAC

Hi Tim,
People would want to analyze the aggregated scATAC-seq data for their actual samples by Signac. 10xCellrangerATAC offers "aggr" function, and it normalizes the data by subsampling fragments from higher-depth GEM wells until they all have an equal number of unique fragments per cell by default. Signac also normalizes the data by TF-IDF. Do you think we should omit the normalization done by CellrangerATAC "aggr"?
Thanks!
Yoshi

running signac on a cell ranger aggr object

Is it possible to run signac on cellranger aggr 10x file?

Combining Coverage Plots

Hi,

I was wondering if there was a simple way to put two coverage plots from two conditions and compare them side by side by scaling them the same way. I'm not really aware of how cowplot works or how I'd be able to get the CoveragePlot function with the same y axis limit as another one to compare them.

Thanks!

ClosestFeature

Hello!
I have been having troubles using the function ClosestFeature. I input a set of peak regions found to be differentially activated in cell clusters but i get the following error:

peaks
[1] "chr9-25677250-25678875" "chr2-219252833-2192552861"
[3] "chr1-201538135-201540645" "chr6-33570673-33571808"
[5] "chr2-96536348-965394011" "chr3-194002751-194004173"
[7] "chr12-55846923-55849266" "chr11-111734956-111736614"
[9] "chr9-130845807-130848326" "chr10-3172019-31735621"
[11] "chr7-108001168-108004822"
ClosestFeature (region= peaks, annotation = genes(TxDb.Hsapiens.UCSC.hg38.knownGene), sep=c('-','-'))
Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") :
In range 2: at least two out of 'start', 'end', and 'width', must
be supplied.
In addition: Warning message:
In .normargSEW0(end, "end") : NAs introduced by coercion to integer range

If i remove peaks region two then it works fine, but i still would like what is happening there.

peaks <- peaks[-2]
ClosestFeature (region= peaks, annotation = genes(TxDb.Hsapiens.UCSC.hg38.knownGene), sep=c('-',' gene_id region distance
286319 286319 chr9-25676389-25678440 0
107985246 107985246 chr1-201507241-201534784 3350
449520 449520 chr6-33540046-33589026 0
9936 9936 chr2-159768628-159798255 0
647323 647323 chr3-193957372-194003760 0
4327 4327 chr12-55835433-55842966 3956
5519 5519 chr11-111726908-111766389 0
25 25 chr9-130713016-130887675 0
9712 9712 chr10-11453946-11611754 0
3912 3912 chr7-107923799-108003187 0

However, why the peak regions are all different from the ones i input?

Thanks a lot!

CoveragePlot() No cells present in the requested region error

In using CoveragePlot to visualize pseudobulk profiles by cluster, the function is currently unable to find cells in requested regions that are on any double digit chromosomes ( those > chr 9) for mouse mm10 aligned Cell Ranger Processed data. The regions I am trying to plot come from a differential accessibility analysis where I know cells have signal there (can be visualized via VlnPlot or FeaturePlot). The plotting works fine for any regions on chromosomes 1:9, but not the others, so I don't know if its an issue with CoveragePlot or the annotation.

The function call looks like this:

CoveragePlot(
object = seurat.signac.object,
region = region1,
sep = c(":","-"),
annotation = EnsDb.Mmusculus.v79,
extend.upstream = 5000,
extend.downstream = 5000,
ncol = 1,
idents=c(0,1,2,3)
)

My attached packages are as follows:

[1] EnsDb.Mmusculus.v79_2.99.0 ensembldb_2.8.0 AnnotationFilter_1.8.0
[4] GenomicFeatures_1.36.1 AnnotationDbi_1.46.0 Signac_0.1.0
[7] SummarizedExperiment_1.14.0 DelayedArray_0.10.0 BiocParallel_1.17.18
[10] matrixStats_0.54.0 Biobase_2.44.0 GenomicRanges_1.36.0
[13] GenomeInfoDb_1.20.0 IRanges_2.18.1 S4Vectors_0.22.0
[16] BiocGenerics_0.30.0 chromVAR_1.6.0 data.table_1.12.2
[19] dplyr_0.8.1 ggplot2_3.2.0 Seurat_3.0.2

Subsetting data based on metadata & mixed species samples

Hi,

I have pooled hg19 + mm10 scATAC-seq data, 10X, CellRanger v3 output. Wanted to run signac MergeWithRegions for two different samples, made with same method. Since the data is mixed-species, I have genome version appended to peak names where "" is replaced with "-" during CreateSeuratObject (peak names e.g. hg19-chr1:10384-10479 & mm10-chr1:10384-10479). So now the "hg19-" and "mm10-" in peak names do not match the "hg19" and "mm10_" in the fragments file (where chromosomes as hg19_chr1 and mm10_chr1).

I made new species-specific fragment files without genome version appended to chrom names. Then changed the Dimnames to also not have "hg19-" and "mm10-" in chrom names: siC.D3.atac$peaks@counts@Dimnames[[1]] <- gsub("mm10-","", siC.D3.atac$peaks@counts@Dimnames[[1]])
So now all I would need to do is subset the mouse data from the seurat object. Tried to use the metadata is_mm10_cell_barcode for that (since it's nicely incorporated when creating the object) - no error but also no subset taken:

a <- SubsetData(CN.atac, cells.use = [email protected][which([email protected]$is_mm10_cell_barcode == "1"), ]) - NO SUBSET PRODUCED

Tried to merge with MergeWithRegions the two samples without separating the mouse and human samples (both samples have hg19+mm10). I think there is a problem here with
sep.1 = c(":", "-"),
sep.2 = c(":", "-")
since I would need sep to also include "hg19_" and "mm10_" .
Also, the fragment file has "hg19_" whereas seurat object was changed to "hg19-" so that's also a problem, I guess.

Do you have a solution for using such mixed-species samples? And subsetting them easier for integration?

unintegrated <- MergeWithRegions(
object.1 = TR.atac,
object.2 = CN.atac,
assay.1 = "peaks",
assay.2 = "peaks",
sep.1 = c(":", "-"),
sep.2 = c(":", "-")
)

Error in .get_data_frame_col_as_numeric(df, granges_cols[["start"]]) :
some values in the "start" column cannot be turned into numeric values
In addition: Warning message:
Expected 3 pieces. Additional pieces discarded in 224211 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
System configuration

Running on:

Operating System: CentOS Linux 7 (Core)
Version of Python: python3.7
version: R version 3.6.0 (2019-04-26)
Signac * 0.1.5 2019-10-27 [1] Github (timoast/signac@241ad37)

Thank you in advance,
Maria

NucleosomeSignal clarification

Hi,

I am confused by NucleosomeSignal function. In the documentation it says: "Computes the ratio of fragments < 147 bp to fragments between 147 bp and 294 bp." When I check the code, I see the following:

mn_ratio <- function(x) { mononucleosome = sum(x[x > min.threshold & x < max.threshold]) nucleosome_free <- sum(x[x <= min.threshold]) return(mononucleosome / nucleosome_free) }

So it calculates ratio of fragments between 147 bp and 294 bp to fragments < 147 bp instead. Is it wrong or am I missing something?

Custom genome

Hi,

How do I go about using signac with custom genome, for which the EnsDb, and BSgenome is not yet available. Do I need to start with those two first?

Sameet

scATAC-seq integration vs merging

Hello,

I would like to understand the concepts and functions for scATAC-seq integration vs merging in signac a bit better. My data is very low-dimensional, same cell type, couple of treatments, run on same 10X Chromium run on different lanes (one lane per treatment), sequenced separately and then run through cellranger v3 separately. I would like to 1) Verify that my top peaks (peaks represented in e.g. 50% cells) overlap some existing DNase footprints (little quality control),
2) Compare top peaks between treatment and control. For this I would need to first identify peak regions that were discovered across the treatment and control samples, so let's say I want to get peaks intersect between 2 samples for simplicity here. I have followed the tutorial "scATAC-seq data integration". Integration I think of as finding peaks that intersect at least x bp, taking the coordinates of these peaks, and introducing a new peak coordinate based on the intersecting peaks (after e.g. centering).

MergeWithRegions - merge datasets in a coordinate-aware manner. Does this mean that no shifts in peak coordinates were introduced (meaning no integration)? I first tried this but when looking at the UMAP the two samples clearly separated in 2 clusters, which I do not expect from the biology. I was a bit unexpected, though (since I should have very low tech variation). When running FindTopFeatures (min.cutoff = "q75") on this "unintegrated" object, I do get some 24K peaks - can I expect these peaks to contribute to the technical variance then (since UMAP generates 2 different clusters)?

So I continued with GetIntersectingFeatures which gives me some 97K peaks out of a total of 267K across 2 samples. Peaks are with original coordinates. Here I have a uniform peak set that I could also use for my objective 1) - see how well they agree with DNase footprints. Correct?

I can now continue with subsampling some peaks and running FeatureMatrix, and add a new assay with these subsampled peaks to one of my samples (seurat objects) "# create a new assay and add it to the 10x dataset". Will that seurat object subset be used by FindIntegrationAnchors or IntegrateData, in any other way than the peaks.use subset directly that I specify for the FindIntegrationAnchors? After making the integrated object, I have there 2 samples with subsampled as well as all peaks across both samples.

integrated
An object of class Seurat
267444 features across 5039 samples within 2 assays
Active assay: integrated (10000 features)
1 other assay present: peaks

Can I now use FindTopFeatures to compare the top peaks between the two samples in the integrated object (objective 2)? If so, is this correct:
integrated <- RunTFIDF(integrated, assay = 'peaks')
integrated <- FindTopFeatures(integrated, assay = 'peaks', min.cutoff = "q75")

Since I am not sure it's comparing the peaks between my two samples (two sets of cells with different treatments) and rather across all the cells.

Alternatively, I guess I could use FeatureMatrix -> CreateGeneActivityMatrix for each sample separately using the same set of peaks and compare the peak-associated genes (quantify gene expression ‘activity’ from ATAC-seq reads)?

I hope you can understand my questions and thank you in advance,
Maria

Non Ensdb genomes

Arabidopsis does not have a ensemblDB library like mouse and human do. Do you have a grasp on how to make such a thing, it is not obvious to me.

stuck at this point:
ene.coords <- genes(EnsDb.Hsapiens.v75, filter = ~ gene_biotype == "protein_coding")

or if you have an easy bypass using a gtf

Integrating Cicero gene activity with gene expression data

Dear Dr. Stuart,

I want to integrate Cicero gene activity matrix with the normalized gene expression matrix. Normalization of the gene expression matrix is done by the function (NormalizeData()) in Seurat pipeline.

The gene activity matrix is normalized by Cicero pipeline. Can I integrate both of them (one is normalized by Seurat pipeline and the other is normalized in a different way by Cicero pipeline)? Or I should use the unnormalized gene activity matrix and normalize it by Seurat function (NormalizeData()) and then integrate them by CCA (as in Seurat)?

Thanks

About the function findMarkers

Hi,

I am asking about the function: findMarkers() that detects the differentially expressed genes in Seurat. Does it consider all the genes or the most variable 2000 genes only for selecting the differentially expressed genes?

Thanks

About integration of scRNA-seq and scATAC-seq data

Hi,

To integrate scRNA-seq data and scATAC-seq data, I followed this tutorial: https://satijalab.org/seurat/v3.1/atacseq_integration_vignette.html

However, when I read the methods section of your paper, I found that the gene expression matrix (scRNA-seq) and gene activity matrix (obtained from scATAC-seq) should have the same number of features (genes) and probably in the same order. This information was not mentioned in the tutorial I mentioned above. Does that mean that I need to repeat all the work again but with both matrices having the same number of genes? Is there in Seurat an easy function to perform this step and produce an output compatible with the Seurat objects?

Thanks

Issue of fragments reading in

while reading in fragments information with command line:

fragment.path <- "/Users/outs/fragments.tsv.bgz"

scatac12wpcsignac <- SetFragments(
object = scatac12wpcsignac,
file = fragment.path
)

got this error:

Error in SetFragments(object = scatac12wpcsignac, file = fragment.path) :
Requested file does not exist or is not indexed

Which fragment file should we use, there is a tsv.bgz and a fragments.tsv.gz.tbi from 10x Cell ranger outputs.

How to excess data plotted with CoveragePlot

Hi timoast,
Signac is a great tool. Thanks for that!
I would like to extract the normalized accessibility data for all identities as plotted with CoveragePlot. With this data, I would like to know all regions of open chromatin (above a certain threshold) in order to look for cis-acting elements. I tried looking at the functions buildup and came up with the idea to use the cutmat matrix. But I am stuck, because I dont know how to do it for all identities and multiple regions in an efficient way.

Thanks in advance!

Ps. Is a command list of the signac package avialable? Something similar to Seurat's one would be nice https://satijalab.org/seurat/essential_commands.html

10X sample alignment

I'm trying to align 2 10X datasets that should be very similar.
sci <- brain03
tenx <- brain

At this code, I get an error unless I change the code to the following:

intersecting.regions <- GetIntersectingFeatures(
object.1 = sci,
object.2 = tenx,
sep.1 = c(":", "-"), ## from c("-", "-")
sep.2 = c(":", "-")
)

The error reads as follows:

Error in .get_data_frame_col_as_numeric(df, granges_cols[["end"]]) :
some values in the "end" column cannot be turned into numeric values
In addition: Warning message:
Expected 3 pieces. Missing pieces filled with NA in 96883 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

By changing the hypen to a colon I can get past that but then I get the following error running this code:

sci_peaks_tenx <- FeatureMatrix(
fragments = GetFragments(object = tenx, assay = 'peaks'),
features = StringToGRanges(peaks.use),
cells = colnames(tenx)
)
Error in .get_data_frame_col_as_numeric(df, granges_cols[["end"]]) :
some values in the "end" column cannot be turned into numeric values
In addition: Warning message:
Expected 3 pieces. Missing pieces filled with NA in 10000 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

Help with chromvar

Hi,
This is more of documentation request than issue.
I see that chromVar function is added in latest Signac release.

Can you please provide the workflow to run chromVar through Signac?

Best regards
Sasi

Error in SetFragments(object = pbmc, file = fragment.path) : could not find function "SetFragments"

Exporting data as bigwigs

Hi,

Great package for analysing scATAC-seq. I was wondering, is there is a way to export the per cluster coverage as wiggle tracks or bigwig file ? While the CoveragePlot function does the job, this is if we need to view it in UCSC genome browser.

Thanks

About FindTransferAnchors function

Dear all,

I am asking about FindTransferAnchors function,

does it look for the nearest match between scRNA-seq cell and scATAC-seq cells? Or it looks for the nearest match between scRNA-seq clusters and scATAC-seq clusters?

From the output of the function, I believe that it looks for the nearest match between cells, right?

But I noticed that one cell in scRNA-seq may have more than one match in scATAC-seq cells, is that true?

So, to take the two most matching cells from the two datasets together, I should take the cells with the highest score? Is there a threshold for the score to determine the confidence of matching? For example, if the score is > 0.5, can I consider this a confident match?

and for the function TransferData, how does it determine the score of cell type prediction in scATAC-seq cells? I found that the scores here are totally different from the matching scores in FindTransferAnchors function.

Thank you

Doublet in snATAC-seq

Hi,
I sometimes find cellular doublets in scATAC-seq dataset like scRNA-seq data. Do you know the way to efficiently remove them like doublet-finder? If the Signac had such kind of function, it would be really useful.
Yoshi

Changing color scheme (3 colors) when visualizing ChromVAR activity

Thanks a lot for developing the package!

I am trying to visualize the motif activity from chromVAR with FeaturePlot using a red (high)-white (0)-blue (low) color scheme as in the last plot in: https://bioconductor.org/packages/release/bioc/vignettes/chromVAR/inst/doc/Introduction.html but when I specify 3 colors, the plot appears with a very different scale (removing values below 1).
How can I keep the full scale but specify 3 colors?

Many thanks,

Elena

could not install signac

When installing signac

devtools::install_github("timoast/signac")

Reported error:

Downloading GitHub repo timoast/signac@master
Skipping 13 packages not available: AnnotationFilter, BiocGenerics, GenomeInfoDb, GenomicFeatures, GenomicRanges, IRanges, Rsamtools, S4Vectors, TFBSTools, ggbio, motifmatchr, patchwork, AnnotationDbi
Installing 3 packages: ggbio, ggseqlogo, patchwork
Error: Failed to install 'Signac' from GitHub:
(converted from warning) packages 'ggbio', 'patchwork' are not available (for R version 3.5.3)

Dependency of Peak Significance on Peak width

Hi Tim,
Is there any way in Signac where the peak width can be taken as a factor while calculating the differential peaks or in the upstream normalization so that we can minimize the effect of peak width.

Best regards
Sasi

Data link discrepancy in PBMC vignette

Hey Tim – excited to begin using the package! Thanks for releasing this.

I downloaded the data directly from the vignette, but there was some issues in processing, first encountered in the QC plotting and during filtering (only < 100 samples left using your filters). Downloading the data directly from the 10X website and there was no issues.
The data links are just off a bit? Not sure what the underlying differences are.

Links from Signac
http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_v1_pbmc_10k/atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5
http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_v1_pbmc_10k/atac_v1_pbmc_10k_fragments.tsv.gz
http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_v1_pbmc_10k/atac_v1_pbmc_10k_fragments.tsv.gz.tbi
http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_pbmc_10k_v1/atac_pbmc_10k_v1_singlecell.csv

Links for 10X
http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_pbmc_10k_v1/atac_pbmc_10k_v1_filtered_peak_bc_matrix.h5
http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_pbmc_10k_v1/atac_pbmc_10k_v1_singlecell.csv
http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_pbmc_10k_v1/atac_pbmc_10k_v1_fragments.tsv.gz.tbi
http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_pbmc_10k_v1/atac_pbmc_10k_v1_fragments.tsv.gz

FindTransferAnchors causes R fatal error

Hi,

I'm trying to currently cluster the scATAC data in Signac based on scRNA clustered data from Seurat. I'm currently following the mouse brain vignette. When I get to this step of the code,

transfer.anchors <- FindTransferAnchors(reference = KO.rna, query = KO, reduction = 'cca', dims = 1:40)

R encounters a fatal error and exits.

Session Info:

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

RunSVD. Centering data before doing SVD?

Hi,

this might be a stupid question, but I am confused by the RunSVD implementation. Shouldn't we center the data first before irlba function? Or is it already done implicitly by TF-IDF normalization? And why do you normalize principal components after computing them (cell.embeddings to norm.embeddings)?

How to extract the normalized peak matrix?

Hi,

I'm interested in extracting the peak matrix after QC and preprocessing through Signac.
Is there any way to directly output the normalized peak matrix from:
pbmc <- RunTFIDF(pbmc)

Binalization needed for the count matrix?

Hi,
My understanding is there are only two states of a chromatin region, either open or closed (1 or 0) in most areas. The different levels of read counts might largely reflect amplification bias rather than true biology. Do you think we need binalize the count matrix before TF-IDF /SVD processing?
Thanks !
Yoshi

'x' contains missing values---FindMotifs

Hi, dears
I encountered an error while executing the FindMotifs command.

enriched.motifs <- FindMotifs(object = pbmc,features = head(rownames(da_peaks), 1000))
Selecting background regions to match input sequence characteristics
Matching GC.percent distribution
Error in density.default(x = mf.query[[i]], kernel = "gaussian", bw = 1) :
  'x' contains missing values

How to fix this issue?

Best,
Furong

Error when running RunChromVAR

Hi timoast,

Thank you for developing this nice tool. I encountered some problems when doing the motif analysis. Can you give me some ideas about why this happened? Thanks.

Jason

There is this warning when I run AddMotifObject:
"Features do not match in Assay and Motif object. Subsetting the Motif object."
I saw this error next when I run runChromVAR:

Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE): invalid character indexing
Traceback:

1. RunChromVAR(object = pbmc, genome = BSgenome.Mmusculus.UCSC.mm10, 
 .     motif.matrix = motif)
2. motif.matrix[rownames(x = peak.matrix), ]
3. `[.Motif`(motif.matrix, rownames(x = peak.matrix), )
4. subset.Motif(x = x, features = i, motifs = j, ...)
5. GetMotifData(object = x, slot = "data")[features, motifs]
6. GetMotifData(object = x, slot = "data")[features, motifs]
7. callGeneric(x, i = i, j = j, drop = TRUE)
8. eval(call, parent.frame())
9. eval(call, parent.frame())
10. x[i = i, j = j, drop = TRUE]
11. x[i = i, j = j, drop = TRUE]
12. subCsp_ij(x, i, j, drop = drop)
13. intI(i, n = d[1], dn[[1]], give.dn = FALSE)
14. stop("invalid character indexing")

Error in function CoveragePlot()

Hi,timoast
Thanks for your provide such a useful tool for analysing single cell ATAC data. Recently I used it to analyze mouse data and had problem with the function CoveragePlot():

Error in plot_to_gtable(x) : 
  Argument needs to be of class "ggplot", "gtable", "grob", "recordedplot", or a function that plots to an R graphicsdevice when called, but is a logical
In addition: Warning message:
`quo_expr()` is deprecated as of rlang 0.2.0.
Please use `quo_squash()` instead.

I also tried the data from the vignette mouse brain, but the problem still exists. I guess maybe something wrong with the version of R or package.
Here is my code for CoveragePlot():

CoveragePlot(
  object = FRC.ATAC,
  region = rownames(da_peaks)[2],
  sep = c(":", "-"),
  annotation = EnsDb.Mmusculus.v79,
  extend.upstream = 5000 ,
  extend.downstream = 5000)

Session info():
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.7 (Maipo)

Matrix products: default
BLAS:   /usr/lib64/libblas.so.3.4.2
LAPACK: /usr/lib64/liblapack.so.3.4.2

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BSgenome.Mmusculus.UCSC.mm10_1.4.0 BSgenome_1.52.0                   
 [3] rtracklayer_1.44.0                 Biostrings_2.52.0                 
 [5] XVector_0.24.0                     TFBSTools_1.22.0                  
 [7] JASPAR2018_1.1.1                   cowplot_0.9.4                     
 [9] reticulate_1.13                    ggplot2_3.2.1                     
[11] RColorBrewer_1.1-2                 dplyr_0.8.3                       
[13] EnsDb.Mmusculus.v79_2.99.0         ensembldb_2.8.0                   
[15] AnnotationFilter_1.8.0             GenomicFeatures_1.36.2            
[17] AnnotationDbi_1.46.0               Biobase_2.44.0                    
[19] GenomicRanges_1.36.0               GenomeInfoDb_1.20.0               
[21] IRanges_2.18.1                     S4Vectors_0.22.0                  
[23] BiocGenerics_0.30.0                Seurat_3.0.2                      
[25] Signac_0.1.5

Any suggestions will be helpful.

R session aborted while running FindTransferAnchors

Hello Tim,

I've tried to integrate two 10X scATAC data sets using Seurat/Signac, however when I run the "FindTransferAnchor" function, R session has always been automatically stopped. I tried other computers, but same problem always popped up.

transfer.anchors.WT <- FindTransferAnchors(reference = WT.RNA, query = WT, reduction = 'cca', dims = 1:50)

mouse brain tutorial issues

Looking forward to utilizing Signac!

When I was running through the mouse brain tutorial, I got:
"Error in is(file, "TabixFile") :
object 'fragment_file_filtered' not found"
after running

gene.activities <- FeatureMatrix(

fragments = fragment_file_filtered,
features = genebodyandpromoter.coords,
cells = colnames(brain),
chunk = 10
)

should it be:
gene.activities <- FeatureMatrix(

fragments = fragment.path,
features = genebodyandpromoter.coords,
cells = colnames(brain),
chunk = 10
)

Also, sorry if I missed this, but where can I download the allen_brain.rds file?
Thanks in advance!
Josh

general question...

how do we select cutoff for number of dimensions for RunUMAP?

merge two Signac Objects

Hi
I tried to merge two scATAC samples the way we can merge Seurat object using merge function in scRNA-seq. However, as the scATAC is different and we need to merge fragments files as well, can you suggest me the way to merge two Signac object?

executing runChromVar() and FindMotifs()

Great work so far, really like where this going.

I like to run chromVar and/or findMotifs but seem be stuck at creating the MotifObject and perhaps a few more. Any help in what steps I need to perform to get there?

Name conflict

Hi there,
I wanted to let you know that we've got an unfortunate naming conflict with your new package over at glotzerlab/signac. We just received an issue from one of your users, and I imagine that this problem will crop up again. Do you have any plans to roll your package into your Seurat package at any point? That might alleviate this difficulty in the long term.

latent.vars in finding DA peaks

Dear Tim,

Thanks for developing very useful tool for scATAC analysis.
In the vignette of "Human PBMCs" and "Adult mouse brain", latent vars = 'peak_region_fragments' was used in finding the differential peaks with FindMarkers function.

da_peaks <- FindMarkers( .......
test.use = 'LR',
latent.vars = 'peak_region_fragments' )

"Motif analysis with Signac" vignette, latent vars = 'nCount_peaks' was adopted in da_peaks selections.

da_peaks <- FindMarkers(.......
test.use = 'LR',
latent.vars = 'nCount_peaks')

I am wondering there are some reasons to select "nCount_peaks" in this situation?

Best,
Yoshi

Recurring problem with CCA when integrating datasets

Hi, I am having a recurring problem with CCA when I try to integrate scATAC and scRNAseq data. This occurs both with my own data and the data provided in the vignette here: https://satijalab.org/signac/articles/pbmc_vignette.html

All goes well until the "FindTransferAnchors" command, when CCA throws an error as such:

> transfer.anchors <- FindTransferAnchors(
+   reference = pbmc_rna,
+   query = pbmc,
+   reduction = 'cca'
+ )
Running CCA

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: Standardize(mat = object1, display_progress = FALSE)
 2: RunCCA.default(object1 = data1, object2 = data2, standardize = TRUE,     num.cc = num.cc, verbose = verbose, )
 3: RunCCA(object1 = data1, object2 = data2, standardize = TRUE,     num.cc = num.cc, verbose = verbose, )
 4: RunCCA.Seurat(object1 = reference, object2 = query, features = features,     num.cc = max(dims), renormalize = FALSE, rescale = FALSE,     verbose = verbose)
 5: RunCCA(object1 = reference, object2 = query, features = features,     num.cc = max(dims), renormalize = FALSE, rescale = FALSE,     verbose = verbose)
 6: FindTransferAnchors(reference = pbmc_rna, query = pbmc, reduction = "cca")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

I have also tried using pcaproject instead of CCA for reduction, but get the following error.

Error in if (is.numeric(x = feature.mean) && feature.mean != "SCT") { :
   missing value where TRUE/FALSE needed

Any suggestions would be appreciated! I am running R version 3.6.1 but have also tried in R version 3.5.2. Please let me know if any other information would be useful.

Many thanks,

Nathan

Integration of multiple scATAC-seq datasets

Hi,
Thank you for developing and providing an excellent tool for epigenetic research. I am really impressed by your developing this useful package.
I am wondering how to integrate several scATAC-seq datasets when there are batch effects (like different mice or human samples). Integration of gene activity matrices in Seurat standard workflow might be possible but I feel it is not so good idea. Could you have any idea or recommendation to decrease the batch effects in TF-IDF or SVD steps?

Duplicated gene names occured in converting chromosome to genes

Hello,
it's a powerful tool for analysing single cell ATAC data. And recently I used it to analyse mouse data by imporing Mus references. Unfortunately, when I was converting the chromosome names to gene names, there is an error info:

Warning: Non-unique features (rownames) present in the input matrix, making unique
Error: Feature names of counts matrix cannot be empty

And my code is as following:

gene.coords <- genes(EnsDb.Mmusculus.v79, filter = ~ gene_biotype == "protein_coding")
seqlevelsStyle(gene.coords) <- 'UCSC'
genebody.coords <- keepStandardChromosomes(gene.coords, pruning.mode = 'coarse')
genebodyandpromoter.coords <- Extend(x = gene.coords, upstream = 2000, downstream = 0)
# create a gene by cell matrix
gene.activities <- FeatureMatrix(fragments = fragment.path,features = genebodyandpromoter.coords,cells = colnames(ON_A1),chunk = 10)

# convert rownames from chromsomal coordinates into gene names
gene.key <- genebodyandpromoter.coords$gene_name
names(gene.key) <- GRangesToString(grange = genebodyandpromoter.coords)
rownames(gene.activities) <- gene.key[rownames(gene.activities)]
# add the gene activity matrix to the Seurat object as a new assay, and normalize it
ON_A1[['RNA']] <- CreateAssayObject(counts = gene.activities)
Warning: Non-unique features (rownames) present in the input matrix, making unique
Error: Feature names of counts matrix cannot be empty

And I guess maybe I need to filter duplicated genes firstly, so I check the duplicated genes:

rownames(gene.activities)[duplicated(rownames(gene.activities))]
   chr1-55300069-55342589  chr1-133655879-133662885   chr11-88971949-88990742
                       ""                   "Zbed6"                    "Coil"
chr11-114924893-114936386     chr14-8088078-8090554   chr15-10998817-11027939
                "Cd300lb"                   "Rpp14"                 "Slc45a2"
  chr15-58028243-58078425   chr15-81000628-81011736 chr15-100578273-100586075
          "9130401M01Rik"                   "Sgsm3"                  "Pou6f1"
  chr18-36937205-37187657   chr18-37008712-37187657   chr18-37018149-37187657
                 "PCDHA9"                  "PCDHA9"                  "PCDHA9"
  chr18-37088013-37187657   chr18-37141503-37187657   chr18-37723706-37841873
                 "PCDHA9"                  "PCDHA9"                 "PCDHGB6"
  chr18-37753731-37841873   chr18-37804364-37841873   chr18-78755215-79089015
                "PCDHGB6"                 "PCDHGB6"                  "Setbp1"
   chr2-60420013-60555130  chr2-152224839-152281852    chr3-34077454-34083222
                 "Pla2r1"                 "Csnk2a1"                 "Dnajc19"
   chr3-35895620-35934787    chr3-54691105-54716837  chr3-105868907-105924036
                "Dcun1d1"                  "Supt20"                  "Adora3"
 chr3-135283566-135304221    chr4-88643816-88646385  chr5-138259658-138266046
                   "Bdh2"                  "Ifna13"                "BC037034"
 chr6-121007241-121083609  chr6-131541784-131553705    chr7-23982065-23985048
                 "Mical3"           "5430401F13Rik"                 "Vmn1r90"
   chr7-24860278-24862412    chr7-29906223-29916813    chr7-43425670-43435991
                 "Gm9844"                        ""                    "Lim2"
   chr7-43428088-43435991    chr7-59982495-60142219  chr7-139638778-139685817
                   "Lim2"                   "Snrpn"                  "Cfap46"
   chr8-58427494-58913627  chr9-109976869-110081742  chr9-110084393-110103026
                "Galntl6"                    "Map4"                   "DHX30"
   chrX-75410444-75418588
                  "Mtcp1"

All of above are duplicated gene names. But I don't know exactly how to filter them. And Also, I find the NULL Feature names chr1-55300069-55342589 ""in this rownames, shuold I filter them dierectly or replace it by others?
And I continue the workflow after filtering the NULL Feature names. But I was thinking if the key gene was improtant and I wanted to project it or visualise the gene expression by violin plot, which gene would be used or just calculated its average ?

Sincerely,
Thanks

pct_reads_in_peaks calculation

Hi,
First of all ,congratulations on the new package for scATAC.

I was trying to calculate the following on my own dataset

brain$pct_reads_in_peaks <- brain$peak_region_fragments / brain$total * 100

And i got the following message
Error in [[.Seurat(x, i, drop = TRUE) : Cannot find 'total' in this Seurat object

Is the total calculated when CreateSeuratObject is run?

I was not able to find how it is calculated and it was missing for my dataset.

Best regards
Sasi

Tutorial suggestion

Under QC for your tutorial (for both mouse brain and PBMCs) you compute percent reads in peaks with the following

pbmc$pct_reads_in_peaks <- pbmc$peak_region_fragments / pbmc$total * 100

Based on reading about the singlecell.csv file output by cell ranger (https://support.10xgenomics.com/single-cell-atac/software/pipelines/latest/output/singlecell) I believe that this line should instead be

pbmc$pct_reads_in_peaks <- pbmc$peak_region_fragments / pbmc$passed_filters * 100

The total refers to all read pairs, which includes duplicates and other non-usable reads. The passed_filters should be more accurate as these don't include duplicate and low-quality reads. Using the total will give artificially low values for the percent of reads in peaks.

Overall, thanks for making this package! I've been impressed with Seurat for single cell RNA and am looking forward to using tools from your group for single cell ATAC as well.

NucleosomalSignal to extract reads in requested region

Hi I am merging the two object with MergeWithRegions,
and get this error:

Error in .get_data_frame_col_as_numeric(df, granges_cols[["end"]]) : 
  some values in the "end" column cannot be turned into numeric values
In addition: Warning message:
Expected 3 pieces. Missing pieces filled with `NA` in 218594 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...]. 
How could I fix this? Thank you!

Merging of 2 sample datasets

Hi Tim,

I have 2 samples of different treatment conditions merged. I would like to do differential peak analysis on this merged object. Using instructions from the FAQ (https://satijalab.org/signac/articles/faq.html), I merged fragments from my 2 samples and then utilized FeatureMatrix to generate a feature x cell matrix, with unified peaks. Weirdly the number of cells decreases after I generate the matrix. Here is what I have done so far:

library(Signac)
library(Seurat)
library(GenomeInfoDb)
library(EnsDb.Mmusculus.v79)
library(ggplot2)
set.seed(1234)

#read Sham files and create Sham object
counts_sham <- Read10X_h5("/Documents/Sham/outs/filtered_peak_bc_matrix.h5")
metadata_sham <- read.csv(
     file = "/Documents/Sham/outs/singlecell.csv",header = TRUE,
     row.names = 1
     )

spinalcord_sham <- CreateSeuratObject(
    counts = counts_sham,
	assay = 'peaks',
     project = 'sham',
     min.cells = 1,
     meta.data = metadata_sham
 )

"Warning message:
In CreateSeuratObject(counts = counts_sham, assay = "peaks", project = "sham",  :
  Some cells in meta.data not present in provided counts matrix."
 
#read SNI files and create SNI object
counts_SNI <- Read10X_h5("/Documents/SNI/outs/filtered_peak_bc_matrix.h5")
metadata_SNI <- read.csv(
     file = "/Documents/SNI/outs/singlecell.csv",header = TRUE,
     row.names = 1
     )

spinalcord_SNI <- CreateSeuratObject(
    counts = counts_SNI,
	assay = 'peaks',
     project = 'SNI',
     min.cells = 1,
     meta.data = metadata_SNI
 )

"Warning message:
In CreateSeuratObject(counts = counts_sham, assay = "peaks", project = "sham",  :
  Some cells in meta.data not present in provided counts matrix."

spinalcord_sham
"An object of class Seurat 
72986 features across 1930 samples within 1 assay 
Active assay: peaks (72986 features)"

spinalcord_SNI

"An object of class Seurat 
75997 features across 5665 samples within 1 assay 
Active assay: peaks (75997 features)"

#merge objects
merged.obj <- MergeWithRegions(
  object.1 = spinalcord_sham,
  object.2 = spinalcord_SNI,
  sep.1 = c(":", "-"),
  sep.2 = c(":", "-")
)

merged.obj
"An object of class Seurat 
66013 features across 7595 samples within 1 assay 
Active assay: peaks (66013 features)"

#set fragment path for merged fragment file (generated with bgzip and tabix)
fragment.path <- "fragments_merged.tsv.gz"
merged.obj <- SetFragments(object = merged.obj, file = fragment.path)

#extract gene coordinates from Ensembl, and ensure name formatting is consistent with Seurat object
gene.coords <- genes(EnsDb.Mmusculus.v79, filter = ~ gene_biotype == "protein_coding")
seqlevelsStyle(gene.coords) <- 'UCSC'
genebody.coords <- keepStandardChromosomes(gene.coords, pruning.mode = 'coarse')
genebodyandpromoter.coords <- Extend(x = gene.coords, upstream = 2000, downstream = 0)

#create a gene by cell matrix
gene.activities <- FeatureMatrix(
fragments = fragment.path,
features = genebodyandpromoter.coords,
cells = colnames(merged.obj),
chunk = 10
)

"Extracting reads overlapping genomic regions
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=13m 20s
Constructing matrix"

#qc
merged.obj <- NucleosomeSignal(object = merged.obj)
merged.obj$pct_reads_in_peaks <- merged.obj$peak_region_fragments / merged.obj$passed_filters * 100

"Error in names(data.return) <- rep.int(x = colnames(x = x), times = length(x = i)) : 
  'names' attribute [7595] must be the same length as the vector [7583]"

#I checked the dimensions for gene.activities and the number of cells is less than the merged object
dim(gene.activities)
[1] 21532  7583

It looks like I'm losing cells after the gene coordinate extraction step, decreasing from 7595 to 7583. Please advise.

Thanks,
Lipin

PWM vs PFM in CreateMotifMatrix

The function CreateMotifMatrix has an option "pwm" that expects:

"A PFMatrixList object containing position weight matrices to use".

Since "pwm" means position weight matrix and it is different than "pfm" - which is position frequency matrix -, what does the function exactly expects? A pfm or a pwm?

Also in the vignette, you wrote that you are getting the pwm's but you are actually getting pfm's.

I am confused.

Function PeriodPlot error

Hi there,
I am trying the function PeriodPlot but get the following error:

pbmc$nucleosome_group <- ifelse(pbmc$nucleosome_signal > 10, 'NS > 10', 'NS < 10')
PeriodPlot(object = pbmc, group.by = 'nucleosome_group')
Error: At least one layer must contain all faceting variables: group.

Plot is missing group
Layer 1 is missing group

what should I do to solve this problem?

Thanks in advance

Tn5 integration enrichment at a specific motif

Hi Tim,
The enrichment of Tn5 integration events at a specific motif, like TSS enrichment visualization in the vignette, would be very visually effective to show the motif activity in each cell-type (like Nat Biotechnol. 2019 Aug;37(8):925-936, Fig.2F). I guess it can be done using or modifying TSSenrichment / TSSPlot function...How do you think we can do this? I would be grateful if you could add it to the function or vignette when you have time.
Thanks!
Yoshi

Overlapping enriched motifs on positively/negativelydifferential peaks

Hi Tim,

Thanks for developing very useful tool for scATAC analysis.
In the vignette of "Motif analysis", enriched motifs are evaluated on the differential peaks of Pvalb compared with Sst.

da_peaks <- FindMarkers(
object = mouse_brain,
ident.1 = 'Pvalb',
ident.2 = 'Sst',
only.pos = TRUE,
test.use = 'LR',
latent.vars = 'nCount_peaks')

enriched.motifs <- FindMotifs(
object = mouse_brain,
features = rownames(da_peaks[da_peaks$p_val < 0.01, ]))

When I tried similarly motif analysis on these of Sst (that is, closed motifs in Pvalb compared to Sst) I saw many overlapping motifs on the both differentially active motif lists for Pvalb and Sst (MEF2s, SP1/2, PLAG2 etc...).

da_peaks <- FindMarkers(
object = mouse_brain,
ident.1 = 'Sst',
ident.2 = 'Pvalb',
only.pos = TRUE,
test.use = 'LR',
latent.vars = 'nCount_peaks')

enriched.motifs <- FindMotifs(
object = mouse_brain,
features = rownames(da_peaks[da_peaks$p_val < 0.01, ]))

These results makes identifying differentially-active motifs between cell types difficult - Do you have any suggestions for dealing with this problem? Using chromvar motif analysis might be simplest solution.

Thanks,
Yoshi

stuart-lab / signac Goto Github PK

signac's People

Contributors

Stargazers

Watchers

Forkers

signac's Issues

Recommend Projects

Recommend Topics

Recommend Org