citefuse2020kim's People
citefuse2020kim's Issues
Trying to reproduce some figures
Dr. Yang (@PYangLab),
I was presenting on the CiteFuse paper for a journal club, and had a question about some of the figures. In particular, the UMAP embeddings from the joint graph.
In the paper, there is a comparison of the effects of doublet detection on the UMAP embeddings and clustering of the dataset. Something that caught my attention here was how different the UMAP plots generated following CiteFuse doublet detection looked compared to those from any other method. One of the most striking differences here was how a group of CD4+ cells was completley seperate in the CiteFuse plots, but not for any other method (other than no-filtering). Additionly, all the other methods' plots look pretty equivalent. This appears in Supp. fig 4:
In my experience, removing a few more cells shouldn't have this strong of an effect on the embedding. So, I tried to reproduce these figures using the code in this repo. I couldn't find the exact code that was used to make the joint graphs for each filtered dataset, so I used code from your vignette. I've included what I ran in the collapsed section below. Due to processing time, I only ran generated fused graphs for 3 of the datasets.
Code to reproduce
# Setup
load("data/doublet_labels.RData")
load("data/Unfiltered.RData")
library(CiteFuse)
library(scater)
library(SingleCellExperiment)
library(DT)
library(gridExtra)
library(purrr)
# Make SingleCellExperiment object
datasets = list(
RNA=as.matrix(rna_mat_control),
ADT=as.matrix(adt_control),
HTO=as.matrix(hto_control)
)
sce = preprocessing(datasets)
# DoubletFinder has it's values stored as strings, so I convert it to bool
doublet.labels = lapply(doublet.labels, as.logical)
# Make subsets, only used some since the fusion takes a while
sce_subsets = lapply(
doublet.labels[c("Unfiltered", "CiteFuse", "DoubletFinder")],
function (labels) {sce[, !labels]}
)
# Fuse graphs and compute UMAP embeddings
compute_umap = function(sce) {
sce <- scater::logNormCounts(sce)
sce <- normaliseExprs(sce, altExp_name = "ADT", transform = "log")
print("Starting to compute citefuse...")
print(system.time(sce <- CiteFuse(sce)))
print("starting to compute umap")
sce = reducedDimSNF(sce, method = "UMAP", dimNames = "UMAP")
sce
}
sce_processed = lapply(sce_subsets, compute_umap)
Output from that process
[1] "Starting to compute citefuse..."
Calculating affinity matrix
Performing SNF
user system elapsed
5605.562 22.790 5638.442
[1] "starting to compute umap"
[1] "Starting to compute citefuse..."
Calculating affinity matrix
Performing SNF
user system elapsed
3762.252 16.996 3787.311
[1] "starting to compute umap"
[1] "Starting to compute citefuse..."
Calculating affinity matrix
Performing SNF
user system elapsed
3658.019 17.448 3683.157
[1] "starting to compute umap"
Now generating plots:
plots = imap(
sce_processed,
function (sce, k) {
p = visualiseDim(
sce,
dimNames = "UMAP",
colour_by = "CD4",
data_from = "altExp",
altExp_assay_name = "logcounts",
) +
labs(title=k)
}
)
grid.arrange(plots$Unfiltered, plots$DoubletFinder, plots$CiteFuse, ncol=3)
Here are the plots I get:
Since it's been a while since I've used R
heavily, I couldn't figure out how to deal with the overplotting, so some points are hidden here. Hovever, I think I can say these plots show different results to what's in the paper. These show a pretty similar plots being generated by CiteFuse and DoubletFinder, with no clear differences in the seperation of CD4+ cells. I'd also note that the embedding which changed the most is the CiteFuse one, which is important since it features heavily in the rest of the paper.
I would like to see your comments on this. How did our plots turn out so different? Was there a signifigant part of the process I changed? In addition, do you think this would have implications for any of the other analyses in the paper - e.g. the clustering analysis?
In case these help with diagnoses on your end:
Here's a link to the SingleCellExperiment objects as an .rds
file. And here's some info about the environment I used:
sessionInfo
R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.4
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringr_1.4.0 purrr_0.3.4 gridExtra_2.3 DT_0.13
[5] scater_1.16.1 ggplot2_3.3.0 SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.1
[9] DelayedArray_0.14.0 matrixStats_0.56.0 Biobase_2.48.0 GenomicRanges_1.40.0
[13] GenomeInfoDb_1.24.0 IRanges_2.22.2 S4Vectors_0.26.1 BiocGenerics_0.34.0
[17] CiteFuse_1.0.0
loaded via a namespace (and not attached):
[1] ggbeeswarm_0.6.0 Rtsne_0.15 colorspace_1.4-1 ggsignif_0.6.0 ellipsis_0.3.1
[6] rio_0.5.16 ggridges_0.5.2 XVector_0.28.0 BiocNeighbors_1.6.0 rstudioapi_0.11
[11] ggpubr_0.3.0 farver_2.0.3 graphlayouts_0.7.0 ggrepel_0.8.2 RSpectra_0.16-0
[16] splines_4.0.0 knitr_1.28 heatmap.plus_1.3 polyclip_1.10-0 alluvial_0.1-2
[21] broom_0.5.6 kernlab_0.9-29 pheatmap_1.0.12 uwot_0.1.8 ggforce_0.3.1
[26] ExPosition_2.8.23 compiler_4.0.0 dqrng_0.2.1 prettyGraphs_2.1.6 backports_1.1.7
[31] assertthat_0.2.1 Matrix_1.2-18 limma_3.44.1 tweenr_1.0.1 BiocSingular_1.4.0
[36] htmltools_0.4.0 tools_4.0.0 rsvd_1.0.3 igraph_1.2.5 gtable_0.3.0
[41] glue_1.4.1 GenomeInfoDbData_1.2.3 reshape2_1.4.4 dplyr_0.8.5 Rcpp_1.0.4.6
[46] carData_3.0-4 cellranger_1.1.0 vctrs_0.3.0 nlme_3.1-147 DelayedMatrixStats_1.10.0
[51] ggraph_2.0.3 xfun_0.14 openxlsx_4.1.5 lifecycle_0.2.0 irlba_2.3.3
[56] statmod_1.4.34 rstatix_0.5.0 edgeR_3.30.0 zlibbioc_1.34.0 MASS_7.3-51.6
[61] scales_1.1.1 tidygraph_1.2.0 hms_0.5.3 rhdf5_2.32.0 RColorBrewer_1.1-2
[66] SNFtool_2.3.0 yaml_2.2.1 curl_4.3 segmented_1.1-0 stringi_1.4.6
[71] randomForest_4.6-14 scran_1.16.0 zip_2.0.4 BiocParallel_1.22.0 rlang_0.4.6
[76] pkgconfig_2.0.3 bitops_1.0-6 lattice_0.20-41 Rhdf5lib_1.10.0 labeling_0.3
[81] htmlwidgets_1.5.1 cowplot_1.0.0 tidyselect_1.1.0 plyr_1.8.6 magrittr_1.5
[86] R6_2.4.1 generics_0.0.2 withr_2.2.0 pillar_1.4.4 haven_2.3.0
[91] foreign_0.8-80 mixtools_1.2.0 survival_3.1-12 abind_1.4-5 RCurl_1.98-1.2
[96] tibble_3.0.1 crayon_1.3.4 car_3.0-8 viridis_0.5.1 locfit_1.5-9.4
[101] grid_4.0.0 readxl_1.3.1 data.table_1.12.8 propr_4.2.6 forcats_0.5.0
[106] digest_0.6.25 tidyr_1.1.0 dbscan_1.1-5 munsell_0.5.0 beeswarm_0.2.3
[111] viridisLite_0.3.0 vipor_0.4.5
Please let me know if you'd like any more information from my end.
Best,
–Isaac
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.