The citefuse2020kim from sydneybiox

Trying to reproduce some figures

I was presenting on the CiteFuse paper for a journal club, and had a question about some of the figures. In particular, the UMAP embeddings from the joint graph.

In the paper, there is a comparison of the effects of doublet detection on the UMAP embeddings and clustering of the dataset. Something that caught my attention here was how different the UMAP plots generated following CiteFuse doublet detection looked compared to those from any other method. One of the most striking differences here was how a group of CD4+ cells was completley seperate in the CiteFuse plots, but not for any other method (other than no-filtering). Additionly, all the other methods' plots look pretty equivalent. This appears in Supp. fig 4:

In my experience, removing a few more cells shouldn't have this strong of an effect on the embedding. So, I tried to reproduce these figures using the code in this repo. I couldn't find the exact code that was used to make the joint graphs for each filtered dataset, so I used code from your vignette. I've included what I ran in the collapsed section below. Due to processing time, I only ran generated fused graphs for 3 of the datasets.

Code to reproduce

# Setup
load("data/doublet_labels.RData")
load("data/Unfiltered.RData")

library(CiteFuse)
library(scater)
library(SingleCellExperiment)
library(DT)
library(gridExtra)
library(purrr)


# Make SingleCellExperiment object
datasets = list(
    RNA=as.matrix(rna_mat_control),
    ADT=as.matrix(adt_control),
    HTO=as.matrix(hto_control)
)
sce = preprocessing(datasets)

# DoubletFinder has it's values stored as strings, so I convert it to bool
doublet.labels = lapply(doublet.labels, as.logical)
# Make subsets, only used some since the fusion takes a while
sce_subsets = lapply(
    doublet.labels[c("Unfiltered", "CiteFuse", "DoubletFinder")],
    function (labels) {sce[, !labels]}
)

# Fuse graphs and compute UMAP embeddings
compute_umap = function(sce) {
    sce <- scater::logNormCounts(sce)
    sce <- normaliseExprs(sce, altExp_name = "ADT", transform = "log")
    print("Starting to compute citefuse...")
    print(system.time(sce <- CiteFuse(sce)))
    print("starting to compute umap")
    sce = reducedDimSNF(sce, method = "UMAP", dimNames = "UMAP")
    sce
}

sce_processed = lapply(sce_subsets, compute_umap)

Output from that process

[1] "Starting to compute citefuse..."
Calculating affinity matrix 
Performing SNF  
    user   system  elapsed 
5605.562   22.790 5638.442 
[1] "starting to compute umap"
[1] "Starting to compute citefuse..."
Calculating affinity matrix 
Performing SNF  
    user   system  elapsed 
3762.252   16.996 3787.311 
[1] "starting to compute umap"
[1] "Starting to compute citefuse..."
Calculating affinity matrix 
Performing SNF  
    user   system  elapsed 
3658.019   17.448 3683.157 
[1] "starting to compute umap"

Now generating plots:

plots = imap(
    sce_processed,
    function (sce, k) {
        p = visualiseDim(
            sce,
            dimNames = "UMAP",
            colour_by = "CD4",
            data_from = "altExp",
            altExp_assay_name = "logcounts",
        ) +
        labs(title=k)
    }
)

grid.arrange(plots$Unfiltered, plots$DoubletFinder, plots$CiteFuse, ncol=3)

Here are the plots I get:

Since it's been a while since I've used R heavily, I couldn't figure out how to deal with the overplotting, so some points are hidden here. Hovever, I think I can say these plots show different results to what's in the paper. These show a pretty similar plots being generated by CiteFuse and DoubletFinder, with no clear differences in the seperation of CD4+ cells. I'd also note that the embedding which changed the most is the CiteFuse one, which is important since it features heavily in the rest of the paper.

I would like to see your comments on this. How did our plots turn out so different? Was there a signifigant part of the process I changed? In addition, do you think this would have implications for any of the other analyses in the paper - e.g. the clustering analysis?

In case these help with diagnoses on your end:

Here's a link to the SingleCellExperiment objects as an .rds file. And here's some info about the environment I used:

sessionInfo

R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] stringr_1.4.0               purrr_0.3.4                 gridExtra_2.3               DT_0.13                    
 [5] scater_1.16.1               ggplot2_3.3.0               SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.1
 [9] DelayedArray_0.14.0         matrixStats_0.56.0          Biobase_2.48.0              GenomicRanges_1.40.0       
[13] GenomeInfoDb_1.24.0         IRanges_2.22.2              S4Vectors_0.26.1            BiocGenerics_0.34.0        
[17] CiteFuse_1.0.0             

loaded via a namespace (and not attached):
  [1] ggbeeswarm_0.6.0          Rtsne_0.15                colorspace_1.4-1          ggsignif_0.6.0            ellipsis_0.3.1           
  [6] rio_0.5.16                ggridges_0.5.2            XVector_0.28.0            BiocNeighbors_1.6.0       rstudioapi_0.11          
 [11] ggpubr_0.3.0              farver_2.0.3              graphlayouts_0.7.0        ggrepel_0.8.2             RSpectra_0.16-0          
 [16] splines_4.0.0             knitr_1.28                heatmap.plus_1.3          polyclip_1.10-0           alluvial_0.1-2           
 [21] broom_0.5.6               kernlab_0.9-29            pheatmap_1.0.12           uwot_0.1.8                ggforce_0.3.1            
 [26] ExPosition_2.8.23         compiler_4.0.0            dqrng_0.2.1               prettyGraphs_2.1.6        backports_1.1.7          
 [31] assertthat_0.2.1          Matrix_1.2-18             limma_3.44.1              tweenr_1.0.1              BiocSingular_1.4.0       
 [36] htmltools_0.4.0           tools_4.0.0               rsvd_1.0.3                igraph_1.2.5              gtable_0.3.0             
 [41] glue_1.4.1                GenomeInfoDbData_1.2.3    reshape2_1.4.4            dplyr_0.8.5               Rcpp_1.0.4.6             
 [46] carData_3.0-4             cellranger_1.1.0          vctrs_0.3.0               nlme_3.1-147              DelayedMatrixStats_1.10.0
 [51] ggraph_2.0.3              xfun_0.14                 openxlsx_4.1.5            lifecycle_0.2.0           irlba_2.3.3              
 [56] statmod_1.4.34            rstatix_0.5.0             edgeR_3.30.0              zlibbioc_1.34.0           MASS_7.3-51.6            
 [61] scales_1.1.1              tidygraph_1.2.0           hms_0.5.3                 rhdf5_2.32.0              RColorBrewer_1.1-2       
 [66] SNFtool_2.3.0             yaml_2.2.1                curl_4.3                  segmented_1.1-0           stringi_1.4.6            
 [71] randomForest_4.6-14       scran_1.16.0              zip_2.0.4                 BiocParallel_1.22.0       rlang_0.4.6              
 [76] pkgconfig_2.0.3           bitops_1.0-6              lattice_0.20-41           Rhdf5lib_1.10.0           labeling_0.3             
 [81] htmlwidgets_1.5.1         cowplot_1.0.0             tidyselect_1.1.0          plyr_1.8.6                magrittr_1.5             
 [86] R6_2.4.1                  generics_0.0.2            withr_2.2.0               pillar_1.4.4              haven_2.3.0              
 [91] foreign_0.8-80            mixtools_1.2.0            survival_3.1-12           abind_1.4-5               RCurl_1.98-1.2           
 [96] tibble_3.0.1              crayon_1.3.4              car_3.0-8                 viridis_0.5.1             locfit_1.5-9.4           
[101] grid_4.0.0                readxl_1.3.1              data.table_1.12.8         propr_4.2.6               forcats_0.5.0            
[106] digest_0.6.25             tidyr_1.1.0               dbscan_1.1-5              munsell_0.5.0             beeswarm_0.2.3           
[111] viridisLite_0.3.0         vipor_0.4.5

Please let me know if you'd like any more information from my end.

Best,

–Isaac

sydneybiox / citefuse2020kim Goto Github PK

citefuse2020kim's People

Contributors

Stargazers

Watchers

citefuse2020kim's Issues

Trying to reproduce some figures

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent