Git Product home page Git Product logo

citefuse2020kim's People

Contributors

hanijieunkim avatar nick-robo avatar yingxinlin avatar

Stargazers

 avatar

Watchers

 avatar

citefuse2020kim's Issues

Trying to reproduce some figures

Dr. Yang (@PYangLab),

I was presenting on the CiteFuse paper for a journal club, and had a question about some of the figures. In particular, the UMAP embeddings from the joint graph.

In the paper, there is a comparison of the effects of doublet detection on the UMAP embeddings and clustering of the dataset. Something that caught my attention here was how different the UMAP plots generated following CiteFuse doublet detection looked compared to those from any other method. One of the most striking differences here was how a group of CD4+ cells was completley seperate in the CiteFuse plots, but not for any other method (other than no-filtering). Additionly, all the other methods' plots look pretty equivalent. This appears in Supp. fig 4:

In my experience, removing a few more cells shouldn't have this strong of an effect on the embedding. So, I tried to reproduce these figures using the code in this repo. I couldn't find the exact code that was used to make the joint graphs for each filtered dataset, so I used code from your vignette. I've included what I ran in the collapsed section below. Due to processing time, I only ran generated fused graphs for 3 of the datasets.

Code to reproduce
# Setup
load("data/doublet_labels.RData")
load("data/Unfiltered.RData")

library(CiteFuse)
library(scater)
library(SingleCellExperiment)
library(DT)
library(gridExtra)
library(purrr)


# Make SingleCellExperiment object
datasets = list(
    RNA=as.matrix(rna_mat_control),
    ADT=as.matrix(adt_control),
    HTO=as.matrix(hto_control)
)
sce = preprocessing(datasets)

# DoubletFinder has it's values stored as strings, so I convert it to bool
doublet.labels = lapply(doublet.labels, as.logical)
# Make subsets, only used some since the fusion takes a while
sce_subsets = lapply(
    doublet.labels[c("Unfiltered", "CiteFuse", "DoubletFinder")],
    function (labels) {sce[, !labels]}
)
# Fuse graphs and compute UMAP embeddings
compute_umap = function(sce) {
    sce <- scater::logNormCounts(sce)
    sce <- normaliseExprs(sce, altExp_name = "ADT", transform = "log")
    print("Starting to compute citefuse...")
    print(system.time(sce <- CiteFuse(sce)))
    print("starting to compute umap")
    sce = reducedDimSNF(sce, method = "UMAP", dimNames = "UMAP")
    sce
}

sce_processed = lapply(sce_subsets, compute_umap)
Output from that process
[1] "Starting to compute citefuse..."
Calculating affinity matrix 
Performing SNF  
    user   system  elapsed 
5605.562   22.790 5638.442 
[1] "starting to compute umap"
[1] "Starting to compute citefuse..."
Calculating affinity matrix 
Performing SNF  
    user   system  elapsed 
3762.252   16.996 3787.311 
[1] "starting to compute umap"
[1] "Starting to compute citefuse..."
Calculating affinity matrix 
Performing SNF  
    user   system  elapsed 
3658.019   17.448 3683.157 
[1] "starting to compute umap"

Now generating plots:

plots = imap(
    sce_processed,
    function (sce, k) {
        p = visualiseDim(
            sce,
            dimNames = "UMAP",
            colour_by = "CD4",
            data_from = "altExp",
            altExp_assay_name = "logcounts",
        ) +
        labs(title=k)
    }
)

grid.arrange(plots$Unfiltered, plots$DoubletFinder, plots$CiteFuse, ncol=3)

Here are the plots I get:

Since it's been a while since I've used R heavily, I couldn't figure out how to deal with the overplotting, so some points are hidden here. Hovever, I think I can say these plots show different results to what's in the paper. These show a pretty similar plots being generated by CiteFuse and DoubletFinder, with no clear differences in the seperation of CD4+ cells. I'd also note that the embedding which changed the most is the CiteFuse one, which is important since it features heavily in the rest of the paper.

I would like to see your comments on this. How did our plots turn out so different? Was there a signifigant part of the process I changed? In addition, do you think this would have implications for any of the other analyses in the paper - e.g. the clustering analysis?

In case these help with diagnoses on your end:

Here's a link to the SingleCellExperiment objects as an .rds file. And here's some info about the environment I used:

sessionInfo
R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] stringr_1.4.0               purrr_0.3.4                 gridExtra_2.3               DT_0.13                    
 [5] scater_1.16.1               ggplot2_3.3.0               SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.1
 [9] DelayedArray_0.14.0         matrixStats_0.56.0          Biobase_2.48.0              GenomicRanges_1.40.0       
[13] GenomeInfoDb_1.24.0         IRanges_2.22.2              S4Vectors_0.26.1            BiocGenerics_0.34.0        
[17] CiteFuse_1.0.0             

loaded via a namespace (and not attached):
  [1] ggbeeswarm_0.6.0          Rtsne_0.15                colorspace_1.4-1          ggsignif_0.6.0            ellipsis_0.3.1           
  [6] rio_0.5.16                ggridges_0.5.2            XVector_0.28.0            BiocNeighbors_1.6.0       rstudioapi_0.11          
 [11] ggpubr_0.3.0              farver_2.0.3              graphlayouts_0.7.0        ggrepel_0.8.2             RSpectra_0.16-0          
 [16] splines_4.0.0             knitr_1.28                heatmap.plus_1.3          polyclip_1.10-0           alluvial_0.1-2           
 [21] broom_0.5.6               kernlab_0.9-29            pheatmap_1.0.12           uwot_0.1.8                ggforce_0.3.1            
 [26] ExPosition_2.8.23         compiler_4.0.0            dqrng_0.2.1               prettyGraphs_2.1.6        backports_1.1.7          
 [31] assertthat_0.2.1          Matrix_1.2-18             limma_3.44.1              tweenr_1.0.1              BiocSingular_1.4.0       
 [36] htmltools_0.4.0           tools_4.0.0               rsvd_1.0.3                igraph_1.2.5              gtable_0.3.0             
 [41] glue_1.4.1                GenomeInfoDbData_1.2.3    reshape2_1.4.4            dplyr_0.8.5               Rcpp_1.0.4.6             
 [46] carData_3.0-4             cellranger_1.1.0          vctrs_0.3.0               nlme_3.1-147              DelayedMatrixStats_1.10.0
 [51] ggraph_2.0.3              xfun_0.14                 openxlsx_4.1.5            lifecycle_0.2.0           irlba_2.3.3              
 [56] statmod_1.4.34            rstatix_0.5.0             edgeR_3.30.0              zlibbioc_1.34.0           MASS_7.3-51.6            
 [61] scales_1.1.1              tidygraph_1.2.0           hms_0.5.3                 rhdf5_2.32.0              RColorBrewer_1.1-2       
 [66] SNFtool_2.3.0             yaml_2.2.1                curl_4.3                  segmented_1.1-0           stringi_1.4.6            
 [71] randomForest_4.6-14       scran_1.16.0              zip_2.0.4                 BiocParallel_1.22.0       rlang_0.4.6              
 [76] pkgconfig_2.0.3           bitops_1.0-6              lattice_0.20-41           Rhdf5lib_1.10.0           labeling_0.3             
 [81] htmlwidgets_1.5.1         cowplot_1.0.0             tidyselect_1.1.0          plyr_1.8.6                magrittr_1.5             
 [86] R6_2.4.1                  generics_0.0.2            withr_2.2.0               pillar_1.4.4              haven_2.3.0              
 [91] foreign_0.8-80            mixtools_1.2.0            survival_3.1-12           abind_1.4-5               RCurl_1.98-1.2           
 [96] tibble_3.0.1              crayon_1.3.4              car_3.0-8                 viridis_0.5.1             locfit_1.5-9.4           
[101] grid_4.0.0                readxl_1.3.1              data.table_1.12.8         propr_4.2.6               forcats_0.5.0            
[106] digest_0.6.25             tidyr_1.1.0               dbscan_1.1-5              munsell_0.5.0             beeswarm_0.2.3           
[111] viridisLite_0.3.0         vipor_0.4.5      

Please let me know if you'd like any more information from my end.

Best,

–Isaac

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.