Git Product home page Git Product logo

Comments (9)

kstreet13 avatar kstreet13 commented on July 21, 2024

Hi @janinemelsen

I just want to start by saying that I don't think there is any "correct" answer for these sorts of questions, but I'll try to help out!

I think some of the weirder results you showed may be due to a quirk of Gaussian mixture modeling, particularly in the "4 diffusion components, clusters calculated by mclust (=9)" and "4 diffusion components, clusters calculated by mclust (=10)" plots. Both of these seem to include one very large, highly dispersed cluster (purple and orange, respectively) that tends to mess up the minimum spanning tree, as many other clusters are connected to it. I haven't played around with mclust too much, but you might be able to avoid these sorts of clusters by setting the modelName argument (maybe to something like "EVV" for "ellipsoidal, equal volume")? Alternatively, you could try other clustering methods, though I have generally had good results with mclust.

And I'm not familiar with how diffusion maps work, but I have used them occasionally, via the destiny package. My understanding (largely informed by the destiny documentation) is that they are based on an eigen decomposition, where the diffusion components are the eigenvectors and the corresponding eigenvalues behave similarly to the variances in PCA (ie. strictly positive and decreasing). So my guess is that selecting a particular number of diffusion components is similar to selecting a number of PCs, for which there are a lot of existing methods (qualitatively, I think looking for the "elbow" in the plot of eigenvalues would be a good starting point).

Finally, I should mention that you don't strictly need to perform clustering before running Slingshot, if you believe that the data only contain one lineage (with no branching). I bring this up because, at least in two dimensions, that seems reasonable for your data. In this case, slingshot will just fit a principal curve and you may be better off using the princurve package directly.

Hope this helps!

from slingshot.

janinemelsen avatar janinemelsen commented on July 21, 2024

Hi,

Thank you for the quick response! I adjusted the model of mclust, and the clusters look much better, however they are not reproducible. Each time I run mclust on the same number of diffusion components, the clusters are different. For instance (model is EVV, number of components is 5, number of clusters is 8):

image

image

My guess is that this could be explained by the cells in the center of the plot (which seems to be outliers).

Unfortunately, the elbow plot is not very informative, since there is no elbow. According to the destiny paper (figure 1B) this can be explained by the large intrinsic dimensionality.
image

Best,
Janine

from slingshot.

kstreet13 avatar kstreet13 commented on July 21, 2024

That's interesting that you don't get the same clusters every time. My best guess is that that's caused by some sort of random initialization. You could probably make a particular set of results reproducible by setting the random seed, but that wouldn't actually make the algorithm any more stable. If you want to try out other methods, I know clusterExperiment::RSEC is specifically designed for stability (and leaves some cells unclustered, which may prevent those points in the middle from causing issues). There's also the graph-based Louvain clustering, which is fairly popular (available via scran::buildSNNGraph+igraph::cluster_louvain or Seurat::FindNeighbors+Seurat::FindClusters).

And yeah, I agree that that plot seems to indicate a high intrinsic dimensionality. Fortunately, most if not all of the methods I've mentioned can work in 10 dimensions without issue. I think it may be best to do the analysis on the full data and only use dimensionality reduction for visualization purposes.

from slingshot.

janinemelsen avatar janinemelsen commented on July 21, 2024

I was not able to set the resolution parameter with the igraph package (without resolution the number of clusters is way too high) so I used the Seurat package. And... it looks much better and reproducible!

image

The SNN graph and clusters were based on the full data and the slingshot was based on the diffusion components. I have one question left: is it possible to calculate the slingshot on the full data (and clusters), and then to plot it on the diffusion map?

from slingshot.

kstreet13 avatar kstreet13 commented on July 21, 2024

That's great, glad you found something that works!

And yes, it is totally possible (and in this case, recommended) to run Slingshot on the full dataset. This makes plotting the results a bit more tricky, since there's no straightforward way to map the smooth curves onto the 2D diffusion map. However, you can get around this by plotting multiple versions of the diffusion map (or tSNE, UMAP, etc.) with cells colored by the pseudotime values along the different lineages (see example here).

from slingshot.

janinemelsen avatar janinemelsen commented on July 21, 2024

Thanks!

Unfortunately, slingshot based on the full dataset was influenced by some 'noise' I guess. Especially in the third lineage red cells are visible in the blue zone, which is not correct in my opinion. According to the louvain clustering these cells do belong to the same cluster, so I dont understand how this can happen.

On the other hand the color gradient is more gradual in the slingshot based on the full dataset compared to the diffusion components:

Slingshot based on full dataset (3 lineages)
image

Slingshot based on the diffusion components (3 lineages)
image

Clustering (based on full data)
image

from slingshot.

kstreet13 avatar kstreet13 commented on July 21, 2024

Hmm, that is unfortunate and I agree that it's probably a result of the intermediate cells in the middle. My guess is that those cells are even more ambiguous in 9 dimensions. Some of them get assigned to the early stage and some to the later stage and where we would draw that line on a 2-dimensional diffusion map isn't exactly where it gets drawn in the original space. Is it possible that some of these are doublets?

Otherwise, I haven't played around with this too much, but one thing you could try is defining a threshold for cells that are "well clustered" and only constructing the lineages based on those cells. For your purposes, I think something like silhouette width (via cluster::silhouette) would work. You could temporarily remove cells with silhouette scores below a certain threshold, such as 0, and then run Slingshot on the rest. Then you can assign the held-out cells to the lineages with the predict method. This would be analogous to how we handle unclustered cells from clustering methods that look to identify "stable" clusters, such as RSEC.

from slingshot.

janinemelsen avatar janinemelsen commented on July 21, 2024

I think I will leave it like this (its not that bad I think). I only adjusted the plot visualization a bit, compared to the previous plots;)

image

image

image

Thanks for the help!

Janine

from slingshot.

kstreet13 avatar kstreet13 commented on July 21, 2024

Cool, glad you were able to find something that works!

from slingshot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.