Comments (9)
I just want to start by saying that I don't think there is any "correct" answer for these sorts of questions, but I'll try to help out!
I think some of the weirder results you showed may be due to a quirk of Gaussian mixture modeling, particularly in the "4 diffusion components, clusters calculated by mclust (=9)" and "4 diffusion components, clusters calculated by mclust (=10)" plots. Both of these seem to include one very large, highly dispersed cluster (purple and orange, respectively) that tends to mess up the minimum spanning tree, as many other clusters are connected to it. I haven't played around with mclust
too much, but you might be able to avoid these sorts of clusters by setting the modelName
argument (maybe to something like "EVV"
for "ellipsoidal, equal volume")? Alternatively, you could try other clustering methods, though I have generally had good results with mclust
.
And I'm not familiar with how diffusion maps work, but I have used them occasionally, via the destiny
package. My understanding (largely informed by the destiny
documentation) is that they are based on an eigen decomposition, where the diffusion components are the eigenvectors and the corresponding eigenvalues behave similarly to the variances in PCA (ie. strictly positive and decreasing). So my guess is that selecting a particular number of diffusion components is similar to selecting a number of PCs, for which there are a lot of existing methods (qualitatively, I think looking for the "elbow" in the plot of eigenvalues would be a good starting point).
Finally, I should mention that you don't strictly need to perform clustering before running Slingshot, if you believe that the data only contain one lineage (with no branching). I bring this up because, at least in two dimensions, that seems reasonable for your data. In this case, slingshot
will just fit a principal curve and you may be better off using the princurve
package directly.
Hope this helps!
from slingshot.
Hi,
Thank you for the quick response! I adjusted the model of mclust, and the clusters look much better, however they are not reproducible. Each time I run mclust on the same number of diffusion components, the clusters are different. For instance (model is EVV, number of components is 5, number of clusters is 8):
My guess is that this could be explained by the cells in the center of the plot (which seems to be outliers).
Unfortunately, the elbow plot is not very informative, since there is no elbow. According to the destiny paper (figure 1B) this can be explained by the large intrinsic dimensionality.
Best,
Janine
from slingshot.
That's interesting that you don't get the same clusters every time. My best guess is that that's caused by some sort of random initialization. You could probably make a particular set of results reproducible by setting the random seed, but that wouldn't actually make the algorithm any more stable. If you want to try out other methods, I know clusterExperiment::RSEC
is specifically designed for stability (and leaves some cells unclustered, which may prevent those points in the middle from causing issues). There's also the graph-based Louvain clustering, which is fairly popular (available via scran::buildSNNGraph
+igraph::cluster_louvain
or Seurat::FindNeighbors
+Seurat::FindClusters
).
And yeah, I agree that that plot seems to indicate a high intrinsic dimensionality. Fortunately, most if not all of the methods I've mentioned can work in 10 dimensions without issue. I think it may be best to do the analysis on the full data and only use dimensionality reduction for visualization purposes.
from slingshot.
I was not able to set the resolution parameter with the igraph package (without resolution the number of clusters is way too high) so I used the Seurat package. And... it looks much better and reproducible!
The SNN graph and clusters were based on the full data and the slingshot was based on the diffusion components. I have one question left: is it possible to calculate the slingshot on the full data (and clusters), and then to plot it on the diffusion map?
from slingshot.
That's great, glad you found something that works!
And yes, it is totally possible (and in this case, recommended) to run Slingshot on the full dataset. This makes plotting the results a bit more tricky, since there's no straightforward way to map the smooth curves onto the 2D diffusion map. However, you can get around this by plotting multiple versions of the diffusion map (or tSNE, UMAP, etc.) with cells colored by the pseudotime values along the different lineages (see example here).
from slingshot.
Thanks!
Unfortunately, slingshot based on the full dataset was influenced by some 'noise' I guess. Especially in the third lineage red cells are visible in the blue zone, which is not correct in my opinion. According to the louvain clustering these cells do belong to the same cluster, so I dont understand how this can happen.
On the other hand the color gradient is more gradual in the slingshot based on the full dataset compared to the diffusion components:
Slingshot based on full dataset (3 lineages)
Slingshot based on the diffusion components (3 lineages)
Clustering (based on full data)
from slingshot.
Hmm, that is unfortunate and I agree that it's probably a result of the intermediate cells in the middle. My guess is that those cells are even more ambiguous in 9 dimensions. Some of them get assigned to the early stage and some to the later stage and where we would draw that line on a 2-dimensional diffusion map isn't exactly where it gets drawn in the original space. Is it possible that some of these are doublets?
Otherwise, I haven't played around with this too much, but one thing you could try is defining a threshold for cells that are "well clustered" and only constructing the lineages based on those cells. For your purposes, I think something like silhouette width (via cluster::silhouette
) would work. You could temporarily remove cells with silhouette scores below a certain threshold, such as 0, and then run Slingshot on the rest. Then you can assign the held-out cells to the lineages with the predict
method. This would be analogous to how we handle unclustered cells from clustering methods that look to identify "stable" clusters, such as RSEC.
from slingshot.
I think I will leave it like this (its not that bad I think). I only adjusted the plot visualization a bit, compared to the previous plots;)
Thanks for the help!
Janine
from slingshot.
Cool, glad you were able to find something that works!
from slingshot.
Related Issues (20)
- getCurves() Missing values HOT 8
- Curves and lineages HOT 3
- Error in igraph::shortest_paths(tree, from = l, to = ends, mode = "out", : At core/paths/dijkstra.c:360 : Weight vector must not contain NaN values, Invalid value HOT 3
- Average curve weights and Pseudotime HOT 1
- slingshot analysis on PCA but visualization on UMAP HOT 3
- how to analyze with 3 different treatments HOT 4
- how to calclate the POS to compare other algorithm? HOT 2
- Pulling and plotting Differential pseudotime data across different conditions/metadata slots HOT 1
- `embedCurves` should also map MST into new space HOT 1
- Slingshot followed by CellRank HOT 1
- is slingshot using only the variable genes? HOT 2
- Pick one lineage from many for conditiontest HOT 3
- How do I extract feature_info in my rds file? HOT 4
- What does a curve represent biologically? HOT 7
- Overlapping curve issue - change to code needed? HOT 3
- Removing a lineage from a `PseudotimeOrdering` object? HOT 3
- Error in graph.adjacency.dense: Adjacency matrix should be symmetric to produce an undirected graph. Invalid value HOT 5
- pseudotime in two lineage? HOT 2
- Slingshot error HOT 1
- Error while plotting the Minimum Distances Between Cell-Type Centroids using Slingshot in r HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from slingshot.