Hi, My data set contains 155 clusters and 2 conditions (treatment vs

different results in different runs about sccoda HOT 8 CLOSED

theislab commented on June 1, 2024

different results in different runs

from sccoda.

Comments (8)

johannesostner commented on June 1, 2024 1

Hi!

This is a conceptional problem that arises when running scCODA without an explicit reference (or baseline) category. Since compositions are of constrained nature, it is only possible to make statements relative to a certain category. (see e.g. this paper on microbial data about it; you can directly translate this to single-cell data as well)

When running NoBaselineModel, it is implied that all cell types with Final parameter 0 are seen as the reference set. If you set one cell type as the baseline (by using BaselineModel), the results will become consistent, as we now always have the same reference.

As a reference cell type, you should choose one whose relative abundance is fairly consistent over all samples.

from sccoda.

johannesostner commented on June 1, 2024 1

Hmm, this is a very unique idea. So far, we have only ever reported single runs of scCODA with a fixed reference and analyzed everything in regards to this reference. For this, we usually show grouped boxplots (grouped by cell type, one plot per category) and indicate the ones with a credible effect somehow. But this does not work in your case since you have 155 clusters, which is far more than we ever used so far.

I like the idea of looking at how often a parameter was found to be credibly different from 0 (final parameter !=0) to gain a consistency estimate, and to use this as a starting point for further analysis. However, you should not use the numerical values of the final parameters like you proposed, in my opinion. They are dependent not only on the rate of change, but also on the dispersion of your data, and the values of the intercept parameters, which makes them very hard to interpret.

What you could use is the sign of the parameters, though. Negative parameters will always indicate a decrease, while positive ones indicate an increase in relative abundance. I would therefore report how often each cluster's parameter was different from 0 (maybe divided into increasing/decreasing), and then plot this as a barplot or heatmap of some sort.

Lastly, I just want to point out that scCODA is still unpublished, so please contact us (preferably @b-schubert) before including it in any reports

from sccoda.

bsierieb1 commented on June 1, 2024

thanks for the explanation! is there a rule of a thumb how to select the reference without any prior knowledge which cell type should remain unaffected by treatment? do you think it would make sense to select the cell type that has the proportion of treated cells which is the closest to the proportion of treated cells in the entire data set?

from sccoda.

johannesostner commented on June 1, 2024

If you have no prior knowledge, I would take a cell type that has low variation (in terms of relative abundance) over the entire dataset. This makes interpretation easier, since now the proportion of your reference category is neither changing drastically between conditions nor within conditions.

For example, in the dataset from the tutorial I would choose Goblet or Endocrine cells:

from sccoda.

bsierieb1 commented on June 1, 2024

it was a useful exercise to try and run it with a baseline but the issue persists - i still get either one bunch of negative clusters or a different bunch of positive clusters depending on the run. does it mean that it is essentially impossible to say if I had an expansion of some clusters or contraction of other clusters in treatment because they kind of "compensate" for each other? I know I may not be phrasing it correctly but I hope you understand what I mean...

from sccoda.

johannesostner commented on June 1, 2024

I think I know what you are trying to point out with clusters "compensating" for each other, and this is exactly what makes compositional data analysis sometimes difficult to interpret: Due to the total sum constraint on a sample (the number of cells that can be sequenced in one go is fixed), the counts are all inherently negatively correlated.
This means that if one cluster expands, all others seem to contract in order to, as you say, compensate.

Therefore, the two solutions you get are two equivalent (and equally likely) interpretations of the same data, and there is no way to tell which of these is "true" on a global, scale, at least not without any additional quantitative measurements.

from sccoda.

bsierieb1 commented on June 1, 2024

Got it, thanks for the explanation. In such a case, I am wondering what would be the most appropriate way to show what I observe in a figure?

When someone simply compares cell type ratios using something like Wald test instead of doing compositional analysis as implemented in scCODA, they sometimes show a volcano plot with cell type proportions on the y axis and -log(p) on the x axis. I really like this representation because it is very visual, and most biologists are used to volcano plots. So how about the following: I could run scCODA n times and then plot max/min value or the range of the "final parameter" values on the x axis and the number of runs in which the final parameter was !=0 on the y axis. Would there be anything wrong with this conceptually, as long as I explain in the text that it is impossible to distinguish between some clusters contracting and other clusters expanding?

My main goal is simply to identify clusters which may be changing and focus on them for the downstream analyses.

from sccoda.

bsierieb1 commented on June 1, 2024

this is a very good thought! a barplot with the number of runs in which the parameter was credibly different from 0 (separated by + and -) provides a nice summary and does not prompt questions about having some sort of "significance" threshold which one normally expects to see in a volcano plot. thanks for the discussion and I will make sure to reach out to you guys in case I want to present the data outside my lab.

from sccoda.

different results in different runs about sccoda HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent