Git Product home page Git Product logo

Comments (8)

johannesostner avatar johannesostner commented on June 1, 2024 1

Hi!

This is a conceptional problem that arises when running scCODA without an explicit reference (or baseline) category. Since compositions are of constrained nature, it is only possible to make statements relative to a certain category. (see e.g. this paper on microbial data about it; you can directly translate this to single-cell data as well)

When running NoBaselineModel, it is implied that all cell types with Final parameter 0 are seen as the reference set. If you set one cell type as the baseline (by using BaselineModel), the results will become consistent, as we now always have the same reference.

As a reference cell type, you should choose one whose relative abundance is fairly consistent over all samples.

from sccoda.

johannesostner avatar johannesostner commented on June 1, 2024 1

Hmm, this is a very unique idea. So far, we have only ever reported single runs of scCODA with a fixed reference and analyzed everything in regards to this reference. For this, we usually show grouped boxplots (grouped by cell type, one plot per category) and indicate the ones with a credible effect somehow. But this does not work in your case since you have 155 clusters, which is far more than we ever used so far.

I like the idea of looking at how often a parameter was found to be credibly different from 0 (final parameter !=0) to gain a consistency estimate, and to use this as a starting point for further analysis. However, you should not use the numerical values of the final parameters like you proposed, in my opinion. They are dependent not only on the rate of change, but also on the dispersion of your data, and the values of the intercept parameters, which makes them very hard to interpret.

What you could use is the sign of the parameters, though. Negative parameters will always indicate a decrease, while positive ones indicate an increase in relative abundance. I would therefore report how often each cluster's parameter was different from 0 (maybe divided into increasing/decreasing), and then plot this as a barplot or heatmap of some sort.

Lastly, I just want to point out that scCODA is still unpublished, so please contact us (preferably @b-schubert) before including it in any reports

from sccoda.

bsierieb1 avatar bsierieb1 commented on June 1, 2024

thanks for the explanation! is there a rule of a thumb how to select the reference without any prior knowledge which cell type should remain unaffected by treatment? do you think it would make sense to select the cell type that has the proportion of treated cells which is the closest to the proportion of treated cells in the entire data set?

from sccoda.

johannesostner avatar johannesostner commented on June 1, 2024

If you have no prior knowledge, I would take a cell type that has low variation (in terms of relative abundance) over the entire dataset. This makes interpretation easier, since now the proportion of your reference category is neither changing drastically between conditions nor within conditions.

For example, in the dataset from the tutorial I would choose Goblet or Endocrine cells:
image

from sccoda.

bsierieb1 avatar bsierieb1 commented on June 1, 2024

it was a useful exercise to try and run it with a baseline but the issue persists - i still get either one bunch of negative clusters or a different bunch of positive clusters depending on the run. does it mean that it is essentially impossible to say if I had an expansion of some clusters or contraction of other clusters in treatment because they kind of "compensate" for each other? I know I may not be phrasing it correctly but I hope you understand what I mean...

from sccoda.

johannesostner avatar johannesostner commented on June 1, 2024

I think I know what you are trying to point out with clusters "compensating" for each other, and this is exactly what makes compositional data analysis sometimes difficult to interpret: Due to the total sum constraint on a sample (the number of cells that can be sequenced in one go is fixed), the counts are all inherently negatively correlated.
This means that if one cluster expands, all others seem to contract in order to, as you say, compensate.

Therefore, the two solutions you get are two equivalent (and equally likely) interpretations of the same data, and there is no way to tell which of these is "true" on a global, scale, at least not without any additional quantitative measurements.

from sccoda.

bsierieb1 avatar bsierieb1 commented on June 1, 2024

Got it, thanks for the explanation. In such a case, I am wondering what would be the most appropriate way to show what I observe in a figure?

When someone simply compares cell type ratios using something like Wald test instead of doing compositional analysis as implemented in scCODA, they sometimes show a volcano plot with cell type proportions on the y axis and -log(p) on the x axis. I really like this representation because it is very visual, and most biologists are used to volcano plots. So how about the following: I could run scCODA n times and then plot max/min value or the range of the "final parameter" values on the x axis and the number of runs in which the final parameter was !=0 on the y axis. Would there be anything wrong with this conceptually, as long as I explain in the text that it is impossible to distinguish between some clusters contracting and other clusters expanding?

My main goal is simply to identify clusters which may be changing and focus on them for the downstream analyses.

from sccoda.

bsierieb1 avatar bsierieb1 commented on June 1, 2024

this is a very good thought! a barplot with the number of runs in which the parameter was credibly different from 0 (separated by + and -) provides a nice summary and does not prompt questions about having some sort of "significance" threshold which one normally expects to see in a volcano plot. thanks for the discussion and I will make sure to reach out to you guys in case I want to present the data outside my lab.

from sccoda.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.