Git Product home page Git Product logo

Comments (6)

FelixTheStudent avatar FelixTheStudent commented on June 8, 2024

Thanks for the clear and precise question!

Yes, that is exactly how to express UMIs as CP10k. Dividing by the total UMI gives tiny numbers (counts per one UMI, if you will), and multiplying this tiny number by ten thousand gives values in a more pleasant range (CP10K).

CP10K are useful in feature plots and to type and communicate thresholds, because they have nice intuition (see main page of this repo).

from cellpypes.

FelixTheStudent avatar FelixTheStudent commented on June 8, 2024

If that answers your question, consider closing this issue with the buttons below. People will still be able to see it afterwards.

from cellpypes.

min0609 avatar min0609 commented on June 8, 2024

Thank you for your help!

After converting the count matrix to cp10k units, I checked the distribution of CD8A expression in cp10k. Then, I have set the threshold in the 'rule' to 0.000175. However, unlike the CD8A expression value seen through the 'featureplot', 'rule' appeared to express CD8A in almost all cells (please see attached).

Could you please tell me what the problem is and suggest how to set the threshold?

image

Thank you again!

from cellpypes.

FelixTheStudent avatar FelixTheStudent commented on June 8, 2024

Hi again, thanks for the plots, that's helpful. ILet me start with this comment that might be useful to others, too:

General comment
I'd not go from a CP10K histogram of single cell counts, as the data sparsity won't be bimodal except for the strongest of markers, so not a generally useful approach. Instead, you could use cellpypes pool_across_neighbors() function to generate a histogram of pooled counts. Here's code to do it, simply plug in your seurat object:

counts <- SeuratObject::GetAssayData(seurat, "counts")
neighbors=as(seurat@graphs[["RNA_snn"]], "dgCMatrix")>.1
pooled_cd8a <- pool_across_neighbors(counts["CD8A",],
                                     neighbors)
pooled_totals<-pool_across_neighbors(seurat$nCount_RNA,
                                     neighbors)

data.frame(pooled_cd8a=0.1+pooled_cd8a, # 0.1 avoids log10(0) in scale_x_log10
           pooled_totals)%>%
  ggplot()+
  geom_histogram(aes(1e4*pooled_cd8a/pooled_totals))+
  scale_x_log10()

If you insist on using a histogram, use the code above or similar. I find it more useful, however, to explore your UMAP embedding with feature plots until you understand the cell types present in your data. By this point, you'll know which cells you want to classify when talking about "CD8 T cells" - they'll be those cells in UMAP that have high CD8A expression and low CD4 expression.

Now more specifically to your data. It seems to me you actually have two questions, let me try to phrase and answer them here:

1. How to pick the threshold? Look at the UMAP plot for areas with high CD8A expression. In your third plot above, we see three strong red areas (top-left, top-right, bottom) surrounding a grey area (in the middle). I recommend you start with a threshold of 1 in rule and inspect the result with plot_last, then play around with the threshold until these three areas get selected. CD8A is a strong marker, I think this makes your CD8A+ cells (red areas in UMAP) and your CD8A- (grey areas in UMAP) pretty clear and you should not have trouble classifying these cells.
2. Why does the histogram threshold not match the classification threshold? The easiest way for me to see what's going on here is if you could send me the output of the plot_last function after you applied the rule, including the feature plot. But let me speculate already now. The CP10K values you computed are roughly ten thousand times lower than I usually see for CD8A. Are you certain you have started from raw UMI counts? Have you checked your colSums add up to 1 after dividing by totalUMI, and then multiplied by 10000?

I hope this helps you classify your CD8A cells. It would be interesting to see the two plots I told you here (histogram of pooled counts generated with my code above, and the two plots generated by plot_last), please share them!

Best,

Felix

from cellpypes.

FelixTheStudent avatar FelixTheStudent commented on June 8, 2024

Hi KwangminYoo,

Have you tried producing the plots I mentioned? Would be interested to see them on your data.

If I answered all your questions, consider closing this issue.

Best,

Felix

from cellpypes.

FelixTheStudent avatar FelixTheStudent commented on June 8, 2024

Closing -- thanks for the nice exchange!

from cellpypes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.