Hi, First of all, thank you for this wonderful tool. I have a question about h

threshold in rule function about cellpypes HOT 6 CLOSED

min0609 commented on June 8, 2024

threshold in rule function

from cellpypes.

Comments (6)

FelixTheStudent commented on June 8, 2024

Thanks for the clear and precise question!

Yes, that is exactly how to express UMIs as CP10k. Dividing by the total UMI gives tiny numbers (counts per one UMI, if you will), and multiplying this tiny number by ten thousand gives values in a more pleasant range (CP10K).

CP10K are useful in feature plots and to type and communicate thresholds, because they have nice intuition (see main page of this repo).

from cellpypes.

FelixTheStudent commented on June 8, 2024

If that answers your question, consider closing this issue with the buttons below. People will still be able to see it afterwards.

from cellpypes.

min0609 commented on June 8, 2024

Thank you for your help!

After converting the count matrix to cp10k units, I checked the distribution of CD8A expression in cp10k. Then, I have set the threshold in the 'rule' to 0.000175. However, unlike the CD8A expression value seen through the 'featureplot', 'rule' appeared to express CD8A in almost all cells (please see attached).

Could you please tell me what the problem is and suggest how to set the threshold?

Thank you again!

from cellpypes.

FelixTheStudent commented on June 8, 2024

Hi again, thanks for the plots, that's helpful. ILet me start with this comment that might be useful to others, too:

General comment
I'd not go from a CP10K histogram of single cell counts, as the data sparsity won't be bimodal except for the strongest of markers, so not a generally useful approach. Instead, you could use cellpypes pool_across_neighbors() function to generate a histogram of pooled counts. Here's code to do it, simply plug in your seurat object:

counts <- SeuratObject::GetAssayData(seurat, "counts")
neighbors=as(seurat@graphs[["RNA_snn"]], "dgCMatrix")>.1
pooled_cd8a <- pool_across_neighbors(counts["CD8A",],
                                     neighbors)
pooled_totals<-pool_across_neighbors(seurat$nCount_RNA,
                                     neighbors)

data.frame(pooled_cd8a=0.1+pooled_cd8a, # 0.1 avoids log10(0) in scale_x_log10
           pooled_totals)%>%
  ggplot()+
  geom_histogram(aes(1e4*pooled_cd8a/pooled_totals))+
  scale_x_log10()

If you insist on using a histogram, use the code above or similar. I find it more useful, however, to explore your UMAP embedding with feature plots until you understand the cell types present in your data. By this point, you'll know which cells you want to classify when talking about "CD8 T cells" - they'll be those cells in UMAP that have high CD8A expression and low CD4 expression.

Now more specifically to your data. It seems to me you actually have two questions, let me try to phrase and answer them here:

1. How to pick the threshold? Look at the UMAP plot for areas with high CD8A expression. In your third plot above, we see three strong red areas (top-left, top-right, bottom) surrounding a grey area (in the middle). I recommend you start with a threshold of 1 in rule and inspect the result with plot_last, then play around with the threshold until these three areas get selected. CD8A is a strong marker, I think this makes your CD8A+ cells (red areas in UMAP) and your CD8A- (grey areas in UMAP) pretty clear and you should not have trouble classifying these cells.
2. Why does the histogram threshold not match the classification threshold? The easiest way for me to see what's going on here is if you could send me the output of the plot_last function after you applied the rule, including the feature plot. But let me speculate already now. The CP10K values you computed are roughly ten thousand times lower than I usually see for CD8A. Are you certain you have started from raw UMI counts? Have you checked your colSums add up to 1 after dividing by totalUMI, and then multiplied by 10000?

I hope this helps you classify your CD8A cells. It would be interesting to see the two plots I told you here (histogram of pooled counts generated with my code above, and the two plots generated by plot_last), please share them!

Best,

Felix

from cellpypes.

FelixTheStudent commented on June 8, 2024

Hi KwangminYoo,

Have you tried producing the plots I mentioned? Would be interested to see them on your data.

If I answered all your questions, consider closing this issue.

Best,

Felix

from cellpypes.

FelixTheStudent commented on June 8, 2024

Closing -- thanks for the nice exchange!

from cellpypes.

threshold in rule function about cellpypes HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent