Git Product home page Git Product logo

Comments (18)

jackhump avatar jackhump commented on August 17, 2024 1

Hi Dalila,
Great idea! For 2. I'll implement covariate PCAs for LeafViz.

from leafcutter.

herber4 avatar herber4 commented on August 17, 2024 1

is the "leafcutter_quantify_psi.R" script available anywhere?

from leafcutter.

jackhump avatar jackhump commented on August 17, 2024 1

script is here: https://github.com/davidaknowles/leafcutter/blob/psi_2019/scripts/leafcutter_quantify_psi.R

from leafcutter.

davidaknowles avatar davidaknowles commented on August 17, 2024
  1. This is true. What's your use case? Other downstream analysis like PCA or clustering? I'm assuming you want something like residuals (after removing technical covariates) rather than fitted values (since these would just be the group level PSIs). The Dirichlet-Multinomial model doesn't currently do this, but I can make a version that does if that would be helpful. The simpler and maybe more pragmatic solution would be to make a phenotype matrix like @goldenflaw's 'prepare_phenotype_table.py' (or the code below!) and then regress out technical confounders from that.
  2. @jackhump I've realized the PCA is currently on the raw counts whereas it should really be on the ratios (otherwise the PCA is confounded with total expression variation). I would suggest this for getting the ratios:
ratios = counts %>% 
  mutate(clu = str_split_fixed(rownames(counts), ":", 4)[,4]) %>%
  group_by(clu) %>% 
  mutate_all( funs( ./sum(.) ) ) %>% 
  ungroup() %>%
  as.data.frame() %>% 
  set_rownames(rownames(counts)) %>% 
  select(-clu)
ratios = ratios[rowMeans(is.na(ratios)) <= 0.4,,drop=F ]
row_means = rowMeans(ratios, na.rm = T)
row_means_outer = outer(row_means, rep(1,ncol(ratios)))
ratios[is.na(ratios)] = row_means_outer[is.na(ratios)]

We'll need to add the 'stringr' dependency for 'str_split_fixed'.

from leafcutter.

jackhump avatar jackhump commented on August 17, 2024

thanks for that David. I think stringr should already be a listed dependency. The function set_rownames comes from magrittr so you'll need to add that if it isn't already.

from leafcutter.

jackhump avatar jackhump commented on August 17, 2024

that's now done. I've deployed the latest changes to https://leafcutter.shinyapps.io/leafviz/

from leafcutter.

davidaknowles avatar davidaknowles commented on August 17, 2024

Great, thanks. I've added magrittr and stringr as new dependencies for the LeafCutter package.

from leafcutter.

ddpinto avatar ddpinto commented on August 17, 2024

Hello,
Just following up here as well (some of it went earlier by email but not sure if you got those).

Yes getting the residuals after regressing out technical & biological covariates. I'm doing case/control analyses and my covariate model was derived from the gene expression counts and it includes biological and technical covariates (e.g. sex, RIN, tissue, 10 sequencing PCs plus 7 SVs).

The idea would be to use the ratios (PSI values) at sample level adjusted for covariates in order to:

  1. examine the effect of regressing out covariables using PCA (i.e. PCA before and after regressing for covariates);
  2. show the adjusted PSI values per sample in the significantly DS intron cluster plots in leafviz;
  3. produce heatmaps of adjusted PSI values for DS clusters in different groups, or simply plot the distribution of PSI values for differentially spliced intron clusters of a gene.

Thanks!

from leafcutter.

davidaknowles avatar davidaknowles commented on August 17, 2024

I see. So this is definitely possible with the existing DM-GLM but will require a little into exactly what the calculation should be. I'll prototype something and maybe you (Dalila) can see if it looks reasonable on your data.

from leafcutter.

davidaknowles avatar davidaknowles commented on August 17, 2024

I've made a branch psi with PSI calculation (with or without adjusting for confounders) under the DM-GLM. This requires an extra optimization step for each cluster and is separate from the differential splicing code. You can install this branch using

devtools::install_github("davidaknowles/leafcutter/leafcutter",ref="psi")

There is a new script scripts/leafcutter_quantify_psi.R which takes the counts file and (optionally) a file with confounders to be removed. I've tested this on 10 v 10 samples from GTEx, pretending tissue is a confounder. Without removing tissue:
image
With tissue removed:
image

from leafcutter.

ddpinto avatar ddpinto commented on August 17, 2024

Great! I'll test this out on my data. Btw I've tested the smart init after regularization and it works for me as well.

from leafcutter.

davidaknowles avatar davidaknowles commented on August 17, 2024

from leafcutter.

davidaknowles avatar davidaknowles commented on August 17, 2024

Any update on whether this works for you?

from leafcutter.

ddpinto avatar ddpinto commented on August 17, 2024

Everything now runs to completion, although it is hard to fully test since it is hard to know what is correct/incorrect. To make an heatmap would you suggest to include only the most differentially spliced intron per intron cluster?

When possible, could you perhaps move the PSI branch to main?

from leafcutter.

davidaknowles avatar davidaknowles commented on August 17, 2024

from leafcutter.

ddpinto avatar ddpinto commented on August 17, 2024

Yes I'm on it, and will get back as soon as possible.

from leafcutter.

ddpinto avatar ddpinto commented on August 17, 2024

Hello,
I've had a chance to look at the correction and found it works well for removing confounders. The plot below shows the effect of removing site effects:
correction_pca

Another thing I played around with are the cluster plots in leafviz. By default it loads the uncorrected data, which means that the PSI values that are shown don't reflect the data after accounting for confounders. I did some crude hacking of the 'prepare_results' script to feed it the corrected ratio data rather than perind_numers.counts, and this shows that the post-correction data better reflect actual differences:
correction_psi-cluster-plot

Running the correction as a separate script works well, though it would be nice to have the option in leafviz to visualize the PCA and clusters before and after covariate correction. Not sure how much of a hassle that would be to implement though.

Thanks!

from leafcutter.

seyoun209 avatar seyoun209 commented on August 17, 2024

Thank you!

from leafcutter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.