Git Product home page Git Product logo

sc_target_evidence's Introduction

Drug target evidence in single-cell data

Meta-analysis of drug target evidence in single-cell data

Contents

  • analysis - Notebooks and scripts for analysis
  • data - Metadata and output files (see Data Pointers)
  • src - Main workflow scripts and functions
    • sc_target_evidence_utils - Python package with utility functions
  • tests - Unit tests for utility functions

Set-up

# Make conda env (see also sc-target-evidence-env.yml)
conda create --name sc-target-evidence-env
conda activate sc-target-evidence-env

# Install R dependencies (for DE analysis)
conda install conda-forge::r-base==4.0.5 
ENVPATH=$(conda info --envs | grep sc-target-evidence-env | cut -d " " -f 5) # get path to conda environment
Rscript --vanilla -e "install.packages(c('BiocManager'), repos='http://cran.us.r-project.org', lib='${ENVPATH}/lib/R/library'); library('BiocManager'); BiocManager::install('glmGamPoi', lib='${ENVPATH}/lib/R/library')"
Rscript --vanilla -e "install.packages(c('tidyverse'), repos='http://cran.us.r-project.org', lib='${ENVPATH}/lib/R/library')"
Rscript --vanilla -e "library('BiocManager'); BiocManager::install('scater', lib='${ENVPATH}/lib/R/library')"

# install utils package
pip install .

Data pointers

Additional processed data is available via Figshare (doi:10.6084/m9.figshare.25360129)

Metadata

scRNA-seq data

  • [see figshare] cxg_aggregated_scRNA.tar.gz - AnnData objects of aggregated scRNA-seq data used for DE analysis for each disease. Gene expression counts are aggregated by sample and cell type annotation.

Diagnostic plots

Plot folders: sc_target_evidence/data/plots/{disease_id}_{disease_name}

  • cellxgene_{disease_id}.celltype_harmonization.* - confusion table of original cell ontology annotations and uniformed ontology annotations. Heatmap color and number in cells denotes the number of cells for each category.
  • cellxgene_targets_{disease_id}.n_cells_boxplot.* - boxplot of numbers of cells per sample and cell type in healthy and disease tissue
  • cellxgene_targets_{disease_id}.target_expression.* - heatmap of log-normalized expression of a sample of drug targets for the disease
  • cellxgene_targets_{disease_id}.celltype_distribution.* - confusion table of assignment of uniformed cell type ontology to each donor, to check differences in cell type distribution across donors/datasets/diseases

Analysis outputs

  • [see figshare] DEA_results.tar.gz - Results of differential expression analysis for each disease
  • suppl_table_disease_target_evidence.csv - merged table of target-disease pairs with clinical status from OpenTargets, genetic evidence from OpenTargets and single-cell evidence from DE analysis, for all tested diseases.
  • suppl_table_drugs.csv - merged table of drugs considered for analysis (investigational or approved drugs for analysed diseases).
  • suppl_table_odds_ratios.all.csv - Results of association analysis between omic support (evidence) and clinical success (clinical status) across diseases
  • suppl_table_odds_ratios.disease.csv - Results of association analysis between omic support (evidence) and clinical success (clinical status) by disease

sc_target_evidence's People

Contributors

emdann avatar erteeple avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

sc_target_evidence's Issues

Toxicology/safety evidence

The idea is to compare expression of the target in the relevant tissue with expression across usual suspects for side effects (blood, liver, heart) or surrounding organs.

The problem is defining what a surrounding organ is.

API structure

  • - cellontology_utils - functions to handle cell ontologies
  • - preprocessing_utils - things like anndata2pseudobulk, adding similarity btw cell types to the pseudobulk objects, functions to save and read pseudobulk objects?
  • - cxg_utils - functions to download data and metadata using cxg census
  • - plotting_utils - functions to make diagnostic plots
  • - de_utils - functions for DE analysis (cleaning data, running DE analysis with different regimes, saving outputs)
  • - sc_evidence - transform DE analysis results to evidence for drug targets
  • - opentargets_utils - functions to clean OT datasets

Plot diagnostics for DE analysis

  • Volcano plots
  • ECDF of significant cell types x lfcThreshold
  • Expression of genes that are significant in a large number of cell types for the cell type specificity evidence
  • Number of significant cell types

Targeted analysis of lung diseases

  • Compare 2 diseases with and without genetic evidence (e.g. COPD and pulmonary fibrosis) - which successful targets are prioritized by single-cell evidence?
  • Why is there no genetic evidence for some of these diseases? Is it lack of GWAS studies or no significant hits?
  • Are targets with sc evidence associated with GWAS variants for lung function?

Fine vs coarse cell annotation on lung dataset

Compare targets with single-cell evidence for coarse vs fine uniformed cell annotations using lung samples/diseases, comparing ontology based coarse annotations with annotations from Extended Lung cell atlas.

  • Are the same genes identified as cell type markers and disease cell type markers?
  • Are more successful targets identified when testing on fine grained annotation?

Fix download of problematic stomach / brain files

Brain files fail because of insufficient RAM even when requesting 700GB. Download of stomach files from cxg hangs forever, looks like something is wrong in the census side.

Try downloading datasets directly from cellxgene website or sfaira, then filtering and pseudobulking.

Low quality annotation diseases

A bunch of diseases gets excluded in DE in disease analysis because no disease pseudobulk is left after filtering for low quality annotation, although in prev version of the pipeline there were cell types to do comparison:

To check

MONDO_0005575
MONDO_0006156
MONDO_0006249
MONDO_0024660
MONDO_0024661
MONDO_0024885
MONDO_0001056 # gastric cancer

Checking the cell annotations

Check output plots for each disease in /nfs/team205/ed6/bin/sc_target_evidence/data/plots/ (or here). What each figure shows is briefly explained in DATA_INFO.

  • Check whether we need to blacklist more terms - e.g. lymphocyte
  • Check whether we need to exclude certain diseases
  • Check expression of correct marker genes - the full list of cell ontology terms used across diseases is here. See DATA_INFO for location of .h5ad files.
  • Flagging general odd things

How to obtain the original unaggregated cxg dataset

Thanks for this amazing work. I was wondering how can I get the original cxg dataset. I try to use the "sc_target_evidence/src/process_sc_data.py", but it seems "TargetDiseasePairs_OpenTargets_cellXgeneID_12072023.csv" is missing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.