Git Product home page Git Product logo

nedaghohabi / colorectal_cancer_subsetting Goto Github PK

View Code? Open in Web Editor NEW

This project forked from collaborativebioinformatics/disease_subsetting

0.0 0.0 0.0 667 KB

Colorectal cancer CMSs subtyping and personalized medication recommendation based on RNA seq pathway analysis

License: MIT License

Shell 0.37% Jupyter Notebook 98.96% R 0.67%
pathway-analysis cancer-subtype rna-seq-analysis drug-discovery rna-seq-pipeline

colorectal_cancer_subsetting's Introduction

Disease subsetting and personalized medication recommendation

Contributors

David Enoma ([email protected])

Jędrzej Kubica ([email protected])

Neda Ghohabi ([email protected])

Magdalena Molenda ([email protected])

Rissy M. Wesonga ([email protected])

Jesujoba Owolabi ([email protected])

Hyonyoung Shin ([email protected])

What kind of problem are we trying to solve?

The consensus molecular subtypes (CMS) classification classifies colorectal cancer into four molecular subtypes with distinct biological characteristics, which may form the basis for clinical stratification and subtype-based targeted intervention. We aim to develop a tool that would, based on this information, automate drug recommendation for patients with a specific molecular subtype of colorectal cancer.

What does our tool do?

Our tool performs RNA-seq pathway analysis based on CMS data of the colorectal cancer, in search of differentially (over/under) expressed pathways that are associated with different subtypes of colorectal cancer. The final result of that analysis is visualized in an R Shiny dashboard.

The analysis result is then used to come up with drug recommendations based on enzymes involved in that pathways and finding their inhibitors or activators. Overall, this entire workflow creates a link between a colorectal cancer subtype that a specific patient is suffering from, and a drug that could be used in treatment of the disease. That means that the drug recommendation is personalized.

Overall pipeline

Flowchart

Pathway analysis

Pipeline steps

  1. Download CMS SRA data from NCBI using SRA toolkit: (https://www.ncbi.nlm.nih.gov/Traces/study/?query_key=3&WebEnv=MCID_622b744a94b9b522d227f612&f=consensus_molecular_subtype_sam_s%3An%3Ac&o=acc_s%3Aa)

    Run selector for:

    • Colorectal cancer
    • Homo Sapiens
    • RNA

    In Selector select:

    • consensus_molecular_subtype
    • cms1, cms2, cms3 and cms4
  2. Extract .fastq files from .sra files using fastq-dump:

  3. Read mapping using hisat2:

    • hisat2 index for GRCh38 reference genome (NCBI)
    • hisat2 mapper for fastq files
  4. Pathway analysis with pathway_visualisation.R

  5. TODO Parsing and radar-plotting of pathway_visualisation.R results (.xml files)

Visualisation of pathway analysis done with demo data

target_pathway_figure_example

TODO

Radar plot example:

radar_plot_ex

Treatment recommendation

Key features for clinicians and clinical researchers

  • KEGG and Drugmonizome API-based applet to fetch all drugs targeting an input gene/pathway in a given disease pathway diagram (e.g. colorectal cancer)
  • Drugs are ranked according to a definition of associative strength (currently associated node count but could be based on another algorithm based on the importance of nodes or representation in literature)
  • Pathway visualization linked with the drug suggestions

Method

  1. Using Drugmonizome API, 'L1000FWD Upregulated KEGG Pathways' and 'L1000FWD Downregulated KEGG Pathways' datasets are downloaded to capture expression changes in both directions.
df_upregulated = parse_gmt_to_df('L1000FWD Upregulated KEGG Pathways/L1000FWD_KEGG_Pathways_drugsetlibrary_up.gmt')
df_downregulated = parse_gmt_to_df('L1000FWD Downregulated KEGG Pathways/L1000FWD_KEGG_Pathways_drugsetlibrary_down.gmt')
  1. Filter small molecules/drugs registered with a particular disease (in this case, 'colorectal cancer')
df_upregulated = df_upregulated.loc[df_upregulated['Colorectal cancer'] == 1]
df_downregulated = df_downregulated.loc[df_downregulated['Colorectal cancer'] == 1]
  1. suggest_drugs(search_nodes) takes a list of target nodes in the colorectal cancer pathway (e.g. shown in figure below) and outputs a list of all colorectal cancer drugs detected in the Drugmonizome database that match the input.

overexpression_targets

Example output:

Target: ["ERK", "EGF", "TGFA", "EREG", "AREG", "EGFR", "KRAS", "NRAS", "PI3K", "RalGDS"]
Your top 5 suggested drugs:
pazopanib
which targets:
['KRAS', 'NRAS', 'PERK-mediated unfolded protein response (GO:0036499)', 'regulation of ERK1 and ERK2 cascade (GO:0070372)', 'VEGFA', 'AREG', 'positive regulation of ERK1 and ERK2 cascade (GO:0070374)', 'EGF', 'VEGFB', 'VEGFR inhibitor', 'EGFR', 'TGFA']
ym-155
which targets:
['ERK1 and ERK2 cascade (GO:0070371)', 'HBEGF', 'negative regulation of ERK1 and ERK2 cascade (GO:0070373)', 'regulation of ERK1 and ERK2 cascade (GO:0070372)', 'AREG', 'VEGFA', 'PI3K-Akt signaling pathway', 'EREG', 'CERK', 'EGFR', 'TGFA']
saracatinib
which targets:
['HBEGF', 'VEGFC', 'KRAS', 'NRAS', 'AREG', 'VEGFA', 'PI3K-Akt signaling pathway', 'EGF', 'EREG', 'EGFR', 'TGFA']
prostratin
which targets:
['HBEGF', 'VEGFC', 'regulation of ERK1 and ERK2 cascade (GO:0070372)', 'AREG', 'VEGFA', 'PI3K-Akt signaling pathway', 'positive regulation of ERK1 and ERK2 cascade (GO:0070374)', 'EREG', 'MEGF9', 'EGFR', 'TGFA']
irinotecan
which targets:
['HBEGF', 'KRAS', 'NRAS', 'regulation of ERK1 and ERK2 cascade (GO:0070372)', 'AREG', 'VEGFA', 'PI3K-Akt signaling pathway', 'EREG', 'EGFR', 'TGFA']

drug_list

Pipeline implementation on DNAnexus

All the code run on a DNAnexus VM has been moved to scripts/ directory on github.

SRA toolkit installation, fastq-dump

run_pipeline.sh

pathway_visualisation.R

It was run using RStudio. (How to run RStudio on DNAnexus?)

The scripts can be run using JupyterNotebook (DXJupyterLab).

Future steps

  • An interface to easily access information about each drug

Installation & software requirements

To run the pathway analysis pipeline (run_pipeline.sh), the following tools need to be installed:

SRA-toolkit, fastq-dump, hisat2_index, hisat2_mapper

The pathway visualization tool (pathway_visualisation.R) depends on following R packages:

pathview, gage, gageData, GenomicAlignments,TxDb.Hsapiens.UCSC.hg19.knownGene

To run the drug recommendation pipeline, the following packages need to be installed: urllib, json, logging, pandas, numpy, and scipy.

colorectal_cancer_subsetting's People

Contributors

jjjk123 avatar magdam1 avatar nedaghohabi avatar mcvain avatar rissy2021 avatar dcgenomics avatar davidenoma avatar khleifat avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.