Git Product home page Git Product logo

pathlink's Introduction

PathLink

This is the repository containing my 3rd year project research.

Video Explanation and Demonstration

Motivation

There is a significant amount of cancer data available in different formats and places.

This is fine for manual use but even then it is lengthy to use the data to come to any conclusions.

We need to be able to use data from the following sources in a single place:

  • KEGG
  • Pathway Commons
  • CIViCDB
  • ClinVar
  • UniProt
  • STRING

The purpose of this project is to provide an API and graphical interface to query cancer-related bioinformatic data.

Outcomes

The API and GUI provide many ways to get data on genes, drugs, pathways and their interactions.

Importantly, it can output the Cypher queries necessary to produce a comprehensive pathway graph in Neo4j. The outputs look something like this:

Neo4j Graph

These graphs can then be traversed with algorithms to reason about cancer systems.

Structure

preliminaries

These are the files and programs I used for my preliminary research into the Biological Pathways domain prior to the full system design

src

This is the cumulative result of the work so far with a frontend Flask webserver and a REST API.

This frontend accepts structured queries fulfilling the criteria set out in the report.

For more information about running the system, consult src/README.md

scripts

This contains any scripts used to test the system as a client.

Implementation Notes

Some raw datasets need to be pulled into an operational MongoDB database in advance:

  • KEGG
  • CiViCDB
  • ClinVar

This is because their APIs are either unacceptably slow or don't provide the functionality we need.

The rest of the data comes live from the API endpoints of the respective services.

Setup

There is a small amount of preliminary setup required.

  1. Install dependencies

    This project uses quite a few python3 modules, so install them as follows:

    pip3 install -r src/requirements.txt
  2. Install databases

    The project uses mongoDB and Neo4j so ensure that accessible instances of both are available.

    Once these are running, retrieve the information referred to in src/server_config and paste them into the relevant locations.

  3. Populate cache database

    The project uses the mongoDB to cache slow datasets. This can be filled by running:

    cd scripts/db_management
    ./populate_mongo.sh
    
  4. Run

    The system can now run. This is achieved by ensuring that you are in the System directory and executing ./run.

Usage

  • The system has a frontend available via localhost (usually 127.0.0.1/). This is a comprehensive interface for simple queries.

  • The API has a total of 7 endpoints (all GET):

    • /api/pathway - All data on a pathway identifier / name.
    • /api/gene - All data on a gene identifier / name.
    • /api/variant - All data on a gene variant.
    • /api/pathway_gene_interaction - All interaction data between a gene and pathway.
    • /api/variant_evidence - All CiViC/ClinVar evidence of a gene variant.
    • /api/functional_enrichment - Get the functional network of a protein.
    • /api/pathway_to_cypher - Get the cypher queries to add a pathway to a Neo4j graph database.

pathlink's People

Contributors

ol-th avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.