Git Product home page Git Product logo

negativeexampleselection's Introduction

README

This repository contains several notebooks that are designed to demonstrate the effect of different strategies of negative example selection on the measured performance of graph representation machine learning in which the nodes of a knowledge graph are embedded into a low-dimensional vector space, transformed by approaches such as the Hadamard transformation into edge representations, following which edge classification is performed by methods such as Random Forest or perceptron classification.

Graph operations and machine learning are performed using GRAPE.

Setup

A minimum of Python version 3.8 is required. The easiest way to run the notebooks is to set up a virtual environment as follows.

python3 -m venv venv
source venv/bin/activate
pip install pandas numpy grape barplots
pip install networkx # required for the SliNetwork notebook only
pip install powerlaw # required for the SliNetwork notebook only
pip install plotnine # required for some graphics
pip install jupyter
python -m ipykernel install --user --name="venv"

Following this, start jupyter or jupyter-notebook and run any of the notebooks in this directory.

Biological networks

Biomedical knowledge graphs include edges representing various types of relations between biological entities such as protein-protein interac-tions, synthetic lethality, or shared biological functions. Because our knowledge about explicit negative edges, i.e., pairs of nodes that were demonstrated not to have a specific relation, negative examples are generally sampled at random from pairs of nodes that are not positively labeled under the assumption that the majority of non-labelled pairs are negative.

  • SliNetwork: Characterization of the Synthetic Lethality Interation (SLI) network.
  • KG-IDG Network: Characterization of KG-IDG (Illuminating the Druggable Genome) graph.

Classification by Node Degree

In this experiment, we show that it is possible to obtain respectable classification performance by using solely the degree of both nodes of an edge as features.

  • degreeBias: Classification of the SLI network based on node degree.

Running classification with node-based and edge-based sampling

We ran one analysis on a SLURM cluster that performed analysis with different classifiers. We present the SLURM script we used as well as the python script in the scripts subdirectory. Running these scripts will generate a folder called experiments with all the results of analysis. The following notebook can be used to ingest these files and calculate mean and standard deviations for the various models and parameters.

We have copied the output of the extractingResults notebook into the results folder. R script are provided to generate several of the figures in the manuscript.

Model parameters

The learning algorithms used for this analysis were implemented in GRAPE and the default parameters were used. The following script creates a table with these parameters for convenience.

negativeexampleselection's People

Contributors

pnrobinson avatar lucacappelletti94 avatar

Stargazers

Wen Haimei avatar MANISH KUMAR PANDEY avatar  avatar

Watchers

Justin Reese avatar Monica Munoz-Torres avatar Nico Matentzoglu avatar  avatar Giorgio Valentini avatar Kostas Georgiou avatar  avatar

Forkers

mquazi wenmm

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.