Git Product home page Git Product logo

ipfmultimodal's Introduction

Multimodal feature learning framework for disease biomarker discovery


This repository contains the implementation of our feature learning algorithm, part of a manuscript submission for IEE-BIBM 2022.

Requirements

  All requirements are listed in requirements.txt and can be installed together using the below pip installation command

pip install -r requirements.txt

Data

In this repository, we have provided all the raw data and a sample version of the processed data used for training the model.

  • data/raw: Contains raw (unprocessed) data used including all the nodes and edges used in the heterogeneous knowledge graph.
  • data/processed: Contains a sample version of processed train/val/test data that can be used for training our multimodal algorithm.

Training

The following scripts can be used to train the model on IPF-specific data provided in the data directory.
However, these scripts can be used to train other disease-related data by simply plugging-in the new input files.
Please refer to the jupyter notebook process_data.ipynb for generating the processed data file.

First, the script pre_train_kg_attn.py can be used for unsupervised training of the KG input, using the below syntax.

python  pre_train_kg_attn.py 
        --dim_size 32 
        --num_neg 10.0 
        --num_epochs 100 
        --prop_val 0.2 
        --prop_test 0.2 
        --patience 10 
        --outdir gold
  • --dim_size: Hidden dimensions of the KG-encoder i.e., the dimensionality of final trained features.
  • --num_neg: Number of negative edges samples for each positive edge for the link-prediction objective.
  • --prop_val and --prop_test: Proportion of validation edges and test edges respectively from the input KG.
  • --num_epochs: Maximum number of training epochs.
  • --patience: Tolerance value to be used during the early-stopping mechanism.
  • --outdir: The IPF network type indicating which set of PPI edges to be used for the KG.

Next, the script save_pretrain_embeddings.py can be used to generate and save the gene embeddings coming out of the unsupervised pretraining of the KG.

Finally, the python script train_IPF_mse_pre.py can be used to train the final multimodal learner. The below syntax can be used to train on the IPF-specific data.

python train_IPF_mse_with_pre.py 
        --model KGNetAttn 
        --save_path saved/training/gold 
        --indir data/processed/ 
        --emb_file saved/pretrain/gold/embedding_dim_32_neg_10.0.npy
  • --model: The feature aggregation model/mechanism to be used for combining the individual unimodal gene features for generating multimodal feature matrix.
  • --indir: Path to folder containing the processed data files (train/val/test data).
  • --emb_file: Path to file containing the pre-trained gene feature inputs from heterogeneous KG input modality.
  • --save_path: Path to output folder for saving the final model checkpoint and test metrics.

ipfmultimodal's People

Contributors

sudhirghandikota avatar

Watchers

 avatar

Forkers

mihikasharma92

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.