Git Product home page Git Product logo

deeprank-gnn-esm's Introduction

DeepRank-GNN-esm

Graph Network for protein-protein interface including language model features

Installation

With Anaconda

  1. Clone the repository
$ git clone https://github.com/DeepRank/DeepRank-GNN-esm.git
$ cd DeepRank-GNN-esm
  1. Install either the CPU or GPU version of DeepRank-GNN-esm
$ conda env create -f environment-cpu.yml && conda activate deeprank-gnn-esm-cpu-env

OR

$ conda env create -f environment-gpu.yml && conda activate deeprank-gnn-esm-gpu-env
  1. Run the tests to make sure everything is working
$ pytest tests/

Generate ems-2 embeddings for your protein

  • Step1. Generate fasta sequence in bulk, use script 'get_fasta.py'
usage: get_fasta.py [-h] pdb_dir output_fasta_name

positional arguments:
  pdb_dir            Path to the directory containing PDB files
  output_fasta_name  Name of the combined output FASTA file

options:
  -h, --help         show this help message and exit
  • Step2. Generate embeddings in bulk from combined fasta files, use the script provided inside esm-2 package,
python esm_2_installation_location/scripts/extract.py esm2_t33_650M_UR50D all.fasta
  tests/data/embedding/1ATN/ --repr_layers 0 32 33 --include mean per_tok

Replace 'esm_2_installation_location' with your installation location, 'all.fasta' with fasta sequence generated above, 'tests/data/embedding/1ATN/' with the output folder name for esm embeddings

Generate graph

  • Example code to generate residue graphs in hdf5 format:
    from deeprank_gnn.GraphGenMP import GraphHDF5
    
    pdb_path = "tests/data/pdb/1ATN/"
    pssm_path = "tests/data/pssm/1ATN/"
    embedding_path = "tests/data/embedding/1ATN/"
    nproc = 20
    outfile = "1ATN_residue.hdf5"
    
    GraphHDF5(
        pdb_path = pdb_path,
        pssm_path = pssm_path,
        embedding_path = embedding_path,
        graph_type = "residue",
        outfile = outfile,
        nproc = nproc,    #number of cores to use
        tmpdir="./tmpdir")
    
  • Example code to add continuous or binary targets to the hdf5 file
    import h5py
    import random
    
    hdf5_file = h5py.File('1ATN_residue.hdf5', "r+")
    for mol in hdf5_file.keys():
        fnat = random.random()
        bin_class = [1 if fnat > 0.3 else 0]
        hdf5_file.create_dataset(f"/{mol}/score/binclass", data=bin_class)
        hdf5_file.create_dataset(f"/{mol}/score/fnat", data=fnat)
    hdf5_file.close()
    

Use pre-trained models to predict

  • Example code to use pre-trained DeepRank-GNN-esm model
from deeprank_gnn.ginet import GINet
from deeprank_gnn.NeuralNet import NeuralNet

database_test = "1ATN_residue.hdf5"
gnn = GINet
target = "fnat"
edge_attr = ["dist"]
threshold = 0.3
pretrained_model = deeprank-GNN-esm/paper_pretrained_models/scoring_of_docking_models/gnn_esm/treg_yfnat_b64_e20_lr0.001_foldall_esm.pth.tar
node_feature = ["type", "polarity", "bsa", "charge", "embedding"]
device_name = "cuda:0"
num_workers = 10

model = NeuralNet(
    database_test,
    gnn,
    device_name = device_name,
    edge_feature = edge_attr,
    node_feature = node_feature,
    target = target,
    num_workers = num_workers,
    pretrained_model = pretrained_model,
    threshold = threshold)

model.test(hdf5 = "tmpdir/GNN_esm_prediction.hdf5")

deeprank-gnn-esm's People

Contributors

ntxxt avatar rvhonorato avatar amjjbonvin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.