Git Product home page Git Product logo

ppi_gnn's Introduction

PPI_GNN

In order to replicate the results mentioned in paper, please follow the following steps:

  1. Download the Pan's human features file and place the files at ../human_features/processed/. The link is given in PPI_GNN/Human_features/README.md. For the S. cerevisiae PPI dataset, download the input feature file and place it at ../S. cerevisiae/processed/. The link is given in PPI_GNN/S. cerevisiae/README.md.
  2. Next use the command: python train.py to train the model.

The steps to predicting protein interactions on a new dataset are:

  1. First, get the node features from protein sequences using the SeqVec method (seqvec_embedding.py) and then build the protein graph (proteins_to_graphs.py).
  2. Next, use the command "python data_prepare.py" to get input features for the model.
  3. Then, use the command "python train.py" to train the model.
  4. Use the command: "python test.py" to evaluate the trained model on unseen data (test set).

To create the ppi_env environment, run: $ conda env create -f ppi_env.yml

ppi_gnn's People

Contributors

jhakanchan15 avatar

Stargazers

 avatar  avatar Dipayan Sarkar avatar Yizhou Zeng avatar  avatar  avatar Yao avatar Ye avatar cui avatar 950288 avatar Jerrylzj avatar Lakhder Amine avatar  avatar  avatar  avatar Chuan Liu avatar  avatar 咕咕咕 avatar Yashfinul Haque avatar Zhang Bo avatar Terry Chan avatar  avatar  avatar Vahid Shirbisheh avatar  avatar Alexander Goncearenco avatar  avatar Biscuit avatar  avatar  avatar  avatar  avatar  avatar  avatar Simone Azeglio avatar Akash Bahai avatar Aydin Manzouri avatar muuu avatar zhangwei avatar wangyang avatar  avatar Logan Hallee avatar Yan avatar Bo_Zhong avatar  avatar Kâmuran İmran avatar Barro avatar Yansong Wang avatar Jinsu Kim avatar  avatar Simon Levine avatar Kwanit Gupta avatar  avatar Junha Park avatar

Watchers

 avatar

ppi_gnn's Issues

S. cerevisiae processed data access denied

Hi jha,

It is an excellent work for prediction PPI. I want to reproduce the prediction results and have a comparison in my work. However, the S. cerevisiae processed data in google drive is access denied. Can you share the data again?
Thanks very much!

Honchkrow

Unable to run

Hello there,

It would be greatly appreciated if you could also upload the environment file somewhere so that your results are reproducible.

Otherwise we have no idea how to run your model with which version of pytorch etc.

Thanks!

IndexError: list index out of range for torch.load(glob.glob(prot_1)[0])

Dear Sir/Madam,

My update:
As I don't have the completed dataset, I guess the original issue comes from below reasons:

  1. npy_file_new(human_dataset).npy has 22217 data
  2. Current available human data is only 4444+1111=5555
    Above causes the below problem. Please feel free to correct me. Thanks.

Original issue:
I am running this project on google colab. This might not be an issue, but I don't know how to solve it.
There is a problem showing as : IndexError: list index out of range.
The part of result as:
GCNN Loaded Training on 4444 samples..... 15657 first prot is /content/gdrive/MyDrive/PPI_GNN/PPI_GNN/human_features/processed/3AIH.pt [] 15657 Second prot is /content/gdrive/MyDrive/PPI_GNN/PPI_GNN/human_features/processed/1DEV.pt Traceback (most recent call last): File "train.py", line 97, in <module> train(model, device, trainloader, optimizer, epoch+1) File "train.py", line 45, in train for count,(prot_1, prot_2, label) in enumerate(trainloader): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 530, in __next__ data = self._next_data() File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 570, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataset.py", line 471, in __getitem__ return self.dataset[self.indices[idx]] File "/content/gdrive/MyDrive/PPI_GNN/PPI_GNN/data_prepare.py", line 41, in __getitem__ prot_1 = torch.load(glob.glob(prot_1)[0]) IndexError: list index out of range

The error comes from the code:
def __getitem__(self, index): prot_1 = os.path.join(self.processed_dir, self.protein_1[index]+".pt") print(index) print(f'first prot is {prot_1}') print(glob.glob('prot_1')) prot_2 = os.path.join(self.processed_dir, self.protein_2[index]+".pt") print(index) print(f'Second prot is {prot_2}') prot_1 = torch.load(glob.glob(prot_1)[0]) print(f'Here lies {glob.glob(prot_2)}') prot_2 = torch.load(glob.glob(prot_2)[0]) print(torch.tensor(self.label[index])) return prot_1, prot_2, torch.tensor(self.label[index])

It seems that glob.glob('prot_1') is null. How to solve this problem?
Thanks in advance.

Restricted access to original processed folder

I would like to replicate results mentioned in the paper. However, access to the original processed folder containing all 4,188 protein graphs is restricted and will require permission. I would greatly appreciate if you can assist and grant me access to the folder. Thank you very much.

Runtime error

I was trying to replicate your result. I followed README.md to set up this repo. However, when I try to run it I got a Runtime error as followed:

RuntimeError: The 'data' object was created by an older version of PyG. If this error occurred while loading an already existing dataset, remove the 'processed/' directory in the dataset's root folder and try again.

I beleive this error occur because of the DataLoader in the data_prepare.py. Any solution on how to solve this? Much appreciated.

proteins_to_graphs

I am trying to understand the pipeline for converting proteins to graphs, and I have found a difficulty in this line:

ftrs = np.load("../human_features/pdb_to_seqvec_dict.npy", allow_pickle=True)
I would like to know where did you get the file pdf_to seqvec_dict.npy? and what does it contains?
Appreciate any answer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.