devalab / deeppocket Goto Github PK

View Code? Open in Web Editor NEW

87.0 3.0 26.0 82 KB

Ligand Binding Site detection using Deep Learning

License: MIT License

Python 100.00%

deeppocket's Issues

Predicting a Binding Site (predict.py): TypeError: sequence item 0: expected str instance, numpy.float32 found

OS: Ubuntu 22.04
mamba: 1.5.5
conda: 23.11.0

Hi,

I am trying to use your program to predict binding sites in a protein PDB. when I run

python ~/DeepPocket/predict.py -p 1A9N_frame_0.pdb -c ~/DeepPocket/first_model_fold1_best_test_auc_85001.pth.tar -s ~/DeepPocket/seg0_best_test_IOU_91.pth.tar -r 3

I get the following terminal message:

/home/luis/DeepPocket/rank_pockets.py:87: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  all_probs.append(F.softmax(output).detach().cpu())
Traceback (most recent call last):
  File "/home/luis/DeepPocket/predict.py", line 106, in <module>
    fout.write(''.join(confidence_types))
TypeError: sequence item 0: expected str instance, numpy.float32 found

I have tried to circumvent the TypeError by changing line 106 in predict.py from fout.write(''.join(confidence_types)) to fout.write(''.join(str(confidence_types))), but that only led to yet another more cryptic error after prompting the aforementioned python command:

Traceback (most recent call last):
  File "/home/luis/DeepPocket/predict.py", line 122, in <module>
    test(seg_model, seg_eptest, seg_gmaker,device,dx_name, args)
  File "/home/luis/DeepPocket/segment_pockets.py", line 145, in test
    output_pocket_pdb(dx_name+'_pocket'+str(count)+'.pdb',prot_prody,pred_aa)
  File "/home/luis/DeepPocket/segment_pockets.py", line 85, in output_pocket_pdb
    pocket=prot_prody.select(sel_str)
  File "/home/luis/miniforge3/envs/DeepPocketEnv/lib/python3.9/site-packages/prody/atomic/atomic.py", line 232, in select
    return SELECT.select(self, selstr, **kwargs)
  File "/home/luis/miniforge3/envs/DeepPocketEnv/lib/python3.9/site-packages/prody/atomic/select.py", line 894, in select
    indices = self.getIndices(atoms, selstr, **kwargs)
  File "/home/luis/miniforge3/envs/DeepPocketEnv/lib/python3.9/site-packages/prody/atomic/select.py", line 952, in getIndices
    torf = self.getBoolArray(atoms, selstr, **kwargs)
  File "/home/luis/miniforge3/envs/DeepPocketEnv/lib/python3.9/site-packages/prody/atomic/select.py", line 1003, in getBoolArray
    parser = self._getParser(selstr)
  File "/home/luis/miniforge3/envs/DeepPocketEnv/lib/python3.9/site-packages/prody/atomic/select.py", line 1102, in _getParser
    parser.enablePackrat()
  File "/home/luis/miniforge3/envs/DeepPocketEnv/lib/python3.9/site-packages/pyparsing/util.py", line 265, in _inner
    return fn(*args, **kwargs)
  File "/home/luis/miniforge3/envs/DeepPocketEnv/lib/python3.9/site-packages/pyparsing/core.py", line 1132, in enable_packrat
    ParserElement.packrat_cache = _FifoCache(cache_size_limit)  # type: ignore[assignment]
  File "/home/luis/miniforge3/envs/DeepPocketEnv/lib/python3.9/site-packages/pyparsing/util.py", line 105, in __init__
    keyring = [object()] * size
TypeError: can't multiply sequence by non-int of type 'Forward'

Do you have an idea how to fix these issues?

Cheers Foly

molgrid version?

Which version of molgrid is this running? I am running molgrid=0.1.1 and getting this error:

Traceback (most recent call last):
  File "predict.py", line 89, in <module>
    protein_gninatype=gninatype(protein_nowat_file)
  File "/content/fpocket/DeepPocket/types_and_gninatyper.py", line 20, in gninatype
    dataloader=molgrid.ExampleProvider(atom_map,shuffle=False,default_batch_size=1)
ValueError: Unknown keyword argument default_batch_size

Any idea how to fix this?

lack of gninatypes files

Hi, I want to use the train_segmentation.py , but it is prompted that the gninatypes file is missing. Can you provide this part of the file?
or should I process these files?
THX

Training Classifier Dataset

Hi @RishalAggarwal,

Firstly, thank you for this repo.

python train.py -m model.py --train_types scPDB_train0.types --test_types scPDB_test0.types -i 200000 --train_recmolcache scPDB_new.molcache2 --test_recmolcache scPDB_new.molcache2 -r val0 -o /model_saves/val9 --base_lr 0.001 --solver Adam

As seen above, file scPDB_train0 is required for training classifier.

The sample content of the scPDB_train0 file is as follows;

0 -6.417309121621622 37.99337461018711 86.51209004677753 10mh_1/protein_0.gninatypes
0 -48.73792600326857 40.15845814418013 90.75518894134738 10mh_1/protein_0.gninatypes
0 -22.384561944279785 38.16762551867219 62.667952578541794 10mh_1/protein_0.gninatypes
0 4.418982018111255 43.43278783958602 81.18465174644241 10mh_1/protein_0.gninatypes
...

My first question is how did you do the labeling (0 or 1) of whether the proteins are pockets according to their coordinates. Is this dataset a public dataset? You didn't mention it in the paper too. How did you create this train file?

My second question is that if you did labeling this dataset by yourself how can I do this pocket / non-pocket (0 or 1) labeling according to the coordinates for my protein files.

Note: Neither COACH420 nor HOLO4k nor scPDB datasets contain coordinates for non-druggable regions. How did you labeled your scPDB_train0 file as a 0 (non-druggable) or 1 (druggable).

How to prepare the inputs for training segmentation model?

Well, since I could not find any code related to this issue, I wonder the details of preprocessing.
I guess use the protein and the binding site to mask the ground truth. But which files did you use? Because in the scPDB dataset, there are many files such as protein.mol2, site.mol2, cavity6.mol2, ligand.mol2, etc. I am getting confused.

Understanding the Train Dataset for Training Part

My question is simple, but I believe it will be useful for everyone to understand the paper better.

The following code block needs to be run to train the classification

Here is an example line of the train and test files as follows

1 50.69633356250253 -8.818796255105756 9.213237190116068 2bel_4/protein_0.gninatypes 2bel_4/cavity6.mol2

I have two questions.

First, what does the number 1 in the first part represent?

My second question is that does the last part of the dataset need to be in the train and test files? (Las part means: 2bel_4/cavity6.mol2) If I delete the 2bel_4/cavity6.mol2 in the last part, will the train part work or do I need the mol2 files too?
Isn't just the gninatype enough (2bel_4/protein_0.gninatypes)?

IndexError: list index out of range in output_pocket_pdb (segment_pockets.py)

Hello,

I'm trying to run the the Predicting Binding Site section example:

python predict.py -p protein.pdb -c first_model_fold1_best_test_auc_85001.pth.tar -s seg0_best_test_IOU_91.pth.tar -r 3

But it crashes with the following errors:

***** POCKET HUNTING BEGINS ***** ***** POCKET HUNTING ENDS ***** /usr/local/lib/python3.7/dist-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 1951. PDBConstructionWarning, /usr/local/lib/python3.7/dist-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 2008. PDBConstructionWarning, /content/DeepPocket/rank_pockets.py:87: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. all_probs.append(F.softmax(output).detach().cpu()) @> 1674 atoms and 1 coordinate set(s) were parsed in 0.01s. Traceback (most recent call last): File "predict.py", line 116, in <module> test(seg_model, seg_eptest, seg_gmaker,device,dx_name, args) File "/content/DeepPocket/segment_pockets.py", line 142, in test output_pocket_pdb(dx_name+'_pocket'+str(count)+'.pdb',prot_prody,pred_aa) File "/content/DeepPocket/segment_pockets.py", line 82, in output_pocket_pdb pocket=prot_prody.select(sel_str) File "/usr/local/lib/python3.7/dist-packages/prody/atomic/atomic.py", line 232, in select return SELECT.select(self, selstr, **kwargs) File "/usr/local/lib/python3.7/dist-packages/prody/atomic/select.py", line 885, in select indices = self.getIndices(atoms, selstr, **kwargs) File "/usr/local/lib/python3.7/dist-packages/prody/atomic/select.py", line 943, in getIndices torf = self.getBoolArray(atoms, selstr, **kwargs) File "/usr/local/lib/python3.7/dist-packages/prody/atomic/select.py", line 995, in getBoolArray tokens = parser(selstr, parseAll=True) File "/usr/local/lib/python3.7/dist-packages/prody/atomic/select.py", line 1100, in _noParser return [self._default(selstr, 0, selstr.split())] File "/usr/local/lib/python3.7/dist-packages/prody/atomic/select.py", line 1118, in _default torf, err = self._and2(sel, loc, tokens) File "/usr/local/lib/python3.7/dist-packages/prody/atomic/select.py", line 1319, in _and2 firsttoken = tokens[0] if not isinstance(tokens[0], Iterable) else list(tokens[0]) IndexError: list index out of range

Ubuntu 18.04 on Google Colab (pytorch 1.9+cuda 10.2) with prody version 2.0.

Pocket Probability

The content of bary_centers_ranked.types is as follows

3 -5.8872820891631195 2.4274225254193103 16.306759363865474 /content/xxx_nowat.gninatypes
1 -3.399881608822576 0.7210002432695428 11.260844794031787 /content/xxx_nowat.gninatypes
2 4.8732080737995735 -7.189318852006624 6.719965229046755 /content/xxxx_nowat.gninatypes
6 -8.189112907608695 8.49208722826087 8.578613858695654 /content/xxxx_nowat.gninatypes
5 -11.534677691417235 3.978282288738218 31.699375888870517 /content/xxxx_nowat.gninatype

How do you calculate the druggability score for each pocket centre? I'm trying to learn the Pocket Probability calculation

Data Preparation for HOLO4K

Hi, I have some questions about data preparation.

In your paper, you mentioned that "The proteins and ligands were separated from the corresponding structure files using the Biopython library". But I can't find corresponding codes in this repo, could you share those parts of codes?
A pdb file in HOLO4K may have several ligands, do you remain all ligands or remove some? What are the criteria to choose ligands in a pdb file?
When you use Fpocket to choose pocket candidates, do you run Fpocket on the original pdb file, pdb file without ligands, or a single chain in pdb file?

can not find seg0_best_test_IOU_91.pth.tar.

hello,
I have not found the file : seg0_best_test_IOU_91.pth.tar.
Can you tell me where can i find it?

Thank you.

Issue with archived file

Hi,

I want to use DeepPocket to make some predictions. For that, I think I need the checkpoint files of the models. I tried downloading the archived zip file in OneDrive but it seems like having an issue unzipping it. Every attempt to unzip the file results in errors indicating that the zip file is corrupted or damaged.

Could you please assist in resolving this issue?

Thanks

What are .dx exactly?

I was wondering what the .dx files represent. Visualising them in PyMol I can only see cubes that contain the DeepPocket predicted top pockets. Is this right?

PDB: 7SUD (Chain A)

Could not open 'gninamap'

Hi,
Thank you for making the code public. I am getting an error however in the data preprocessing stage. When I try to convert a .pdb file to gninatypes, I get the error Could not open gninamap. I simply separated the data preprocessing stage (using your code without any modifications) to create a self contained example to show my error.

from Bio.PDB import PDBParser, PDBIO, Select
import Bio
import os
import sys
import molgrid
import struct
import numpy as np
import os
import sys

class NonHetSelect(Select):
    def accept_residue(self, residue):
        return 1 if Bio.PDB.Polypeptide.is_aa(residue,standard=True) else 0

def clean_pdb(input_file,output_file):
    pdb = PDBParser().get_structure("protein", input_file)
    io = PDBIO()
    io.set_structure(pdb)
    io.save(output_file, NonHetSelect())

def gninatype(file):
    # creates gninatype file for model input
    f=open(file.replace('.pdb','.types'),'w')
    f.write(file)
    f.close()
    atom_map=molgrid.FileMappedGninaTyper('gninamap')
    dataloader=molgrid.ExampleProvider(atom_map,shuffle=False,default_batch_size=1)
    train_types=file.replace('.pdb','.types')
    dataloader.populate(train_types)
    example=dataloader.next()
    coords=example.coord_sets[0].coords.tonumpy()
    types=example.coord_sets[0].type_index.tonumpy()
    types=np.int_(types)
    print(coords)
    fout=open(file.replace('.pdb','.gninatypes'),'wb')
    for i in range(coords.shape[0]):
        fout.write(struct.pack('fffi',coords[i][0],coords[i][1],coords[i][2],types[i]))
        print(struct.pack('fffi',coords[i][0],coords[i][1],coords[i][2],types[i]))
    fout.close()
    os.remove(train_types)
    return file.replace('.pdb','.gninatypes')

def create_types(file,protein):
    # create types file for model predictions
    fout=open(file.replace('.txt','.types'),'w')
    fin =open(file,'r')
    for line in fin:
        fout.write(' '.join(line.split()) + ' ' + protein +'\n')
    return file.replace('.txt','.types')

protein_file="/home/ubuntu/Data/1a8o.pdb"
protein_nowat_file=protein_file.replace('.pdb','_nowat.pdb')
clean_pdb(protein_file,protein_nowat_file)
protein_gninatype=gninatype(protein_nowat_file)

The code ends with the error

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_13436/2408537986.py in <module>
      2 protein_nowat_file=protein_file.replace('.pdb','_nowat.pdb')
      3 clean_pdb(protein_file,protein_nowat_file)
----> 4 protein_gninatype=gninatype(protein_nowat_file)

/tmp/ipykernel_13436/3305498276.py in gninatype(file)
      4     f.write(file)
      5     f.close()
----> 6     atom_map=molgrid.FileMappedGninaTyper('gninamap')
      7     dataloader=molgrid.ExampleProvider(atom_map,shuffle=False,default_batch_size=1)
      8     train_types=file.replace('.pdb','.types')

ValueError: Could not open gninamap

Can you please help me with this issue? Thank you.

Question about classes

Hello and congrats on your repo!

As I see at training segmentation, you have num_classes=1. That means that label=1 -> pocket and label->0 not pocket/?

.

Do not have files for running make_types.py when prerparing custom data for training a new classifier

I am trying to use your instruction to prepare data for training a new classifier.
I have stuck in make_types step because I can't find train.txt and test.txt files.

Moreover, I have 4 questions:

If I want to add several pdb files to the available scPDB dataset, how can I complete it?
Your instruction for preparing data only works for a single pdb file, does it? If not, I need to write a pipeline to wrap up it.
How to prepare train.txt and test.txt files to run make_types.py?
Could you please show me which file/folder needed inputting from previous to each step?

I am tried on this pdb.

Thank you very much.

bary_centers.txt Issue

What is the 'bary_centers.txt' ? It is neither automatically created nor a file given as an external argument. I'm getting an error. May I learn what is this txt file? Thank you.

Channels in training script are different from those in your Supporting Information table s1

Thanks for your great work. I have one quesion, in your training code, you use the first 14 channels in gninamap file, which are

Hydrogen, PolarHydrogen, AliphaticCarbonXSHydrophobe, AliphaticCarbonXSNonHydrophobe, AromaticCarbonXSHydrophobe, AromaticCarbonXSNonHydrophobe, Nitrogen, NitrogenXSDonor, NitrogenXSDonorAcceptor, NitrogenXSAcceptor, Oxygen, OxygenXSDonor, OxygenXSDonorAcceptor and OxygenXSAcceptor

, However in Supporting Information table s1(https://pubs.acs.org/doi/suppl/10.1021/acs.jcim.1c00799/suppl_file/ci1c00799_si_001.pdf), they are

AliphaticCarbonXSHydrophobe, AliphaticCarbonXSNonHydrophobe, AromaticCarbonXSHydrophobe, AromaticCarbonXSNonHydrophobe, Bromine Iodine Chlorine Fluorine, Nitrogen NitrogenXSAcceptor, NitrogenXSDonor NitrogenXSDonorAcceptor, Oxygen OxygenXSAcceptor, OxygenXSDonorAcceptor OxygenXSDonor, Sulfur SulfurAcceptor, Phosphorus, Calcium, Zinc and GenericMetal Boron Manganese Magnesium Iron

why they are different, may I misunderstand something?

Yaml file or Colab Notebook?

Hi, I am trying to install DeepPocket using the listed dependencies but running into many conflicts between packages. Could you please provide an updated .yml file or perhaps a Google Colab Notebook that installs the dependencies and runs DeepPocket? Many thanks!

rank_pockets.py - UserWarning

I'm getting a userwarning from rank_pockets.py

DeepPocket/rank_pockets.py:88: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.

Full test scripts to reproduce the metrics results in the paper.

Thank you for open-sourcing this great work!

I am a freshman in this topic. And I notice there are a lot of different metrics used in the paper, such as accuracy, DCA, DCC, DVO, success rate of Top-N, Top-(N+2), and ratio.

Could you kindly please provide the testing scripts to calculate these metrics on four datasets for reproducing the results in your paper?

It will be a great help to cite and compare with your paper. Thanks in advance.

segmentation fault

when predicting binding sites given a .pdb file of a protein using:
python predict.py -p pdb/1alb_A.pdb -c first_model_fold1_best_test_auc_85001.pth.tar -s seg0_best_test_IOU_91.pth.tar -r 3

I meet this bug :
'segmentation fault'
after
***** POCKET HUNTING BEGINS *****
***** POCKET HUNTING ENDS *****

I used gdb to view the core file :
'Failed to read a valid object file image from memory.
Core was generated by `python3 predict.py -p pdb/1alb_A.pdb -c first_model_fold1_best_test_auc_85001.p'.
Program terminated with signal 11, Segmentation fault.'

Why does DeepPocket not predict residues for some fpocket pockets?

Hi, I was trying DeepPocket on chain B of 1a5h and tried to segment ALL pockets predicted by fpocket. I have noticed that out of the 14 predicted by fpocket, only 3 have been segmented (I have removed the lines that break the loop when count >= 3. In this example, it is pockets 1, 3, and 10, which correspond to fpocket 2, 1, and 5, respectively. I have checked and there are no predicted residues:

Pocket 11 of 1a5h_B has [] predicted residues
Pocket 12 of 1a5h_B has [] predicted residues
Pocket 13 of 1a5h_B has [] predicted residues
Pocket 14 of 1a5h_B has [] predicted residues
Pocket 15 of 1a5h_B has [] predicted residues
Pocket 16 of 1a5h_B has [] predicted residues
Pocket 17 of 1a5h_B has [] predicted residues

And this is how masks_pred look like:

        [[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]],

        [[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]],

        [[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]]])

Any idea why this is? I would expect a pocket could be segmented out of each fpocket prediction. Thanks!

How to avoid data leakage?

In the "Data sets and Preprocessing" section of your paper, you mention that " we removed all proteins from the training set that had either sequence identity greater than 50% or ligand similarity greater than 0.9 and sequence identity greater than 30%".

How do you define sequence identity and ligand similarity?
Could you provide the scripts to calculate sequence identity and ligand similarity?
You mention twice sequence identities which are greater than 50% and 30%. Do you mean the protein sequence identity greater than 50% and ligand sequence identity greater than 30%?

Training Classifier Problem

You are giving code block below as an example for Training Classifier

You are using --data_dir (-d) in train.py as below

 eptrain = molgrid.ExampleProvider(shuffle=True, stratify_receptor=True, labelpos=0,balanced=True,
                                      data_root=args.data_dir,recmolcache=args.train_recmolcache)

Where are you reading data_dir from? The training classifier example you shared does not have data_dir in the code block ?
'

Broken Link to Model Checkpoints, "404 FILE NOT FOUND"

The link provided in the README/documentation for downloading the prepared types, molcache, and saved model checkpoints seems to be broken. When attempting to access the resources through the link, it results in a 404 File Not Found error.

Could you please look into this and provide an updated link or guidance on how to access these materials?

Best regards.

predict.py running error

Hi, I have some issues when running 'predict.py'.

It seems can't find 'class_checkpoint' and 'seg_checkpoint'.

python predict.py -p Downloads/3g73.pdb -c first_model_fold1_best_test_auc_85001.pth.tar -s seg0_best_test_IOU_91.pth.tar -r 3

DeepPocket/predict.py:14: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/Bio/pairwise2.py:283: BiopythonDeprecationWarning: Bio.pairwise2 has been deprecated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopython developers if you still need the Bio.pairwise2 module.
BiopythonDeprecationWarning,
/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 3096.
PDBConstructionWarning,
/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 3097.
PDBConstructionWarning,
/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 3098.
PDBConstructionWarning,
/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 3146.
PDBConstructionWarning,
/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain C is discontinuous at line 3177.
PDBConstructionWarning,
/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain D is discontinuous at line 3218.
PDBConstructionWarning,
***** POCKET HUNTING BEGINS *****
mkdir: cannot create directory ‘Downloads/3g73_nowat_out/pockets’: File exists
***** POCKET HUNTING ENDS *****
Traceback (most recent call last):
File "DeepPocket/predict.py", line 87, in
class_checkpoint=torch.load(args.class_checkpoint)
File "/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/torch/serialization.py", line 579, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/torch/serialization.py", line 230, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/torch/serialization.py", line 211, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'first_model_fold1_best_test_auc_85001.pth.tar'

Plz help...!!
Thanks

Prediction Error

Thank you for this valued work.

When I run the code below, I am getting an error.

python predict.py -p 6aah.pdb -c first_model_fold1_best_test_auc_85001.pth.tar -s seg0_best_test_IOU_91.pth.tar -r 3

Fpocket files were created properly but then the program throws an error

Note: All libraries properly installed before running.

/content/DeepPocket/predict.py:14: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
***** POCKET HUNTING BEGINS ***** 
***** POCKET HUNTING ENDS ***** 
==============================
*** **Open Babel Warning  in PerceiveBondOrders
  Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders** (title is /content/6aah_protein_nowat.pdb)

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/tarfile.py", line 187, in nti
    n = int(s.strip() or "0", 8)
ValueError: invalid literal for int() with base 8: '\x04ctorch.'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/tarfile.py", line 2289, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/usr/local/lib/python3.7/tarfile.py", line 1095, in fromtarfile
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
  File "/usr/local/lib/python3.7/tarfile.py", line 1037, in frombuf
    chksum = nti(buf[148:156])
  File "/usr/local/lib/python3.7/tarfile.py", line 189, in nti
    raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/torch/serialization.py", line 555, in _load
    return legacy_load(f)
  File "/usr/local/lib/python3.7/site-packages/torch/serialization.py", line 466, in legacy_load
    with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar, \
  File "/usr/local/lib/python3.7/tarfile.py", line 1593, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/usr/local/lib/python3.7/tarfile.py", line 1623, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/usr/local/lib/python3.7/tarfile.py", line 1486, in __init__
    self.firstmember = self.next()
  File "/usr/local/lib/python3.7/tarfile.py", line 2301, in next
    raise ReadError(str(e))
tarfile.ReadError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/DeepPocket/predict.py", line 87, in <module>
    class_checkpoint=torch.load(args.class_checkpoint)
  File "/usr/local/lib/python3.7/site-packages/torch/serialization.py", line 386, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.7/site-packages/torch/serialization.py", line 559, in _load
    raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: /content/gdrive/MyDrive/Colab Notebooks/Ainnocence/DeepPocket/first_model_fold1_best_test_auc_85001.pth.tar is a zip archive (did you mean to use torch.jit.load()?)

The number of pocket in types file

Hi, I read the train.types like seg_scPDB_train0.types. The first few lines are as follows:
1 -18.161927039784217 32.606813980669806 85.32244760620364 10mh_1/protein_0.gninatypes 10mh_1/cavity6.mol2
1 -11.51310276710522 28.98620689253697 91.02771812783796 10mh_1/protein_0.gninatypes 10mh_1/cavity6.mol2
1 14.198903210849663 9.972515184884662 25.079490147212237 12gs_1/protein_0.gninatypes 12gs_1/cavity6.mol2
1 6.117556524238361 -2.4037784058248697 32.47945066104617 12gs_1/protein_0.gninatypes 12gs_1/cavity6.mol2

My confusion is that for this PDB 10mh_1 , it seems that there is only one cavity in the source folder "scPDB/10mh_1/", but there are two lines about 10mh_1 in seg_scPDB_train0.types.

No output after running train_segmentation.py

Hi, I've been trying to run this code but keep running into issues while trying to run the segmentation code.

First few times I tried to run that code, I got the same error as shown here from Issue #18 - #18 (comment)

I re-downloaded the "scPDB_new" file and tried running it again, and now the code doesn't show an error, but it doesn't show any output either. I checked wandb as well, and it doesn't show any output (the train.py output had no issues and was represented perfectly on wandb).

Here is my code (on Google Colab) -
!python /content/drive/MyDrive/DeepPocket/train_segmentation.py
--train_types /content/drive/MyDrive/DeepPocket/seg_scPDB_train0.types
--test_types /content/drive/MyDrive/DeepPocket/seg_scPDB_test0.types
-d /content/drive/MyDrive/DeepPocket/data/
--train_recmolcache scPDB_new.molcache2
--test_recmolcache scPDB_new.molcache2
-b 8
-o /content/drive/MyDrive/DeepPocket/model_saves/seg0
-e 200 -r seg0

And here is the output -

I've been trying to fix this for a week, not sure what else I could do here. Any fixes or suggestions would be appreciated.

.

Transform part in Train

    for b in range(batch_size):
            center = molgrid.float3(float(centers[b][0]), float(centers[b][1]), float(centers[b][2]))
            #intialise transformer for rotaional augmentation
            transformer = molgrid.Transform(center, 0, True)
            #center=transformer.get_quaternion().rotate(center.x,center.y,center.z)
            # random rotation on input protein
            transformer.forward(batch[b],batch[b])
            # Update input tensor with b'th datapoint of the batch
            gmaker.forward(center, batch[b].coord_sets[0], input_tensor[b])

The above code takes part in train.py. Can you explain why you are using this code please?

Why do we need to rotate or another processes on coordinates data while we train our data?

I am asking this question because

transformer.get_rotation_center().x is equal to centers[b][0]
transformer.get_rotation_center().y is equal to centers[b][1]
transformer.get_rotation_center().z is equal to centers[b][2]

What is your main goal by using transformer here ?

devalab / deeppocket Goto Github PK

deeppocket's Issues

Recommend Projects

Recommend Topics

Recommend Org