devalab / deeppocket Goto Github PK
View Code? Open in Web Editor NEWLigand Binding Site detection using Deep Learning
License: MIT License
Ligand Binding Site detection using Deep Learning
License: MIT License
OS: Ubuntu 22.04
mamba: 1.5.5
conda: 23.11.0
Hi,
I am trying to use your program to predict binding sites in a protein PDB. when I run
python ~/DeepPocket/predict.py -p 1A9N_frame_0.pdb -c ~/DeepPocket/first_model_fold1_best_test_auc_85001.pth.tar -s ~/DeepPocket/seg0_best_test_IOU_91.pth.tar -r 3
I get the following terminal message:
/home/luis/DeepPocket/rank_pockets.py:87: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
all_probs.append(F.softmax(output).detach().cpu())
Traceback (most recent call last):
File "/home/luis/DeepPocket/predict.py", line 106, in <module>
fout.write(''.join(confidence_types))
TypeError: sequence item 0: expected str instance, numpy.float32 found
I have tried to circumvent the TypeError by changing line 106 in predict.py from fout.write(''.join(confidence_types))
to fout.write(''.join(str(confidence_types)))
, but that only led to yet another more cryptic error after prompting the aforementioned python command:
Traceback (most recent call last):
File "/home/luis/DeepPocket/predict.py", line 122, in <module>
test(seg_model, seg_eptest, seg_gmaker,device,dx_name, args)
File "/home/luis/DeepPocket/segment_pockets.py", line 145, in test
output_pocket_pdb(dx_name+'_pocket'+str(count)+'.pdb',prot_prody,pred_aa)
File "/home/luis/DeepPocket/segment_pockets.py", line 85, in output_pocket_pdb
pocket=prot_prody.select(sel_str)
File "/home/luis/miniforge3/envs/DeepPocketEnv/lib/python3.9/site-packages/prody/atomic/atomic.py", line 232, in select
return SELECT.select(self, selstr, **kwargs)
File "/home/luis/miniforge3/envs/DeepPocketEnv/lib/python3.9/site-packages/prody/atomic/select.py", line 894, in select
indices = self.getIndices(atoms, selstr, **kwargs)
File "/home/luis/miniforge3/envs/DeepPocketEnv/lib/python3.9/site-packages/prody/atomic/select.py", line 952, in getIndices
torf = self.getBoolArray(atoms, selstr, **kwargs)
File "/home/luis/miniforge3/envs/DeepPocketEnv/lib/python3.9/site-packages/prody/atomic/select.py", line 1003, in getBoolArray
parser = self._getParser(selstr)
File "/home/luis/miniforge3/envs/DeepPocketEnv/lib/python3.9/site-packages/prody/atomic/select.py", line 1102, in _getParser
parser.enablePackrat()
File "/home/luis/miniforge3/envs/DeepPocketEnv/lib/python3.9/site-packages/pyparsing/util.py", line 265, in _inner
return fn(*args, **kwargs)
File "/home/luis/miniforge3/envs/DeepPocketEnv/lib/python3.9/site-packages/pyparsing/core.py", line 1132, in enable_packrat
ParserElement.packrat_cache = _FifoCache(cache_size_limit) # type: ignore[assignment]
File "/home/luis/miniforge3/envs/DeepPocketEnv/lib/python3.9/site-packages/pyparsing/util.py", line 105, in __init__
keyring = [object()] * size
TypeError: can't multiply sequence by non-int of type 'Forward'
Do you have an idea how to fix these issues?
Cheers Foly
Which version of molgrid is this running? I am running molgrid=0.1.1 and getting this error:
Traceback (most recent call last):
File "predict.py", line 89, in <module>
protein_gninatype=gninatype(protein_nowat_file)
File "/content/fpocket/DeepPocket/types_and_gninatyper.py", line 20, in gninatype
dataloader=molgrid.ExampleProvider(atom_map,shuffle=False,default_batch_size=1)
ValueError: Unknown keyword argument default_batch_size
Any idea how to fix this?
Hi, I want to use the train_segmentation.py , but it is prompted that the gninatypes file is missing. Can you provide this part of the file?
or should I process these files?
THX
Hi @RishalAggarwal,
Firstly, thank you for this repo.
python train.py -m model.py --train_types scPDB_train0.types --test_types scPDB_test0.types -i 200000 --train_recmolcache scPDB_new.molcache2 --test_recmolcache scPDB_new.molcache2 -r val0 -o /model_saves/val9 --base_lr 0.001 --solver Adam
As seen above, file scPDB_train0 is required for training classifier.
The sample content of the scPDB_train0 file is as follows;
0 -6.417309121621622 37.99337461018711 86.51209004677753 10mh_1/protein_0.gninatypes
0 -48.73792600326857 40.15845814418013 90.75518894134738 10mh_1/protein_0.gninatypes
0 -22.384561944279785 38.16762551867219 62.667952578541794 10mh_1/protein_0.gninatypes
0 4.418982018111255 43.43278783958602 81.18465174644241 10mh_1/protein_0.gninatypes
...
My first question is how did you do the labeling (0 or 1) of whether the proteins are pockets according to their coordinates. Is this dataset a public dataset? You didn't mention it in the paper too. How did you create this train file?
My second question is that if you did labeling this dataset by yourself how can I do this pocket / non-pocket (0 or 1) labeling according to the coordinates for my protein files.
Note: Neither COACH420 nor HOLO4k nor scPDB datasets contain coordinates for non-druggable regions. How did you labeled your scPDB_train0 file as a 0 (non-druggable) or 1 (druggable).
Well, since I could not find any code related to this issue, I wonder the details of preprocessing.
I guess use the protein and the binding site to mask the ground truth. But which files did you use? Because in the scPDB dataset, there are many files such as protein.mol2, site.mol2, cavity6.mol2, ligand.mol2, etc. I am getting confused.
My question is simple, but I believe it will be useful for everyone to understand the paper better.
The following code block needs to be run to train the classification
python train.py -m model.py --train_types scPDB_train0.types --test_types scPDB_test0.types -i 200000 --train_recmolcache scPDB_new.molcache2 --test_recmolcache scPDB_new.molcache2 -r val0 -o /model_saves/val9 --base_lr 0.001 --solver Adam
Here is an example line of the train and test files as follows
1 50.69633356250253 -8.818796255105756 9.213237190116068 2bel_4/protein_0.gninatypes 2bel_4/cavity6.mol2
I have two questions.
First, what does the number 1 in the first part represent?
My second question is that does the last part of the dataset need to be in the train and test files? (Las part means: 2bel_4/cavity6.mol2
) If I delete the 2bel_4/cavity6.mol2
in the last part, will the train part work or do I need the mol2
files too?
Isn't just the gninatype enough (2bel_4/protein_0.gninatypes
)?
Hello,
I'm trying to run the the Predicting Binding Site section example:
python predict.py -p protein.pdb -c first_model_fold1_best_test_auc_85001.pth.tar -s seg0_best_test_IOU_91.pth.tar -r 3
But it crashes with the following errors:
***** POCKET HUNTING BEGINS ***** ***** POCKET HUNTING ENDS ***** /usr/local/lib/python3.7/dist-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 1951. PDBConstructionWarning, /usr/local/lib/python3.7/dist-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 2008. PDBConstructionWarning, /content/DeepPocket/rank_pockets.py:87: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. all_probs.append(F.softmax(output).detach().cpu()) @> 1674 atoms and 1 coordinate set(s) were parsed in 0.01s. Traceback (most recent call last): File "predict.py", line 116, in <module> test(seg_model, seg_eptest, seg_gmaker,device,dx_name, args) File "/content/DeepPocket/segment_pockets.py", line 142, in test output_pocket_pdb(dx_name+'_pocket'+str(count)+'.pdb',prot_prody,pred_aa) File "/content/DeepPocket/segment_pockets.py", line 82, in output_pocket_pdb pocket=prot_prody.select(sel_str) File "/usr/local/lib/python3.7/dist-packages/prody/atomic/atomic.py", line 232, in select return SELECT.select(self, selstr, **kwargs) File "/usr/local/lib/python3.7/dist-packages/prody/atomic/select.py", line 885, in select indices = self.getIndices(atoms, selstr, **kwargs) File "/usr/local/lib/python3.7/dist-packages/prody/atomic/select.py", line 943, in getIndices torf = self.getBoolArray(atoms, selstr, **kwargs) File "/usr/local/lib/python3.7/dist-packages/prody/atomic/select.py", line 995, in getBoolArray tokens = parser(selstr, parseAll=True) File "/usr/local/lib/python3.7/dist-packages/prody/atomic/select.py", line 1100, in _noParser return [self._default(selstr, 0, selstr.split())] File "/usr/local/lib/python3.7/dist-packages/prody/atomic/select.py", line 1118, in _default torf, err = self._and2(sel, loc, tokens) File "/usr/local/lib/python3.7/dist-packages/prody/atomic/select.py", line 1319, in _and2 firsttoken = tokens[0] if not isinstance(tokens[0], Iterable) else list(tokens[0]) IndexError: list index out of range
Ubuntu 18.04 on Google Colab (pytorch 1.9+cuda 10.2) with prody version 2.0.
The content of bary_centers_ranked.types is as follows
3 -5.8872820891631195 2.4274225254193103 16.306759363865474 /content/xxx_nowat.gninatypes
1 -3.399881608822576 0.7210002432695428 11.260844794031787 /content/xxx_nowat.gninatypes
2 4.8732080737995735 -7.189318852006624 6.719965229046755 /content/xxxx_nowat.gninatypes
6 -8.189112907608695 8.49208722826087 8.578613858695654 /content/xxxx_nowat.gninatypes
5 -11.534677691417235 3.978282288738218 31.699375888870517 /content/xxxx_nowat.gninatype
How do you calculate the druggability score for each pocket centre? I'm trying to learn the Pocket Probability
calculation
Hi, I have some questions about data preparation.
In your paper, you mentioned that "The proteins and ligands were separated from the corresponding structure files using the Biopython library". But I can't find corresponding codes in this repo, could you share those parts of codes?
A pdb file in HOLO4K may have several ligands, do you remain all ligands or remove some? What are the criteria to choose ligands in a pdb file?
When you use Fpocket to choose pocket candidates, do you run Fpocket on the original pdb file, pdb file without ligands, or a single chain in pdb file?
hello,
I have not found the file : seg0_best_test_IOU_91.pth.tar.
Can you tell me where can i find it?
Thank you.
Hi,
I want to use DeepPocket to make some predictions. For that, I think I need the checkpoint files of the models. I tried downloading the archived zip file in OneDrive but it seems like having an issue unzipping it. Every attempt to unzip the file results in errors indicating that the zip file is corrupted or damaged.
Could you please assist in resolving this issue?
Thanks
Hi,
Thank you for making the code public. I am getting an error however in the data preprocessing stage. When I try to convert a .pdb file to gninatypes, I get the error Could not open gninamap
. I simply separated the data preprocessing stage (using your code without any modifications) to create a self contained example to show my error.
from Bio.PDB import PDBParser, PDBIO, Select
import Bio
import os
import sys
import molgrid
import struct
import numpy as np
import os
import sys
class NonHetSelect(Select):
def accept_residue(self, residue):
return 1 if Bio.PDB.Polypeptide.is_aa(residue,standard=True) else 0
def clean_pdb(input_file,output_file):
pdb = PDBParser().get_structure("protein", input_file)
io = PDBIO()
io.set_structure(pdb)
io.save(output_file, NonHetSelect())
def gninatype(file):
# creates gninatype file for model input
f=open(file.replace('.pdb','.types'),'w')
f.write(file)
f.close()
atom_map=molgrid.FileMappedGninaTyper('gninamap')
dataloader=molgrid.ExampleProvider(atom_map,shuffle=False,default_batch_size=1)
train_types=file.replace('.pdb','.types')
dataloader.populate(train_types)
example=dataloader.next()
coords=example.coord_sets[0].coords.tonumpy()
types=example.coord_sets[0].type_index.tonumpy()
types=np.int_(types)
print(coords)
fout=open(file.replace('.pdb','.gninatypes'),'wb')
for i in range(coords.shape[0]):
fout.write(struct.pack('fffi',coords[i][0],coords[i][1],coords[i][2],types[i]))
print(struct.pack('fffi',coords[i][0],coords[i][1],coords[i][2],types[i]))
fout.close()
os.remove(train_types)
return file.replace('.pdb','.gninatypes')
def create_types(file,protein):
# create types file for model predictions
fout=open(file.replace('.txt','.types'),'w')
fin =open(file,'r')
for line in fin:
fout.write(' '.join(line.split()) + ' ' + protein +'\n')
return file.replace('.txt','.types')
protein_file="/home/ubuntu/Data/1a8o.pdb"
protein_nowat_file=protein_file.replace('.pdb','_nowat.pdb')
clean_pdb(protein_file,protein_nowat_file)
protein_gninatype=gninatype(protein_nowat_file)
The code ends with the error
ValueError Traceback (most recent call last)
/tmp/ipykernel_13436/2408537986.py in <module>
2 protein_nowat_file=protein_file.replace('.pdb','_nowat.pdb')
3 clean_pdb(protein_file,protein_nowat_file)
----> 4 protein_gninatype=gninatype(protein_nowat_file)
/tmp/ipykernel_13436/3305498276.py in gninatype(file)
4 f.write(file)
5 f.close()
----> 6 atom_map=molgrid.FileMappedGninaTyper('gninamap')
7 dataloader=molgrid.ExampleProvider(atom_map,shuffle=False,default_batch_size=1)
8 train_types=file.replace('.pdb','.types')
ValueError: Could not open gninamap
Can you please help me with this issue? Thank you.
Hello and congrats on your repo!
As I see at training segmentation, you have num_classes=1. That means that label=1 -> pocket and label->0 not pocket/?
I am trying to use your instruction to prepare data for training a new classifier.
I have stuck in make_types step because I can't find train.txt and test.txt files.
Moreover, I have 4 questions:
I am tried on this pdb.
Thank you very much.
What is the 'bary_centers.txt'
? It is neither automatically created nor a file given as an external argument. I'm getting an error. May I learn what is this txt file? Thank you.
Thanks for your great work. I have one quesion, in your training code, you use the first 14 channels in gninamap file, which are
Hydrogen, PolarHydrogen, AliphaticCarbonXSHydrophobe, AliphaticCarbonXSNonHydrophobe, AromaticCarbonXSHydrophobe, AromaticCarbonXSNonHydrophobe, Nitrogen, NitrogenXSDonor, NitrogenXSDonorAcceptor, NitrogenXSAcceptor, Oxygen, OxygenXSDonor, OxygenXSDonorAcceptor and OxygenXSAcceptor
, However in Supporting Information table s1(https://pubs.acs.org/doi/suppl/10.1021/acs.jcim.1c00799/suppl_file/ci1c00799_si_001.pdf), they are
AliphaticCarbonXSHydrophobe, AliphaticCarbonXSNonHydrophobe, AromaticCarbonXSHydrophobe, AromaticCarbonXSNonHydrophobe, Bromine Iodine Chlorine Fluorine, Nitrogen NitrogenXSAcceptor, NitrogenXSDonor NitrogenXSDonorAcceptor, Oxygen OxygenXSAcceptor, OxygenXSDonorAcceptor OxygenXSDonor, Sulfur SulfurAcceptor, Phosphorus, Calcium, Zinc and GenericMetal Boron Manganese Magnesium Iron
why they are different, may I misunderstand something?
Hi, I am trying to install DeepPocket using the listed dependencies but running into many conflicts between packages. Could you please provide an updated .yml
file or perhaps a Google Colab Notebook that installs the dependencies and runs DeepPocket
? Many thanks!
I'm getting a userwarning from rank_pockets.py
DeepPocket/rank_pockets.py:88: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
Thank you for open-sourcing this great work!
I am a freshman in this topic. And I notice there are a lot of different metrics used in the paper, such as accuracy, DCA, DCC, DVO, success rate of Top-N, Top-(N+2), and ratio.
Could you kindly please provide the testing scripts to calculate these metrics on four datasets for reproducing the results in your paper?
It will be a great help to cite and compare with your paper. Thanks in advance.
when predicting binding sites given a .pdb file of a protein using:
python predict.py -p pdb/1alb_A.pdb -c first_model_fold1_best_test_auc_85001.pth.tar -s seg0_best_test_IOU_91.pth.tar -r 3
I meet this bug :
'segmentation fault'
after
***** POCKET HUNTING BEGINS *****
***** POCKET HUNTING ENDS *****
I used gdb to view the core file :
'Failed to read a valid object file image from memory.
Core was generated by `python3 predict.py -p pdb/1alb_A.pdb -c first_model_fold1_best_test_auc_85001.p'.
Program terminated with signal 11, Segmentation fault.'
Hi, I was trying DeepPocket on chain B
of 1a5h
and tried to segment ALL pockets predicted by fpocket
. I have noticed that out of the 14 predicted by fpocket
, only 3 have been segmented (I have removed the lines that break the loop when count >= 3
. In this example, it is pockets 1, 3, and 10, which correspond to fpocket 2, 1, and 5, respectively. I have checked and there are no predicted residues:
Pocket 11 of 1a5h_B has [] predicted residues
Pocket 12 of 1a5h_B has [] predicted residues
Pocket 13 of 1a5h_B has [] predicted residues
Pocket 14 of 1a5h_B has [] predicted residues
Pocket 15 of 1a5h_B has [] predicted residues
Pocket 16 of 1a5h_B has [] predicted residues
Pocket 17 of 1a5h_B has [] predicted residues
And this is how masks_pred
look like:
[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]],
[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]],
[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]])
Any idea why this is? I would expect a pocket could be segmented out of each fpocket
prediction. Thanks!
In the "Data sets and Preprocessing" section of your paper, you mention that " we removed all proteins from the training set that had either sequence identity greater than 50% or ligand similarity greater than 0.9 and sequence identity greater than 30%".
You are giving code block below as an example for Training Classifier
python train.py -m model.py --train_types scPDB_train0.types --test_types scPDB_test0.types -i 200000 --train_recmolcache scPDB_new.molcache2 --test_recmolcache scPDB_new.molcache2 -r val0 -o /model_saves/val9 --base_lr 0.001 --solver Adam
You are using --data_dir (-d) in train.py as below
eptrain = molgrid.ExampleProvider(shuffle=True, stratify_receptor=True, labelpos=0,balanced=True,
data_root=args.data_dir,recmolcache=args.train_recmolcache)
Where are you reading data_dir from? The training classifier example you shared does not have data_dir in the code block ?
'
The link provided in the README/documentation for downloading the prepared types, molcache, and saved model checkpoints seems to be broken. When attempting to access the resources through the link, it results in a 404 File Not Found error.
Could you please look into this and provide an updated link or guidance on how to access these materials?
Best regards.
Hi, I have some issues when running 'predict.py'.
It seems can't find 'class_checkpoint' and 'seg_checkpoint'.
python predict.py -p Downloads/3g73.pdb -c first_model_fold1_best_test_auc_85001.pth.tar -s seg0_best_test_IOU_91.pth.tar -r 3
DeepPocket/predict.py:14: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/Bio/pairwise2.py:283: BiopythonDeprecationWarning: Bio.pairwise2 has been deprecated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopython developers if you still need the Bio.pairwise2 module.
BiopythonDeprecationWarning,
/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 3096.
PDBConstructionWarning,
/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 3097.
PDBConstructionWarning,
/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 3098.
PDBConstructionWarning,
/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 3146.
PDBConstructionWarning,
/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain C is discontinuous at line 3177.
PDBConstructionWarning,
/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/Bio/PDB/StructureBuilder.py:92: PDBConstructionWarning: WARNING: Chain D is discontinuous at line 3218.
PDBConstructionWarning,
***** POCKET HUNTING BEGINS *****
mkdir: cannot create directory ‘Downloads/3g73_nowat_out/pockets’: File exists
***** POCKET HUNTING ENDS *****
Traceback (most recent call last):
File "DeepPocket/predict.py", line 87, in
class_checkpoint=torch.load(args.class_checkpoint)
File "/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/torch/serialization.py", line 579, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/torch/serialization.py", line 230, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/park/miniconda3/envs/torchdrug/lib/python3.7/site-packages/torch/serialization.py", line 211, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'first_model_fold1_best_test_auc_85001.pth.tar'
Plz help...!!
Thanks
Thank you for this valued work.
When I run the code below, I am getting an error.
python predict.py -p 6aah.pdb -c first_model_fold1_best_test_auc_85001.pth.tar -s seg0_best_test_IOU_91.pth.tar -r 3
Fpocket files were created properly but then the program throws an error
Note: All libraries properly installed before running.
/content/DeepPocket/predict.py:14: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
***** POCKET HUNTING BEGINS *****
***** POCKET HUNTING ENDS *****
==============================
*** **Open Babel Warning in PerceiveBondOrders
Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders** (title is /content/6aah_protein_nowat.pdb)
Traceback (most recent call last):
File "/usr/local/lib/python3.7/tarfile.py", line 187, in nti
n = int(s.strip() or "0", 8)
ValueError: invalid literal for int() with base 8: '\x04ctorch.'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/tarfile.py", line 2289, in next
tarinfo = self.tarinfo.fromtarfile(self)
File "/usr/local/lib/python3.7/tarfile.py", line 1095, in fromtarfile
obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
File "/usr/local/lib/python3.7/tarfile.py", line 1037, in frombuf
chksum = nti(buf[148:156])
File "/usr/local/lib/python3.7/tarfile.py", line 189, in nti
raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/torch/serialization.py", line 555, in _load
return legacy_load(f)
File "/usr/local/lib/python3.7/site-packages/torch/serialization.py", line 466, in legacy_load
with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar, \
File "/usr/local/lib/python3.7/tarfile.py", line 1593, in open
return func(name, filemode, fileobj, **kwargs)
File "/usr/local/lib/python3.7/tarfile.py", line 1623, in taropen
return cls(name, mode, fileobj, **kwargs)
File "/usr/local/lib/python3.7/tarfile.py", line 1486, in __init__
self.firstmember = self.next()
File "/usr/local/lib/python3.7/tarfile.py", line 2301, in next
raise ReadError(str(e))
tarfile.ReadError: invalid header
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/content/DeepPocket/predict.py", line 87, in <module>
class_checkpoint=torch.load(args.class_checkpoint)
File "/usr/local/lib/python3.7/site-packages/torch/serialization.py", line 386, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.7/site-packages/torch/serialization.py", line 559, in _load
raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: /content/gdrive/MyDrive/Colab Notebooks/Ainnocence/DeepPocket/first_model_fold1_best_test_auc_85001.pth.tar is a zip archive (did you mean to use torch.jit.load()?)
Hi, I read the train.types like seg_scPDB_train0.types. The first few lines are as follows:
1 -18.161927039784217 32.606813980669806 85.32244760620364 10mh_1/protein_0.gninatypes 10mh_1/cavity6.mol2
1 -11.51310276710522 28.98620689253697 91.02771812783796 10mh_1/protein_0.gninatypes 10mh_1/cavity6.mol2
1 14.198903210849663 9.972515184884662 25.079490147212237 12gs_1/protein_0.gninatypes 12gs_1/cavity6.mol2
1 6.117556524238361 -2.4037784058248697 32.47945066104617 12gs_1/protein_0.gninatypes 12gs_1/cavity6.mol2
My confusion is that for this PDB 10mh_1 , it seems that there is only one cavity in the source folder "scPDB/10mh_1/", but there are two lines about 10mh_1 in seg_scPDB_train0.types.
Hi, I've been trying to run this code but keep running into issues while trying to run the segmentation code.
First few times I tried to run that code, I got the same error as shown here from Issue #18 - #18 (comment)
I re-downloaded the "scPDB_new" file and tried running it again, and now the code doesn't show an error, but it doesn't show any output either. I checked wandb as well, and it doesn't show any output (the train.py output had no issues and was represented perfectly on wandb).
Here is my code (on Google Colab) -
!python /content/drive/MyDrive/DeepPocket/train_segmentation.py
--train_types /content/drive/MyDrive/DeepPocket/seg_scPDB_train0.types
--test_types /content/drive/MyDrive/DeepPocket/seg_scPDB_test0.types
-d /content/drive/MyDrive/DeepPocket/data/
--train_recmolcache scPDB_new.molcache2
--test_recmolcache scPDB_new.molcache2
-b 8
-o /content/drive/MyDrive/DeepPocket/model_saves/seg0
-e 200 -r seg0
I've been trying to fix this for a week, not sure what else I could do here. Any fixes or suggestions would be appreciated.
for b in range(batch_size):
center = molgrid.float3(float(centers[b][0]), float(centers[b][1]), float(centers[b][2]))
#intialise transformer for rotaional augmentation
transformer = molgrid.Transform(center, 0, True)
#center=transformer.get_quaternion().rotate(center.x,center.y,center.z)
# random rotation on input protein
transformer.forward(batch[b],batch[b])
# Update input tensor with b'th datapoint of the batch
gmaker.forward(center, batch[b].coord_sets[0], input_tensor[b])
The above code takes part in train.py. Can you explain why you are using this code please?
Why do we need to rotate or another processes on coordinates data while we train our data?
I am asking this question because
transformer.get_rotation_center().x
is equal to centers[b][0]
transformer.get_rotation_center().y
is equal to centers[b][1]
transformer.get_rotation_center().z
is equal to centers[b][2]
What is your main goal by using transformer here ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.