Thank you for open-sourcing this great work! I am a freshman in this

Hey, unfortunately I do not have those s with me as the server with those files

Thanks for your reply. I will try to implement these s by myself. <p dir="au

Top-N is calculated for each protein individually, take the top N predictions fo

Full test scripts to reproduce the metrics results in the paper. about deeppocket HOT 6 OPEN

devalab commented on July 21, 2024

Full test scripts to reproduce the metrics results in the paper.

from deeppocket.

Comments (6)

RishalAggarwal commented on July 21, 2024

Hey, unfortunately I do not have those scripts with me as the server with those files went down :( . Reproducing the metrics should be simple enough though once you've generated the predictions across the dataset. After that it's manipulation of generated text files using python. To load molecule files one can use openbabel.

from deeppocket.

ljpadam commented on July 21, 2024

Thanks for your reply. I will try to implement these scripts by myself.

I have some more questions about the details of the evaluation.

When you calculate the success rate of Top-N, do you first calculate the success rate of each protein, and then average them? Or you put predictions of all proteins together, and divide it by the number of groundtruth pockets of all proteins?
Which model(s) is used for getting the metrics on COACH420 and HOLO4k? One of the 10-fold models trained on scPDB, or all of them, or you retrain a new model on COACH420?
I found there are trainning and testing types files for COACH420, and HOLO4k. Is only the testing types file used in evaluation? What is the purpose of training types files?
In the "Data Sets and Preprocessing" section of your paper, you first mention that "there are 291 protein structures and 359 ligands, 3413 protein structures, and 4288 ligands for COACH420 and HOLO4k", and then mention that "207 out of 291 proteins (71.13%) and 2752 out of 3413 proteins (80.63%) for the COACH420 and HOLO4k data sets".
Do you mean that the numbers of proteins in COACH420 and HOLO4k are 291 and 3413 for classification, and 207 and 2752 for segmentation?

Sorry for asking so many questions. I am so interested in your work, and sincerely thank for your help.

from deeppocket.

RishalAggarwal commented on July 21, 2024

Top-N is calculated for each protein individually, take the top N predictions for each protein where 'N' is the number of annotated pockets for that protein and calculate the metric. You also need to be careful about subpockets as fpocket sometimes gives multiple pocket centers for the same pocket (essentially predicting the same pocket again). You can cross-check that with the proximity to the corresponding ligand.
We have separately trained models for COACH420 and HOLO4k
The training types files contain datapoints from the scPDB dataset after removing protein that are similar to datapoints in the corresponding test set.
Yes that is correct

from deeppocket.

RishalAggarwal commented on July 21, 2024

My bad, for the first point, success rate is calculated by putting all (Top-N unique) predictions of all proteins together, and dividing it by the number of ground truth pockets.

from deeppocket.

Full test scripts to reproduce the metrics results in the paper. about deeppocket HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent