Comments (6)
Hey, unfortunately I do not have those scripts with me as the server with those files went down :( . Reproducing the metrics should be simple enough though once you've generated the predictions across the dataset. After that it's manipulation of generated text files using python. To load molecule files one can use openbabel.
from deeppocket.
Thanks for your reply. I will try to implement these scripts by myself.
I have some more questions about the details of the evaluation.
-
When you calculate the success rate of Top-N, do you first calculate the success rate of each protein, and then average them? Or you put predictions of all proteins together, and divide it by the number of groundtruth pockets of all proteins?
-
Which model(s) is used for getting the metrics on COACH420 and HOLO4k? One of the 10-fold models trained on scPDB, or all of them, or you retrain a new model on COACH420?
-
I found there are trainning and testing types files for COACH420, and HOLO4k. Is only the testing types file used in evaluation? What is the purpose of training types files?
-
In the "Data Sets and Preprocessing" section of your paper, you first mention that "there are 291 protein structures and 359 ligands, 3413 protein structures, and 4288 ligands for COACH420 and HOLO4k", and then mention that "207 out of 291 proteins (71.13%) and 2752 out of 3413 proteins (80.63%) for the COACH420 and HOLO4k data sets".
Do you mean that the numbers of proteins in COACH420 and HOLO4k are 291 and 3413 for classification, and 207 and 2752 for segmentation?
Sorry for asking so many questions. I am so interested in your work, and sincerely thank for your help.
from deeppocket.
- Top-N is calculated for each protein individually, take the top N predictions for each protein where 'N' is the number of annotated pockets for that protein and calculate the metric. You also need to be careful about subpockets as fpocket sometimes gives multiple pocket centers for the same pocket (essentially predicting the same pocket again). You can cross-check that with the proximity to the corresponding ligand.
- We have separately trained models for COACH420 and HOLO4k
- The training types files contain datapoints from the scPDB dataset after removing protein that are similar to datapoints in the corresponding test set.
- Yes that is correct
from deeppocket.
My bad, for the first point, success rate is calculated by putting all (Top-N unique) predictions of all proteins together, and dividing it by the number of ground truth pockets.
from deeppocket.
Related Issues (20)
- Data Preparation for HOLO4K HOT 3
- How to avoid data leakage? HOT 1
- How to prepare the inputs for training segmentation model? HOT 2
- bary_centers.txt Issue
- Prediction Error
- rank_pockets.py - UserWarning HOT 1
- Training Classifier Problem HOT 1
- Transform part in Train HOT 5
- lack of gninatypes files HOT 9
- The number of pocket in types file HOT 3
- .
- predict.py running error HOT 2
- Understanding the Train Dataset for Training Part HOT 2
- Pocket Probability HOT 1
- Channels in training script are different from those in your Supporting Information table s1 HOT 6
- Do not have files for running make_types.py when prerparing custom data for training a new classifier HOT 3
- Training Classifier Dataset HOT 8
- Question about classes HOT 2
- segmentation fault HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deeppocket.