Git Product home page Git Product logo

Comments (6)

RishalAggarwal avatar RishalAggarwal commented on July 21, 2024

Hey, unfortunately I do not have those scripts with me as the server with those files went down :( . Reproducing the metrics should be simple enough though once you've generated the predictions across the dataset. After that it's manipulation of generated text files using python. To load molecule files one can use openbabel.

from deeppocket.

ljpadam avatar ljpadam commented on July 21, 2024

Thanks for your reply. I will try to implement these scripts by myself.

I have some more questions about the details of the evaluation.

  1. When you calculate the success rate of Top-N, do you first calculate the success rate of each protein, and then average them? Or you put predictions of all proteins together, and divide it by the number of groundtruth pockets of all proteins?

  2. Which model(s) is used for getting the metrics on COACH420 and HOLO4k? One of the 10-fold models trained on scPDB, or all of them, or you retrain a new model on COACH420?

  3. I found there are trainning and testing types files for COACH420, and HOLO4k. Is only the testing types file used in evaluation? What is the purpose of training types files?

  4. In the "Data Sets and Preprocessing" section of your paper, you first mention that "there are 291 protein structures and 359 ligands, 3413 protein structures, and 4288 ligands for COACH420 and HOLO4k", and then mention that "207 out of 291 proteins (71.13%) and 2752 out of 3413 proteins (80.63%) for the COACH420 and HOLO4k data sets".
    Do you mean that the numbers of proteins in COACH420 and HOLO4k are 291 and 3413 for classification, and 207 and 2752 for segmentation?

Sorry for asking so many questions. I am so interested in your work, and sincerely thank for your help.

from deeppocket.

RishalAggarwal avatar RishalAggarwal commented on July 21, 2024
  1. Top-N is calculated for each protein individually, take the top N predictions for each protein where 'N' is the number of annotated pockets for that protein and calculate the metric. You also need to be careful about subpockets as fpocket sometimes gives multiple pocket centers for the same pocket (essentially predicting the same pocket again). You can cross-check that with the proximity to the corresponding ligand.
  2. We have separately trained models for COACH420 and HOLO4k
  3. The training types files contain datapoints from the scPDB dataset after removing protein that are similar to datapoints in the corresponding test set.
  4. Yes that is correct

from deeppocket.

RishalAggarwal avatar RishalAggarwal commented on July 21, 2024

My bad, for the first point, success rate is calculated by putting all (Top-N unique) predictions of all proteins together, and dividing it by the number of ground truth pockets.

from deeppocket.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.