Hi, very interesting model. But here are a few suggestions from us drug design develop

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Suggestions from drug design developers about diffdock HOT 4 CLOSED

gcorso commented on August 19, 2024 13

Suggestions from drug design developers

from diffdock.

Comments (4)

gcorso commented on August 19, 2024 4

Hi @HBioquant,
Thank you for the suggestions, I'll try to answer to all your points:

Indeed, for GNINA and SMINA we also test them in combination of another method to find the pocket such as P2Rank. This improves their performance on blind (global) docking but not by that much (around 29% top-1 RMSD < 2A). Please find the results and more discussion in the paper. Also from my understanding, QuickVina-W was developed for blind docking.
There are two points here. (1) One might argue that the prediction of the pose when knowing the pocket is a more useful task. This is likely the case for many drug developers, but, from my understanding, blind docking is also important for many critical applications such as reverse screening and understanding biological mechanisms behind side effects etc. This being said we are currently working on a version of DiffDock where one can specify the pocket. Stay tuned :). (2) I do not agree that, for blind docking, predicting the pocket and calculating the pose separately is the best approach. Predicting which is the correct pocket without knowing how the ligand could fit in the pocket, or even the ligand identity, feels to me suboptimal and this has also been shown in previous works such as TANKBind [Lu et al. 2022]. Indeed computing RMSD and centroid distance does not make much sense when the pocket is wrong, but anyways these will never appear in the RMSD below 2A (and 5A) predictions that we report.
Indeed the prediction of the torsion angles is definitely not random. The way that we predict the conformation is analogous to the approach that we took in our previous work Torsional Diffusion for Molecular Conformer Generation [Jing et al. 2022]. There you can find significantly more analysis on the quality of the conformers generated by the method (both in terms of geometry and energy), we will try to add similar analyses to this paper as well even though we expect them to be similar. Do you have another type of comparison you had in mind to evaluate the fidelity of the conformer predicted?
Well, we totally recognize that we have not solved virtual screening and it was not the task we were tackling in this paper. We are currently building on top of this work to get closer and closer to the virtual screening task and setting. From my perspective, one should not expect a single paper to go all the way, otherwise, it would have to significantly rely on existing frameworks and not really bring an entirely new approach to the field. However, in my view, accurate docking is a necessary (but not sufficient) condition for virtual screening and that is why we (as many before in the past) decided to focus on it in this paper. Stay tuned for future work!

Thanks again for your comments,
Gabriele

from diffdock.

HBioquant commented on August 19, 2024

Thanks for your reply, Gabriele. I'll also point-to-point express my views:
1 & 2. Yeap, QuickVina-W is a better choice to benchmark. My confusion is mainly on the performance of p2rank. As our internal test of p2rank, the acc is dissatisfactory under the top1 score and nearest predicted pocket with the native pocket. Have you tested the performance of p2rank? You can use a more SOTA pocket prediction tool, unnecessarily the same as TankBind implementation；
3. I think mmff94 energy is a good indicator to evaluate the generated conformer. My main concern is that Torsional Diffusion learned from DRUGS with lots of molecular comformers, while DiffDock learned from PDBBind and conditioned on the CA pocket, the torsion angle may be affected by this difference and the model didn't learn exact torsion angle distribution;
4’. As your examples showed on the Figure 10 and https://github.com/gcorso/DiffDock/tree/main/visualizations, the showed problem is not a real problem, because if we really know our protein such as the symmetric complexes, we just delete the homologous monomers or remove the symmetry and keep only one asymmetric component, which is formed in one single monomer or monomer-monomer interface. So it is not a real problem for regression-based methods. But I agree with generative model for the pose prediction. A really nice work!
Thanks again for your work and your reply,
David

from diffdock.

gcorso commented on August 19, 2024

Thanks David for the reply!
1&2 In TANKBind the reported performance of P2Rank top1 is 73%. What do you suggest as a more accurate pocket prediction tool? Notice that I believe the cause of the difference of these results with the one typically reported by the docking methods (often in the 80s %) is the fact that they sometimes assume not only knowledge of the binding pocket location but also of a very restrictive bounding box, which even good pocket prediction methods cannot provide.
3. Thanks for the comment, we'll try to make this evaluation!
Thanks,
Gabriele

from diffdock.

rdk commented on August 19, 2024

Hi, very interesting discussion. Author of P2Rank here.

P2Rank prediction success rate of around 73% for top1 pocket is consistent with my experience (the exact number depends on the specific identification criterion used). This might seem rather low (seemingly leaving 27% of binding sites on the table), but I believe it is quite close to Bayes optimal rate that can be achieved in a benchmark on noisy datasets. The reason is the fact that, more often than not, proteins have more than one binding site, but not all of them are present in many/most of the datasets currently used in the field (which usually consist of proteins+ligands taken straight from the PDB). Even the hypothetical perfect binding site prediction method (one that predicts all true binding sites and nothing else) would at times predict true binding site not contained in the dataset and assign it the highest score. In such cases, the method would be penalized for that correct prediction by top1 metric. My guess is that the Bayes optimal top1 prediction success rate for structures(proteins+ligands) taken straignt from PDB is not higher than 80%.

Cosequently, I wouldn't use just top1 prediction of P2Rank (or any other prediction method, for that matter) for anything, especially not docking. Instead, I think it makes more sense to consider at least top3 or top5 pockets (or to use a reasonably permissive pocket score threshold) in combination with a larger bounding boxes (~8A or more).

@HBioquant, could you elaborate on what kind of dataset was top1 accuracy of P2Rank unsatisfactory? I would be interested in looking into it. (Here or privately if you prefer.... I wasn't able to find your contact.)

@gcorso, very interesting work. Has your understanding of the role/limitations of ligand binding site prediction in docking changed since?

from diffdock.

Suggestions from drug design developers about diffdock HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent