Git Product home page Git Product logo

pocketgen's Introduction

📢 PocketGen: Generating Full-Atom Ligand-Binding Protein Pockets

TOC
TOC

Environment

Install conda environment via conda yaml file

conda env create -f pocketgen.yaml
conda activate pocketgen

Install via Conda and Pip

conda create -n targetdiff python=3.8
conda activate targetdiff
conda install pytorch pytorch-cuda=11.6 -c pytorch -c nvidia
conda install pyg -c pyg
conda install rdkit openbabel tensorboard pyyaml easydict python-lmdb -c conda-forge
conda install -c conda-forge openmm pdbfixer flask
conda install -c conda-forge numpy swig boost-cpp sphinx sphinx_rtd_theme
pip install meeko==0.1.dev3 wandb scipy pdb2pqr vina==1.2.2 
python -m pip install git+https://github.com/Valdes-Tresanco-MS/AutoDockTools_py3

Benchmark Datasets

We use CrossDocked and Binding MOAD datasets to benchmark pocket generation.

CrossDocked

We download and process the CrossDocked dataset as described by the authors of TargetDiff
Firstly download the crossdocked_v1.1_rmsd1.0.tar.gz and split_by_name.pt and put it under the ./data directory.
Use the following commands to extract pockets, create index_seq.pkl, and split the dataset.

python data_preparation/extract_pockets.py
python data_preparation/split_pl_dataset.py

Binding MOAD

We download and process the Binding MOAD dataset following the authors of DiffSBDD Download the dataset

wget http://www.bindingmoad.org/files/biou/every_part_a.zip
wget http://www.bindingmoad.org/files/biou/every_part_b.zip
wget http://www.bindingmoad.org/files/csv/every.csv

unzip every_part_a.zip
unzip every_part_b.zip

Process the raw data using

python -W ignore process_bindingmoad.py <bindingmoad_dir>

Use the following commands to extract pockets, create index_seq.pkl, and split the dataset.

python data_preparation/extract_pockets_moad.py
python data_preparation/split_pl_dataset_moad.py

Processed datasets

We also provide the processed datasets for training from scratch at zenodo

For each dataset, it requires the preprocessed .lmdb file and split file _split.pt

Benchmark Results

Benchmarking PocketGen and other approaches for pocket generation on two datasets. Reported are average and standard deviation values across three independent runs. The best results are bolded.

Model AAR (↑) CrossDocked Designability (↑) CrossDocked Vina (↓) CrossDocked AAR (↑) Binding MOAD Designability (↑) Binding MOAD Vina (↓) Binding MOAD
Test set - 0.77 -7.016 - 0.79 -8.076
DEPACT 31.52±3.26% 0.68±0.04 -6.632±0.18 35.30±2.19% 0.67±0.06 -7.571±0.15
dyMEAN 38.71±2.16% 0.71±0.03 -6.855±0.06 41.22±1.40% 0.70±0.03 0.71±0.04
FAIR 40.16±1.17% 0.73±0.02 -7.015±0.12 43.68±0.92% 0.72±0.05 -7.930±0.15
RFDiffusion 46.57±2.07% 0.74±0.01 -6.936±0.07 45.31±2.73% 0.75±0.05 -7.942±0.14
RFDiffusionAA 50.85±1.85% 0.75±0.03 -7.012±0.09 49.09±2.49% 0.78±0.03 -8.020±0.11
PocketGen 63.40±1.64% 0.77±0.02 -7.135±0.08 64.43±2.35% 0.80±0.04 -8.112±0.14

Training

Train on CrossDocked:

python train_recycle.py --config ./config/train_model.yml

Train on Binding MOAD:

python train_recycle.py --config ./config/train_model_moad.yml

Model Checkpoints

Pretrained checkpoint on the CrossDocked training dataset: checkpoint.pt

Generation

python generate_new.py

We provide one example of the generated pocket for pdbid-2p16 and visualize the interactions with plip

TOC

Evaluation

The code to compute self-consistency-related scores, such as scRMSD, scTM, and pLDDT can be found at eval.

The code to run protein-ligand interaction analysis is interaction.

Acknowledgement

This project draws in part from TargetDiff and ByProt, supported by MIT License and Apache-2.0 License. Thanks for their great work and code!

Contact

Zaixi Zhnag ([email protected])

Sincerely appreciate your suggestions on our work!

License

This project is licensed under the terms of the MIT license. See LICENSE for additional details.

Reference

@article{zhang2024pocketgen,
  title={PocketGen: Generating Full-Atom Ligand-Binding Protein Pockets},
  author={Zhang, Zaixi and Shen, Wanxiang and Liu, Qi and Zitnik, Marinka},
  journal={arXiv},
  url={},
  year={2024}
}

pocketgen's People

Contributors

zaixizhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

pocketgen's Issues

Question about the validation/test set

Hi there,

Great work on PocketGen! I've been reviewing your paper and code, and I have a question about the data splitting and validation process for the CrossDocked dataset.

I downloaded the crossdocked_split.pt file from Zenodo and noticed something unexpected:

a = torch.load('crossdocked_split.pt')
(set(a['val']) & set(a['test'])) == set(a['test'])

This returns True, indicating that the validation and test sets are identical. Is this intentional?

Could you please clarify if this is the intended data split, and if so, the reasoning behind it?

Thanks for any clarification you can provide!

Unpacking Error Encountered While Running the Model

Hello, I encountered some issues while trying to run your model with the use case '2p16'.
Here is the detailed error message:

Traceback (most recent call last):
File "generate_new.py", line 305, in
aar, rmsd, attend_logits = model.generate(batch, dir_name)
File "/home/dell/download/pocketgen/models/PD.py", line 354, in generate
res_H, res_X, pred_ligand, ligand_feat, pred_res_type, attend_logits = self.encoder(res_H, res_X, res_S, res_batch, pred_ligand, ligand_feat, batch['ligand_mask'], batch['edit_residue_num'], residue_mask)
ValueError: not enough values to unpack (expected 6, got 5)

It seems that the return values from the function call are not as expected. Can you provide guidance on how to resolve this issue?
Thank you for your assistance!

Is it possible to add restraints to part of the sequence forming the binding pocket?

Very interesting work!

Although your method is trained to predict the full pocket, do you expect it to work
properly (if not better) for partial binding site impainting (Fixing a parte of the residues)?
Are you planning to implement such possibility in the scripts or to fine-tune a model for such partial pocket sequence retrieval?

Thanks!

Issues with installing environment.

Hi,

Thanks for your work on this interesting model!

I'm trying create an environment using the yaml file provided but it seems Pip can't find suitable versions for many of the libraries needed, an excerpt from my error message;

Pip subprocess error:

ERROR: Ignored the following versions that require a different python version: 0.0.1 Requires-Python >=3.9,<4.0; 0.21 Requires-Python >=3.9; 0.21.1 Requires-Python >=3.9; 0.21.2 Requires-Python >=3.9; 0.21.post1 Requires-Python >=3.9; 0.21rc1 Requires-Python >=3.9;

ERROR: Could not find a version that satisfies the requirement pyg-lib==0.2.0 (from versions: none)
ERROR: No matching distribution found for pyg-lib==0.2.0

Is there a different version of the environment file that I can try to use?

Thanks so much for your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.