Git Product home page Git Product logo

pocketflow's Introduction

PocketFlow is a data-and-knowledge driven structure-based molecular generative model

Deep learning-based molecular generation has extensive applications in many fields, particularly drug discovery. However, majority of current deep generative models (DGMs) are ligand-based and do not consider chemical knowledge in molecular generation process, often resulting in a relatively low success rate. We herein propose a structure-based molecular generative framework with chemical knowledge explicitly considered (named PocketFlow), which generates novel ligand molecules inside protein binding pockets. In various computational evaluations, PocketFlow showed a state-of-the-art performance with generated molecules being 100% chemically valid and highly drug-like. Ablation experiments prove a critical role of chemical knowledge in ensuring the validity and drug-likeness of the generated molecules. We applied PocketFlow to two new target proteins that are related to epigenetic regulation, HAT1 and YTHDC1, and successfully obtained wet-lab validated bioactive compounds. The binding modes of the active compounds with target proteins are close to those predicted by molecular docking, and further confirmed by the X-ray crystal structure. All the results suggest that PocketFlow is a useful deep generative model, capable of generating innovative bioactive molecules from scratch given a protein binding pocket.

Create the Conda environment

conda env create -f environment.yml

Molecular generation

The molecule can be generated by running the following command, where the pocket pdb file and the model parameter file are required, and the rest of the parameters are optional

python main_generate.py -pkt test_samples/test_pocket10/1bvr_C_rec_pocket10-surf.pdb --ckpt ckpt/ZINC-pretrained-255000.pt -n 100 -d cuda:0 --root_path gen_results --name 1bvr -at 1.0 -bt 1.0 --max_atom_num 35 -ft 0.5 -cm True --with_print True

All parameters of generation:

usage: main_generate.py [-h] [-pkt POCKET] [--ckpt CKPT] [-n NUM_GEN] [--name NAME] [-d DEVICE] [-at ATOM_TEMPERATURE] [-bt BOND_TEMPERATURE] [--max_atom_num MAX_ATOM_NUM] [-ft FOCUS_THRESHOLD] [-cm CHOOSE_MAX]
                        [--min_dist_inter_mol MIN_DIST_INTER_MOL] [--bond_length_range BOND_LENGTH_RANGE] [-mdb MAX_DOUBLE_IN_6RING] [--with_print WITH_PRINT] [--root_path ROOT_PATH] [--readme README]

optional arguments:
  -h, --help            show this help message and exit
  -pkt POCKET, --pocket POCKET
                        the pdb file of pocket in receptor
  --ckpt CKPT           the path of saved model
  -n NUM_GEN, --num_gen NUM_GEN
                        the number of generateive molecule
  --name NAME           receptor name
  -d DEVICE, --device DEVICE
                        cuda:x or cpu
  -at ATOM_TEMPERATURE, --atom_temperature ATOM_TEMPERATURE
                        temperature for atom sampling
  -bt BOND_TEMPERATURE, --bond_temperature BOND_TEMPERATURE
                        temperature for bond sampling
  --max_atom_num MAX_ATOM_NUM
                        the max atom number for generation
  -ft FOCUS_THRESHOLD, --focus_threshold FOCUS_THRESHOLD
                        the threshold of probility for focus atom
  -cm CHOOSE_MAX, --choose_max CHOOSE_MAX
                        whether choose the atom that has the highest prob as focus atom
  --min_dist_inter_mol MIN_DIST_INTER_MOL
                        inter-molecular dist cutoff between protein and ligand.
  --bond_length_range BOND_LENGTH_RANGE
                        the range of bond length for mol generation.
  -mdb MAX_DOUBLE_IN_6RING, --max_double_in_6ring MAX_DOUBLE_IN_6RING
  --with_print WITH_PRINT
                        whether print SMILES in generative process
  --root_path ROOT_PATH
                        the root path for saving results
  --readme README, -rm README
                        description of this genrative task

Spliting Pocket

Based on the pose of the ligand, the pocket structure can be splited from the protein structure

from pocket_flow import SplitPocket, Protein, Ligand

pro = Protein('/path/to/protein.pdb')
lig = Ligand('/path/to/ligand.sdf')
dist_cutoff = 10
pocket_block, _ = SplitPocket._split_pocket_with_surface_atoms(pro, lig, dist_cutoff)
open('/path/to/pocket.pdb','w').write(pocket_block)

Dataset

The raw CrossDocked2020 dataset is large, which need about 50G disk space. You can donwload the processed data from Pocket2Mol

from pocket_flow import CrossDocked2020

unexpected_sample = [
    line.split()[-1] for line in open('data/unexcept_element_sample_new.csv').read().split('\n')
    ]
cs2020 = CrossDocked2020(
    './data/crossdocked_pocket10/',
    './data/crossdocked_pocket10/index.pkl',
    unexpected_sample=unexpected_sample
    )
cs2020.run(
    dataset_name='crossdocked_pocket10_processed_35Atoms.lmdb',
    max_ligand_atom=35,
    only_backbone=False,
    lmdb_path='./data/'
    )

The pretraining datase of PocketFlow was choosed from ZINC 3D. You can download ZINC 3D, and then use make_pretrain_data.py to produce the pretraining dataset.

pocketflow's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

pocketflow's Issues

Code for Metrics Calculation

hi, Thank you so much for publishing the code and it's a pretty work.
I'm a relative newcomer and could not find the instruction to calculate metrics (including SAscores etc.). Could you offer some information?

[No module named 'pocket_flow]and[RuntimeError: Trying to deserialize object on CUDA device 1 but torch.cuda.device_count() is 1]

您好,我尝试了您的模型,使用系统win11,并测试“python main_generate.py -pkt test_samples/test_pocket10/1bvr_C_rec_pocket10-surf.pdb --ckpt ckpt/ZINC-pretrained-255000.pt -n 100 -d cuda:0 --root_path gen_results --name 1bvr -at 1.0 -bt 1.0 --max_atom_num 35 -ft 0.5 -cm True --with_print True”,已成功运行。
已经安装了cuda( torch.version.cuda'11.6'),但是当使用“-d cuda:1”时会报错“RuntimeError: Trying to deserialize object on CUDA device 1 but torch.cuda.device_count() is 1"。具体信息如下

E:\0_deep_learning_program\project\PocketFlow\PocketFlow-master\PocketFlow-master\pocket_flow\utils\transform.py:134: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  protein_atom_indices = torch.tensor(data.protein_atom_to_aa_type, dtype=torch.int64)
Loading model ...
Traceback (most recent call last):
  File "main_generate.py", line 79, in <module>
    ckpt = torch.load(args.ckpt, map_location=device)
  File "E:\0_deep_learning_program\miniconda3\envs\pytorch\lib\site-packages\torch\serialization.py", line 712, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "E:\0_deep_learning_program\miniconda3\envs\pytorch\lib\site-packages\torch\serialization.py", line 1049, in _load
    result = unpickler.load()
  File "E:\0_deep_learning_program\miniconda3\envs\pytorch\lib\site-packages\torch\serialization.py", line 1019, in persistent_load
    load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "E:\0_deep_learning_program\miniconda3\envs\pytorch\lib\site-packages\torch\serialization.py", line 1001, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "E:\0_deep_learning_program\miniconda3\envs\pytorch\lib\site-packages\torch\serialization.py", line 970, in restore_location
    return default_restore_location(storage, map_location)
  File "E:\0_deep_learning_program\miniconda3\envs\pytorch\lib\site-packages\torch\serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "E:\0_deep_learning_program\miniconda3\envs\pytorch\lib\site-packages\torch\serialization.py", line 152, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "E:\0_deep_learning_program\miniconda3\envs\pytorch\lib\site-packages\torch\serialization.py", line 143, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on CUDA device '
RuntimeError: Attempting to deserialize object on CUDA device 1 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.

由于本人毫无编程基础无法解决以上问题,如您能解答将万分感激!

Having trouble with requirements.txt

Hi,

Thank you so much for publishing the code, it must have been a lot of work writing the code. I am very keen to try out the repository, but I'm having troubles with installing the requirements.txt file because most of the packages seem to be local to the computer that the repository was developed.

Could you provide a clean conda env file via conda env export --no-builds?

Also, could you let me know what is the CUDA driver that works with the repository?

Thank you so much and I look forward to your reply. 😁

Best regards,
Yew Mun

Request for Code for Metrics Calculation

Hi,

Thanks for your impressive work!

I noticed the QED metric of Crossdocked2020 test set as reference in your paper is 0.531 ± 0.210. However, most SBDD papers report results around ~0.48 for this metric.

I could not find the code to calculate metrics (including SAscores etc.) in your repository. Could you please provide it?

@Saoge123

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.