Git Product home page Git Product logo

deephops's Introduction


Supporting Information for the paper "Deep Scaffold Hopping with Multimodal Transformer Neural Networks"

DeepHop is a multi-modal molecular transformation framework. It accepts a hit molecule and an interest target protein sequence as inputs and design isofunctional molecular structures to the source compound.



Create a conda environment for QSAR-scorer:

conda create env -f=score/env.yaml

Create a conda environment for Deephop:

conda create env -f=deephop/env.yaml

Note that you should replace three source files(,, of the torchtext library in "your deep hop env path/python3.7/site-packages/torchtext/data" with the corrsponding three files contained in "deephop/replace_torchtext" since we have modified the codes.

Scaffold hopping pairs construction

For the convenience of illustration, We assume that: you code extract in /data/u1/projects/mget_3d environment for deephop is named deephop_env

cd /data/u1/projects/mget_3d
conda activate deephop_env

you can use to generate hopping pairs.

Dataset split

python -out_dir data40_tue_3d/0.60 -protein_group  data40 -target_uniq_rate 0.6 -hopping_pairs_dir hopping_pairs_with_scaffold

Data preprocessing

python -train_src data40_tue_3d/0.60/src-train.txt -train_tgt data40_tue_3d/0.60/tgt-train.txt -train_cond data40_tue_3d/0.60/cond-train.txt -valid_src data40_tue_3d/0.60/src-val.txt -valid_tgt data40_tue_3d/0.60/tgt-val.txt -valid_cond data40_tue_3d/0.60/cond-val.txt -save_data data40_tue_3d/0.60/seqdata -share_vocab -src_seq_length 1000 -tgt_seq_length 1000 -src_vocab_size 1000 -tgt_vocab_size 1000 -with_3d_confomer

Model training

python -condition_dim 768  -use_graph_embedding -arch after_encoding -data data40_tue_3d/0.60/seqdata -save_model experiments/data40_tue_3d/after/models/model -seed 42 -save_checkpoint_steps 158 -keep_checkpoint 400 -train_steps 95193 -param_init 0 -param_init_glorot -max_generator_batches 32 -batch_size 8192 -batch_type tokens -normalization tokens -max_grad_norm 0 -accum_count 4 -optim adam -adam_beta1 0.9 -adam_beta2 0.998 -decay_method noam -warmup_steps 475 -learning_rate 2 -label_smoothing 0.0 -report_every 10 -layers 4 -rnn_size 256 -word_vec_size 256 -encoder_type transformer -decoder_type transformer -dropout 0.1 -position_encoding -share_embeddings -global_attention general -global_attention_function softmax -self_attn_type scaled-dot -heads 8 -transformer_ff 2048 -log_file experiments/data40_tue_3d/after/train.log -tensorboard -tensorboard_log_dir experiments/data40_tue_3d/after/logs -world_size 4 -gpu_ranks 0 1 2 3 -valid_steps 475 -valid_batch_size 32

Hops generation

To generate the output SMILES by loading saved model

python -condition_dim 768  -use_graph_embedding -arch after_encoding -with_3d_confomer -model /data/u1/projects/mget_3d/experiments/data40_tue/3d_gcn/models/ -gpu 0 -src data40_tue_3d/src-test.txt -cond data40_tue_3d/cond-test.txt -output /data/u1/projects/mget_3d/summary_tue/data40/after/9500/pred.txt -beam_size 10 -n_best 10 -batch_size 16 -replace_unk -max_length 200 -fast -use_protein40


To evaluate our model

 python -beam_size 10 -src summary_tue/data40/after/9500/src-test-protein.txt -prediction /data/u1/projects/mget_3d/summary_tue/data40/after/9500/pred.txt -score_file /data/u1/projects/mget_3d/summary_tue/data40/after/9500/score.csv -invalid_smiles -cond summary_tue/data40/after/9500/cond-test-protein.txt -train_data_dir /data/u1/projects/mget_3d/data40_tue_3d -scorer_model_dir /data/u1/projects/score/total_mtr -pvalue_dir /data/u1/projects/mget_3d/score_train_data

where the final result report is saved at /data/u1/projects/mget_3d/summary_tue/data40/after/9500/score_final.csv


Please cite the following paper if you use this code in your work.

  title={Deep scaffold hopping with multimodal transformer neural networks},
  author={Zheng, Shuangjia and Lei, Zengrong and Ai, Haitao and Chen, Hongming and Deng, Daiguo and Yang, Yuedong},
  journal={Journal of cheminformatics},

deephops's People


edgarzheng-hub avatar prokia avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar

deephops's Issues

data source

Hello,I want to ask a question,can l get the source data named /home/aht/paper_code/shaungjia/chembl_webresource_client/scaffold_hopping_320target.csv?

Issue about translate

Hi, after training, I run the to generate SMILES with followed instruction, However I met some bug. Can you give me some solutions?
python -condition_dim 768 -use_graph_embedding -arch after_encoding -with_3d_confomer -model /data/u1/projects/mget_3d/experiments/data40_tue/3d_gcn/models/ -gpu 0 -src data40_tue_3d/src-test.txt -cond data40_tue_3d/cond-test.txt -output /data/u1/projects/mget_3d/summary_tue/data40/after/9500/pred.txt -beam_size 10 -n_best 10 -batch_size 16 -replace_unk -max_length 200 -fast -use_protein40

Traceback (most recent call last):
File "", line 100, in
File "", line 72, in main
opt = opt)
File "/home/data/aidd/deepHops/deephop/onmt/translate/", line 257, in translate
batch_data = self.translate_batch(batch, data,
File "/home/data/aidd/deepHops/deephop/onmt/translate/", line 441, in translate_batch
File "/home/data/aidd/deepHops/deephop/onmt/translate/", line 581, in _fast_translate_batch
[alive_seq.index_select(0, select_indices),
RuntimeError: expected scalar type Long but found Float
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[DEBUG/MainProcess] running the remaining "atexit" finalizers

about the file ReadoutFunction

I'm glad you can share the code. The ReadoutFunction module imported in MPNNs is not found. Where could I find the file? Thank you very much.



I tried to install envs according to guide from README but I faced with problem:

conda create env -f=score/env.yaml
usage: conda create [-h] [--clone ENV] [-n ENVIRONMENT | -p PATH] [-c CHANNEL]
                    [--use-local] [--override-channels]
                    [--repodata-fn REPODATA_FNS] [--strict-channel-priority]
                    [--no-channel-priority] [--no-deps | --only-deps]
                    [--no-pin] [--copy] [-C] [-k] [--offline] [-d] [--json]
                    [-q] [-v] [-y] [--download-only] [--show-channel-urls]
                    [--file FILE] [--no-default-packages]
                    [--experimental-solver {classic,libmamba,libmamba-draft}]
                    [package_spec ...]
conda create: error: argument -f/--force: ignored explicit argument 'score/env.yaml'

Then I fixed it to more relevant one:

cd score
conda env create -f env.yaml

But it didn't work:

Collecting package metadata (repodata.json): done
Solving environment: failed

  - cudnn==7.6.5=cuda10.1_0
  - tensorflow==2.1.0=gpu_py36h2e5cdaa_0
  - fontconfig==2.13.1=he4413a7_1000
  - cudatoolkit==10.1.243=h6bb024c_0
  - c-ares==1.15.0=h7b6447c_1001
  - matplotlib-base==3.1.1=py36hfd891ef_0
  - xz==5.2.5=h7b6447c_0
  - icu==58.2=hf484d3e_1000
  - lzo==2.10=h14c3975_1000
  - xorg-libxrender==0.9.10=h516909a_1002
  - libprotobuf==3.11.4=hd408876_0
  - wrapt==1.12.1=py36h7b6447c_1
  - readline==8.0=h7b6447c_0
  - libxcb==1.13=h14c3975_1002
  - rdkit==2020.03.2.0=py36hc20afe1_1
  - py-boost==1.67.0=py36h04863e7_4
  - tk==8.6.10=hed695b0_0
  - xorg-libxext==1.3.4=h516909a_0
  - pthread-stubs==0.4=h14c3975_1001
  - xorg-xproto==7.0.31=h14c3975_1007
  - tornado==6.0.4=py36h8c4c3a4_1
  - zlib==1.2.11=h516909a_1006
  - grpcio==1.27.2=py36hf8bcb03_0
  - pyqt==4.11.4=py36_3
  - libuuid==2.32.1=h14c3975_1000
  - jpeg==9c=h14c3975_1001
  - tensorflow-gpu==2.1.0=h0d30ee6_0
  - cupti==10.1.168=0
  - hdf5==1.10.4=hb1b8bf9_0
  - brotlipy==0.7.0=py36h8c4c3a4_1000
  - libstdcxx-ng==9.1.0=hdf63c60_0
  - tensorflow-base==2.1.0=gpu_py36h6c5654b_0
  - xorg-libxdmcp==1.1.3=h516909a_0
  - numexpr==2.7.1=py36h830a2c2_1
  - pixman==0.34.0=h14c3975_1003
  - cython==0.29.17=py36h831f99a_0
  - xorg-libsm==1.2.3=h84519dc_1000
  - h5py==2.10.0=py36h7918eee_0
  - lz4-c==1.9.2=he1b5a44_1
  - libiconv==1.15=h516909a_1006
  - bzip2==1.0.8=h516909a_2
  - numpy==1.18.4=py36h7314795_0
  - libopenblas==0.3.7=h5ec1e0e_6
  - xorg-libice==1.0.10=h516909a_0
  - libgcc-ng==9.1.0=hdf63c60_0
  - libtiff==4.1.0=hc7e4089_6
  - scipy==1.4.1=py36h2d22cac_3
  - sqlite==3.31.1=h62c20be_1
  - libxml2==2.9.9=h13577e0_2
  - freetype==2.10.2=he06d7ca_0
  - pcre==8.44=he1b5a44_0
  - ld_impl_linux-64==2.33.1=h53a641e_7
  - zstd==1.4.4=h6597ccf_3
  - openssl==1.1.1g=h516909a_0
  - xorg-xextproto==7.3.0=h14c3975_1002
  - kiwisolver==1.2.0=py36hdb11119_0
  - gettext==
  - protobuf==3.11.4=py36he6710b0_0
  - ncurses==6.2=he6710b0_1
  - libffi==3.3=he6710b0_1
  - libxgboost==1.0.2=he1b5a44_1
  - _tflow_select==2.1.0=gpu
  - cryptography==2.9.2=py36h45558ae_0
  - biopython==1.76=py36h516909a_0
  - libedit==3.1.20181209=hc058e9b_0
  - xorg-kbproto==1.0.7=h14c3975_1002
  - pillow==7.1.2=py36h8328e55_0
  - scikit-learn==0.23.0=py36h0e1014b_0
  - pytables==3.6.1=py36h71ec239_0
  - libboost==1.67.0=h46d08c1_4
  - xorg-renderproto==0.11.1=h14c3975_1002
  - xorg-libx11==1.6.9=h516909a_0
  - libgfortran-ng==7.3.0=hdf63c60_5
  - libwebp-base==1.1.0=h516909a_3
  - python==3.6.10=h7579374_2
  - pandas==1.0.3=py36h830a2c2_1
  - xorg-libxau==1.0.9=h14c3975_0
  - libpng==1.6.37=hed695b0_1
  - blosc==1.18.1=he1b5a44_0

I used macOS Monterey 12.1

Is there a way to fix installation? Or maybe update a guide?

Some question

Hello,I want to ask a question,can l get the source data named /home/aht/paper_code/shaungjia/chembl_webresource_client/scaffold_hopping_320target.csv?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.