luost26 / diffab Goto Github PK
View Code? Open in Web Editor NEW✌🏻 Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures (NeurIPS 2022)
License: Apache License 2.0
✌🏻 Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures (NeurIPS 2022)
License: Apache License 2.0
Hi, I am wondering why you don't use fape loss to supervise the training. it is a more natural choice. It optimizes the rotation and translation predictions jointly.
According to the paper attachment, after sampling a random rotation
However, on the released code,
diffab/diffab/modules/diffusion/transition.py
Line 118 in 6febe02
diffab/diffab/modules/diffusion/transition.py
Line 134 in 6febe02
Which operation is correct?
Hello,
I was wondering if you could add streamlit to the env yaml file.
This way streamlit is directly installed and the demo can be used.
Or else maybe provide an alternative installation maybe with pip?
I have a really hard time getting streamlit to run in a docker container together with diffab. the issue is that I cannot install streamlit with conda in a separate step.
Best regards,
Hi. In the forward diffusion process, given e_0, the initial rotations, this function generates the noised rotations by R_noisy = E_scaled @ R0_scaled
. where E_scaled is sampled from a prior distribution. I wonder why e_normal
is not used but declared?
diffab/diffab/modules/diffusion/transition.py
Line 112 in 6febe02
In your paper, the R_noisy
should be the interpolation between c0 * v_0
(scaled v_0) and c1 * e_scaled
. But your code does not use c1 at all.
Hi, Luo,
Thanks for sharing the code.
I notice that you produce a key named 'seqmap' for each entity. However, this 'seqmap' seemed not to be used any longer. This is a very tine issue, and I just wonder the significance of this value.
Thanks for sharing this work, its really fascinating. I was wondering - would this model work on nanobodies as well?
在esign_for_testset.py中 parser.add_argument('index', type=int) 这个 index 代表什么意思呀,谢谢指教
Thanks for your quick reply #14 (comment).
It is still confusing to me. I also take the column vectors as the base vectors. In this case,
Moreover,
I'm trying to deploy the model on our local cluster, but got the following error while running the example script.
> python design_pdb.py ./data/examples/7DK2_AB_C.pdb --config ./configs/test/codesign_single.yml
[INFO] Renumbered chain A (H)
[INFO] Renumbered chain B (K)
[INFO] Chain C does not contain valid Fv: Variable chain sequence not recognized: "CPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGXXX"
[2022-10-21 23:45:07,716::sample::INFO] Data ID: 7DK2_AB_C.pdb
[2022-10-21 23:45:07,717::sample::INFO] Results will be saved to ./results/codesign_single/7DK2_AB_C.pdb_2022_10_21__23_45_07
[2022-10-21 23:45:07,962::sample::INFO] Loading model config and checkpoints: ./trained_models/codesign_single.pt
[2022-10-21 23:45:13,578::sample::INFO]
[2022-10-21 23:45:14,304::sample::INFO] Start sampling for: H_CDR1
Traceback (most recent call last):
File "/gshare/xielab/jianfc/DLTEST/diffab/design_pdb.py", line 4, in
design_for_pdb(args_from_cmdline())
File "/gshare/xielab/jianfc/DLTEST/diffab/diffab/tools/runner/design_for_pdb.py", line 177, in design_for_pdb
save_pdb(data_native, os.path.join(log_dir, variant['tag'], 'REF1.pdb')) # w/ OpenMM minimization
File "/gshare/xielab/jianfc/DLTEST/diffab/diffab/utils/protein/writers.py", line 58, in save_pdb
unique_chain_nb = data['chain_nb'].unique().tolist()
File "/share/home/jianfc/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/_tensor.py", line 586, in unique
return torch.unique(self, sorted=sorted, return_inverse=return_inverse, return_counts=return_counts, dim=dim)
File "/share/home/jianfc/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/_jit_internal.py", line 422, in fn
return if_false(*args, **kwargs)
File "/share/home/jianfc/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/_jit_internal.py", line 422, in fn
return if_false(*args, **kwargs)
File "/share/home/jianfc/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/functional.py", line 946, in _return_output
output, _, _ = _unique_impl(input, sorted, return_inverse, return_counts, dim)
File "/share/home/jianfc/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/functional.py", line 860, in _unique_impl
output, inverse_indices, counts = torch._unique2(
RuntimeError: std::bad_alloc
Although I have solved the problem temporarily, by changing the Line 58 of writers.py
from:
unique_chain_nb = data['chain_nb'].unique().tolist()
to:
unique_chain_nb = data['chain_nb'].to('cuda').unique().tolist()
, running the unique() function on GPU and everything goes well, I'm still wondering the reason why the problem occurs on CPU and how I can fix it.
I'm working on a cluster node with 512GB RAM and two A100-40GB GPU, with python 3.9, pytorch 1.11.0 and cuda 11.3.1.
The folders are created in the process of looping, so, how to set the number of cycles?
and the last one is the fianl best result?
Hi, do you have the results of your model as well as baselines on light chains? I want to list your method as the baseline and it would be very helpful if you can provide such information!
Thanks.
老师您好,
diffab 的功能好强,给抗体设计产生了很多想象的空间,您太棒了!赞一个!
..en, 我在google colab 上捣鼓了一下代码,链接如下,运行测试出现如下问题,能否麻烦您帮看一下,应该怎么解决:
https://colab.research.google.com/drive/1O4vz0A-Y84sOE3dAjePAhL_VmYFhZe8A?usp=share_link
python /content/diffab/design_pdb.py /content/diffab/data/examples/7DK2_AB_C.pdb --config /content/diffab/configs/test/codesign_single.yml
CalledProcessError Traceback (most recent call last)
in
----> 1 get_ipython().run_cell_magic('bash', '', 'source activate diffab\n#%%shell\n#eval "$(conda shell.bash hook)" # copy conda command to shell\n#conda config --show\n#conda config --get channels\n#conda list\n# python commands are ready to run within your environment\npython --version\npython /content/diffab/design_pdb.py /content/diffab/data/examples/7DK2_AB_C.pdb --config /content/diffab/configs/test/codesign_single.yml\n#python /content/diffab/diffab/tools/runner/design_for_pdb.py /content/diffab/data/examples/7DK2_AB_C.pdb --config /content/diffab/configs/test/abopt_singlecdr.yml\n')
3 frames
/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2357 with self.builtin_trap:
2358 args = (magic_arg_s, cell)
-> 2359 result = fn(*args, **kwargs)
2360 return result
2361
/usr/local/lib/python3.8/dist-packages/IPython/core/magics/script.py in named_script_magic(line, cell)
140 else:
141 line = script
--> 142 return self.shebang(line, cell)
143
144 # write a basic docstring:
in shebang(self, line, cell)
/usr/local/lib/python3.8/dist-packages/IPython/core/magic.py in (f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
/usr/local/lib/python3.8/dist-packages/IPython/core/magics/script.py in shebang(self, line, cell)
243 sys.stderr.flush()
244 if args.raise_error and p.returncode!=0:
--> 245 raise CalledProcessError(p.returncode, cell, output=out, stderr=err)
246
247 def _run_script(self, p, cell, to_close):
CalledProcessError: Command 'b'source activate diffab\n#%%shell\n#eval "$(conda shell.bash hook)" # copy conda command to shell\n#conda config --show\n#conda config --get channels\n#conda list\n# python commands are ready to run within your environment\npython --version\npython /content/diffab/design_pdb.py /content/diffab/data/examples/7DK2_AB_C.pdb --config /content/diffab/configs/test/codesign_single.yml\n#python /content/diffab/diffab/tools/runner/design_for_pdb.py /content/diffab/data/examples/7DK2_AB_C.pdb --config /content/diffab/configs/test/abopt_singlecdr.yml\n'' returned non-zero exit status 1.
Dear author,
I did single cdr loop CDRH3 structure-sequence co-design with input as a pair of Ab-Ag complex and using this exact config file https://github.com/luost26/diffab/blob/main/configs/test/codesign_single.yml. Interestingly, one of the designed loop structure clashed with input antigen. In theory, if the design is conditioned on the antigen structure, we shouldn't observe such clash. Could you please help me understand this specific case?
Thanks!
Dear author,
When I use abopt_singlecdr.yml config, there are several empty residue with index xxxA, xxxB, … Does this also happen in your experiments? If so, how did you solve it?
Thank you very much!
Best,
Jerry
Hi there,
How much time does it take to train a model?
Thanks!
Hello, thank you for sharing the great work. I am trying to reproduce some of the experiments using the example data. However, there seems not to be a configuration file for that. Could you help me to run experiments with those example data? Many thanks!
Could you explain the derivation in more detail?
https://github.com/luost26/diffab/blob/main/diffab/modules/diffusion/transition.py#L204
Hi. I am trying to reproduce your test results about generating antibody CDRs (sequence-structure co-design) using the DiffAb model.
Using the design_testset.py
script, index 10 (pdb 7bwj_H_L_E
), and codesign_single
ckpt, the results on CDR-H3 are unacceptably bad.
The following table is the rmsd-ca between generated structure and native structure.
H_CDR1 H_CDR2 H_CDR3 L_CDR1 L_CDR2 L_CDR3
mean 1.428283 1.659423 52.084544 1.605134 0.385445 3.645147
min 0.861669 0.935878 28.523428 0.730051 0.208125 1.603799
max 2.923090 3.224749 153.499802 2.118108 0.752684 6.416183
the rmsd-ca is calculated by the following code.
generate_flags = variant['data']['generate_flag']
native_atom_positions = variant['data']['pos_heavyatom'][...,BBHeavyAtom.CA,:][generate_flags]
# native_atom_positions = native_atom_positions[mask_ha[generate_flags]]
pred_atom_positions = pos_ha[...,BBHeavyAtom.CA,:][generate_flags]
# pred_atom_positions = pred_atom_positions[mask_ha[generate_flags]]
rmsd = ((native_atom_positions - pred_atom_positions)**2).sum(-1).mean()
If this case has such a high rmsd, I doubt that the testset rmsd reported in your paper, Table 1 would also high.
No offense, I am trying to find out what is wrong with my reproduction.
Let me know if you want more details about my reproduction.
Hi. First off, thank you very much for releasing the code accompanying your awesome work here!
I was curious if this source code is currently (or will eventually be) accompanied by any software licenses (e.g., MIT).
Hi,
As a proof-of-concept, I've been trying to sample the H-CDR3 structure of a given antibody to see if your method would be able to reproduce the crystal structure conformation. For the designed structures I'm especially interested in identifying those conformations where the RMSD is <= 2 Å.
I'm not sure if my protocol is correct, but I've been running it as follows:
With this protocol, I was able to identify some conformations where the RMSD was smaller than 2 Å.
However, I noticed that across different runs, the energy for the reference structure was changing, which would make my analysis not comparable and non-reproducible. For instance, the absolute difference between the dG_ref in two runs was 100, which is a lot.
So, I started debugging the code and found out that 'diffab/tools/relax/run.py' relaxes the reference and 'diffab/tools/eval/run.py' uses it as the reference to calculate the energy. That explains why across different runs I got different dG_refs. I then set PyRosetta to use the same seed in 'diffab/tools/relax/run.py' and I was able to get the same values across different calls of 'diffab/tools/eval/run.py'.
Despite that, I'm still getting minor differences for the reference within the same run. See an example below:
The difference is small, but it is strange to have different energies for the same complex. I tried to set PyRosetta's seed in 'diffab/tools/eval/run.py', but it didn't work.
Would you know what is going on here?
thank you for this very great tool and for your help.
bests
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.