Git Product home page Git Product logo

diffab's People

Contributors

luost26 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

diffab's Issues

Why not use FAPE loss?

Hi, I am wondering why you don't use fape loss to supervise the training. it is a more natural choice. It optimizes the rotation and translation predictions jointly.

left-multiply or right-multiply ?

According to the paper attachment, after sampling a random rotation $E \sim \mathcal{IG}_{SO(3)}(I, \sigma^2)$, we need to left-multiply $R$ to $E$ to get the desired random value $RE \sim \mathcal{IG}_SO(3)(R, \sigma^2)$.

However, on the released code,

R_noisy = E_scaled @ R0_scaled

R_next = E @ so3vec_to_rotation(v_next)

$R$ is multiplied to $E$ from the right side.

Which operation is correct?

missing streamlit dependency

Hello,

I was wondering if you could add streamlit to the env yaml file.
This way streamlit is directly installed and the demo can be used.
Or else maybe provide an alternative installation maybe with pip?

I have a really hard time getting streamlit to run in a docker container together with diffab. the issue is that I cannot install streamlit with conda in a separate step.

Best regards,

code is inconsistent with your paper

def add_noise(self, v_0, mask_generate, t):

Hi. In the forward diffusion process, given e_0, the initial rotations, this function generates the noised rotations by R_noisy = E_scaled @ R0_scaled. where E_scaled is sampled from a prior distribution. I wonder why e_normal is not used but declared?

e_normal = e_scaled / (c1 + 1e-8)

In your paper, the R_noisy should be the interpolation between c0 * v_0(scaled v_0) and c1 * e_scaled. But your code does not use c1 at all.

Why generating seqmap during data preprocessing?

Hi, Luo,
Thanks for sharing the code.
I notice that you produce a key named 'seqmap' for each entity. However, this 'seqmap' seemed not to be used any longer. This is a very tine issue, and I just wonder the significance of this value.

inquiry about nanobodies

Thanks for sharing this work, its really fascinating. I was wondering - would this model work on nanobodies as well?

Further questions about left/right-multiply in issue #14

Thanks for your quick reply #14 (comment).

It is still confusing to me. I also take the column vectors as the base vectors. In this case, $RE$ is more intuitive to me (we first get $E$ around $I$ and transform it with $R$).

Moreover, $ER = (ERE^{-1})E \sim \mathcal{IG}_{SO(3)}(ERE^{-1}, \sigma^2)$, it seems that $RE$ and $ER$ do different things, because $ERE^{-1} = R$ is not always true.

Problem with diffab.utils.protein.writers.save_pdb()

I'm trying to deploy the model on our local cluster, but got the following error while running the example script.
> python design_pdb.py ./data/examples/7DK2_AB_C.pdb --config ./configs/test/codesign_single.yml

[INFO] Renumbered chain A (H)
[INFO] Renumbered chain B (K)
[INFO] Chain C does not contain valid Fv: Variable chain sequence not recognized: "CPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGXXX"
[2022-10-21 23:45:07,716::sample::INFO] Data ID: 7DK2_AB_C.pdb
[2022-10-21 23:45:07,717::sample::INFO] Results will be saved to ./results/codesign_single/7DK2_AB_C.pdb_2022_10_21__23_45_07
[2022-10-21 23:45:07,962::sample::INFO] Loading model config and checkpoints: ./trained_models/codesign_single.pt
[2022-10-21 23:45:13,578::sample::INFO]
[2022-10-21 23:45:14,304::sample::INFO] Start sampling for: H_CDR1
Traceback (most recent call last):
File "/gshare/xielab/jianfc/DLTEST/diffab/design_pdb.py", line 4, in
design_for_pdb(args_from_cmdline())
File "/gshare/xielab/jianfc/DLTEST/diffab/diffab/tools/runner/design_for_pdb.py", line 177, in design_for_pdb
save_pdb(data_native, os.path.join(log_dir, variant['tag'], 'REF1.pdb')) # w/ OpenMM minimization
File "/gshare/xielab/jianfc/DLTEST/diffab/diffab/utils/protein/writers.py", line 58, in save_pdb
unique_chain_nb = data['chain_nb'].unique().tolist()
File "/share/home/jianfc/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/_tensor.py", line 586, in unique
return torch.unique(self, sorted=sorted, return_inverse=return_inverse, return_counts=return_counts, dim=dim)
File "/share/home/jianfc/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/_jit_internal.py", line 422, in fn
return if_false(*args, **kwargs)
File "/share/home/jianfc/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/_jit_internal.py", line 422, in fn
return if_false(*args, **kwargs)
File "/share/home/jianfc/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/functional.py", line 946, in _return_output
output, _, _ = _unique_impl(input, sorted, return_inverse, return_counts, dim)
File "/share/home/jianfc/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/functional.py", line 860, in _unique_impl
output, inverse_indices, counts = torch._unique2(
RuntimeError: std::bad_alloc

Although I have solved the problem temporarily, by changing the Line 58 of writers.py from:
unique_chain_nb = data['chain_nb'].unique().tolist()
to:
unique_chain_nb = data['chain_nb'].to('cuda').unique().tolist()
, running the unique() function on GPU and everything goes well, I'm still wondering the reason why the problem occurs on CPU and how I can fix it.

I'm working on a cluster node with 512GB RAM and two A100-40GB GPU, with python 3.9, pytorch 1.11.0 and cuda 11.3.1.

Fix-backbone Performance on Light Chains

Hi, do you have the results of your model as well as baselines on light chains? I want to list your method as the baseline and it would be very helpful if you can provide such information!

Thanks.

diffab 在colab 上运行,请教~

老师您好,
diffab 的功能好强,给抗体设计产生了很多想象的空间,您太棒了!赞一个!
..en, 我在google colab 上捣鼓了一下代码,链接如下,运行测试出现如下问题,能否麻烦您帮看一下,应该怎么解决:
https://colab.research.google.com/drive/1O4vz0A-Y84sOE3dAjePAhL_VmYFhZe8A?usp=share_link

python /content/diffab/design_pdb.py /content/diffab/data/examples/7DK2_AB_C.pdb --config /content/diffab/configs/test/codesign_single.yml

Python 3.8.15
[INFO] Renumbered chain A (H)
[INFO] Renumbered chain B (K)
[INFO] Chain C does not contain valid Fv: Variable chain sequence not recognized: "CPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGXXX"
[2022-12-18 05:44:18,014::sample::INFO] Data ID: 7DK2_AB_C.pdb
[2022-12-18 05:44:18,014::sample::INFO] Results will be saved to ./results/codesign_single/7DK2_AB_C.pdb_2022_12_18__05_44_18
[2022-12-18 05:44:18,185::sample::INFO] Loading model config and checkpoints: ./trained_models/codesign_single.pt
Traceback (most recent call last):
File "/content/diffab/design_pdb.py", line 4, in
design_for_pdb(args_from_cmdline())
File "/content/diffab/diffab/tools/runner/design_for_pdb.py", line 141, in design_for_pdb
ckpt = torch.load(config.model.checkpoint, map_location='cpu')
File "/usr/local/envs/diffab/lib/python3.8/site-packages/torch/serialization.py", line 699, in load
with _open_file_like(f, 'rb') as opened_file:
File "/usr/local/envs/diffab/lib/python3.8/site-packages/torch/serialization.py", line 230, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/usr/local/envs/diffab/lib/python3.8/site-packages/torch/serialization.py", line 211, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: './trained_models/codesign_single.pt'

CalledProcessError Traceback (most recent call last)
in
----> 1 get_ipython().run_cell_magic('bash', '', 'source activate diffab\n#%%shell\n#eval "$(conda shell.bash hook)" # copy conda command to shell\n#conda config --show\n#conda config --get channels\n#conda list\n# python commands are ready to run within your environment\npython --version\npython /content/diffab/design_pdb.py /content/diffab/data/examples/7DK2_AB_C.pdb --config /content/diffab/configs/test/codesign_single.yml\n#python /content/diffab/diffab/tools/runner/design_for_pdb.py /content/diffab/data/examples/7DK2_AB_C.pdb --config /content/diffab/configs/test/abopt_singlecdr.yml\n')

3 frames
/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2357 with self.builtin_trap:
2358 args = (magic_arg_s, cell)
-> 2359 result = fn(*args, **kwargs)
2360 return result
2361

/usr/local/lib/python3.8/dist-packages/IPython/core/magics/script.py in named_script_magic(line, cell)
140 else:
141 line = script
--> 142 return self.shebang(line, cell)
143
144 # write a basic docstring:

in shebang(self, line, cell)

/usr/local/lib/python3.8/dist-packages/IPython/core/magic.py in (f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):

/usr/local/lib/python3.8/dist-packages/IPython/core/magics/script.py in shebang(self, line, cell)
243 sys.stderr.flush()
244 if args.raise_error and p.returncode!=0:
--> 245 raise CalledProcessError(p.returncode, cell, output=out, stderr=err)
246
247 def _run_script(self, p, cell, to_close):

CalledProcessError: Command 'b'source activate diffab\n#%%shell\n#eval "$(conda shell.bash hook)" # copy conda command to shell\n#conda config --show\n#conda config --get channels\n#conda list\n# python commands are ready to run within your environment\npython --version\npython /content/diffab/design_pdb.py /content/diffab/data/examples/7DK2_AB_C.pdb --config /content/diffab/configs/test/codesign_single.yml\n#python /content/diffab/diffab/tools/runner/design_for_pdb.py /content/diffab/data/examples/7DK2_AB_C.pdb --config /content/diffab/configs/test/abopt_singlecdr.yml\n'' returned non-zero exit status 1.

Designed CDR clashed with antigen

Dear author,

5CIL_0023_CDRH3.pdb.zip

I did single cdr loop CDRH3 structure-sequence co-design with input as a pair of Ab-Ag complex and using this exact config file https://github.com/luost26/diffab/blob/main/configs/test/codesign_single.yml. Interestingly, one of the designed loop structure clashed with input antigen. In theory, if the design is conditioned on the antigen structure, we shouldn't observe such clash. Could you please help me understand this specific case?

Thanks!

Training time

Hi there,

How much time does it take to train a model?

Thanks!

在运行sabdab.py代码,发生报错

在运行sabdab.py代码,发生报错,请问该如何解决呀
第一个错误:报ValueError: invalid literal for int() with base 10: 'V'异常,
第二个错误如图,请教这两个问题都是怎么合理解决的呀
20230617-131318

Example

Hello, thank you for sharing the great work. I am trying to reproduce some of the experiments using the example data. However, there seems not to be a configuration file for that. Could you help me to run experiments with those example data? Many thanks!

questions regarding the reproduction of your test results

Hi. I am trying to reproduce your test results about generating antibody CDRs (sequence-structure co-design) using the DiffAb model.
Using the design_testset.py script, index 10 (pdb 7bwj_H_L_E), and codesign_single ckpt, the results on CDR-H3 are unacceptably bad.
The following table is the rmsd-ca between generated structure and native structure.

         H_CDR1     H_CDR2      H_CDR3     L_CDR1     L_CDR2     L_CDR3
mean    1.428283   1.659423   52.084544   1.605134   0.385445   3.645147
min     0.861669   0.935878   28.523428   0.730051   0.208125   1.603799
max     2.923090   3.224749  153.499802   2.118108   0.752684   6.416183

the rmsd-ca is calculated by the following code.

generate_flags = variant['data']['generate_flag']
native_atom_positions = variant['data']['pos_heavyatom'][...,BBHeavyAtom.CA,:][generate_flags]
# native_atom_positions = native_atom_positions[mask_ha[generate_flags]]
pred_atom_positions = pos_ha[...,BBHeavyAtom.CA,:][generate_flags]
# pred_atom_positions = pred_atom_positions[mask_ha[generate_flags]]
rmsd = ((native_atom_positions - pred_atom_positions)**2).sum(-1).mean()

If this case has such a high rmsd, I doubt that the testset rmsd reported in your paper, Table 1 would also high.
No offense, I am trying to find out what is wrong with my reproduction.
Let me know if you want more details about my reproduction.

Issues with Sidechain packing

Hi , for the 7DK2_AB_C.pdb I have sampled it using codesign_multicdrs.yml and when I am trying to do side chain packing and full atom refinement it is giving errors.I am running the differ/tools/relax/run.py over the sampled files.
Screenshot 2022-12-22 at 7 21 10 PM

License?

Hi. First off, thank you very much for releasing the code accompanying your awesome work here!
I was curious if this source code is currently (or will eventually be) accompanied by any software licenses (e.g., MIT).

Issue with non-reproducibility in the energy calculation

Hi,

As a proof-of-concept, I've been trying to sample the H-CDR3 structure of a given antibody to see if your method would be able to reproduce the crystal structure conformation. For the designed structures I'm especially interested in identifying those conformations where the RMSD is <= 2 Å.

I'm not sure if my protocol is correct, but I've been running it as follows:

  • Step 1: run design_pdb.py with the config file 'strpred.yml'. I've modified it to sample only the H-CDR3.
  • Step 2: run diffab/tools/relax/run.py with the pipeline set to 'pyrosetta' to relax the sampled structures.
  • Step 3: run diffab/tools/eval/run.py to obtain RMSDs and energies.

With this protocol, I was able to identify some conformations where the RMSD was smaller than 2 Å.

However, I noticed that across different runs, the energy for the reference structure was changing, which would make my analysis not comparable and non-reproducible. For instance, the absolute difference between the dG_ref in two runs was 100, which is a lot.

So, I started debugging the code and found out that 'diffab/tools/relax/run.py' relaxes the reference and 'diffab/tools/eval/run.py' uses it as the reference to calculate the energy. That explains why across different runs I got different dG_refs. I then set PyRosetta to use the same seed in 'diffab/tools/relax/run.py' and I was able to get the same values across different calls of 'diffab/tools/eval/run.py'.

Despite that, I'm still getting minor differences for the reference within the same run. See an example below:

image

The difference is small, but it is strange to have different energies for the same complex. I tried to set PyRosetta's seed in 'diffab/tools/eval/run.py', but it didn't work.

Would you know what is going on here?

thank you for this very great tool and for your help.
bests

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.