rosettacommons / rfdiffusion Goto Github PK
View Code? Open in Web Editor NEWCode for running RFdiffusion
License: Other
Code for running RFdiffusion
License: Other
There are some proteins that need to form a dimer to function as an enzyme, how to design these enzymes
Hi, amazing repo!
Will icosahedral symmetric design (as mentioned in the preprint) also be made available on GitHub?
Thanks!
Hello!
Thank you for publishing your work! I'm currently experimenting with your software, and I have some ideas in mind, but I am not sure to what extent they are practical and possible with RFdiffusion. As I am new to the protein design field, feel free to correct any assumptions I may have made; I would greatly appreciate any guidance.
I have a target protein and a highly stable scaffold protein; however, the scaffold protein was not designed to bind to this specific target, so no actual binding motif is present. Is it possible (and effective) to use the RFdiffusion protocol, such as motif scaffolding, to redesign some parts of the existing scaffold that face the target hotspot to create a binding interface? Or would it be more effective to use partial diffusion/fold conditioned binder design to create new, but structurally similar scaffolds? If I understand correctly, the latter approach will cause loss of the binder sequence, which might lead to a possible loss of stability compared to the original scaffold.
I apologize if I am missing something about presented pipelines.
Thank you!
On the Colab notebook I am using as a template for binder generation the mirror image of a naturally occurring protein (uploaded pdb file). The generation of the poly glycine backbone works well but when this input is used for protein MPNN and Alphafold evaluation the outputs alter the stereochemistry of the input protein back to the natural enantiomer. It's not clear whether this is happening at the protein MPNN step or whether this is a step taken by Alphafold. Is there a way to discern whether Protein MPNN is assigning side chains based the mirror image protein input (or whether Protein MPNN reverts the stereochemistry)?
Any way to get this to run on multiple GPUs simultaneously?
Right now it only ones on a single GPU even when there are multiple present. Any flags I might try?
Any way to get RFdiffusion to connect an N to C terminus to form a circularized protein?
If possible, this would be phenomenally useful functionality.
Many thanks.
I've been using RFdiffusion on colab pro for a few weeks now, using Google's GPUs. Realistically, we want to generate hundreds to thousands of designs and then filter through these. How is this supposed to be implemented when the max I can do is 32 designs in colab pro? I'm not sure how running RFdiffusion locally on my own GPU (RTX3060) would work out, any thoughts? I have never done any serious computing since I'm mostly at the bench, so technically I am very naive.
Thanks for RFdiffusion!
OS: CentOS Linux 7
GPU: gtx 1080
Hi! I get the following error running any of the examples scripts
RuntimeError: NVTX functions not installed. Are you sure you have a CUDA build?
When using the current SE3nv.yml I get the following versions
pytorch 1.9.1 cpu_py39hc5866cc_3 conda-forge
torchaudio 0.9.1 py39 pytorch
torchvision 0.14.1 cpu_py39h39206e8_1 conda-forge
I did a clean install running pip3 install --force-reinstall torch torchvision torchaudio
torch 2.0.0 pypi_0 pypi
torchaudio 2.0.1 pypi_0 pypi
torchvision 0.15.1 pypi_0 pypi
That seems to run every example without an issue. I've come into issues before with conda installs for pytorch when not using the most recent version. Is there a known issue from keeping RFdiffusion from moving to pytorch 2.0?
Hi there, I've been using the google colab version of the program and wanted to know where do I key in the denoiser.noise_scale and denoiser.noise_scale_frame command on the code? Also would like to know how to filter i_pae results down to < 10. Thank you! Love the program, keep up the great work! :)
Dear Authors,
Thank you for your great work! I am writing to inquire if there are any plans to release the pretraining code for both the modified RosettaFold and RF diffusion on GitHub. As someone with a keen interest in this field, I am particularly curious about this aspect and would appreciate any information or updates you could provide.
Really amazing work firstly. I had a suggestion to enable discussion in the Github repository so that interested users could discuss about potential uses etc. https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/enabling-or-disabling-github-discussions-for-a-repository
When I am trying to run inference to yield an unconditional monomer as described in the README, I get the following error:
_[2023-04-01 20:22:08,689][main][INFO] - Making design test_outputs/test_0
[2023-04-01 20:22:08,692][inference.model_runners][INFO] - Using contig: ['150-150']
Error executing job with overrides: ['contigmap.contigs=[150-150]', 'inference.output_prefix=test_outputs/test', 'inference.num_designs=10']
Traceback (most recent call last):
File "C:\Users\Norb\RFdiffusion\run_inference.py", line 76, in main
x_init, seq_init = sampler.sample_init()
File "C:\Users\Norb\RFdiffusion\inference\model_runners.py", line 341, in sample_init
seq_t[contig_map.hal_idx0] = seq_orig[contig_map.ref_idx0]
RuntimeError: Index put requires the source and destination dtypes match, got Long for the destination and Int for the source.
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
(SE3nv)_
Sorry in case I am missing something basic, I am an absolute beginner. Thank you so much in advance.
Hey! Thanks for sharing this :)
I just had the following problem—maybe I'm doing something wrong on my end—but by trying both (i) RFdiffusion/examples/design_motifscaffolding.sh and (ii) RFdiffusion/examples/design_unconditional.sh, the randomly sampled ranges only contain Gs (e.g., for the motif example: "GGGGGGGGGGGGGGGGGGEVNKIKSALLSTNKAVVSLGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG")
I've just set this up in WSL2 and had no issues during setup. I just copied the unconditional design example to quickly test if it works but I'm getting the error below. I'm not sure how to interpret this error so any help would be great!
~/RFdiffusion$ scripts/run_inference.py inference.output_prefix=example_outputs/design_unconditional 'contigmap.contigs=[100-200]' inference.num_designs=10
[2023-06-26 22:44:09,438][main][INFO] - Found GPU with device_name NVIDIA GeForce RTX 3060. Will run RFdiffusion on NVIDIA GeForce RTX 3060
Reading models from /home/usr/RFdiffusion/rfdiffusion/inference/../../models
[2023-06-26 22:44:09,439][rfdiffusion.inference.model_runners][INFO] - Reading checkpoint from /home/usr/RFdiffusion/rfdiffusion/inference/../../models/Base_ckpt.pt
This is inf_conf.ckpt_path
/home/usr/RFdiffusion/rfdiffusion/inference/../../models/Base_ckpt.pt
Error executing job with overrides: ['inference.output_prefix=example_outputs/design_unconditional', 'contigmap.contigs=[100-200]', 'inference.num_designs=10']
Traceback (most recent call last):
File "/home/usr/RFdiffusion/scripts/run_inference.py", line 54, in main
sampler = iu.sampler_selector(conf)
File "/home/usr/RFdiffusion/rfdiffusion/inference/utils.py", line 511, in sampler_selector
sampler = model_runners.SelfConditioning(conf)
File "/home/usr/RFdiffusion/rfdiffusion/inference/model_runners.py", line 37, in init
self.initialize(conf)
File "/home/usr/RFdiffusion/rfdiffusion/inference/model_runners.py", line 103, in initialize
self.load_checkpoint()
File "/home/usr/RFdiffusion/rfdiffusion/inference/model_runners.py", line 181, in load_checkpoint
self.ckpt = torch.load(
File "/home/usr/anaconda3/envs/SE3nv/lib/python3.9/site-packages/torch/serialization.py", line 594, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/usr/anaconda3/envs/SE3nv/lib/python3.9/site-packages/torch/serialization.py", line 230, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/usr/anaconda3/envs/SE3nv/lib/python3.9/site-packages/torch/serialization.py", line 211, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/home/usr/RFdiffusion/rfdiffusion/inference/../../models/Base_ckpt.pt'
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Thanks a lot for making the RFDiffusion project available! I am trying to wrap my head around what is needed to get the whole design workflow set up locally.
RFdiffusion as described here only seems to output "poly-glycine" PDBs. So we still need to run ProteinMPNN and AF2 filtering on all candidate solutions. The colab version of RFdiffusion seems to perform these steps through a call to colabdesign/rf/designability_test.py
. However, that script doesn't seem to exist neither in this repo nor in the colabdesign/rf
one.
Could you please add this script to this repo here so that one can really reproduce the workflow described in your paper?
Hello, how to solve the following error?
conda error.txt
Hi,
For the past week, whenever I am trying to run partial diffusion, I get the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'outputs/traj/test_0_pX0_traj.pdb'
I attached a picture with my input. After settings those, I have only did Runtime->Run All.
Could you help me to solve this problem?
Thank you!
Is it possible to set deterministic at conf/inference/base.yaml
as True and set its seed value from the argument?
Lines 42 to 43 in 1a39202
Or is it preferable to simply set deterministic to True and a large value of inference.num_designs?
But in that case, it will take longer to get the result because it will be executed sequentially instead of in parallel.
Is there a big difference in output between running in parallel with many seeds with inference.num_designs = 1 and running sequentially with a large value of inference.num_designs?
I found out after several failures to run the inference script that on Windows when running the inference script, if one use
'contigmap.contigs=[B1-100/0 100-100]'
or 'ppi.hotspot_res=[A30,A33,A34]' , it will not work.
One need to use " and not ' around the arguments otherwise the arguments are not correctly parsed and it raises an error.
eg.. 'contigmap.contigs=[B1-100/0 100-100]' becomes "contigmap.contigs=[B1-100/0 100-100]"
Not sure if this is because the default language of my windows is not english
The colab notebook is great but google colab is particularly unreliable recently in free account.
Would it be possible that someone translate the colab notebook to Jupyter notebook?
Today I got the error in google colab when running a diffusion. last Friday it was working well.
[Errno 2] No such file or directory: 'outputs/traj/test_0_pX0_traj.pdb' w
Dear all, please help me with this error. Thank you very much.
./scripts/run_inference.py 'contigmap.contigs=[150-150]' inference.output_prefix=test_outputs/test inference.num_designs=10
[2023-06-04 19:49:19,789][main][INFO] - Found GPU with device_name NVIDIA GeForce RTX 3090. Will run RFdiffusion on NVIDIA GeForce RTX 3090
Reading models from /home/hesong/local/RFdiffusion/rfdiffusion/inference/../../models
[2023-06-04 19:49:19,790][rfdiffusion.inference.model_runners][INFO] - Reading checkpoint from /home/hesong/local/RFdiffusion/rfdiffusion/inference/../../models/Base_ckpt.pt
This is inf_conf.ckpt_path
/home/hesong/local/RFdiffusion/rfdiffusion/inference/../../models/Base_ckpt.pt
Assembling -model, -diffuser and -preprocess configs from checkpoint
USING MODEL CONFIG: self._conf[model][n_extra_block] = 4
USING MODEL CONFIG: self._conf[model][n_main_block] = 32
USING MODEL CONFIG: self._conf[model][n_ref_block] = 4
USING MODEL CONFIG: self._conf[model][d_msa] = 256
USING MODEL CONFIG: self._conf[model][d_msa_full] = 64
USING MODEL CONFIG: self._conf[model][d_pair] = 128
USING MODEL CONFIG: self._conf[model][d_templ] = 64
USING MODEL CONFIG: self._conf[model][n_head_msa] = 8
USING MODEL CONFIG: self._conf[model][n_head_pair] = 4
USING MODEL CONFIG: self._conf[model][n_head_templ] = 4
USING MODEL CONFIG: self._conf[model][d_hidden] = 32
USING MODEL CONFIG: self._conf[model][d_hidden_templ] = 32
USING MODEL CONFIG: self._conf[model][p_drop] = 0.15
USING MODEL CONFIG: self._conf[model][SE3_param_full] = {'num_layers': 1, 'num_channels': 32, 'num_degrees': 2, 'n_heads': 4, 'div': 4, 'l0_in_features': 8, 'l0_out_features': 8, 'l1_in_features': 3, 'l1_out_features': 2, 'num_edge_features': 32}
USING MODEL CONFIG: self._conf[model][SE3_param_topk] = {'num_layers': 1, 'num_channels': 32, 'num_degrees': 2, 'n_heads': 4, 'div': 4, 'l0_in_features': 64, 'l0_out_features': 64, 'l1_in_features': 3, 'l1_out_features': 2, 'num_edge_features': 64}
USING MODEL CONFIG: self._conf[model][freeze_track_motif] = False
USING MODEL CONFIG: self._conf[model][use_motif_timestep] = True
USING MODEL CONFIG: self._conf[diffuser][T] = 50
USING MODEL CONFIG: self._conf[diffuser][b_0] = 0.01
USING MODEL CONFIG: self._conf[diffuser][b_T] = 0.07
USING MODEL CONFIG: self._conf[diffuser][schedule_type] = linear
USING MODEL CONFIG: self._conf[diffuser][so3_type] = igso3
USING MODEL CONFIG: self._conf[diffuser][crd_scale] = 0.25
USING MODEL CONFIG: self._conf[diffuser][so3_schedule_type] = linear
USING MODEL CONFIG: self._conf[diffuser][min_b] = 1.5
USING MODEL CONFIG: self._conf[diffuser][max_b] = 2.5
USING MODEL CONFIG: self._conf[diffuser][min_sigma] = 0.02
USING MODEL CONFIG: self._conf[diffuser][max_sigma] = 1.5
USING MODEL CONFIG: self._conf[preprocess][sidechain_input] = False
USING MODEL CONFIG: self._conf[preprocess][motif_sidechain_input] = True
USING MODEL CONFIG: self._conf[preprocess][d_t1d] = 22
USING MODEL CONFIG: self._conf[preprocess][d_t2d] = 44
USING MODEL CONFIG: self._conf[preprocess][prob_self_cond] = 0.5
USING MODEL CONFIG: self._conf[preprocess][str_self_cond] = True
USING MODEL CONFIG: self._conf[preprocess][predict_previous] = False
[2023-06-04 19:49:22,270][rfdiffusion.inference.model_runners][INFO] - Loading checkpoint.
[2023-06-04 19:49:24,666][rfdiffusion.diffusion][INFO] - Using cached IGSO3.
Error executing job with overrides: ['contigmap.contigs=[150-150]', 'inference.output_prefix=test_outputs/test', 'inference.num_designs=10']
Traceback (most recent call last):
File "/home/hesong/local/RFdiffusion/./scripts/run_inference.py", line 54, in main
sampler = iu.sampler_selector(conf)
File "/home/hesong/local/RFdiffusion/rfdiffusion/inference/utils.py", line 511, in sampler_selector
sampler = model_runners.SelfConditioning(conf)
File "/home/hesong/local/RFdiffusion/rfdiffusion/inference/model_runners.py", line 37, in init
self.initialize(conf)
File "/home/hesong/local/RFdiffusion/rfdiffusion/inference/model_runners.py", line 130, in initialize
self.diffuser = Diffuser(**self._conf.diffuser, cache_dir=schedule_directory)
File "/home/hesong/local/RFdiffusion/rfdiffusion/diffusion.py", line 582, in init
self.so3_diffuser = IGSO3(
File "/home/hesong/local/RFdiffusion/rfdiffusion/diffusion.py", line 198, in init
self.igso3_vals = self._calc_igso3_vals(L=L)
File "/home/hesong/local/RFdiffusion/rfdiffusion/diffusion.py", line 233, in _calc_igso3_vals
igso3_vals = read_pkl(cache_fname)
File "/home/hesong/local/RFdiffusion/rfdiffusion/diffusion.py", line 144, in read_pkl
raise (e)
File "/home/hesong/local/RFdiffusion/rfdiffusion/diffusion.py", line 140, in read_pkl
return pickle.load(handle)
EOFError: Ran out of input
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Hi
Thank you for providing this excellent tool for protein design.
I have a question. Is it possible to use RFDiffusion for generating small molecule-binding protein assuming that there is a designed pocket (only separate amino acids that have good interactions with the molecule)?
Thanks
Hi all,
May I check if it is possible to run the code in this repo on a Intel Macbook without Nvidia GPU? I installed Pytorch but this error keeps coming up:
I installed Pytorch like this:
I searched Nvidia's website for CUDA tookit 11.1 but it seesm like there isn't an option for Mac.
If it is possible, may I know how I can install the missing packages?
Greatly appreciate any help! Thank you!
Hi RFdiffusion team, thank you for this great project and taking the time to make it public and write comprehensive documentation!
I was wondering if it's was possible to provide multiple sequence ranges to provide_seq
when doing partial diffusion? For instance, the following does not throw an error:
'contigmap.provide_seq=[0-383,498-580,692-821]'
However only AAs 0-383 appear to be unmasked. Am I missing something?
Thank you!
Hi,in the part of ‘Generation of Symmetric Oligomers’,
I saw the command is:
"./scripts/run_inference.py --config-name symmetry inference.symmetry=tetrahedral 'contigmap.contigs=[360]' inference.output_prefix=test_sample/tetrahedral inference.num_designs=1"
, but an error will be reported when running:
File ".../RFdiffusion/rfdiffusion/contigs.py", line 137, in get_sampled_mask
contig_list = self.contigs[0].strip().split()
AttributeError: 'int' object has no attribute 'strip'
the symmetry.yaml in config/inference is set as follow:
contigmap:
# Specify a single integer value to sample unconditionally.
# Must be evenly divisible by the number of chains in the symmetry.
contigs: ['100']
So, is there some problem in this part?
Finally, please standardize the content of the README.md, because there seem to be some differences between the README.md and the code,thanks
I've tried to use the cyclic symmetric mode to generate a set of symmetric loops between symmetric structural units (e.g. loops to join together helices in a helical bundle with cyclic symmetry).
However, while this kind of operation works to generate non-symmetric loops, when switching to use of symmetry it acts to always assume chain-breaks after the newly generated loops, such that they are never positioned so as to actually join the e.g. helices of the symmetric helical bundle.
Based on the text in the nickel design example, which says that chain breaks don't strictly need to be given the contigs when using symmetry, it seems this might be a known limitation of the symmetry mode. Is this true? Is it possible to circumvent this limiation?
This is really amazing!
Not sure what I am doing wrong, but conda
is telling me that it found some conflicts when creating the environment. So I removed the constraints, changed the channel from defaults
to conda-forge
and it solved my issue:
name: SE3nv
channels:
- conda-forge
- pytorch
- dglteam
dependencies:
- python
- pytorch
- torchaudio
- torchvision
- cudatoolkit
- dgl-cuda11.1
- pip
- pip:
- hydra-core
- pyrsistent
- icecream
Not sure neither if this is the way to go but seems to have worked for me! And all the examples I tried so far are running smoothly.
Computer config:
... again amazing!
Hello! I'm having a lot of fun playing around with this!
One feature that would be convenient to have would be a constant directory where all of the diffuser's pre-computed schedules live. Right now the inference code recomputes the schedule pkl if it doesn't exists in ./
. To avoid recomputing whenever you're running in a different directory, the constant cache directory ensures that the schedule is only computed once.
I've implemented a simple version of this by replacing line 116 in inference/model_runners.py
with
self.diffuser = Diffuser(**self._conf.diffuser, cache_dir=f'{SCRIPT_DIR}/../schedules')
instead of
self.diffuser = Diffuser(**self._conf.diffuser)
.
This assumes that there is a schedules
directory in the RFdiffusion folder where all pre-computed schedules will live. Only downside to this simple approach is that the cache directory couldn't be overwritten (but this is technically consistent with the model checkpoints).
Hi,
I've noticed that when using a symmetric mode (cyclic, dihedral, etc), it's not possible to supply a length range for the new diffused regions. A single value must be used, otherwise 'ValueError: Sequence length must be divisble by n' is returned.
I'm guessing it's because the lengths of the diffused regions in each chain/monomer aren't tied, so usually the total sequence length ends up indivisible by the oligomeric state?
Is this something that might be possible in future updates?
Thanks!
Ali
Dear developers, I am repeating the analysis of "Design of C3-symmetric oligomers to scaffold the binding interface of the designed ACE2 mimic". But I found RFDiffusion may change the orientation of motif protomer.
Data link: https://drive.google.com/drive/folders/19BZTqTx-uKEjVqGp06Ez2zufb7hgva-q?usp=share_link
The file 7uhc.pdb
has been centrelized by me can been accessed from this link. I used Chimera to prove it is C3 symmetry along z axis:
open 7uhc.pdb
delete #0:.B #0:.C #0:.E #0:.F
sym #0 group C3 axis z
open 7uhc.pdb
Then I used RFDiffusion to design the C3-symmetric oligomers with following command:
run_inference.py \
inference.symmetry=C3 \
inference.num_designs=1 \
inference.output_prefix=Spike_Symmetric_PPI/1_structure_design/Spike_Symmetric_PPI_1.0_0.1_Base \
'potentials.guiding_potentials=["type:olig_contacts,weight_intra:1.0,weight_inter:0.1"]' \
potentials.olig_intra_all=True \
potentials.olig_inter_all=True \
potentials.guide_scale=2 \
diffuser.T=50 \
potentials.guide_decay=quadratic \
inference.input_pdb=7uhc.pdb \
'contigmap.contigs=[D1-55/120/0 E1-55/120/0 F1-55/120/0]' \
inference.ckpt_override_path=models/Base_ckpt.pt
The output file can be accessed from this link. And I open 7uhc.pdb
and Spike_Symmetric_PPI_1.0_0.1_Base_0.pdb
with Chimera:
open 7uhc.pdb
open Spike_Symmetric_PPI_1.0_0.1_Base_0.pdb
delete #0:.A #0:.B #0:.C
display @CA
~ribbon
And I align model #1
to model #0
:
mm #0 #1
Only one motif can be perfectly matched.
After having built RFdiffusion's docker container and pulled it to our HPC, I have run the following test by using singularity:
singularity run --env TF_FORCE_UNIFIED_MEMORY=1,XLA_PYTHON_CLIENT_MEM_FRACTION=4.0,OPENMM_CPU_THREADS=10,HYDRA_FULL_ERROR=1 \
-B $HOME/outputs,$HOME/models,$HOME/inputs \
--pwd /app/RFdiffusion \
--nv $HOME/rfdiffusion/rfdiffusion_v1.1.0.sif \
inference.output_prefix=$HOME/outputs/motifscaffolding \
inference.model_directory_path=$HOME/models \
inference.input_pdb=$HOME/inputs/5TPN.pdb \
inference.num_designs=3 \
'contigmap.contigs=[10-40/A163-181/10-40]'
However, I got errors as follows:
Traceback (most recent call last):
File "/usr/lib/python3.9/pathlib.py", line 1313, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'outputs/2023-06-24/15-54-50'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.9/pathlib.py", line 1313, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'outputs/2023-06-24'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/RFdiffusion/scripts/run_inference.py", line 194, in <module>
main()
File "/usr/local/lib/python3.9/dist-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/usr/local/lib/python3.9/dist-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/usr/local/lib/python3.9/dist-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/usr/local/lib/python3.9/dist-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/usr/local/lib/python3.9/dist-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/usr/local/lib/python3.9/dist-packages/hydra/_internal/utils.py", line 458, in <lambda>
lambda: hydra.run(
File "/usr/local/lib/python3.9/dist-packages/hydra/_internal/hydra.py", line 119, in run
ret = run_job(
File "/usr/local/lib/python3.9/dist-packages/hydra/core/utils.py", line 146, in run_job
Path(str(output_dir)).mkdir(parents=True, exist_ok=True)
File "/usr/lib/python3.9/pathlib.py", line 1317, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/usr/lib/python3.9/pathlib.py", line 1317, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/usr/lib/python3.9/pathlib.py", line 1313, in mkdir
self._accessor.mkdir(self, mode)
OSError: [Errno 30] Read-only file system: 'outputs'
Can you please let me know what I did incorrectly?
Thanks a lot in advance.
First of all, thanks to the RFDiffusion team for this tool and making it open source! I'm excited to use it!
I've begun the installation process and have tried to create the conda environment from the SE3nv.yml file with conda env create -f env/SE3nv.yml
. But get the following error:
ResolvePackageNotFound:
- icecream
- cudatoolkit=11.1
I've resolved this issue by adding - nvidia
to the channels and moving icecream
to the pip installs. The final file looks like this:
name: SE3nv
channels:
- defaults
- pytorch
- dglteam
- nvidia
dependencies:
- python=3.9
- pytorch=1.9
- torchaudio
- torchvision
- cudatoolkit=11.1
- dgl-cuda11.1
- pip
- pip:
- icecream
- hydra-core
- pyrsistent
Not sure if this is the best way to handle creating the environment but seems to have worked for me!
First, thank you to the authors for releasing this code and model!
When I run RFDiffusion for binder design, the output .pdbs show the binder as polyglycine (expected) and the target protein with the original sequence (also expected). However, when you look at the structure, the target protein no longer has side chains, but only the backbone atoms are preserved (not what I expected). Is this a problem if I intend to use these .pdbs as inputs to ProteinMPNN? Or should I take the RFDiffusion backbone and make a new pdb with the original target structure with sidechains and all?
Hi, RF diffusion team
This is a great work.
I am trying to make a binder to a beta sheet, and i tried using Complex_beta_ckpt.pt.
The results showed that the binder sequences were GGGGGGGG....
What should i do to solve this problems?
My code is python ${script_path}/run_inference.py inference.output_prefix=out/design_ppi inference.input_pdb=input/target.pdb 'contigmap.contigs=[C1-62/0 50-79]' 'ppi.hotspot_res=[C2,C3,C14,C15,C16,C17,C18,C20]' inference.ckpt_override_path=${script_path}/models/Complex_beta_ckpt.pt inference.num_designs=10 denoiser.noise_scale_ca=0 denoiser.noise_scale_frame=0
Thank you.
Is there any prediction for when RF diffusion will be extended to allow for design of ligand-protein interactions?
Really awesome program so far!
Thanks!
Trying to use the tool in WSL2 with my RTX4090. Windows version doesn't work (see the issue #13 ).
Everything loads fine, but then I see an error:
(SE3nv) pavel@Gigabyte-PC:~/RFdiffusion$ python scripts/run_inference.py inference.model_directory_path=/mnt/d/Models/RFdiffusion 'contigmap.contigs=[150-150]' inference.output_prefix=test_outputs/test inference.num_designs=10
Reading models from /mnt/d/Models/RFdiffusion
[2023-04-06 15:59:06,421][rfdiffusion.inference.model_runners][INFO] - Reading checkpoint from /mnt/d/Models/RFdiffusion/Base_ckpt.pt
This is inf_conf.ckpt_path
/mnt/d/Models/RFdiffusion/Base_ckpt.pt
Assembling -model, -diffuser and -preprocess configs from checkpoint
USING MODEL CONFIG: self._conf[model][n_extra_block] = 4
USING MODEL CONFIG: self._conf[model][n_main_block] = 32
USING MODEL CONFIG: self._conf[model][n_ref_block] = 4
USING MODEL CONFIG: self._conf[model][d_msa] = 256
USING MODEL CONFIG: self._conf[model][d_msa_full] = 64
USING MODEL CONFIG: self._conf[model][d_pair] = 128
USING MODEL CONFIG: self._conf[model][d_templ] = 64
USING MODEL CONFIG: self._conf[model][n_head_msa] = 8
USING MODEL CONFIG: self._conf[model][n_head_pair] = 4
USING MODEL CONFIG: self._conf[model][n_head_templ] = 4
USING MODEL CONFIG: self._conf[model][d_hidden] = 32
USING MODEL CONFIG: self._conf[model][d_hidden_templ] = 32
USING MODEL CONFIG: self._conf[model][p_drop] = 0.15
USING MODEL CONFIG: self._conf[model][SE3_param_full] = {'num_layers': 1, 'num_channels': 32, 'num_degrees': 2, 'n_heads': 4, 'div': 4, 'l0_in_features': 8, 'l0_out_features': 8, 'l1_in_features': 3, 'l1_out_features': 2, 'num_edge_features': 32}
USING MODEL CONFIG: self._conf[model][SE3_param_topk] = {'num_layers': 1, 'num_channels': 32, 'num_degrees': 2, 'n_heads': 4, 'div': 4, 'l0_in_features': 64, 'l0_out_features': 64, 'l1_in_features': 3, 'l1_out_features': 2, 'num_edge_features': 64}
USING MODEL CONFIG: self._conf[model][d_time_emb] = 0
USING MODEL CONFIG: self._conf[model][d_time_emb_proj] = 10
USING MODEL CONFIG: self._conf[model][freeze_track_motif] = False
USING MODEL CONFIG: self._conf[model][use_motif_timestep] = True
USING MODEL CONFIG: self._conf[diffuser][T] = 50
USING MODEL CONFIG: self._conf[diffuser][b_0] = 0.01
USING MODEL CONFIG: self._conf[diffuser][b_T] = 0.07
USING MODEL CONFIG: self._conf[diffuser][schedule_type] = linear
USING MODEL CONFIG: self._conf[diffuser][so3_type] = igso3
USING MODEL CONFIG: self._conf[diffuser][crd_scale] = 0.25
USING MODEL CONFIG: self._conf[diffuser][so3_schedule_type] = linear
USING MODEL CONFIG: self._conf[diffuser][min_b] = 1.5
USING MODEL CONFIG: self._conf[diffuser][max_b] = 2.5
USING MODEL CONFIG: self._conf[diffuser][min_sigma] = 0.02
USING MODEL CONFIG: self._conf[diffuser][max_sigma] = 1.5
USING MODEL CONFIG: self._conf[preprocess][sidechain_input] = False
USING MODEL CONFIG: self._conf[preprocess][motif_sidechain_input] = True
USING MODEL CONFIG: self._conf[preprocess][d_t1d] = 22
USING MODEL CONFIG: self._conf[preprocess][d_t2d] = 44
USING MODEL CONFIG: self._conf[preprocess][prob_self_cond] = 0.5
USING MODEL CONFIG: self._conf[preprocess][str_self_cond] = True
USING MODEL CONFIG: self._conf[preprocess][predict_previous] = False
[2023-04-06 15:59:10,778][rfdiffusion.inference.model_runners][INFO] - Loading checkpoint.
[2023-04-06 15:59:13,459][rfdiffusion.diffusion][INFO] - Calculating IGSO3.
Successful diffuser __init__
[2023-04-06 15:59:17,256][__main__][INFO] - Making design test_outputs/test_0
[2023-04-06 15:59:17,260][rfdiffusion.inference.model_runners][INFO] - Using contig: ['150-150']
With this beta schedule (linear schedule, beta_0 = 0.04, beta_T = 0.28), alpha_bar_T = 0.00013696048699785024
[2023-04-06 15:59:17,271][rfdiffusion.inference.model_runners][INFO] - Sequence init: ------------------------------------------------------------------------------------------------------------------------------------------------------
Error executing job with overrides: ['inference.model_directory_path=/mnt/d/Models/RFdiffusion', 'contigmap.contigs=[150-150]', 'inference.output_prefix=test_outputs/test', 'inference.num_designs=10']
Traceback (most recent call last):
File "/home/pavel/RFdiffusion/scripts/run_inference.py", line 85, in main
px0, x_t, seq_t, plddt = sampler.sample_step(
File "/home/pavel/RFdiffusion/rfdiffusion/inference/model_runners.py", line 665, in sample_step
msa_prev, pair_prev, px0, state_prev, alpha, logits, plddt = self.model(msa_masked,
File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/pavel/RFdiffusion/rfdiffusion/RoseTTAFoldModel.py", line 114, in forward
msa, pair, R, T, alpha_s, state = self.simulator(seq, msa_latent, msa_full, pair, xyz[:,:,:3],
File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/pavel/RFdiffusion/rfdiffusion/Track_module.py", line 420, in forward
msa_full, pair, R_in, T_in, state, alpha = self.extra_block[i_m](msa_full,
File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/pavel/RFdiffusion/rfdiffusion/Track_module.py", line 332, in forward
R, T, state, alpha = self.str2str(msa, pair, R_in, T_in, xyz, state, idx, motif_mask=motif_mask, top_k=0)
File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/cuda/amp/autocast_mode.py", line 141, in decorate_autocast
return func(*args, **kwargs)
File "/home/pavel/RFdiffusion/rfdiffusion/Track_module.py", line 266, in forward
shift = self.se3(G, node.reshape(B*L, -1, 1), l1_feats, edge_feats)
File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/pavel/RFdiffusion/rfdiffusion/SE3_network.py", line 83, in forward
return self.se3(G, node_features, edge_features)
File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/se3_transformer-1.0.0-py3.9.egg/se3_transformer/model/transformer.py", line 140, in forward
basis = basis or get_basis(graph.edata['rel_pos'], max_degree=self.max_degree, compute_gradients=False,
File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/se3_transformer-1.0.0-py3.9.egg/se3_transformer/model/basis.py", line 167, in get_basis
spherical_harmonics = get_spherical_harmonics(relative_pos, max_degree)
File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/se3_transformer-1.0.0-py3.9.egg/se3_transformer/model/basis.py", line 58, in get_spherical_harmonics
sh = o3.spherical_harmonics(all_degrees, relative_pos, normalize=True)
File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/e3nn/o3/_spherical_harmonics.py", line 180, in spherical_harmonics
return sh(x)
File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/e3nn/o3/_spherical_harmonics.py", line 82, in forward
sh = _spherical_harmonics(self._lmax, x[..., 0], x[..., 1], x[..., 2])
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)
nvrtc compilation failed:
#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)
template<typename T>
__device__ T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}
template<typename T>
__device__ T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}
extern "C" __global__
void fused_pow_pow_pow_su_9196483836509741110(float* tz_1, float* ty_1, float* tx_1, float* aten_mul, float* aten_mul_1, float* aten_mul_2, float* aten_sub, float* aten_add, float* aten_mul_3, float* aten_pow) {
{
if (512 * blockIdx.x + threadIdx.x<22350 ? 1 : 0) {
float ty_1_1 = __ldg(ty_1 + 3 * (512 * blockIdx.x + threadIdx.x));
aten_pow[512 * blockIdx.x + threadIdx.x] = ty_1_1 * ty_1_1;
float tz_1_1 = __ldg(tz_1 + 3 * (512 * blockIdx.x + threadIdx.x));
float tx_1_1 = __ldg(tx_1 + 3 * (512 * blockIdx.x + threadIdx.x));
aten_mul_3[512 * blockIdx.x + threadIdx.x] = (float)((double)(tz_1_1 * tz_1_1 - tx_1_1 * tx_1_1) * 0.8660254037844386);
aten_add[512 * blockIdx.x + threadIdx.x] = tx_1_1 * tx_1_1 + tz_1_1 * tz_1_1;
aten_sub[512 * blockIdx.x + threadIdx.x] = ty_1_1 * ty_1_1 - (float)((double)(tx_1_1 * tx_1_1 + tz_1_1 * tz_1_1) * 0.5);
aten_mul_2[512 * blockIdx.x + threadIdx.x] = (float)((double)(ty_1_1) * 1.732050807568877) * tz_1_1;
aten_mul_1[512 * blockIdx.x + threadIdx.x] = (float)((double)(tx_1_1) * 1.732050807568877) * ty_1_1;
aten_mul[512 * blockIdx.x + threadIdx.x] = (float)((double)(tx_1_1) * 1.732050807568877) * tz_1_1;
}
}
}
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
How can I perform partial diffusion for all residues except 3 (residues of an active site)? I tried to fix them in 'contigs', but they were moving anyway. I also tried to use models/ActiveSite_ckpt.pt, but it ruined the fold.
I got error with paras like this, i wonder if it is a incompatible problem that i try to design binder for more than one molecule? ['contigmap.contigs=[C/0 F/0 100-120]', 'ppi.hotspot_res=[C80,C82,C86,C87,C90,C92,C93,E128,C138,C185,C187]', 'denoiser.noise_scale_ca=0', 'denoiser.noise_scale_frame=0'
Traceback (most recent call last):
File "/Share/app/RFdiffusion/scripts/run_inference.py", line 194, in
main()
File "/Share/app/miniconda3.9/envs/SE3nv/lib/python3.9/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/Share/app/miniconda3.9/envs/SE3nv/lib/python3.9/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/Share/app/miniconda3.9/envs/SE3nv/lib/python3.9/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/Share/app/miniconda3.9/envs/SE3nv/lib/python3.9/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/Share/app/miniconda3.9/envs/SE3nv/lib/python3.9/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/Share/app/miniconda3.9/envs/SE3nv/lib/python3.9/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/Share/app/miniconda3.9/envs/SE3nv/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/Share/app/miniconda3.9/envs/SE3nv/lib/python3.9/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/Share/app/miniconda3.9/envs/SE3nv/lib/python3.9/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/Share/app/RFdiffusion/scripts/run_inference.py", line 84, in main
x_init, seq_init = sampler.sample_init()
File "/Share/app/RFdiffusion/rfdiffusion/inference/model_runners.py", line 278, in sample_init
self.contig_map = self.construct_contig(self.target_feats)
File "/Share/app/RFdiffusion/rfdiffusion/inference/model_runners.py", line 240, in construct_contig
return ContigMap(target_feats, **self.contig_conf)
File "/Share/app/RFdiffusion/rfdiffusion/contigs.py", line 78, in init
) = self.expand_sampled_mask()
File "/Share/app/RFdiffusion/rfdiffusion/contigs.py", line 225, in expand_sampled_mask
int(subcon.split("-")[0][1:]), int(subcon.split("-")[1]) + 1
ValueError: invalid literal for int() with base 10: ''
Hi, RF diffusion team
The results showed that every designed residue is output as glycine, and we should use ProteinMPNN to assign these residues.
My question is, if we want to fix some residues from the input structure (e.g., enzyme design or scaffold functional motif, I want to fix the activate site or functional motif), how to specify in ProteinMPNN?
Thank you.
Hi,
thank you for providing this code and the examples! Suppose I have a .pdb file of a protein with a long alpha helix at the N-terminus. Is it possible to extend this helix by a helix bundle, i.e. to generate an N-terminal fusion of a three helix bundle to my target protein? The helical bundles that are generated by the design_ppi script would be perfect.
Thanks so much to the RosettaCommons team for open-sourcing RFdiffusion! I'm looking forward to getting started.
Unfortunately, something appears to have gone wrong with the folder organization of my installation. I'm trying to execute the first example in the README, generating unconstrained backbones with 150 residues. Here is my attempt to execute, and my output:
(SE3nv) john@john-Desktop:~/RFdiffusion$ ./scripts/run_inference.py 'contigmap.contigs=[150-150]' inference.output_prefix=test_outputs/test inference.num_designs=10
Traceback (most recent call last):
File "/home/john/RFdiffusion/./scripts/run_inference.py", line 24, in <module>
from rfdiffusion.util import writepdb_multi, writepdb
ModuleNotFoundError: No module named 'rfdiffusion'
So run_inference.py
is being found -- but it's looking for the util.py
file somewhere other than where it is, which is /home/john/RFdiffusion/rfdiffusion
.
I know how to reorganize Python folders and/or to provide a folder link, to hack my way around this problem, but I'm concerned that I would cause other problems by doing so.
Please advise, thanks.
I'd like to finetune RFdiffusion to fit my data.
However, there seems no training code provided for now.
Is there any plan to release training code?
Thanks!
Hi team, congratulation on the great paper and milestone results.
I've collected multiple remarks on the code while I was studying it, hopefully this will help in improving the codebase.
terminology
dependencies
ignored parameters/code
* self.guide_scale
is missing here RFdiffusion/rfdiffusion/potentials/manager.py
Line 202 in 92b83de
distributing weights
computations
Sergey's one hot trick
(used in multiple places) - strangely, created embedding layer is never used. Better just create a linear module, or only a trainable parameterRFdiffusion/rfdiffusion/kinematics.py
Lines 293 to 295 in 92b83de
checkpointing
lambda *args: module(*args, topk=...)
duplicated code
Minor:
Hello authors,
Thank you for providing this fantastic code for protein design! I was wondering if there is a smart way to incorporate some sequence information in binder design process or add a hot spot potential in partial denoise process. I was trying to optimize a binder. If I use the partial denoise pipeline, I cannot define the hotspot; If I use the binder design pipeline, I cannot fix the part of the sequence of interest. I assume there should be a way to combine both features because they are simply two different potential functions. Could you let me know if that's possible or some clue for me to implement this feature? Thank you very much!
Best,
Shuhao
here shows the partial comparison between original stucture and the one after diffusion. confusingly, i always get wrong connection within those part that i did not declare to be disigned.
i use /0 as the sign for chain break but seem to be failed, since chain E and F was connected wrong
i havent figure out the exact rule behind, any advice will be appreciated!
A different driver version (in my case: Driver Version: 510.108.03 CUDA Version: 11.6) fails to install the cuda version of pytorch, resulting in the runtime error when trying to run all of the examples.
This is probably due to the fact that there's no cudatoolkit version 11.1 (which is required by the original SE3nv yaml) for this driver.
To solve it, I installed a cudatoolkit11.6 and pytorch1.12.1, I attached the export of my environment, in case someone else encounters this problem
Thanks for sharing RFDiffusion.
I'm using it on a T4 GPU. It takes 30-60seconds for this:
RFdiffusion-main/scripts/run_inference.py 'contigmap.contigs=[10-20]' inference.output_prefix=test_outputs/test inference.num_designs=1
Cuda is also available.
How could I make sure if it is using GPU?
This is my dockerfile:
FROM nvidia/cuda:11.1.1-cudnn8-runtime-ubuntu20.04
ENV PATH="/root/miniconda3/bin:${PATH}"
ARG PATH="/root/miniconda3/bin:${PATH}"
RUN apt-get update
RUN apt-get install -y wget git && rm -rf /var/lib/apt/lists/*
RUN wget \
https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& mkdir /root/.conda \
&& bash Miniconda3-latest-Linux-x86_64.sh -b \
&& rm -f Miniconda3-latest-Linux-x86_64.sh
RUN conda --version
COPY RFdiffusion-main RFdiffusion-main
RUN conda env create -f RFdiffusion-main/env/SE3nv.yml
RUN echo "conda activate SE3nv" >> ~/.bashrc
SHELL ["/bin/bash", "--login", "-c"]
SHELL ["conda", "run", "--no-capture-output", "-n", "SE3nv", "/bin/bash", "-c"]
RUN pip install --no-cache-dir -r RFdiffusion-main/env/SE3Transformer/requirements.txt
RUN python RFdiffusion-main/env/SE3Transformer/setup.py install
RUN pip install -e RFdiffusion-main
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
COPY entrypoint.sh entrypoint.sh
RUN chmod +x entrypoint.sh
RUN pip install se3-transformer-pytorch
RUN pip install -e RFdiffusion-main/env/SE3Transformer
RUN pip install --force-reinstall torch torchvision torchaudio
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.