bjing2016 / alphaflow Goto Github PK
View Code? Open in Web Editor NEWAlphaFold Meets Flow Matching for Generating Protein Ensembles
License: MIT License
AlphaFold Meets Flow Matching for Generating Protein Ensembles
License: MIT License
Hi,
I was wondering if there is a new wheel for cuda12 installation? I tried this on a debian 12 system with cuda12, however the setup fails to build the wheel. I also tried with cuda11.3, where it fails because debian12 has a g++ version 12 and it requires a g++ version <=10.
Thanks
Hi, very interesting work!
I wonder if it is possible to diffuse only the selected residues?
Best,
I have pasted the error below. I am attempting to use the AlphaFlow MD + Template model and I am using this model + sequence:
https://alphafold.ebi.ac.uk/entry/A0A2P6NC61
predict.py 133
main()
_contextlib.py 115 decorate_context
return func(*args, **kwargs)
predict.py 119 main
prots = model.inference(batch, as_protein=True, noisy_first=args.noisy_first,
wrapper.py 374 inference
output = self.model(batch, prev_outputs=prev_outputs)
module.py 1532 _wrapped_call_impl
return self._call_impl(*args, **kwargs)
module.py 1541 _call_impl
return forward_call(*args, **kwargs)
alphafold.py 240 forward
extra_pseudo_beta = pseudo_beta_fn(batch['aatype'], batch['extra_all_atom_positions'], None)
feats.py 38 pseudo_beta_fn
pseudo_beta = torch.where(
RuntimeError:
The size of tensor a (184) must match the size of tensor b (183) at non-singleton dimension 1
[!!] 2024-05-15 16:55:47,353 Command 'source activate AlphaFlow; python alphaflow/predict.py --mode alphafold --input_csv alphaflow_input.csv --msa_dir AlphaFlow_MSA_Results --weights alphaflow/alphaflow_md_templates_base_202402.pt --samples 10 --outpdb upload/ --templates_dir alphaflow_template' returned non-zero exit status 1. (main.py:252)
Hi,
I have a problem.
When I sample many samples(at least 10000 & Law of Large Numbers) from the Hamonic Prior, the mean distance of adjacent alpha carbons is not approximately 3.8A as stated in this paper.
Hi,
Thank you for your impressive work.
There are no help messages for prediction options.
parser.add_argument('--input_csv', type=str, default='splits/transporters_only.csv')
parser.add_argument('--templates_dir', type=str, default='./data')
parser.add_argument('--msa_dir', type=str, default='./alignment_dir')
parser.add_argument('--mode', choices=['alphafold', 'esmfold'], default='alphafold')
parser.add_argument('--samples', type=int, default=10)
parser.add_argument('--steps', type=int, default=10)
parser.add_argument('--outpdb', type=str, default='./outpdb/default')
parser.add_argument('--weights', type=str, default=None)
parser.add_argument('--ckpt', type=str, default=None)
parser.add_argument('--original_weights', action='store_true')
parser.add_argument('--pdb_id', nargs='*', default=[])
parser.add_argument('--subsample', type=int, default=None)
parser.add_argument('--resample', action='store_true')
parser.add_argument('--tmax', type=float, default=1.0)
parser.add_argument('--templates', action='store_true')
parser.add_argument('--no_diffusion', action='store_true', default=False)
parser.add_argument('--self_cond', action='store_true', default=False)
parser.add_argument('--noisy_first', action='store_true', default=False)
parser.add_argument('--runtime_json', type=str, default=None)
parser.add_argument('--no_overwrite', action='store_true', default=False)
Could you add these messages or describe for what each option is in README?
I could figure out some of them after reading the paper and README but some are still not clear so much.
Could you provide a version that can be trained using PDBs?
In addition, you must specify the mdtraj version for the installation, otherwise the updated mdtraj may install other versions numpy and scipy.
Any chance to adopt this for protein-protein interfaces?
I built the docker file from commit 2c27c69. When running predict.py the following stack trace is generated,
root@3f4467776483:/opt/alphaflow# /opt/conda/bin/python -V
Python 3.9.7
root@3f4467776483:/opt/alphaflow# /opt/conda/bin/python /opt/alphaflow/predict.py
/opt/conda/lib/python3.9/site-packages/openfold-1.0.1-py3.9-linux-x86_64.egg/openfold/data/templates.py:88: FutureWarning: In the future `np.object` will be defined as the corresponding NumPy scalar.
"template_domain_names": np.object,
Traceback (most recent call last):
File "/opt/alphaflow/predict.py", line 29, in <module>
from alphaflow.data.data_modules import collate_fn
File "/opt/alphaflow/alphaflow/data/data_modules.py", line 32, in <module>
from alphaflow.data import data_pipeline, feature_pipeline
File "/opt/alphaflow/alphaflow/data/data_pipeline.py", line 22, in <module>
from openfold.data import templates, parsers, mmcif_parsing
File "/opt/conda/lib/python3.9/site-packages/openfold-1.0.1-py3.9-linux-x86_64.egg/openfold/data/templates.py", line 88, in <module>
"template_domain_names": np.object,
File "/opt/conda/lib/python3.9/site-packages/numpy/__init__.py", line 324, in __getattr__
raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'object'.
`np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
I note that when the container is built that numpy 1.21.2 is first installed, and later, uninstalled when mdtraj is built, leaving 1.26 instead.
Stored in directory: /root/.cache/pip/wheels/4f/17/89/f855cce8e6394e9029e1b972cb623c8813b706d3d1ca81832f
Successfully built mdtraj
Installing collected packages: fair-esm, typing-extensions, pyparsing, numpy, astunparse, scipy, mdtraj, pytorch_lightning
Attempting uninstall: typing-extensions
Found existing installation: typing-extensions 3.10.0.2
Uninstalling typing-extensions-3.10.0.2:
Successfully uninstalled typing-extensions-3.10.0.2
Attempting uninstall: numpy
Found existing installation: numpy 1.21.2
**Uninstalling numpy-1.21.2:
Successfully uninstalled numpy-1.21.2**
Attempting uninstall: scipy
Found existing installation: scipy 1.7.3
Uninstalling scipy-1.7.3:
Successfully uninstalled scipy-1.7.3
Attempting uninstall: pytorch_lightning
Found existing installation: pytorch-lightning 1.5.10
Uninstalling pytorch-lightning-1.5.10:
Successfully uninstalled pytorch-lightning-1.5.10
Successfully installed astunparse-1.6.3 fair-esm-2.0.0 mdtraj-1.10.0 **numpy-1.26.4** pyparsing-3.1.2 pytorch_lightning-2.0.4 scipy-1.13.1 typing-extensions-4.12.2
I am going to try pinning numpy 1.21.2 through out the pip installations, please advise if there is a better/different route.
Hi all! Thanks for your work.
I'm reaching out to inquire about the availability of MD ensemble evaluation scripts, particularly for metrics beyond RMSD and RMSF. While these two metrics are relatively straightforward to generate, I've found challenges reproducing others like Root Mean W2-Dist and PCAs.
Could u provide guidance or scripts to assist with calculating these metrics? Really thanks for your help.
Best,
Shaoning
Thank you for this fantastic repository! Would it be possible to provide a Google Colab demo for running the selected model? It would be extremely helpful for quick tests.
Thank you!
Hello,I put weights, csv and a3m in folders (Especially, a3m in /cluster/home/xxx/alphaflow/splits/6DS0_A/a3m/6DS0_A.a3m), and run following code:
python predict.py --mode alphafold --input_csv /cluster/home/xxx/alphaflow/splits/6DS0_test.csv --msa_dir /cluster/home/xxx/alphaflow/splits --weights /cluster/home/xxx/alphaflow/splits/alphaflow_pdb_base_202402.pt --samples 200 --outpdb /cluster/home/xxx/alphaflow/splits/output --self_cond --resample
then I meet the error:
2024-02-22 13:45:48,506 [node83:61063] [INFO] Loading the model
Traceback (most recent call last):
File "/cluster/home/xxx/alphaflow/predict.py", line 132, in <module>
main()
File "/cluster/home/xxx/.conda/envs/alphaflow/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/cluster/home/xxx/alphaflow/predict.py", line 78, in main
model = model_class(**ckpt['hyper_parameters'], training=False)
File "/cluster/home/xxx/alphaflow/alphaflow/model/wrapper.py", line 496, in __init__
self.model = AlphaFold(config,
File "/cluster/home/xxx/alphaflow/alphaflow/model/alphafold.py", line 73, in __init__
self.extra_msa_stack = ExtraMSAStack(
TypeError: __init__() missing 2 required positional arguments: 'opm_first' and 'fuse_projection_weights'
Could you offer me some help to solve it? Thanks.
Hi,
Thanks for the wonderful work. I am planning on doing some conformation sampling using this work, but unfortunately it seems like the hard requirement of CUDA 11.6 is an issue. I've tried different installers on CUDA 12 and can't get it to work but unfortunately the machines I have access to are all CUDA 12.
It seems like OpenFold has a branch pl_upgrades
that can support CUDA 12+ - would it be possible to do a port forward on this branch or provide a Dockerfile?
Thanks.
Hi! Great work!
Is multimer supported as in ESMFold?
I was trying to use a separation token ":" as in ESM but it doesn't seem to work.
Hi,
I've been trying to do some experiments using your model and scripts and running into a problem. The error arises when using the following command from a testing directory in the main project location:
python3 ../predict.py \
--weights ../model_weights/alphaflow_pdb_distilled_202402.pt \
--mode alphafold \
--input_csv ghsr.csv \
--msa_dir ./msas \
--samples 10 \
--outpdb ./out \
--noisy_first \
--no_diffusion
I get the error:
2024-02-27 08:10:02,292 [iwe547170:52263] [INFO] Loading the model
Traceback (most recent call last):
File "/media/data/software/alphaflow/workdir/../predict.py", line 138, in <module>
main()
File "/home/iwe34/anaconda3/envs/alphaflow/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/media/data/software/alphaflow/workdir/../predict.py", line 84, in main
model = model_class(**ckpt['hyper_parameters'], training=False)
File "/media/data/software/alphaflow/alphaflow/model/wrapper.py", line 496, in __init__
self.model = AlphaFold(config,
File "/media/data/software/alphaflow/alphaflow/model/alphafold.py", line 77, in __init__
self.evoformer = EvoformerStack(
TypeError: __init__() missing 1 required positional argument: 'no_column_attention'
To figure out why that happened, I modified the alphaflow/config.py
file by manually adding the flag no_column_attention: False
and realized later, that this config is only used when the predict.py
script is called with the additional flag --original_weights=True
.
However, inspecting the loaded config ckpt
from the lines
if args.weights:
ckpt = torch.load(args.weights, map_location='cpu')
model = model_class(**ckpt['hyper_parameters'], training=False)
showed that the model weights alphaflow_pdb_distilled_202402.pt
doesn't contain the no_column_attention
field. It worked fine when I used the original weights params_model_1.npz
(apart from getting a CUDA error, another problem).
Simple question: What am I doing wrong? Why can I provide new model weights when these could never be used because the EvoformerStack
in openfold
requires this argument?
Hi BoWen, i am using alphaflow train.py to trying run, but find error with below:
AttributeError: 'AlphaFoldWrapper' object has no attribute 'extra_msa_stack
and i am using:
python train.py --lr 5e-4 --noise_prob 0.8 --accumulate_grad 8 --train_epoch_len 80000 --train_cutoff 2018-05-01 --filter_chains --train_data_dir ../unpack_mmcif_out --train_msa_dir ../openfold/pdb --mmcif_dir ../pdb_mmcif --val_msa_dir ../openfold/alignment_db --run_name alphaflow_train
Is need add extra_msa_stack in AlphaFoldWrapper class or drop this line?
I believe your code has some discrepancies when compared to the pseudocode in your article.
Does this pseudocode correspond to your code in wrapper.py ModelWrapper.distillation_training_step?
This holds the same in ModelWrapper.inference.
The atoms in the PDB output seems to be clustered together very densely, which makes it an abnormal protein structure.
Great work on the latest version of the paper and thanks for putting this repo out.
I was trying to test the basic inference you outlined using either the ESMFlow or AlphaFlow models and weights and ran into problems at every corner. I'll detail my specific issues below but repos always get increased usage when authors provide at least one full example input line for inference, so if you provide that I'm sure it would help many people checking out your code. Thanks!
Trying ESMFlow Model
mkdir output
mkdir weights
python predict.py --mode esmfold --input_csv splits/atlas_test.csv --weights weights/esmflow_md_distilled_202402.pt --samples 5 --outpdb output/
Output
2024-02-26 12:54:34,511 [---] [INFO] Loading the model
2024-02-26 12:55:16,878 [---] [INFO] Model has been loaded
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:25<00:00, 5.08s/it]
Traceback (most recent call last):
File "/---/alphaflow/predict.py", line 132, in
main()
File "/---/miniconda3/envs/AlphaFlow/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/---/alphaflow/predict.py", line 126, in main
f.write(protein.prots_to_pdb(result))
File "/---/alphaflow/alphaflow/utils/protein.py", line 163, in prots_to_pdb
prot = to_pdb(prot)
File "/---/miniconda3/envs/AlphaFlow/lib/python3.9/site-packages/openfold/np/protein.py", line 341, in to_pdb
chain_index = prot.chain_index.astype(np.int32)
AttributeError: 'NoneType' object has no attribute 'astype'
Tried with esmflow_pdb_base_202402.pt weights as well...same result.
Trying AlphaFlow Model
Preparing the MSA
python -m scripts.mmseqs_query --split splits/atlas_test.csv --outdir output
COMPLETE: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 450/450 [elapsed: 00:02 remaining: 00:00]
SUCCESS!
Running Inference
python predict.py --mode alphafold --input_csv splits/atlas_test.csv --msa_dir output/ --weights weights/alphaflow_pdb_distilled_202402.pt --samples 5 --outpdb output/
2024-02-26 13:17:56,383 [---] [INFO] Loading the model
Traceback (most recent call last):
File "/---/alphaflow/predict.py", line 132, in
main()
File "/---/miniconda3/envs/AlphaFlow/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/---/alphaflow/predict.py", line 78, in main
model = model_class(**ckpt['hyper_parameters'], training=False)
File "/---/alphaflow/alphaflow/model/wrapper.py", line 496, in init
self.model = AlphaFold(config,
File "/---/alphaflow/alphaflow/model/alphafold.py", line 73, in init
self.extra_msa_stack = ExtraMSAStack(
TypeError: init() missing 2 required positional arguments: 'opm_first' and 'fuse_projection_weights'
Thanks again for your assistance. Looking forward to trying out this great work.
It appears that the scripts.prep_atlas.py file for the Atlas data processing has been removed in the most recent update.
I tried to preprocess pdbs into NPZ files using "scripts/unpack_mmcif.py"
e.g.:
$ python -m scripts.unpack_mmcif.py --mmcif_dir ../testpdb/data_dir/ --outdir ../testpdb/outesmflow/
but it tries to load betafold:
"from betafold.data.data_pipeline import DataPipeline"
and fails:
"ModuleNotFoundError: No module named 'betafold'"
I tried to substitute "betafold.data.data_pipeline" for "alphaflow.data.data_pipeline" but I run into other issues that make me believe that "unpack_mmcif.py" is an outdated file.
Can you confirm this?
Hi,
I'm trying to install AlphaFlow on a machine with A30 GPUs with CUDA 12.1 and even tough I found a compatible pytorch version I gett the following error after running the command: pip install 'openfold @ git+https://github.com/aqlaboratory/openfold.git@103d037':
"In file included from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/extension.h:5,
from openfold/utils/kernel/csrc/softmax_cuda_kernel.cu:18:
/home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:2: error: #error C++17 or later compatible compiler is required to use PyTorch.
4 | #error C++17 or later compatible compiler is required to use PyTorch.
| ^~~~~
In file included from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/c10/util/string_view.h:4,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/c10/util/StringUtil.h:6,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/c10/util/Exception.h:5,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/c10/core/Device.h:5,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/ATen/core/TensorBody.h:11,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/ATen/core/Tensor.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/ATen/Tensor.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/extension.h:5,
from openfold/utils/kernel/csrc/softmax_cuda_kernel.cu:18:
/home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/c10/util/C++17.h:27:2: error: #error You need C++17 to compile PyTorch
27 | #error You need C++17 to compile PyTorch
| ^~~~~
In file included from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:4,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/all.h:9,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/extension.h:5,
from openfold/utils/kernel/csrc/softmax_cuda_kernel.cu:18:
/home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/ATen/ATen.h:4:2: error: #error C++17 or later compatible compiler is required to use ATen.
4 | #error C++17 or later compatible compiler is required to use ATen.
| ^~~~~
error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1"
My gcc version is 11.3
Hi @bjing2016 ! Extremely useful work!
I have some PDBs with jumps in sequences (i.e. excluded IDRs).
I wonder if PDB as input is possible?
If not, would appreciate if it becomes available in the future.
Hi,
I have some long sequences that can't be predicted using alphaflow default settings.
To deal with long ones I changed arguments to predict as below.
config = model_config(
'initial_training',
train=False,
low_prec=False
long_sequence_inference=True
)
For long sequence prediction initial training is not needed? I'm afraid it results in decrease in precision. I want conformer ensembles with wide ranges of conformations and high precision.
There seems no description about the best practice or settings to predict proteins with long sequences.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.