deepmind / alphafold Goto Github PK

Open source code for AlphaFold.

License: Apache License 2.0

Python 92.21% Dockerfile 0.38% Shell 1.99% Jupyter Notebook 5.42%

alphafold's Introduction

AlphaFold

This package provides an implementation of the inference pipeline of AlphaFold v2. For simplicity, we refer to this model as AlphaFold throughout the rest of this document.

We also provide:

An implementation of AlphaFold-Multimer. This represents a work in progress and AlphaFold-Multimer isn't expected to be as stable as our monomer AlphaFold system. Read the guide for how to upgrade and update code.
The technical note containing the models and inference procedure for an updated AlphaFold v2.3.0.
A CASP15 baseline set of predictions along with documentation of any manual interventions performed.

Any publication that discloses findings arising from using this source code or the model parameters should cite the AlphaFold paper and, if applicable, the AlphaFold-Multimer paper.

Please also refer to the Supplementary Information for a detailed description of the method.

You can use a slightly simplified version of AlphaFold with this Colab notebook or community-supported versions (see below).

If you have any questions, please contact the AlphaFold team at [email protected].

Installation and running your first prediction

You will need a machine running Linux, AlphaFold does not support other operating systems. Full installation requires up to 3 TB of disk space to keep genetic databases (SSD storage is recommended) and a modern NVIDIA GPU (GPUs with more memory can predict larger protein structures).

Please follow these steps:

Install Docker.
- Install NVIDIA Container Toolkit for GPU support.
- Setup running Docker as a non-root user.

Clone this repository and cd into it.

git clone https://github.com/deepmind/alphafold.git
cd ./alphafold

Download genetic databases and model parameters:
- Install aria2c. On most Linux distributions it is available via the package manager as the aria2 package (on Debian-based distributions this can be installed by running sudo apt install aria2).
- Please use the script scripts/download_all_data.sh to download and set up full databases. This may take substantial time (download size is 556 GB), so we recommend running this script in the background:
```
scripts/download_all_data.sh <DOWNLOAD_DIR> > download.log 2> download_all.log &
```
- Note: The download directory <DOWNLOAD_DIR> should not be a subdirectory in the AlphaFold repository directory. If it is, the Docker build will be slow as the large databases will be copied into the docker build context.
- It is possible to run AlphaFold with reduced databases; please refer to the complete documentation.
Check that AlphaFold will be able to use a GPU by running:
```
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
```
The output of this command should show a list of your GPUs. If it doesn't, check if you followed all steps correctly when setting up the NVIDIA Container Toolkit or take a look at the following NVIDIA Docker issue.

If you wish to run AlphaFold using Singularity (a common containerization platform on HPC systems) we recommend using some of the third party Singularity setups as linked in #10 or #24.

Build the Docker image:

docker build -f docker/Dockerfile -t alphafold .

If you encounter the following error:

W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is not signed.

use the workaround described in #463 (comment).

Install the run_docker.py dependencies. Note: You may optionally wish to create a Python Virtual Environment to prevent conflicts with your system's Python environment.
```
pip3 install -r docker/requirements.txt
```
Make sure that the output directory exists (the default is /tmp/alphafold) and that you have sufficient permissions to write into it.
Run run_docker.py pointing to a FASTA file containing the protein sequence(s) for which you wish to predict the structure (--fasta_paths parameter). AlphaFold will search for the available templates before the date specified by the --max_template_date parameter; this could be used to avoid certain templates during modeling. --data_dir is the directory with downloaded genetic databases and --output_dir is the absolute path to the output directory.
```
python3 docker/run_docker.py \
  --fasta_paths=your_protein.fasta \
  --max_template_date=2022-01-01 \
  --data_dir=$DOWNLOAD_DIR \
  --output_dir=/home/user/absolute_path_to_the_output_dir
```
Once the run is over, the output directory shall contain predicted structures of the target protein. Please check the documentation below for additional options and troubleshooting tips.

Genetic databases

This step requires aria2c to be installed on your machine.

AlphaFold needs multiple genetic (sequence) databases to run:

BFD,
MGnify,
PDB70,
PDB (structures in the mmCIF format),
PDB seqres – only for AlphaFold-Multimer,
UniRef30 (FKA UniClust30),
UniProt – only for AlphaFold-Multimer,
UniRef90.

We provide a script scripts/download_all_data.sh that can be used to download and set up all of these databases:

Recommended default:
```
scripts/download_all_data.sh <DOWNLOAD_DIR>
```
will download the full databases.
With reduced_dbs parameter:
```
scripts/download_all_data.sh <DOWNLOAD_DIR> reduced_dbs
```
will download a reduced version of the databases to be used with the reduced_dbs database preset. This shall be used with the corresponding AlphaFold parameter --db_preset=reduced_dbs later during the AlphaFold run (please see AlphaFold parameters section).

📒 Note: The download directory <DOWNLOAD_DIR> should not be a subdirectory in the AlphaFold repository directory. If it is, the Docker build will be slow as the large databases will be copied during the image creation.

We don't provide exactly the database versions used in CASP14 – see the note on reproducibility. Some of the databases are mirrored for speed, see mirrored databases.

📒 Note: The total download size for the full databases is around 556 GB and the total size when unzipped is 2.62 TB. Please make sure you have a large enough hard drive space, bandwidth and time to download. We recommend using an SSD for better genetic search performance.

📒 Note: If the download directory and datasets don't have full read and write permissions, it can cause errors with the MSA tools, with opaque (external) error messages. Please ensure the required permissions are applied, e.g. with the sudo chmod 755 --recursive "$DOWNLOAD_DIR" command.

The download_all_data.sh script will also download the model parameter files. Once the script has finished, you should have the following directory structure:

$DOWNLOAD_DIR/                             # Total: ~ 2.62 TB (download: 556 GB)
    bfd/                                   # ~ 1.8 TB (download: 271.6 GB)
        # 6 files.
    mgnify/                                # ~ 120 GB (download: 67 GB)
        mgy_clusters_2022_05.fa
    params/                                # ~ 5.3 GB (download: 5.3 GB)
        # 5 CASP14 models,
        # 5 pTM models,
        # 5 AlphaFold-Multimer models,
        # LICENSE,
        # = 16 files.
    pdb70/                                 # ~ 56 GB (download: 19.5 GB)
        # 9 files.
    pdb_mmcif/                             # ~ 238 GB (download: 43 GB)
        mmcif_files/
            # About 199,000 .cif files.
        obsolete.dat
    pdb_seqres/                            # ~ 0.2 GB (download: 0.2 GB)
        pdb_seqres.txt
    small_bfd/                             # ~ 17 GB (download: 9.6 GB)
        bfd-first_non_consensus_sequences.fasta
    uniref30/                              # ~ 206 GB (download: 52.5 GB)
        # 7 files.
    uniprot/                               # ~ 105 GB (download: 53 GB)
        uniprot.fasta
    uniref90/                              # ~ 67 GB (download: 34 GB)
        uniref90.fasta

bfd/ is only downloaded if you download the full databases, and small_bfd/ is only downloaded if you download the reduced databases.

Model parameters

While the AlphaFold code is licensed under the Apache 2.0 License, the AlphaFold parameters and CASP15 prediction data are made available under the terms of the CC BY 4.0 license. Please see the Disclaimer below for more detail.

The AlphaFold parameters are available from https://storage.googleapis.com/alphafold/alphafold_params_2022-12-06.tar, and are downloaded as part of the scripts/download_all_data.sh script. This script will download parameters for:

5 models which were used during CASP14, and were extensively validated for structure prediction quality (see Jumper et al. 2021, Suppl. Methods 1.12 for details).
5 pTM models, which were fine-tuned to produce pTM (predicted TM-score) and (PAE) predicted aligned error values alongside their structure predictions (see Jumper et al. 2021, Suppl. Methods 1.9.7 for details).
5 AlphaFold-Multimer models that produce pTM and PAE values alongside their structure predictions.

Updating existing installation

If you have a previous version you can either reinstall fully from scratch (remove everything and run the setup from scratch) or you can do an incremental update that will be significantly faster but will require a bit more work. Make sure you follow these steps in the exact order they are listed below:

Update the code.
- Go to the directory with the cloned AlphaFold repository and run git fetch origin main to get all code updates.
Update the UniProt, UniRef, MGnify and PDB seqres databases.
- Remove <DOWNLOAD_DIR>/uniprot.
- Run scripts/download_uniprot.sh <DOWNLOAD_DIR>.
- Remove <DOWNLOAD_DIR>/uniclust30.
- Run scripts/download_uniref30.sh <DOWNLOAD_DIR>.
- Remove <DOWNLOAD_DIR>/uniref90.
- Run scripts/download_uniref90.sh <DOWNLOAD_DIR>.
- Remove <DOWNLOAD_DIR>/mgnify.
- Run scripts/download_mgnify.sh <DOWNLOAD_DIR>.
- Remove <DOWNLOAD_DIR>/pdb_mmcif. It is needed to have PDB SeqRes and PDB from exactly the same date. Failure to do this step will result in potential errors when searching for templates when running AlphaFold-Multimer.
- Run scripts/download_pdb_mmcif.sh <DOWNLOAD_DIR>.
- Run scripts/download_pdb_seqres.sh <DOWNLOAD_DIR>.
Update the model parameters.
- Remove the old model parameters in <DOWNLOAD_DIR>/params.
- Download new model parameters using scripts/download_alphafold_params.sh <DOWNLOAD_DIR>.
Follow Running AlphaFold.

Using deprecated model weights

To use the deprecated v2.2.0 AlphaFold-Multimer model weights:

Change SOURCE_URL in scripts/download_alphafold_params.sh to https://storage.googleapis.com/alphafold/alphafold_params_2022-03-02.tar, and download the old parameters.
Change the _v3 to _v2 in the multimer MODEL_PRESETS in config.py.

To use the deprecated v2.1.0 AlphaFold-Multimer model weights:

Change SOURCE_URL in scripts/download_alphafold_params.sh to https://storage.googleapis.com/alphafold/alphafold_params_2022-01-19.tar, and download the old parameters.
Remove the _v3 in the multimer MODEL_PRESETS in config.py.

Running AlphaFold

The simplest way to run AlphaFold is using the provided Docker script. This was tested on Google Cloud with a machine using the nvidia-gpu-cloud-image with 12 vCPUs, 85 GB of RAM, a 100 GB boot disk, the databases on an additional 3 TB disk, and an A100 GPU. For your first run, please follow the instructions from Installation and running your first prediction section.

By default, Alphafold will attempt to use all visible GPU devices. To use a subset, specify a comma-separated list of GPU UUID(s) or index(es) using the --gpu_devices flag. See GPU enumeration for more details.
You can control which AlphaFold model to run by adding the --model_preset= flag. We provide the following models:
- monomer: This is the original model used at CASP14 with no ensembling.
- monomer_casp14: This is the original model used at CASP14 with num_ensemble=8, matching our CASP14 configuration. This is largely provided for reproducibility as it is 8x more computationally expensive for limited accuracy gain (+0.1 average GDT gain on CASP14 domains).
- monomer_ptm: This is the original CASP14 model fine tuned with the pTM head, providing a pairwise confidence measure. It is slightly less accurate than the normal monomer model.
- multimer: This is the AlphaFold-Multimer model. To use this model, provide a multi-sequence FASTA file. In addition, the UniProt database should have been downloaded.
You can control MSA speed/quality tradeoff by adding --db_preset=reduced_dbs or --db_preset=full_dbs to the run command. We provide the following presets:
- reduced_dbs: This preset is optimized for speed and lower hardware requirements. It runs with a reduced version of the BFD database. It requires 8 CPU cores (vCPUs), 8 GB of RAM, and 600 GB of disk space.
- full_dbs: This runs with all genetic databases used at CASP14.
Running the command above with the monomer model preset and the reduced_dbs data preset would look like this:
```
python3 docker/run_docker.py \
  --fasta_paths=T1050.fasta \
  --max_template_date=2020-05-14 \
  --model_preset=monomer \
  --db_preset=reduced_dbs \
  --data_dir=$DOWNLOAD_DIR \
  --output_dir=/home/user/absolute_path_to_the_output_dir
```
After generating the predicted model, AlphaFold runs a relaxation step to improve local geometry. By default, only the best model (by pLDDT) is relaxed (--models_to_relax=best), but also all of the models (--models_to_relax=all) or none of the models (--models_to_relax=none) can be relaxed.
The relaxation step can be run on GPU (faster, but could be less stable) or CPU (slow, but stable). This can be controlled with --enable_gpu_relax=true (default) or --enable_gpu_relax=false.
AlphaFold can re-use MSAs (multiple sequence alignments) for the same sequence via --use_precomputed_msas=true option; this can be useful for trying different AlphaFold parameters. This option assumes that the directory structure generated by the first AlphaFold run in the output directory exists and that the protein sequence is the same.

Running AlphaFold-Multimer

All steps are the same as when running the monomer system, but you will have to

provide an input fasta with multiple sequences,
set --model_preset=multimer,

An example that folds a protein complex multimer.fasta:

python3 docker/run_docker.py \
  --fasta_paths=multimer.fasta \
  --max_template_date=2020-05-14 \
  --model_preset=multimer \
  --data_dir=$DOWNLOAD_DIR \
  --output_dir=/home/user/absolute_path_to_the_output_dir

By default the multimer system will run 5 seeds per model (25 total predictions) for a small drop in accuracy you may wish to run a single seed per model. This can be done via the --num_multimer_predictions_per_model flag, e.g. set it to --num_multimer_predictions_per_model=1 to run a single seed per model.

AlphaFold prediction speed

The table below reports prediction runtimes for proteins of various lengths. We only measure unrelaxed structure prediction with three recycles while excluding runtimes from MSA and template search. When running docker/run_docker.py with --benchmark=true, this runtime is stored in timings.json. All runtimes are from a single A100 NVIDIA GPU. Prediction speed on A100 for smaller structures can be improved by increasing global_config.subbatch_size in alphafold/model/config.py.

No. residues	Prediction time (s)
100	4.9
200	7.7
300	13
400	18
500	29
600	36
700	53
800	60
900	91
1,000	96
1,100	140
1,500	280
2,000	450
2,500	969
3,000	1,240
3,500	2,465
4,000	5,660
4,500	12,475
5,000	18,824

Examples

Below are examples on how to use AlphaFold in different scenarios.

Folding a monomer

Say we have a monomer with the sequence <SEQUENCE>. The input fasta should be:

>sequence_name
<SEQUENCE>

Then run the following command:

python3 docker/run_docker.py \
  --fasta_paths=monomer.fasta \
  --max_template_date=2021-11-01 \
  --model_preset=monomer \
  --data_dir=$DOWNLOAD_DIR \
  --output_dir=/home/user/absolute_path_to_the_output_dir

Folding a homomer

Say we have a homomer with 3 copies of the same sequence <SEQUENCE>. The input fasta should be:

>sequence_1
<SEQUENCE>
>sequence_2
<SEQUENCE>
>sequence_3
<SEQUENCE>

Then run the following command:

python3 docker/run_docker.py \
  --fasta_paths=homomer.fasta \
  --max_template_date=2021-11-01 \
  --model_preset=multimer \
  --data_dir=$DOWNLOAD_DIR \
  --output_dir=/home/user/absolute_path_to_the_output_dir

Folding a heteromer

Say we have an A2B3 heteromer, i.e. with 2 copies of <SEQUENCE A> and 3 copies of <SEQUENCE B>. The input fasta should be:

>sequence_1
<SEQUENCE A>
>sequence_2
<SEQUENCE A>
>sequence_3
<SEQUENCE B>
>sequence_4
<SEQUENCE B>
>sequence_5
<SEQUENCE B>

Then run the following command:

python3 docker/run_docker.py \
  --fasta_paths=heteromer.fasta \
  --max_template_date=2021-11-01 \
  --model_preset=multimer \
  --data_dir=$DOWNLOAD_DIR \
  --output_dir=/home/user/absolute_path_to_the_output_dir

Folding multiple monomers one after another

Say we have a two monomers, monomer1.fasta and monomer2.fasta.

We can fold both sequentially by using the following command:

python3 docker/run_docker.py \
  --fasta_paths=monomer1.fasta,monomer2.fasta \
  --max_template_date=2021-11-01 \
  --model_preset=monomer \
  --data_dir=$DOWNLOAD_DIR \
  --output_dir=/home/user/absolute_path_to_the_output_dir

Folding multiple multimers one after another

Say we have a two multimers, multimer1.fasta and multimer2.fasta.

We can fold both sequentially by using the following command:

python3 docker/run_docker.py \
  --fasta_paths=multimer1.fasta,multimer2.fasta \
  --max_template_date=2021-11-01 \
  --model_preset=multimer \
  --data_dir=$DOWNLOAD_DIR \
  --output_dir=/home/user/absolute_path_to_the_output_dir

AlphaFold output

The outputs will be saved in a subdirectory of the directory provided via the --output_dir flag of run_docker.py (defaults to /tmp/alphafold/). The outputs include the computed MSAs, unrelaxed structures, relaxed structures, ranked structures, raw model outputs, prediction metadata, and section timings. The --output_dir directory will have the following structure:

<target_name>/
    features.pkl
    ranked_{0,1,2,3,4}.pdb
    ranking_debug.json
    relax_metrics.json
    relaxed_model_{1,2,3,4,5}.pdb
    result_model_{1,2,3,4,5}.pkl
    timings.json
    unrelaxed_model_{1,2,3,4,5}.pdb
    msas/
        bfd_uniref_hits.a3m
        mgnify_hits.sto
        uniref90_hits.sto

The contents of each output file are as follows:

features.pkl – A pickle file containing the input feature NumPy arrays used by the models to produce the structures.
unrelaxed_model_*.pdb – A PDB format text file containing the predicted structure, exactly as outputted by the model.
relaxed_model_*.pdb – A PDB format text file containing the predicted structure, after performing an Amber relaxation procedure on the unrelaxed structure prediction (see Jumper et al. 2021, Suppl. Methods 1.8.6 for details).
ranked_*.pdb – A PDB format text file containing the predicted structures, after reordering by model confidence. Here ranked_i.pdb should contain the prediction with the (i + 1)-th highest confidence (so that ranked_0.pdb has the highest confidence). To rank model confidence, we use predicted LDDT (pLDDT) scores (see Jumper et al. 2021, Suppl. Methods 1.9.6 for details). If --models_to_relax=all then all ranked structures are relaxed. If --models_to_relax=best then only ranked_0.pdb is relaxed (the rest are unrelaxed). If --models_to_relax=none, then the ranked structures are all unrelaxed.
ranking_debug.json – A JSON format text file containing the pLDDT values used to perform the model ranking, and a mapping back to the original model names.
relax_metrics.json – A JSON format text file containing relax metrics, for instance remaining violations.
timings.json – A JSON format text file containing the times taken to run each section of the AlphaFold pipeline.
msas/ - A directory containing the files describing the various genetic tool hits that were used to construct the input MSA.
result_model_*.pkl – A pickle file containing a nested dictionary of the various NumPy arrays directly produced by the model. In addition to the output of the structure module, this includes auxiliary outputs such as:
- Distograms (distogram/logits contains a NumPy array of shape [N_res, N_res, N_bins] and distogram/bin_edges contains the definition of the bins).
- Per-residue pLDDT scores (plddt contains a NumPy array of shape [N_res] with the range of possible values from 0 to 100, where 100 means most confident). This can serve to identify sequence regions predicted with high confidence or as an overall per-target confidence score when averaged across residues.
- Present only if using pTM models: predicted TM-score (ptm field contains a scalar). As a predictor of a global superposition metric, this score is designed to also assess whether the model is confident in the overall domain packing.
- Present only if using pTM models: predicted pairwise aligned errors (predicted_aligned_error contains a NumPy array of shape [N_res, N_res] with the range of possible values from 0 to max_predicted_aligned_error, where 0 means most confident). This can serve for a visualisation of domain packing confidence within the structure.

The pLDDT confidence measure is stored in the B-factor field of the output PDB files (although unlike a B-factor, higher pLDDT is better, so care must be taken when using for tasks such as molecular replacement).

This code has been tested to match mean top-1 accuracy on a CASP14 test set with pLDDT ranking over 5 model predictions (some CASP targets were run with earlier versions of AlphaFold and some had manual interventions; see our forthcoming publication for details). Some targets such as T1064 may also have high individual run variance over random seeds.

Inferencing many proteins

The provided inference script is optimized for predicting the structure of a single protein, and it will compile the neural network to be specialized to exactly the size of the sequence, MSA, and templates. For large proteins, the compile time is a negligible fraction of the runtime, but it may become more significant for small proteins or if the multi-sequence alignments are already precomputed. In the bulk inference case, it may make sense to use our make_fixed_size function to pad the inputs to a uniform size, thereby reducing the number of compilations required.

We do not provide a bulk inference script, but it should be straightforward to develop on top of the RunModel.predict method with a parallel system for precomputing multi-sequence alignments. Alternatively, this script can be run repeatedly with only moderate overhead.

Note on CASP14 reproducibility

AlphaFold's output for a small number of proteins has high inter-run variance, and may be affected by changes in the input data. The CASP14 target T1064 is a notable example; the large number of SARS-CoV-2-related sequences recently deposited changes its MSA significantly. This variability is somewhat mitigated by the model selection process; running 5 models and taking the most confident.

To reproduce the results of our CASP14 system as closely as possible you must use the same database versions we used in CASP. These may not match the default versions downloaded by our scripts.

For genetics:

UniRef90: v2020_01
MGnify: v2018_12
Uniclust30: v2018_08
BFD: only version available

For templates:

PDB: (downloaded 2020-05-14)
PDB70: 2020-05-13

An alternative for templates is to use the latest PDB and PDB70, but pass the flag --max_template_date=2020-05-14, which restricts templates only to structures that were available at the start of CASP14.

Citing this work

If you use the code or data in this package, please cite:

@Article{AlphaFold2021,
  author  = {Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and {\v{Z}}{\'\i}dek, Augustin and Potapenko, Anna and Bridgland, Alex and Meyer, Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie, Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig and Reiman, David and Clancy, Ellen and Zielinski, Michal and Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol and Senior, Andrew W and Kavukcuoglu, Koray and Kohli, Pushmeet and Hassabis, Demis},
  journal = {Nature},
  title   = {Highly accurate protein structure prediction with {AlphaFold}},
  year    = {2021},
  volume  = {596},
  number  = {7873},
  pages   = {583--589},
  doi     = {10.1038/s41586-021-03819-2}
}

In addition, if you use the AlphaFold-Multimer mode, please cite:

@article {AlphaFold-Multimer2021,
  author       = {Evans, Richard and O{\textquoteright}Neill, Michael and Pritzel, Alexander and Antropova, Natasha and Senior, Andrew and Green, Tim and {\v{Z}}{\'\i}dek, Augustin and Bates, Russ and Blackwell, Sam and Yim, Jason and Ronneberger, Olaf and Bodenstein, Sebastian and Zielinski, Michal and Bridgland, Alex and Potapenko, Anna and Cowie, Andrew and Tunyasuvunakool, Kathryn and Jain, Rishub and Clancy, Ellen and Kohli, Pushmeet and Jumper, John and Hassabis, Demis},
  journal      = {bioRxiv},
  title        = {Protein complex prediction with AlphaFold-Multimer},
  year         = {2021},
  elocation-id = {2021.10.04.463034},
  doi          = {10.1101/2021.10.04.463034},
  URL          = {https://www.biorxiv.org/content/early/2021/10/04/2021.10.04.463034},
  eprint       = {https://www.biorxiv.org/content/early/2021/10/04/2021.10.04.463034.full.pdf},
}

Community contributions

Colab notebooks provided by the community (please note that these notebooks may vary from our full AlphaFold system and we did not validate their accuracy):

The ColabFold AlphaFold2 notebook by Martin Steinegger, Sergey Ovchinnikov and Milot Mirdita, which uses an API hosted at the Södinglab based on the MMseqs2 server (Mirdita et al. 2019, Bioinformatics) for the multiple sequence alignment creation.

Acknowledgements

AlphaFold communicates with and/or references the following separate libraries and packages:

We thank all their contributors and maintainers!

Get in Touch

If you have any questions not covered in this overview, please contact the AlphaFold team at [email protected].

We would love to hear your feedback and understand how AlphaFold has been useful in your research. Share your stories with us at [email protected].

License and Disclaimer

This is not an officially supported Google product.

AlphaFold Code License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Model Parameters License

The AlphaFold parameters are made available under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license. You can find details at: https://creativecommons.org/licenses/by/4.0/legalcode

Third-party software

Use of the third-party software, libraries or code referred to in the Acknowledgements section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.

Mirrored Databases

The following databases have been mirrored by DeepMind, and are available with reference to the following:

BFD (unmodified), by Steinegger M. and Söding J., available under a Creative Commons Attribution-ShareAlike 4.0 International License.
BFD (modified), by Steinegger M. and Söding J., modified by DeepMind, available under a Creative Commons Attribution-ShareAlike 4.0 International License. See the Methods section of the AlphaFold proteome paper for details.
Uniref30: v2021_03 (unmodified), by Mirdita M. et al., available under a Creative Commons Attribution-ShareAlike 4.0 International License.
MGnify: v2022_05 (unmodified), by Mitchell AL et al., available free of all copyright restrictions and made fully and freely available for both non-commercial and commercial use under CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.

alphafold's People

Contributors

Stargazers

Watchers

Forkers

harmandotpy biocreator jamesthesnake xennygrimmato codeaudit adamlerer fyremael llinsky liuliu tclin422 ennolan bwingert animesh muklek dav-ell timothystiles arberx tinkeringengr syssynbio nnguyen19 lilleswing jeffm14 pablodz blakedallen leehanchung leetoo gradjitta edunuke mbrukman ryan-wolbeck tne-ai mjschnie dennisdellacorte alreadydone albetancurqu42 designium zhongxiang117 jakobhoh trucnguyenlam richarms marktrovinger truatpasteurdotfr hmartiniano mashuoa sspleo himanshunitrr canxkoz contropist caiyingchun evdcush zpeng1989 jim1309 vv111y gdsttian zhouhufeng yas guoshunhao biocheming zho-o-o-u ditannan lpf-sioc chrismacdermaid dongcf happyshaobochen ns3284 lifanchen-simm lilyfengli sbhakat superxiang wortjohn huanglu2018 ycres zsg0108 mycstar lucifer2288 zijianding qj-chen atestgroup233 fbxie peldom nouretienne yangluom linhduongtuan lijush lzhbrian ltzheng gchenfly domi91c morganfuture shane-15 stogqy chenshengsgithub nguyenducnhaty jiyanbio wushengyao wangtongxing erikzhang-9762 lqx-ai kellywang95 superxp1412

alphafold's Issues

Session crashed after using all available RAM (AlphaFold Colab)

Hi I'm trying to get oligomeric structure of a protein. I'm able to get the monomer through AlphaFold Colab but when i try to use the oligomeric feature it is crashing.
Error # Your session crashed after using all available RAM. If you are interested in access to high-RAM runtimes, you may want to check out Colab Pro.

NameError Traceback (most recent call last)
in ()
5 use_model = {}
6 if "model_params" not in dir(): model_params = {}
----> 7 for model_name in ["model_1","model_2","model_3","model_4","model_5"][:num_models]:
8 use_model[model_name] = True
9 if model_name not in model_params:
NameError: name 'num_models' is not defined

When i try to use local runtime, getting another error...
Unrecognized runtime "python3"; defaulting to "python2"
Please help, if anyone have any solution for this.
Thanks
Pankaj

Is there a way to get per-residue embeddings from the model?

Pretty much self-explanatory.

running via docker: cannot initialize backend TPU

In trying to repro the Google Cloud setup specified, I am seeing the following log messages using the Docker image approach. They are "INFO" level -- are they an indication that my setup is incorrect and I would get sub-optimal performance?

I0718 14:02:57.560956 139711506118464 run_docker.py:180] I0718 14:02:57.560250 139800994285376 tpu_client.py:54] Starting the local TPU driver.
I0718 14:02:57.561176 139711506118464 run_docker.py:180] I0718 14:02:57.560675 139800994285376 xla_bridge.py:231] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: local://
I0718 14:02:57.777399 139711506118464 run_docker.py:180] I0718 14:02:57.776700 139800994285376 xla_bridge.py:231] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available.

FYI, my setup instructions

About distogram format

Thanks for the amazing work! I've managed to make it work in my machine, it was really easy as the instructions are very clear. I just have a question about the output of the program, in particular the distogram. I've seen they are stored in the result_model_{1,2,3,4,5}.pkl pickle files, under the distogram key. However, I'm not sure how to interpret this subdictionary. My suspicion is that bin_edges contains the distance boundaries for each bin (closed or open intervals?), and logits is a 3d array with shape (nres, nres, bins) where each z-layer corresponds with the probability of the residue pair being in the bin defined at bin_edges. Is this the case? Thanks!

BFD download bottleneck

Hi,
this is not really an issue with the code but it seems that downloading the BFD database is a real bottleneck (at least for me). I tried aria2c and wget and with both I only get around 200 kb/s download rate. It is certainly not a limit of bandwidth on my side. Estimated time of completion is 9d! Does anyone maybe have the file mirrored?

Operation system?

What's the preferred operation system? Can it be set up on Windows?

About RuntimeError of hhsearch

Due to the network, I use non-docker version. But in the process of Inference, there was a problem with hhsearch：
RuntimeError: HHSearch failed:
stdout:
stderr:

00:22:03.447 INFO: /tmp/tmpkkewtvxt/query.a3m is in A2M, A3M or FASTA format
00:22:03.593 INFO: Searching 80799 database HHMs without prefiltering
00:22:03.629 INFO: Iteration 1
00:22:03.986 INFO: Scoring 80799 HMMs using HMM-HMM Viterbi alignment
00:22:04.056 INFO: Alternative alignment: 0
00:23:25.517 INFO: 80799 alignments done
00:23:25.786 INFO: Alternative alignment: 1
00:23:30.387 INFO: 3802 alignments done
00:23:30.391 INFO: Alternative alignment: 2
00:23:31.297 INFO: 438 alignments done
00:23:31.297 INFO: Alternative alignment: 3

Can someone give me some suggestions or solutions? thanks

Rerunning models using existing MSAs

Is it possible to run the feature calculation process using the MSAs computed on an earlier run? Does one need the HHR file corresponding to hhsearch_hits below?

https://github.com/deepmind/alphafold/blob/main/alphafold/data/pipeline.py#L147

It doesn't look like it's saved as the other sto/a3m files are.

Commercial license: how to obtain one.

Hi,

I would like to use alphafold for commercial purposes, is it possible to license the model?

AlphaFold Prediction Server

An online service for predicting the structure of a protein (similar to I-TASSER server) would be useful for academic users.

predicted TM-score (pTM) and predicted aligned errors

Using the same random seed, looks like 'model_1_ptm' and 'model_1' do not output identical results.
Is there any reason we should use 'model_1_ptm', 'model_2_ptm' ,'model_3_ptm' ,'model_4_ptm' ,'model_5_ptm' V.S. 'model_1', 'model_2' ,'model_3' ,'model_4' ,'model_5'?

Is there any assurance difference, please?

uniclust30 bottleneck

Thank you for your work fixing #13 -- that really helped!

I am now seeing a similar download bottleneck with uniclust30 (it has effectively stalled).

jax and jaxlib versions

TL;DR is it okay to use the jaxlib version of 0.1.68+cuda110 instead of 0.1.69+cuda110?

I have tried to write a script and construct a conda environment that does not use Docker. When I used the same versions of jax and jaxlib defined in the docker/Dockerfile, I had some issues during the inference time. Scripts were working fine for model_{1,3,4} but raised CUDA_ERROR_ILLEGAL_ADDRESS errors for model_{2,5}. I have no idea why it happened...
So, I tested many variants of the environment and found out that jax=0.2.17 (probably, it is the same version of the original) and jaxlib=0.1.68+cuda110 (it is the version for installing jax with a command
pip3 install jax[cuda110] -f https://storage.googleapis.com/jax-releases/jax_releases.html ) are okay to run smoothly without Docker, but with my custom conda environment.

Odd release in conjunction with RoseTTAFold gaining traction

While it's great you finally release your code publicly (after months of criticism for keeping it secret against the ethos of mutual aid in the scientific world) it certainly seems strange to do it just around the time RoseTTAFold , a system that does nearly the same thing for free at a fraction of the computational cost, gains significant traction in news stories.

That sudden release might give the impression of an orchestrated effort of your PR department to bury the RoseTTAFold news story.

I'm certain you will deny any of this, but to be without the shadow of a doubt in the future, DeepMind might want to share their work openly and proactively with the scientific world, instead of releasing what seems in a defensive way.

failed to build docker

Wonderful repo !!! Thanks so much for the Alphafold2 Team. And I have a problem in building alphafold docker. Error information is shown below. It seems that the related cuda resource is not found. Thanks all.

$ docker build -f docker/Dockerfile -t alphafold .

Sending build context to Docker daemon  12.69MB
Step 1/19 : ARG CUDA=11.0
Step 2/19 : FROM nvidia/cuda:${CUDA}-base
 ---> 2ec708416bb8
Step 3/19 : ARG CUDA
 ---> Using cache
 ---> 076eace7d488
Step 4/19 : SHELL ["/bin/bash", "-c"]
 ---> Using cache
 ---> b57b88dc2b9a
Step 5/19 : RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y       build-essential       cmake       cuda-command-line-tools-${CUDA/./-}       git       hmmer       kalign       tzdata       wget     && rm -rf /var/lib/apt/lists/*
 ---> Running in 52e953f25c2b
...
...


E: Failed to fetch https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64/by-hash/SHA256/27b2dbb347d54776dec155d98e1eefbde6c10a3fd1295007c3e836cfd9b98522  404  Not Found [IP: 58.205.210.80 443]
E: Some index files failed to download. They have been ignored, or old ones used instead.

Using a subset of genetic databases

Is it possible to use a subset of the genetic databases for running the model end to end for a particular protein sequence?

Alphafold protein domain compartmentalization

Alphafold has an issue when it comes to a class of proteins known as membrane proteins. Briefly, membrane proteins are located in multiple cell compartments where the domains of the membrane protein are separated by a physical barrier.

I have encountered multiple examples of alphafold becoming confused and trying to get domains that are biologically seperated to interact. Luckily a large training set of annotated proteins already exists in the uniprot database. For example, the insulin receptor (https://www.uniprot.org/uniprot/P06213#subcellular_location) have extracellular domains a transmembrane domain and a intra cellular domain. Thus only domains in the same compartments needs to be close to each other in the final fold, whereas domains in separate compartments need only be within a distance that can be spanned by the peptide chain.

Additionally, some protein are modified after translation. These modification include formation of disulfide bridges, which locks the distance and geometry of the two involved cysteine residues. This information is also available in uniprot (https://www.uniprot.org/uniprot/P06213#ptm_processing)

While I recognize that alphafold was designed for de novo prediction, I wonder how more powerful it could become if it were allowed to incorporate known biochemical information into its predictions.

While this is not per say an issue, it is something that should be tried.

Is there any way to turn off the relaxation option without modifications ?

Hi,

I want to turn off the amber relaxation option without modifications of codes.

When I commented out the relaxation option in the script, 'run_alphafold.py',

I could turn off it.

But, is it also possible to adjust it using the argument ?

Thanks.

Donload data "uniclust30" problem

Hi,

I cannot download uniclust30, Dataset from the script. Also from another repository RoseTTAFold, I cannot download uniref30 either.

I'm not major in biology, I do not familiar with this organization wwwuser.gwdg.de.
Can anyone really download these datasets?

Thanks!!!

Following are the download URLs:
uniclust30: http://wwwuser.gwdg.de/~compbiol/uniclust/2018_08/uniclust30_2018_08_hhsuite.tar.gz
uniref30: http://wwwuser.gwdg.de/~compbiol/uniclust/2020_06/UniRef30_2020_06_hhsuite.tar.gz

Error downloading PDB mmCIF

When I run the scripts/download_all_data.sh to download the databases I get the error:

Downloading PDB mmCIF files...
Running rsync to fetch all mmCIF files (note that the rsync progress estimate might be inaccurate)...
rsync: failed to connect to rsync.rcsb.org (132.249.210.234): Connection timed out (110)
rsync error: error in socket IO (code 10) at clientserver.c(125) [Receiver=3.1.2]

Proper release and build instructions ?

Hi,
This code is generating a lot of interest in the advanced research computing community. Unfortunately, it currently can not be supported on most clusters because of its use of docker, and its lack of release versions.

Any plan to address these would be appreciated:

Make proper release versions (I recommend Semantic versionning: https://semver.org/lang)
Provide instructions to install the code without the use of containers or anaconda (I suggest using Autoconf or CMake as a build tool)
If 2) is impossible or too hard, providing a singularity container, rather than docker, would already be a good start.

Docker image missing cuda libs

When using a docker image created from the repo, libs like cusolver and libcudnn are not installed. I get the following set of warnings, ending with "Skipping registering GPU devices".

Was the docker image meant to include these libs? I see libcusolver.so.10 (a different version than listed in the warning), in the 11.0-runtime image (the Dockerfile is based on 11.0-base). And cudnn would require an additional install step.

I0718 14:20:35.488188 139711506118464 run_docker.py:180] 2021-07-18 14:20:35.487292: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
I0718 14:20:35.632766 139711506118464 run_docker.py:180] 2021-07-18 14:20:35.632116: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
I0718 14:20:35.633013 139711506118464 run_docker.py:180] 2021-07-18 14:20:35.632167: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
I0718 14:20:35.633130 139711506118464 run_docker.py:180] Skipping registering GPU devices...

"stereo_chemical_props.txt" missing for model relaxation

Hi Developers of AlphaFold2,

Thanks for sharing alphafold! It's really exciting!

I noticed that a file, alphafold/common/stereo_chemical_props.txt, containing structural parameters, is needed for model relaxation as indicated in alphafold/common/residue_constants.py line 406. But this file is actually missing in folder alphafold/common

Although it's not hard to find a similar file in openstructure (https://openstructure.org), I think adding this file to the folder would be great for others!

Best,
Junhui

OpenMM patch fails in docker build (python version discrepancy)

When I attempted to build the Docker container for alphafold, the build process failed at step 16/19. This step involves applying a patch to OpenMM.

I think the issue is that the patch tries to apply to /opt/conda/lib/python3.8 whereas the Docker build image has Python 3.9 installed. I don't know whether this is caused by a change in miniconda-latest or OS default python versions.

(Ubuntu 18.04 LTS host OS)

Predicted residuewise errors

Thank you for the nice work! It worked smoothly and was faster than I expected before :)

I have a question about the predicted residue-wise errors. I guess the information is stored in $output_dir/result_model_{1,2,3,4,5}.pkl, but the information is not recorded in the output PDB files. I found that pkl['plddt'] is for the predicted residue-wise errors. Is it the correct thing that I am looking for?

RuntimeError: cuSolver internal error

Did anyone solve the cuSolver internal error?

I0716 21:59:04.723145 139668278384448 run_docker.py:180] WARNING:tensorflow:From /app/alphafold/alphafold/model/tf/input_pipeline.py:151: calling map_fn (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
I0716 21:59:04.723469 139668278384448 run_docker.py:180] Instructions for updating:        
I0716 21:59:04.723672 139668278384448 run_docker.py:180] Use fn_output_signature instead                              
I0716 21:59:04.723857 139668278384448 run_docker.py:180] W0716 13:59:04.722220 140425547745088 deprecation.py:528] From /app/alphafold/alphafold/model/tf/input_pipeline.py:151: calling map_fn (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a fut
ure version.                                                                                                                        
I0716 21:59:04.724043 139668278384448 run_docker.py:180] Instructions for updating:
I0716 21:59:04.724218 139668278384448 run_docker.py:180] Use fn_output_signature instead                                                  
I0716 21:59:08.106853 139668278384448 run_docker.py:180] 2021-07-16 13:59:08.105871: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; L
D_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64                                                     
I0716 21:59:08.107234 139668278384448 run_docker.py:180] 2021-07-16 13:59:08.105914: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GP
U. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
I0716 21:59:08.107458 139668278384448 run_docker.py:180] Skipping registering GPU devices...                             
I0716 21:59:09.326027 139668278384448 run_docker.py:180] I0716 13:59:09.324977 140425547745088 model.py:132] Running predict with shape(feat) = {'aatype': (4, 44), 'residue_index': (4, 44), 'seq_length': (4,), 'template_aatype': (4, 4, 44), 'template_all_atom_masks': (4, 4, 44, 37
), 'template_all_atom_positions': (4, 4, 44, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 44), 'msa_mask': (4, 508, 44), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 44, 3
), 'template_pseudo_beta_mask': (4, 4, 44), 'atom14_atom_exists': (4, 44, 14), 'residx_atom14_to_atom37': (4, 44, 14), 'residx_atom37_to_atom14': (4, 44, 37), 'atom37_atom_exists': (4, 44, 37), 'extra_msa': (4, 5120, 44), 'extra_msa_mask': (4, 5120, 44), 'extra_msa_row_mask': (4,
5120), 'bert_mask': (4, 508, 44), 'true_msa': (4, 508, 44), 'extra_has_deletion': (4, 5120, 44), 'extra_deletion_value': (4, 5120, 44), 'msa_feat': (4, 508, 44, 49), 'target_feat': (4, 44, 22)}
I0716 21:59:58.832500 139668278384448 run_docker.py:180] 2021-07-16 13:59:58.831660: W external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object fi
le: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
I0716 21:59:58.929851 139668278384448 run_docker.py:180] Traceback (most recent call last):
I0716 21:59:58.930104 139668278384448 run_docker.py:180] File "/app/alphafold/run_alphafold.py", line 283, in <module>
I0716 21:59:58.930308 139668278384448 run_docker.py:180] app.run(main)
I0716 21:59:58.930593 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run
I0716 21:59:58.930781 139668278384448 run_docker.py:180] _run_main(main, args)
I0716 21:59:58.930959 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
I0716 21:59:58.931135 139668278384448 run_docker.py:180] sys.exit(main(argv))
I0716 21:59:58.931310 139668278384448 run_docker.py:180] File "/app/alphafold/run_alphafold.py", line 255, in main
I0716 21:59:58.931483 139668278384448 run_docker.py:180] predict_structure(
I0716 21:59:58.931658 139668278384448 run_docker.py:180] File "/app/alphafold/run_alphafold.py", line 137, in predict_structure
I0716 21:59:58.931832 139668278384448 run_docker.py:180] prediction_result = model_runner.predict(processed_feature_dict)
I0716 21:59:58.932008 139668278384448 run_docker.py:180] File "/app/alphafold/alphafold/model/model.py", line 134, in predict
I0716 21:59:58.932183 139668278384448 run_docker.py:180] result = self.apply(self.params, jax.random.PRNGKey(0), feat)
I0716 21:59:58.932358 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 183, in reraise_with_filtered_traceback
I0716 21:59:58.932534 139668278384448 run_docker.py:180] return fun(*args, **kwargs)
I0716 21:59:58.932709 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/_src/api.py", line 424, in cache_miss
I0716 21:59:58.932884 139668278384448 run_docker.py:180] out_flat = xla.xla_call(
I0716 21:59:58.933057 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 1560, in bind
I0716 21:59:58.933231 139668278384448 run_docker.py:180] return call_bind(self, fun, *args, **params)
I0716 21:59:58.933458 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 1551, in call_bind
I0716 21:59:58.933639 139668278384448 run_docker.py:180] outs = primitive.process(top_trace, fun, tracers, params)
I0716 21:59:58.933813 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 1563, in process
I0716 21:59:58.933987 139668278384448 run_docker.py:180] return trace.process_call(self, fun, tracers, params)
I0716 21:59:58.934161 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 606, in process_call
I0716 21:59:58.934336 139668278384448 run_docker.py:180] return primitive.impl(f, *tracers, **params)
I0716 21:59:58.934510 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/interpreters/xla.py", line 592, in _xla_call_impl
I0716 21:59:58.934684 139668278384448 run_docker.py:180] compiled_fun = _xla_callable(fun, device, backend, name, donated_invars,
I0716 21:59:58.934857 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/linear_util.py", line 262, in memoized_fun
I0716 21:59:58.935029 139668278384448 run_docker.py:180] ans = call(fun, *args)
I0716 21:59:58.935202 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/interpreters/xla.py", line 723, in _xla_callable
I0716 21:59:58.935374 139668278384448 run_docker.py:180] out_nodes = jaxpr_subcomp(
I0716 21:59:58.935548 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/interpreters/xla.py", line 462, in jaxpr_subcomp
I0716 21:59:58.935724 139668278384448 run_docker.py:180] ans = rule(c, axis_env, extend_name_stack(name_stack, eqn.primitive.name),
I0716 21:59:58.935896 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/_src/lax/control_flow.py", line 350, in _while_loop_translation_rule
I0716 21:59:58.936069 139668278384448 run_docker.py:180] new_z = xla.jaxpr_subcomp(body_c, body_jaxpr.jaxpr, backend, axis_env,
I0716 21:59:58.936244 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/interpreters/xla.py", line 462, in jaxpr_subcomp
I0716 21:59:58.936418 139668278384448 run_docker.py:180] ans = rule(c, axis_env, extend_name_stack(name_stack, eqn.primitive.name),
I0716 21:59:58.936592 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/interpreters/xla.py", line 1040, in f
I0716 21:59:58.936766 139668278384448 run_docker.py:180] outs = jaxpr_subcomp(c, jaxpr, backend, axis_env, _xla_consts(c, consts),
I0716 21:59:58.936941 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/interpreters/xla.py", line 462, in jaxpr_subcomp
I0716 21:59:58.937116 139668278384448 run_docker.py:180] ans = rule(c, axis_env, extend_name_stack(name_stack, eqn.primitive.name),
I0716 21:59:58.937289 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/_src/lax/control_flow.py", line 350, in _while_loop_translation_rule
I0716 21:59:58.937474 139668278384448 run_docker.py:180] new_z = xla.jaxpr_subcomp(body_c, body_jaxpr.jaxpr, backend, axis_env,
I0716 21:59:58.937648 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/interpreters/xla.py", line 453, in jaxpr_subcomp
I0716 21:59:58.937821 139668278384448 run_docker.py:180] ans = rule(c, *in_nodes, **eqn.params)
I0716 21:59:58.937993 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jax/_src/lax/linalg.py", line 503, in _eigh_cpu_gpu_translation_rule
I0716 21:59:58.938167 139668278384448 run_docker.py:180] v, w, info = syevd_impl(c, operand, lower=lower)
I0716 21:59:58.938340 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jaxlib/cusolver.py", line 281, in syevd
I0716 21:59:58.938514 139668278384448 run_docker.py:180] lwork, opaque = cusolver_kernels.build_syevj_descriptor(
I0716 21:59:58.938688 139668278384448 run_docker.py:180] jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: cuSolver internal error
I0716 21:59:58.938864 139668278384448 run_docker.py:180]
I0716 21:59:58.939038 139668278384448 run_docker.py:180] The stack trace below excludes JAX-internal frames.
I0716 21:59:58.939213 139668278384448 run_docker.py:180] The preceding is the original exception that occurred, unmodified.
I0716 21:59:58.939386 139668278384448 run_docker.py:180] --------------------
I0716 21:59:58.939731 139668278384448 run_docker.py:180]
I0716 21:59:58.939903 139668278384448 run_docker.py:180] The above exception was the direct cause of the following exception:
I0716 21:59:58.940074 139668278384448 run_docker.py:180]
I0716 21:59:58.940248 139668278384448 run_docker.py:180] Traceback (most recent call last):
I0716 21:59:58.940423 139668278384448 run_docker.py:180] File "/app/alphafold/run_alphafold.py", line 283, in <module>
I0716 21:59:58.940596 139668278384448 run_docker.py:180] app.run(main)
I0716 21:59:58.940770 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run
I0716 21:59:58.940943 139668278384448 run_docker.py:180] _run_main(main, args)
I0716 21:59:58.941116 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
I0716 21:59:58.941291 139668278384448 run_docker.py:180] sys.exit(main(argv))
I0716 21:59:58.941488 139668278384448 run_docker.py:180] File "/app/alphafold/run_alphafold.py", line 255, in main
I0716 21:59:58.941663 139668278384448 run_docker.py:180] predict_structure(
I0716 21:59:58.941836 139668278384448 run_docker.py:180] File "/app/alphafold/run_alphafold.py", line 137, in predict_structure
I0716 21:59:58.942028 139668278384448 run_docker.py:180] prediction_result = model_runner.predict(processed_feature_dict)
I0716 21:59:58.942201 139668278384448 run_docker.py:180] File "/app/alphafold/alphafold/model/model.py", line 134, in predict
I0716 21:59:58.942372 139668278384448 run_docker.py:180] result = self.apply(self.params, jax.random.PRNGKey(0), feat)
I0716 21:59:58.942544 139668278384448 run_docker.py:180] File "/opt/conda/lib/python3.8/site-packages/jaxlib/cusolver.py", line 281, in syevd
I0716 21:59:58.942715 139668278384448 run_docker.py:180] lwork, opaque = cusolver_kernels.build_syevj_descriptor(
I0716 21:59:58.942886 139668278384448 run_docker.py:180] RuntimeError: cuSolver internal error

Non docker setup

Dear AlphaFold authors,

Since most of the IT does not allow docker on their HPC's or production servers, we have made a small attempt to create a How-To for non-docker AlphaFold setup.

Can be found here: https://github.com/kalininalab/alphafold_non_docker

Thank you for the tool!

Kind regards,
Sanjay

link to paper in readme is down

When clicking the link to the paper I get the following

If you could share the proper link I'd be happy to submit a PR :)

GPU required?

The README implies that AlphaFold would at least prefer to use GPUs. However, it doesn't mention whether or not GPUs are required.

ValueError: Cannot create a tensor proto whose content is larger than 2GB.

Got a system for which the features.pkl is 5 GB. Any workaround for this?

Traceback (most recent call last):
  File "/X/alphafold/run_alphafold.py", line 283, in <module>
    app.run(main)
  File "/X/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/X/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/X/alphafold/run_alphafold.py", line 255, in main
    predict_structure(
  File "/X/alphafold/run_alphafold.py", line 132, in predict_structure
    processed_feature_dict = model_runner.process_features(
  File "/X/alphafold/alphafold/model/model.py", line 103, in process_features
    return features.np_example_to_features(
  File "/X/alphafold/alphafold/model/features.py", line 93, in np_example_to_features
    tensor_dict = proteins_dataset.np_to_tensor_dict(
  File "/X/alphafold/alphafold/model/tf/proteins_dataset.py", line 162, in np_to_tensor_dict
    tensor_dict = {k: tf.constant(v) for k, v in np_example.items()
  File "/X/alphafold/alphafold/model/tf/proteins_dataset.py", line 162, in <dictcomp>
    tensor_dict = {k: tf.constant(v) for k, v in np_example.items()
  File "/X/miniconda3/envs/alphafold/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 162, in constant_v1
    return _constant_impl(value, dtype, shape, name, verify_shape=verify_shape,
  File "/X/miniconda3/envs/alphafold/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 281, in _constant_impl
    tensor_util.make_tensor_proto(
  File "/X/miniconda3/envs/alphafold/lib/python3.8/site-packages/tensorflow/python/framework/tensor_util.py", line 527, in make_tensor_proto            raise ValueError(
ValueError: Cannot create a tensor proto whose content is larger than 2GB.

Running on CUDA 10.X

Hi!

It looks like our server's GPU nodes only support up to CUDA 10.2. With the downgraded versions of tensorflow and other modules/packages, will be output consistent with those produced from the default set-up?

Thanks!

GDT Score

Hi,
I want to get gdt and lddt score

How to get gdt and lddt score
- use the pdb files
- and any support this issue?

please give me this solutions.

how to get databases without aria2?

I cannot get aria2 to install (sudo yum install aria2). Is there a way to get the databases without aria2?

Protein Embedding with last activation layers?

Is it possible to obtain the last activation values using AlphaFold?

Something like ESM allows with the model.forward method.

Failure on Nvidia devices with compute capability 8.6

Just a FYI. Running on an Ampere A10 with CC 8.6.

I0725 13:13:22.066373 139761799472960 run_docker.py:200] 2021-07-25 03:13:22.065889: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:235] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.6
I0725 13:13:22.066596 139761799472960 run_docker.py:200] 2021-07-25 03:13:22.065923: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:238] Used ptxas at ptxas
I0725 13:13:22.066839 139761799472960 run_docker.py:200] 2021-07-25 03:13:22.066559: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:625] failed to get PTX kernel "shift_right_logical_3" from module: CUDA_ERROR_NOT_FOUND: named symbol not found
I0725 13:13:22.067006 139761799472960 run_docker.py:200] 2021-07-25 03:13:22.066620: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2040] Execution of replica 0 failed: Internal: Could not find the corresponding function
I0725 13:13:22.068676 139761799472960 run_docker.py:200] Traceback (most recent call last):
I0725 13:13:22.068786 139761799472960 run_docker.py:200] File "/app/alphafold/run_alphafold.py", line 303, in
I0725 13:13:22.068876 139761799472960 run_docker.py:200] app.run(main)
I0725 13:13:22.068992 139761799472960 run_docker.py:200] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
I0725 13:13:22.069080 139761799472960 run_docker.py:200] _run_main(main, args)
I0725 13:13:22.069165 139761799472960 run_docker.py:200] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
I0725 13:13:22.069255 139761799472960 run_docker.py:200] sys.exit(main(argv))
I0725 13:13:22.069342 139761799472960 run_docker.py:200] File "/app/alphafold/run_alphafold.py", line 285, in main
I0725 13:13:22.069428 139761799472960 run_docker.py:200] random_seed=random_seed)
I0725 13:13:22.069509 139761799472960 run_docker.py:200] File "/app/alphafold/run_alphafold.py", line 149, in predict_structure
I0725 13:13:22.069588 139761799472960 run_docker.py:200] prediction_result = model_runner.predict(processed_feature_dict)
I0725 13:13:22.069675 139761799472960 run_docker.py:200] File "/app/alphafold/alphafold/model/model.py", line 134, in predict
I0725 13:13:22.069755 139761799472960 run_docker.py:200] result = self.apply(self.params, jax.random.PRNGKey(0), feat)
I0725 13:13:22.069834 139761799472960 run_docker.py:200] File "/opt/conda/lib/python3.7/site-packages/jax/_src/random.py", line 75, in PRNGKey
I0725 13:13:22.069914 139761799472960 run_docker.py:200] k1 = convert(lax.shift_right_logical(seed_arr, lax._const(seed_arr, 32)))
I0725 13:13:22.070003 139761799472960 run_docker.py:200] File "/opt/conda/lib/python3.7/site-packages/jax/_src/lax/lax.py", line 382, in shift_right_logical
I0725 13:13:22.070081 139761799472960 run_docker.py:200] return shift_right_logical_p.bind(x, y)
I0725 13:13:22.070159 139761799472960 run_docker.py:200] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 264, in bind
I0725 13:13:22.070236 139761799472960 run_docker.py:200] out = top_trace.process_primitive(self, tracers, params)
I0725 13:13:22.070315 139761799472960 run_docker.py:200] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 604, in process_primitive
I0725 13:13:22.070394 139761799472960 run_docker.py:200] return primitive.impl(*tracers, **params)
I0725 13:13:22.070472 139761799472960 run_docker.py:200] File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 262, in apply_primitive
I0725 13:13:22.070549 139761799472960 run_docker.py:200] return compiled_fun(*args)
I0725 13:13:22.070631 139761799472960 run_docker.py:200] File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 378, in _execute_compiled_primitive
I0725 13:13:22.070705 139761799472960 run_docker.py:200] out_bufs = compiled.execute(input_bufs)
I0725 13:13:22.070770 139761799472960 run_docker.py:200] RuntimeError: Internal: Could not find the corresponding function

ModuleNotFoundError

Hi,
Following the steps to run AlphaFold via the Colab module, the "Search against genetic databases" cell cannot find the 'alphafold.model' module. Note : the "Download AlphaFold" code ran without problem so I assume the module should be installed.

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-22-1640ae7ce3a3> in <module>()
     24 import py3Dmol
     25 
---> 26 from alphafold.model import model
     27 from alphafold.model import config
     28 from alphafold.model import data

ModuleNotFoundError: No module named 'alphafold.model'

How long does it take on T1050 (779 residues)

It is still in extracting MSA.
Could anyone share the running time on T1050 (779 residues)? Thanks very much!

GPU supported

Hi,
I want to use alphafold2, but the Linux server has no GPU and only CPU. Is it OK? Thank you!
hqin

ValueError: Could not find Jackhmmer database /mnt/mgnify_database_path/mgy_clusters_2018_08.fa

The download script for mgnify downloaded and created file, mgy_clusters.fa , however couldn't locate mgy_clusters_2018_08.fa which seems to be referenced by /app/alphafold/alphafold/data/tools/jackhmmer.py

packages/absl/flags/_validators.py:203: UserWarning: Flag --preset has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
I0724 21:57:01.485210 139819066255168 run_docker.py:193] warnings.warn(
I0724 21:57:02.218386 139819066255168 run_docker.py:193] I0724 21:57:02.217257 140387317352256 templates.py:880] Using precomputed obsolete pdbs /mnt/obsolete_pdbs_path/obsolete.dat.
I0724 21:57:02.224482 139819066255168 run_docker.py:193] E0724 21:57:02.223666 140387317352256 jackhmmer.py:65] Could not find Jackhmmer database /mnt/mgnify_database_path/mgy_clusters_2018_08.fa
I0724 21:57:02.225311 139819066255168 run_docker.py:193] Traceback (most recent call last):
I0724 21:57:02.225568 139819066255168 run_docker.py:193] File "/app/alphafold/run_alphafold.py", line 283, in <module>
I0724 21:57:02.225908 139819066255168 run_docker.py:193] app.run(main)
I0724 21:57:02.226118 139819066255168 run_docker.py:193] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run
I0724 21:57:02.226256 139819066255168 run_docker.py:193] _run_main(main, args)
I0724 21:57:02.226412 139819066255168 run_docker.py:193] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
I0724 21:57:02.226542 139819066255168 run_docker.py:193] sys.exit(main(argv))
I0724 21:57:02.226696 139819066255168 run_docker.py:193] File "/app/alphafold/run_alphafold.py", line 218, in main
I0724 21:57:02.226851 139819066255168 run_docker.py:193] data_pipeline = pipeline.DataPipeline(
I0724 21:57:02.226989 139819066255168 run_docker.py:193] File "/app/alphafold/alphafold/data/pipeline.py", line 104, in __init__
I0724 21:57:02.227127 139819066255168 run_docker.py:193] self.jackhmmer_mgnify_runner = jackhmmer.Jackhmmer(
I0724 21:57:02.227246 139819066255168 run_docker.py:193] File "/app/alphafold/alphafold/data/tools/jackhmmer.py", line 66, in __init__
I0724 21:57:02.227406 139819066255168 run_docker.py:193] raise ValueError(f'Could not find Jackhmmer database {database_path}')
I0724 21:57:02.227567 139819066255168 run_docker.py:193] ValueError: Could not find Jackhmmer database /mnt/mgnify_database_path/mgy_clusters_2018_08.fa

How many GPUs has Alphafold used for training?

In supplementary Table 4 of the published Nature paper, it says it takes about 7days to train initially and 4 days to fine-tune, but we can't see any message about the number of gpus that were used? Could you please clarify that?

ModuleNotFoundError

I am running colab module of alphafold but it gives me the following error although I have followed the steps. I ran it once on one of the protein and it worked but now it seems to not find the module. I hope you can help.

Regards,
Rubin.

ModuleNotFoundError Traceback (most recent call last)
in ()
24 import py3Dmol
25
---> 26 from alphafold.model import model
27 from alphafold.model import config
28 from alphafold.model import data

ModuleNotFoundError: No module named 'alphafold.model'

NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

Failed to build docker on Step 16/19 - patch issue

I'm using a nvidia-gpu-cloud-image on GCP with all the specifications given in the readme. When I run

docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

I see the expected output.

Building docker failed the first time, and when I tried restarting I get this error:

Sending build context to Docker daemon  12.69MB
Step 1/19 : ARG CUDA=11.0
Step 2/19 : FROM nvidia/cuda:${CUDA}-base
 ---> 2ec708416bb8
Step 3/19 : ARG CUDA
 ---> Using cache
 ---> 3cd6e5d7aef1
Step 4/19 : SHELL ["/bin/bash", "-c"]
 ---> Using cache
 ---> 657140758d4a
Step 5/19 : RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y       build-essential       cmake       cuda-command-line-tools-${CUDA/./-}       git       hmmer       kalign       tzdata       wget     && rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> d2d4f3239339
Step 6/19 : RUN git clone --branch v3.3.0 https://github.com/soedinglab/hh-suite.git /tmp/hh-suite     && mkdir /tmp/hh-suite/build
 ---> Using cache
 ---> a080135b9366
Step 7/19 : WORKDIR /tmp/hh-suite/build
 ---> Using cache
 ---> 51cca5fbe9a6
Step 8/19 : RUN cmake -DCMAKE_INSTALL_PREFIX=/opt/hhsuite ..     && make -j 4 && make install     && ln -s /opt/hhsuite/bin/* /usr/bin     && rm -rf /tmp/hh-suite
 ---> Using cache
 ---> ca9d782a9cf7
Step 9/19 : RUN wget -q -P /tmp   https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh     && bash /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda     && rm /tmp/Miniconda3-latest-Linux-x86_64.sh
 ---> Using cache
 ---> b0be57ee6667
Step 10/19 : ENV PATH="/opt/conda/bin:$PATH"
 ---> Using cache
 ---> bb0f01a8e0ce
Step 11/19 : RUN conda update -qy conda     && conda install -y -c conda-forge       openmm=7.5.1       cudatoolkit==${CUDA}.3       pdbfixer       pip
 ---> Using cache
 ---> 5b8200c0a89d
Step 12/19 : COPY . /app/alphafold
 ---> Using cache
 ---> 88f827ee547a
Step 13/19 : RUN wget -q -P /app/alphafold/alphafold/common/   https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt
 ---> Using cache
 ---> e11c50f82963
Step 14/19 : RUN pip3 install --upgrade pip     && pip3 install -r /app/alphafold/requirements.txt     && pip3 install --upgrade jax jaxlib==0.1.69+cuda${CUDA/./} -f       https://storage.googleapis.com/jax-releases/jax_releases.html
 ---> Using cache
 ---> bb6bb12e43ad
Step 15/19 : WORKDIR /opt/conda/lib/python3.8/site-packages
 ---> Using cache
 ---> 47ea4f7f278b
Step 16/19 : RUN patch -p0 < /app/alphafold/docker/openmm.patch
 ---> Running in ffae81eced58
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|Index: simtk/openmm/app/topology.py
|===================================================================
|--- simtk.orig/openmm/app/topology.py
|+++ simtk/openmm/app/topology.py
--------------------------
File to patch: 
Skip this patch? [y] 
Skipping patch.
1 out of 1 hunk ignored
The command '/bin/bash -c patch -p0 < /app/alphafold/docker/openmm.patch' returned a non-zero code: 1

Any idea what may be going wrong?

400 Client Error

Hello,

I am trying to run AlphaFold using docker (command: python3 docker/run_docker.py --use_gpu=False --fasta_paths=seq.fa --max_template_date=2021-12-12) and I get this error:

Traceback (most recent call last):
File "[HOME_dir_redacted]/.local/lib/python3.9/site-packages/docker/api/client.py", line 268, in _raise_for_status
response.raise_for_status()
File "/usr/lib/python3/dist-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http+docker://localhost/v1.41/containers/create

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "[HOME_dir_redacted]/Software/alphafold/docker/run_docker.py", line 201, in
app.run(main)
File "[HOME_dir_redacted]/.local/lib/python3.9/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "[HOME_dir_redacted]/.local/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "[HOME_dir_redacted]/Software/alphafold/docker/run_docker.py", line 173, in main
container = client.containers.run(
File "[HOME_dir_redacted]/.local/lib/python3.9/site-packages/docker/models/containers.py", line 811, in run
container = self.create(image=image, command=command,
File "[HOME_dir_redacted]/.local/lib/python3.9/site-packages/docker/models/containers.py", line 870, in create
resp = self.client.api.create_container(**create_kwargs)
File "[HOME_dir_redacted]/.local/lib/python3.9/site-packages/docker/api/container.py", line 430, in create_container
return self.create_container_from_config(config, name)
File "[HOME_dir_redacted]/.local/lib/python3.9/site-packages/docker/api/container.py", line 441, in create_container_from_config
return self._result(res, True)
File "[HOME_dir_redacted]/.local/lib/python3.9/site-packages/docker/api/client.py", line 274, in _result
self._raise_for_status(response)
File "[HOME_dir_redacted]/.local/lib/python3.9/site-packages/docker/api/client.py", line 270, in _raise_for_status
raise create_api_error_from_http_exception(e)
File "[HOME_dir_redacted]/.local/lib/python3.9/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 400 Client Error for http+docker://localhost/v1.41/containers/create: Bad Request ("invalid mount config for type "bind": bind source path does not exist: /tmp/alphafold")

I did not set up any keys for Docker. Is this related? And how to fix this? I tried removing the alphafold image and purging/reinstalling docker. Still, nothing helps.

Thanks,
Lukasz

Releasing the AlphaFold package on pypi

Thank you for sharing the AlphaFold codebase!

I wanted to suggest releasing the package to pypi to make it easier to install especially in production environments, if possible, including the run_alphafold script.

Please let me know if you'll consider this suggestion.

MGnify download unreachable

Hi,
I have been trying to download the MGnify dataset, but the host is unreachable.
"Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.197.74|:21... failed: Connection timed out."
Is it possible to put this dataset on google cloud as well?

absl FATAL flags parsing error

After I built the docker image, I found it failed to run. Error messages as below. I need help to fix it.

/opt/conda/lib/python3.7/site-packages/absl/flags/_validators.py:206: UserWarning: Flag --preset has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!

'command line!' % flag_name)

FATAL Flags parsing error:

flag --fasta_paths=None: Flag --fasta_paths must have a value other than None.

flag --output_dir=None: Flag --output_dir must have a value other than None.

flag --model_names=None: Flag --model_names must have a value other than None.

flag --data_dir=None: Flag --data_dir must have a value other than None.

flag --uniref90_database_path=None: Flag --uniref90_database_path must have a value other than None.

flag --mgnify_database_path=None: Flag --mgnify_database_path must have a value other than None.

flag --pdb70_database_path=None: Flag --pdb70_database_path must have a value other than None.

flag --template_mmcif_dir=None: Flag --template_mmcif_dir must have a value other than None.

flag --max_template_date=None: Flag --max_template_date must have a value other than None.

flag --obsolete_pdbs_path=None: Flag --obsolete_pdbs_path must have a value other than None.

Pass --helpshort or --helpfull to see help on flags.

/opt/conda/lib/python3.7/site-packages/absl/flags/_validators.py:206: UserWarning: Flag --preset has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!

'command line!' % flag_name)

FATAL Flags parsing error:

flag --fasta_paths=None: Flag --fasta_paths must have a value other than None.

flag --output_dir=None: Flag --output_dir must have a value other than None.

flag --model_names=None: Flag --model_names must have a value other than None.

flag --data_dir=None: Flag --data_dir must have a value other than None.

flag --uniref90_database_path=None: Flag --uniref90_database_path must have a value other than None.

flag --mgnify_database_path=None: Flag --mgnify_database_path must have a value other than None.

flag --pdb70_database_path=None: Flag --pdb70_database_path must have a value other than None.

flag --template_mmcif_dir=None: Flag --template_mmcif_dir must have a value other than None.

flag --max_template_date=None: Flag --max_template_date must have a value other than None.

flag --obsolete_pdbs_path=None: Flag --obsolete_pdbs_path must have a value other than None.

Pass --helpshort or --helpfull to see help on flags.

/opt/conda/lib/python3.7/site-packages/absl/flags/_validators.py:206: UserWarning: Flag --preset has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!

'command line!' % flag_name)

FATAL Flags parsing error:

flag --fasta_paths=None: Flag --fasta_paths must have a value other than None.

flag --output_dir=None: Flag --output_dir must have a value other than None.

flag --model_names=None: Flag --model_names must have a value other than None.

flag --data_dir=None: Flag --data_dir must have a value other than None.

flag --uniref90_database_path=None: Flag --uniref90_database_path must have a value other than None.

flag --mgnify_database_path=None: Flag --mgnify_database_path must have a value other than None.

flag --pdb70_database_path=None: Flag --pdb70_database_path must have a value other than None.

flag --template_mmcif_dir=None: Flag --template_mmcif_dir must have a value other than None.

flag --max_template_date=None: Flag --max_template_date must have a value other than None.

flag --obsolete_pdbs_path=None: Flag --obsolete_pdbs_path must have a value other than None.

Pass --helpshort or --helpfull to see help on flags.

full_dbs setting requires no creative commons license only Apache?

If running alphafold using the full_dbs setting does that mean only the apache license applies? Is it only the casp14 parameters that are covered under the creative commons license?

CUDA error out of memory

Hello,

Right now, we have a machine with 12 GB vRAM, and we get the error CUDA error out of memory. when folding with model 4 or 5. We suspect this is because the GPU does not have enough memory for the model. If so, our lab is wondering if there are ways to decrease the memory requirements. If not, we can try to find a different machine or GPU to run it.

Thanks!

Database disk type

The README states the need to put "the databases on an additional 3 TB disk". Should this be an SSD?

Distribution over multiple GPUs

Hi,
I ran alphafold on a 2k sequence using 2x V100 (32 GB) GPU. Like for shorter sequences, 29 GB are allocated on the first GPU and 300 MB on the other from the start. After hhblits I got an out-of-memory error from JAX that 26 GB couldn't be allocated. Does everything need to be loaded to a single GPU or should it in theory be possible to use the next GPU if the first one runs out of memory? Why are 300 MB allocated to the second GPU from the start?

deepmind / alphafold Goto Github PK

alphafold's Introduction

AlphaFold

Installation and running your first prediction

Genetic databases

Model parameters

Updating existing installation

Using deprecated model weights

Running AlphaFold

Running AlphaFold-Multimer

AlphaFold prediction speed

Examples

Folding a monomer

Folding a homomer

Folding a heteromer

Folding multiple monomers one after another

Folding multiple multimers one after another

AlphaFold output

Inferencing many proteins

Note on CASP14 reproducibility

Citing this work

Community contributions

Acknowledgements

Get in Touch

License and Disclaimer

AlphaFold Code License

Model Parameters License

Third-party software

Mirrored Databases

alphafold's People

Contributors

Stargazers

Watchers

Forkers

alphafold's Issues

To view examples of installing some common dependencies, click the "Open Examples" button below.

Recommend Projects

Recommend Topics

Recommend Org

To view examples of installing some common dependencies, click the
"Open Examples" button below.