generatebio / chroma Goto Github PK

A generative model for programmable protein design

License: Apache License 2.0

Python 95.29% Jupyter Notebook 4.64% Dockerfile 0.07%

chroma's Introduction

Get Started | Sampling | Design | Conditioners | License

Chroma is a generative model for designing proteins programmatically.

Protein space is complex and hard to navigate. With Chroma, protein design problems are represented in terms of composable building blocks from which diverse, all-atom protein structures can be automatically generated. As a joint model of structure and sequence, Chroma can also be used for common protein modeling tasks such as generating sequences given backbones, packing side-chains, and scoring designs.

We provide protein conditioners for a variety of constraints, including substructure, symmetry, shape, and neural-network predictions of some protein classes and annotations. We also provide an API for creating your own conditioners in a few lines of code.

Internally, Chroma uses diffusion modeling, equivariant graph neural networks, and conditional random fields to efficiently sample all-atom structures with a complexity that is sub-quadratic in the number of residues. It can generate large complexes in a few minutes on a commodity GPU. You can read more about Chroma, including biophysical and crystallographic validation of some early designs, in our paper, Illuminating protein space with a programmable generative model. Nature 2023.

Get Started

Note: An API key is required to download and use the pretrained model weights. It can be obtained here.

Colab Notebooks. The quickest way to get started with Chroma is our Colab notebooks, which provide starting points for a variety of use cases in a preconfigured, in-browser environment

Chroma Quickstart: GUI notebook demonstrating unconditional and conditional generation of proteins with Chroma.
Chroma API Tutorial: Code notebook demonstrating protein I/O, sampling, and design configuration directly in python.
Chroma Conditioner API Tutorial: A deeper dive under the hood for implementing new Chroma Conditioners.

PyPi package.You can install the latest release of Chroma with:

pip install generate-chroma

Install latest Chroma from github

git clone https://github.com/generatebio/chroma.git
pip install -e chroma # use `-e` for it to be editable locally.

Sampling

Unconditional monomer. We provide a unified entry point to both unconditional and conditional protein design with the Chroma.sample() method. When no conditioners are specified, we can sample a simple 200-amino acid monomeric protein with

from chroma import Chroma

chroma = Chroma()
protein = chroma.sample(chain_lengths=[200])

protein.to("sample.cif")
display(protein)

Generally, Chroma.sample() takes as input design hyperparameters and Conditioners and outputs Protein objects representing the all-atom structures of protein systems which can be loaded to and from disk in PDB or mmCIF formats.

Unconditional complex. To sample a complex instead of a monomer, we can simply do

from chroma import Chroma

chroma = Chroma()
protein = chroma.sample(chain_lengths=[100, 200])

protein.to("sample-complex.cif")

Conditional complex. We can further customize sampling towards design objectives via Conditioners and sampling hyperparameters. For example, to sample a C3-symmetric homo-trimer with 100 residues per monomer, we can do

from chroma import Chroma, conditioners

chroma = Chroma()
conditioner = conditioners.SymmetryConditioner(G="C_3", num_chain_neighbors=2)
protein = chroma.sample(
    chain_lengths=[100],
    conditioner=conditioner,
    langevin_factor=8,
    inverse_temperature=8,
    sde_func="langevin",
    potts_symmetry_order=conditioner.potts_symmetry_order)

protein.to("sample-C3.cif")

Because compositions of conditioners are conditioners, even relatively complex design problems can follow this basic usage pattern. See the demo notebooks and docstrings for more information on hyperparameters, conditioners, and starting points.

Design

Robust design. Chroma is a joint model of sequence and structure that uses a common graph neural network base architecture to parameterize both backbone generation and conditional sequence and sidechain generation. These sequence and sidechain decoders are diffusion-aware in the sense that they have been trained to predict sequence and side chain not just for natural structures at diffusion time $t=0$ but also on noisy structures at all diffusion times $t \in [0,1]$. As a result, the $t$ hyperpameter of the design network provides a kind of tunable robustness via diffusion augmentation in we trade off between how much the model attempts to design the backbone exactly as specified (e.g. $t=0.0$) versus robust design within a small neighborhood of nearby backbone conformations (e.g. $t=0.5$).

While all results presented in the Chroma publication were done with exact design at $t=0.0$, we have found robust design at times near $t=0.5$ frequently improves one-shot refolding while incurring only minor, often Ångstrom-scale, relaxation adjustments to target backbones. When we compare the performance of these two design modes on our set of 50,000 unconditional backbones that were analyzed in the paper, we see very large improvements in refolding across both AlphaFold and ESMFold that stratifies well across protein length, percent helicity, or similarity to a known structure (See Chroma Supplementary Figure 14 for further context).

The value of diffusion time conditioning $t$ can be set via the design_t parameter in Chroma.sample and Chroma.design. We find that for generated structures, $t = 0.5$ produces highly robust refolding results and is, therefore, the default setting. For experimentally-precise structures, $t = 0.0$ may be more appropriate, and values in between may provide a useful tradeoff between these two regimes.

Design a la carte. Chroma's design network can be accessed separately to design, redesign, and pack arbitrary protein systems. Here we load a protein from the PDB and redesign as

# Redesign a Protein
from chroma import Protein, Chroma
chroma = Chroma()

protein = Protein('1GFP')
protein = chroma.design(protein)

protein.to("1GFP-redesign.cif")

Clamped sub-sequence redesign is also available and compatible with a built-in selection algebra, along with position- and mutation-specific mask constraints as

# Redesign a Protein
from chroma import Protein, Chroma
chroma = Chroma()

protein = Protein('my_favorite_protein.cif') # PDB is fine too
protein = chroma.design(protein, design_selection="resid 20-50 around 5.0") #  5 angstrom bubble around indices 20-50

protein.to("my_favorite_protein_redesign.cif")

We provide more examples of design in the demo notebooks.

Conditioners

Protein design with Chroma is programmable. Our Conditioner framework allows for automatic conditional sampling under arbitrary compositions of protein specifications, which can come in the forms of restraints (biasing the distribution of states) or constraints (directly restrict the domain of underlying sampling process); see Supplementary Appendix M in our paper. We have pre-defined multiple conditioners, including for controlling substructure, symmetry, shape, semantics, and natural-language prompts (see chroma.layers.structure.conditioners), which can be used in arbitrary combinations.

Conditioner	Class(es) in `chroma.conditioners`	Example applications
Symmetry constraint	`SymmetryConditioner`, `ScrewConditioner`	Large symmetric assemblies
Substructure constraint	`SubstructureConditioner`	Substructure grafting, scaffold enforcement
Shape restraint	`ShapeConditioner`	Molecular shape control
Secondary structure	`ProClassConditioner`	Secondary-structure specification
Domain classification	`ProClassConditioner`	Specification of class, such as Pfam, CATH, or Taxonomy
Text caption	`ProCapConditioner`	Natural language prompting
Sequence	`SubsequenceConditioner`	Subsequence constraints.

How it works. The central idea of Conditioners is composable state transformations, where each Conditioner is a function that modifies the state and/or energy of a protein system in a differentiable way (Supplementary Appendix M). For example, to encode symmetry as a constraint we can take as input the assymetric unit and tesselate it according to the desired symmetry group to output a protein system that is symmetric by construction. To encode something like a neural network restraint, we can adjust the total system energy by the negative log probability of the target condition. For both of these, we add on the diffusion energy to the output of the Conditioner(s) and then backpropagate the total energy through all intermediate transformations to compute the unconstrained forces that are compatible with generic sampling SDE such as annealed Langevin Dynamics.

We schematize this overall Conditioners framework below.

The Conditioner class is the composable building block of protein design with Chroma.

Conditioner API

It is simple to develop new conditioners. A Conditioner is a Pytorch nn.Module which takes in the system state - i.e. the structure, energy, and diffusion time - and outputs potentially updated structures and energies as

class Conditioner(torch.nn.Module):
    """A composable function for parameterizing protein design problems.
    """
    def __init__(self, *args, **kwargs):
        super().__init__()
        # Setup your conditioner's hyperparameters

    def forward(
        self,
        X: torch.Tensor,                # Input coordinates
        C: torch.LongTensor,            # Input chain map (for complexes)
        O: torch.Tensor,                # Input sequence (one-hot, not used)
        U: torch.Tensor,                # Input energy (one-hot, not used)
        t: Union[torch.Tensor, float],  # Diffusion time
    ):
        # Update the state, e.g. map from an unconstrained to constrained manifold
        X_update, C_update  = update_state(X, C, t)

        # Update the energy, e.g. add a restraint potential
        U_update = U + update_energy(X, C, t)
        return X_update, C_update, O, U_update, t

Roughly speaking, Conditioners are composable by construction because their input and output type signatures are matched (i.e. they are an endomorphism). So we also simply build conditioners from conditioners by "stacking" them much as we would with traditional neural network layer developemnt. With the final Conditioner as an input, Chroma.sample() will then leverage Pytorch's automatic differentiation facilities to automaticallly furnish a diffusion-annealed MCMC sampling algorithm to sample with this conditioner (We note this isn't magic and taking care to scale and parameterize appropriately is important).

A minimal Conditioner: 2D lattice symmetry

The code snippet below shows how in a few lines of code we can add a conditioner that stipulates the generation of a 2D crystal-like object, where generated proteins are arrayed in an M x N rectangular lattice.

import torch
from chroma.models import Chroma
from chroma.layers.structure import conditioners

class Lattice2DConditioner(conditioners.Conditioner):
    def __init__(self, M, N, cell):
        super().__init__()
        # Setup the coordinates of a 2D lattice
        self.order = M*N
        x = torch.arange(M) * cell[0]
        y = torch.arange(N) * cell[1]
        xx, yy = torch.meshgrid(x, y, indexing="ij")
        dX = torch.stack([xx.flatten(), yy.flatten(), torch.zeros(M * N)], dim=1)
        self.register_buffer("dX", dX)
        
    def forward(self, X, C, O, U, t): 
        # Tesselate the unit cell on the lattice
        X = (X[:,None,...] + self.dX[None,:,None,None]).reshape(1, -1, 4, 3)
        C = torch.cat([C + C.unique().max() * i for i in range(self.dX.shape[0])], dim=1)
        # Average the gradient across the group (simplifies force scaling)
        X.register_hook(lambda gradX: gradX / self.order)
        return X, C, O, U, t
    
chroma = Chroma().cuda()
conditioner = Lattice2DConditioner(M=3, N=4, cell=[20., 15.]).cuda()
protein = chroma.sample(
    chain_lengths=[70], conditioner=conditioner, sde_func='langevin',
    potts_symmetry_order=conditioner.order
)

protein.to_CIF("lattice_protein.cif")

Note on Conditioners

An attractive aspect of this conditioner framework is that it is very general, enabling both constraints (which involve operations on $x$) and restraints (which amount to changes to $U$). At the same time, generation under restraints can still be (and often is) challenging, as the resulting effective energy landscape can become arbitrarily rugged and difficult to integrate. We therefore advise caution when using and developing new conditioners or conditioner combinations. We find that inspecting diffusition trajectories (including unconstrained and denoised trajectories, $\hat{x}_t$ and $\tilde{x}_t$) can be a good tool for identifying integration challenges and defining either better conditioner forms or better sampling regimes.

Citing Chroma

If you use Chroma in your research, please cite:

J. B. Ingraham, M. Baranov, Z. Costello, K. W. Barber, W. Wang, A. Ismail, V. Frappier, D. M. Lord, C. Ng-Thow-Hing, E. R. Van Vlack, S. Tie, V. Xue, S. C. Cowles, A. Leung, J. V. Rodrigues, C. L. Morales-Perez, A. M. Ayoub, R. Green, K. Puentes, F. Oplinger, N. V. Panwar, F. Obermeyer, A. R. Root, A. L. Beam, F. J. Poelwijk, and G. Grigoryan, "Illuminating protein space with a programmable generative model", Nature, 2023 (10.1038/s41586-023-06728-8).

@Article{Chroma2023,
  author  = {Ingraham, John B. and Baranov, Max and Costello, Zak and Barber, Karl W. and Wang, Wujie and Ismail, Ahmed and Frappier, Vincent and Lord, Dana M. and Ng-Thow-Hing, Christopher and Van Vlack, Erik R. and Tie, Shan and Xue, Vincent and Cowles, Sarah C. and Leung, Alan and Rodrigues, Jo\~{a}o V. and Morales-Perez, Claudio L. and Ayoub, Alex M. and Green, Robin and Puentes, Katherine and Oplinger, Frank and Panwar, Nishant V. and Obermeyer, Fritz and Root, Adam R. and Beam, Andrew L. and Poelwijk, Frank J. and Grigoryan, Gevorg},
  journal = {Nature},
  title   = {Illuminating protein space with a programmable generative model},
  year    = {2023},
  volume  = {},
  number  = {},
  pages   = {},
  doi     = {10.1038/s41586-023-06728-8}
}

Acknowledgements

The Chroma codebase is the work of many contributers at Generate Biomedicines. We would like to acknowledge: Ahmed Ismail, Alan Witmer, Alex Ramos, Alexander Bock, Ameya Harmalkar, Brinda Monian, Craig Mackenzie, Dan Luu, David Moore, Frank Oplinger, Fritz Obermeyer, George Kent-Scheller, Gevorg Grigoryan, Jacob Feala, James Lucas, Jenhan Tao, John Ingraham, Martin Jankowiak, Max Baranov, Meghan Franklin, Mick Ward, Rudraksh Tuwani, Ryan Nelson, Shan Tie, Vincent Frappier, Vincent Xue, William Wolfe-McGuire, Wujie Wang, Zak Costello, Zander Harteveld.

License

Chroma Code License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this code except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. See the License for the specific language governing permissions and limitations under the License.

Model Weights License

Chroma weights are freely available to academic researchers and non-profit entities who accept and agree to be bound under the terms of the Chroma Parameters License. Please visit the weights download page for more information. If you are not eligible to use the Chroma Parameters under the terms of the provided License or if you would like to share the Chroma Parameters and/or otherwise use the Chroma Parameters beyond the scope of the rights granted in the License (including for commercial purposes), you may contact the Licensor at: [email protected].

chroma's People

Contributors

Stargazers

Watchers

chroma's Issues

Batched conditional generation

Some conditioners have not been tested for batched generation.

How to do deterministic sampling?

Hi! Thanks for the great work!

Though I have a question about how to do deterministic sampling.
I find the inference results are not the same when running a code multiple times (like I masked one portion of residues and redesign them). Even after I choose the max probability each node while doing autoregressive sampling, the results are not the same. I found that permutations are done when preprocessing the chains, this might be the reason.
For potts sampling, the results also differs in different run times.

How can I do deterministic sampling, to choose the one with maximum probability (or to say the most possible chain that I want)?

How to design the interface residues within complex.

Hi Chroma user and experts,

Recently, I tried to use Chroma to re-design some interface residues between two different subunits. I used the command as following that:

from chroma import Protein, Chroma
chroma = Chroma()
protein = Protein('test.pdb', device='cuda:0')
protein = chroma.design(protein, design_selection="chain A and resid 241+242+243+245+270+271+272+273+274+275+276+277+278+282+283+284+285+286+287+288+310+311+325+326+327+328+329")
protein.to("redesign.pdb")

Here is just an example I tried to go through the tutorial practice in another way. There wasn't any error it complained. However, the redesign.pdb had shown that in the Chain B, there also had been re-designed at the same position as Chain A. So, How do I modify the selection sentence or commands to change this situation. Redesign Chain B isn't my expectation. Please help me, Thanks so much.

Best regards,
Ning

Caption data

Hello,

Amazing work! Curious if the captions used to train the ProCap will be made available.
Best,
Logan

ProClass Conditioner List of Values

The docstring states, "for a complete list of valid value label pairs import the value dictionary from the GraphClassifierLoader in the zoo."

Where can we find access to this?

thank you for making the codebase very easy to read. 🙏🏽

De-novo binder generation with a hotspot

Hi, this is a question that I have asked before and I know it has been discussed in some other threads, however, I don't feel there is a sufficient solution yet.

Given a receptor, I would like to de-novo design a binder in a particular spot. I have tried two approaches.

De novo generation with no hotspots: This works well, simply using a receptor PDB with one chain and creating a new chain in the Chroma API. However, there is no way to specify where to place the chain, so only 1 in 20 binders are in the right region.
I have tried supplying a PDB with a complex of the receptor and existing binder, and used the substructure conditioner to only keep one residue from the existing binder, to sort of ground the Chroma diffusion in the right place. While this works in keeping the Chroma design in the binding pocket, the binders that Chroma generates are not good, usually being long, improbable single alpha helixes despite using the CATH conditioner for alpha helix bundles. (in contrast, in the first approach the CATH conditioner works really well to design good looking binders, albeit in the wrong spot).

Please let me know if there is any recommended solution for this. Thank you! Chroma is awesome!

Chroma use on read-only file systems

Is it possible to use Chroma when installed on a read-only filesystem? The software appears to require write access to the software itself for API and model parameters.

Permission denied: '.../lib/python3.8/site-packages/chroma/layers/structure/params/centering_2g3n.params'

Ideally parameters and API access keys could be configured with a variable or in $HOME - is that possible?

fileNotFound error

I installed chroma in a conda environment. I registered the API key via this code:

from chroma.utility.api import register_key
register_key("[mytoken]")

When I try to run a simple program such as:
from chroma import Chroma

chroma = Chroma()

I receive this error message:

Using cached data from C:\Users\Elijah\AppData\Local\Temp\chroma_weights\90e339502ae6b372797414167ce5a632\weights.pt
Computing reference stats for 2g3n
Traceback (most recent call last):
File "C:\Users\Elijah\DataspellProjects\pythonProject\chroma1.py", line 3, in
chroma = Chroma()
File "C:\ProgramData\anaconda3\envs\pdbs\lib\site-packages\chroma\models\chroma.py", line 84, in init
self.backbone_network = graph_backbone.load_model(
File "C:\ProgramData\anaconda3\envs\pdbs\lib\site-packages\chroma\models\graph_backbone.py", line 405, in load_model
return utility_load_model(
File "C:\ProgramData\anaconda3\envs\pdbs\lib\site-packages\chroma\utility\model.py", line 107, in load_model
model = model_class(**params["init_kwargs"]).to(device)
File "C:\ProgramData\anaconda3\envs\pdbs\lib\site-packages\chroma\models\graph_backbone.py", line 114, in init
[
File "C:\ProgramData\anaconda3\envs\pdbs\lib\site-packages\chroma\models\graph_backbone.py", line 115, in
BackboneEncoderGNN(
File "C:\ProgramData\anaconda3\envs\pdbs\lib\site-packages\chroma\models\graph_design.py", line 1192, in init
self.feature_graph = protein_graph.ProteinFeatureGraph(
File "C:\ProgramData\anaconda3\envs\pdbs\lib\site-packages\chroma\layers\structure\protein_graph.py", line 182, in init
self._load_centering_params(self.centered_pdb)
File "C:\ProgramData\anaconda3\envs\pdbs\lib\site-packages\chroma\layers\structure\protein_graph.py", line 276, in _load_centering_params
param_dictionary = self._reference_stats(reference_pdb)
File "C:\ProgramData\anaconda3\envs\pdbs\lib\site-packages\chroma\layers\structure\protein_graph.py", line 294, in _reference_stats
X, C, _ = Protein.from_PDBID(reference_pdb).to_XCS()
File "C:\ProgramData\anaconda3\envs\pdbs\lib\site-packages\chroma\data\protein.py", line 234, in from_PDBID
RCSB_file_download(pdb_id, ".cif", file_cif)
File "C:\ProgramData\anaconda3\envs\pdbs\lib\site-packages\chroma\utility\fetchdb.py", line 47, in RCSB_file_download
return _download_file(url, local_filename)
File "C:\ProgramData\anaconda3\envs\pdbs\lib\site-packages\chroma\utility\fetchdb.py", line 27, in _download_file
with open(out_file, "wb") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/2g3n.cif'

Did I not register the API key correctly, or is this another issue such as it being in a conda environment?

De-novo binder generation

The supplement of the original Nature paper suggests that De-novo binders could be generated by "Combine (i) substructure conditioning on antigen, (ii) optional scaffold constraint on binder, and (iii) contact constraints on epitope/paratope."

From my current understanding of Chroma, I am unsure on how to use an existing antigen as the template for a de novo binder. I've been able to successfully generate re-designs on existing binders using code similar to #9 but de novo is proving more of a challenge. In practice, what API(s) should one use to ask Chroma to design a brand new chain in complex with an existing antigen?

As a second question, the "(iii) contact constraints" should ostensibly be implemented via the "Substructure distances conditioner" (pg 80, Supp Table 6). My understanding of this module is that it would allow the user to provide Chroma with a pre-specified binding location, akin to hotspots in RFDiffusion. I recognize all the other conditioners from the source code, but I can't find the substructure distances module. Has that one already been implemented/open-sourced, or does it exist as a mathematical template for now until its implemented in Python?

Example of training a classifier based on graphclassifier

Hey all,

Would anyone be able to share a classifier training example, such as what was the script for the ProClassConditioner?

Thanks!

Constrained diffusion on one chain in a protein complex

Hi,

This is a really impressive tool and I am thinking of many cool things this can be used for in the future. I was wondering if the following would be possible with Chroma.

Given a pdb file that is a 2 protein complex (chain A and chain B), how would I do constrained diffusion on chain A? In other words, I want to hold chain B constant, and I want to constrain the output of Chroma to obey the topology of chain A.

Thanks for the help!

Trajectory cif file won't open in ChimeraX

Hello,

This is so cool! I just tried playing with the demo colab notebook and ran the "get a protein" example with the default 160 residues, and downloaded the trajectory cif file. I can open it in PyMOL and play the trajectory. But when trying to open it in ChimeraX (a viewer I am more familiar with), I get this error:

Summary of feedback from opening /home/guillaume/Downloads/protein_trajectory.cif
warnings	Skipping atom_site category: Missing column 'type_symbol' near line 231
No mmCIF models found.

Attaching this particular protein_trajectory.cif.txt file just in case it is helpful, but I suspect this would happen with any file, just with the error at a different line (I had to add a .txt extension for GitHub to accept uploading, you will need to rename it and remove this extension).

Is it possible to save the potts params used for sequence sampling?

Is it possible to save the potts params (L,A) and (L,A,L,A) given structure, used for sequence sampling at the design stage?

linking downloaded weight for offline chroma usage

Dear generatebio github,
I have gpu node that is not connected to the internet and thus need to run chroma offline.
Can you help me how to set up and call out the downloaded weight?
I appreciate your help and guidance.

Where could I find the code to generate the results in paper fig3 b?

Great work. I wonder where is the code to generate the data in your nature paper figure 3 b, for the out-filling? Thanks.

Length-variable design with substructure conditioner

Dear Chroma team,

Thanks for this amazing work and the open-source of the wonderful code!

I've been playing around with Chroma for a while and having some tests on some of the conditioners. Thanks for the example you provided and I'm wondering if the substructure conditioner (more specific, the infilling task) support length-variable design, which is, similar to something like motif-scaffolding around a pre-defined substructure.

I tried select some different motif from different chains and took the substructure alone, like:

protein = Protein('./input.pdb', canonicalize=True, device=device)
X, C, S = protein.to_XCS()
chain_B = Protein(X, C==2, S) # Select chain B as predefined substructure
X2, C2, S2 = chainB.to_XCS()
...... ( The rest of example code)

This is rather a dumb try since I found if I took chain_B object as predefined substructure and then sample a length-variable protein with chroma.sample(..., protein_init=chain_B, chain_lengths=[$length],...), it would raise an error that the

I went through the issue list and found a similar situation brought by @gha2012 #24 (comment)_ , but it seems there's some unexpected behaviour on the sampled backbones.

So I'm wondering if the length-variable design task suitable for the Chroma conditioner architecture. If you could provide some views or examples under this circumstance, I would be very grateful.

Many thanks!

Expected all tensors to be on the same device

Dear developer,

please give a hand on this issue, Thanks!

this example look fine
from chroma import Chroma
chroma = Chroma()
protein = chroma.sample(chain_lengths=[200])
protein.to("sample.cif")

But try running the example
from chroma import Protein, Chroma
chroma = Chroma()
protein = Protein('1GFP')
protein = chroma.design(protein)
protein.to("1GFP-redesign.cif")

gave error messages:
Traceback (most recent call last):
File "testDesignwithChroma.py", line 21, in
protein = chroma.design(protein)
File "C:\Users\project\structure\chroma\chroma\models\chroma.py", line 528, in design
X_sample, S_sample, _ = self.design_network.sample(
File "C:\Users\miniconda3\envs\py8\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Users\project\structure\chroma\chroma\data\xcs.py", line 114, in new_func
return func(*args, **kwargs)
File "C:\Users\project\structure\chroma\chroma\models\graph_design.py", line 826, in sample
node_h, edge_h, edge_idx, mask_i, mask_ij = self.encode(X, C, t=t)
File "C:\Users\project\structure\chroma\chroma\data\xcs.py", line 114, in new_func
return func(*args, **kwargs)
File "C:\Users\project\structure\chroma\chroma\models\graph_design.py", line 536, in encode
node_h, edge_h, edge_idx, mask_i, mask_ij = self.encoder(
File "C:\Users\miniconda3\envs\py8\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\project\structure\chroma\chroma\data\xcs.py", line 114, in new_func
return func(*args, **kwargs)
File "C:\Users\project\structure\chroma\chroma\models\graph_design.py", line 1237, in forward
node_h, edge_h, edge_idx, mask_i, mask_ij = self._checkpoint(
File "C:\Users\project\structure\chroma\chroma\models\graph_design.py", line 1253, in _checkpoint
return module(*args)
File "C:\Users\miniconda3\envs\py8\lib\site-packages\torch\nn\modules\module.py", line 1130, in call_impl
return forward_call(*input, **kwargs)
File "C:\Users\project\structure\chroma\chroma\layers\structure\protein_graph.py", line 223, in forward
node_h_l = node_h_l - self.getattr(f"node_means{i}")
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Strange output from inpainting

I have been playing around with inpainting and after replicating the examples that you provided I tried it out on my own protein.

I found that while inpainting a region of 40 amino acids out of 157, that the output from Chroma was a protein in which it appeared to explode with a completely broken chain. I then attempted to modify the various SDE / Langevin dynamics parameters but these did not result in more stable designs. In the zipped folder attached, I have the following files:

original_protein.pdb is the starting protein from which I am trying to inpaint a subset of the residues (indices 80-139)
large_inpainting_final.cif is the structure output from the inpainting design
large_inpainting_traj.cif is the trajectory

To debug this further I tried reducing the size of the inpainting. I noticed when inpainting only 10 residues, Chroma provided reasonable looking designs. These files are:

large_inpainting_traj.cif
small_inpainting_traj.cif

Next, I increased the inpainting size from 10 to 20 residues. This time, things did not explode, however, Chroma inpainted a region that forms a knot with the rest of the protein and clashes quite significantly. These files are:

medium_inpainting_final.cif
medium_inpainting_traj.cif

The code I used to do the inpainting is:

device='cuda'

protein = Protein('original_protein.pdb', device=device)
residues_to_fix = [i for i in range(156) if i < 80 or i > 139]
protein.sys.save_selection(gti=residues_to_fix, selname="infilling_selection")

conditioner = conditioners.SubstructureConditioner(
    protein,
    backbone_model=chroma.backbone_network,
    selection = 'namesel infilling_selection').to(device)

infilled_protein = chroma.sample(
    protein_init=protein,
    conditioner=conditioner,
    langevin_factor=4.0,
    langevin_isothermal=True,
    inverse_temperature=8.0,
    full_output=True,
    steps=500)

As mentioned above, I also tried modifying Langevin_factor, Langevin_isothermal, and inverse_temperature.

If you are able to provide any insight into why this may be happening or provide guidance on what I may be doing wrong, that would be great. It is also possible that this is just a case where inpainting seems to not work well for whatever reason (sampling is hard, protein is out of distribution enough). Thank you!

for_github.zip

Doc on design_selection in chroma.design

Hi all, thanks for your work Chroma.
Could u please provide a description about design_selection in chroma.design (rules or something)?
The source code is a little bit complex to understand.
Thanks for your help.

Best,
Shaoning.

Computing the TM score against the PDB dataset

How can I compute the TM score against the PDB dataset for a newly generated protein? Similar to what you have done for Fig 2e.

Thanks

Protein folding

Thank you for making this excellent project open source!

Is Chroma able to do protein folding, as in, provided an amino acid sequence with no structural information (i.e. a FASTA file) generate a 3D structure for that sequence?

I don't see any examples of this in the notebooks or demo files.

Substructure Conditioner - Supplementary Appendix O

Hi,

Fantastic work - I've had a lot of fun playing with different design principles enabled by Chroma.

While reviewing the chroma paper I saw Supplementary Appendix O details a particular Substructure conditioner that suggests a specific protocol/potential for inter-residues distance restraints. Is this conditioner available [l looked around, but didn't easily find it] or would it need to be derived from the higher-level substructure conditioner? If so, any code resources or advice in the right direction would be appreciated!

Design topology not obeyed

In the colab ipynb https://colab.research.google.com/github/generatebio/chroma/blob/main/notebooks/ChromaDemo.ipynb, I have tried 3 times to create a TIM-barrel (CATH 3.20.20) with different lengths in the range of known TIM-barrels.
None of the 3 times did I get something resembling a TIM-barrel. All three times I got a 4-strand antiparallel beta sheet, surrounded by several helices that do not form a barrel.

Design multiple sequences for the same input structure

Hello,

Thanks for Chroma! It is a pretty great tool.

I do have a query - is there a way to design multiple sequences (say 10) for the same input structure? I am new to Chroma and have tried running the design code multiple times with the same structure, but it gives out the same output.

Any help in this matter would be appreciated. Thank you!

Here is the code I am using (same as the one in the README)

protein = Protein(f'/path/to/protein.pdb',device='cuda')
protein = chroma.design(protein)
protein.to(f"output_path.pdb")

Running chroma from docker

Hi everyone,

I wonder how to properly create docker image and run container in order to utilize GPU? For now, chroma is running using CPU only.

What I did:

I created an image from Dockerfile using docker build -t chroma . command
Container was started using docker run -i -t --runtime=nvidia --gpus all chroma
nvidia-smi within container will show that I have GPU available
However, any example will run on CPU, using all available cores.

Progress Bars during Sampling

Is it possible to turn off the progress bars during sampling ?

Made a few adjustments, please check out. 'samples' issue and batch splitting.

So... I have been having some issues and tried to find my own way of solving them.

samples argument in chroma.sample:

I wanted to produce more than one protein outputs from chroma.sample(...) with 'samples' argument, but sadly, it only generated 'one', probably due to lack of actual code that takes 'samples' integer as the number of outputs.
For my own use, I only needed to implement the 'design' part (since I wanted multiple sequence designs per one backbone sample). Thus, I simply added for loop in the chroma.sample() part, as seen in the models.chroma module script file.
I made the minimal amount of changes since I did not want to ruin the logic of the model. So... I hope the team somehow fix this issue.

The sample run consumes TOO MUCH GPU memory.

I use GeForce rtx3050ti(4GB memory), which is, honestly not the best GPU option out there. However, I had absolutely no problems running RFdiffusion, AlphaFold, and sometimes, even MD simulations with my tiny laptop. chroma module run, on the other hand, always fails with 'Segmentation Fault', due to lack of GPU memory available.
In my case, it turned out to be the issue with the size of the batch. Step size as small as 150 causes segmentation fault issue. So, I made a small changes in the layers.structure.conditioner file and models.chroma file.
For layers.structure, I added checkpoints for parts with intensive calculations. I added checkpoints to the conditioners I used(substructure, subsequence, procap) and all is well now.
The more significant change though, is splitting the batch in models.chroma file. I simply split the number of steps to 100 steps, and avoided scoring part until the last batch, integrating the result of all batches for final outcomes. For smaller designs, batch size 50 works fine, but I used 100 for the sequence of 364 aa proteins.
This is all I did, but it definitely helped. Again, I made minimal changes, since I did not want to ruin the system.

Please take this issues into your consideration when releasing the next version.

Thanks!
chroma_modifications.zip

Installation fails on windows 11 and wsl2 with python >= 3.11

See attached file for error trace

I tried with python 3.11.6 and 3.12.0, both are no go.
I tried from git repo with same result.
I tried using a venv in python 3.11.6, but no go either.

the problem is scikit-learn version that is still using distutils or setuptools which are deprecated.

Even using a compatible version of setuptools the installation is not going ahead of this point above.

On WSL2 there's the same problem.

could you please advice?

error trace chroma.txt

Prediction of secondary structure

Hi,

In figures 5g and 5h of your paper, you indicate that Chroma can predict the alpha and beta content of the generated molecules. Could you indicate the code you use for this prediction?

Thanks
Antonio

Selection syntax

Hi!

Thanks all for making this open-sourced for the research community and for all of the documentation. I'm attempting to replace 2 chains of a protein complex with a de novo design using the substructure example, but I'm struggling with selection syntax. I've seen options such as "chain_id =", "resid (2)", etc. in the documentation, but I'm unsure how these map to PBD files. If I aim to infill chains B and G in a structure, would I specify something like "chain B or chain G" as the selection_string, or is the syntax different? Is there a guide to Chroma's selection syntax?

Kind regards,
Bobby

ImportError: cannot import name 'api' from 'chroma'

running:
from chroma import api
api.register_key("xxx")

ImportError: cannot import name 'api' from 'chroma' (/opt/conda/envs/python37/lib/python3.7/site-packages/chroma/init.py)

Tensors not on the same device

When i run one of the minimal examples

# test.py
from chroma import Chroma, Protein

chroma = Chroma()
protein = Protein('1GFP')
protein = chroma.design(protein)

protein.to("1GFP-redesign.cif")

I get an error that some tensors are not on the same devices:

(chroma_env) Ξ software/chroma git:(main) ▶ python3 test.py
Using cached data from /tmp/chroma_weights/90e339502ae6b372797414167ce5a632/weights.pt
Loaded from cache
cuda
Using cached data from /tmp/chroma_weights/03a3a9af343ae74998768a2711c8b7ce/weights.pt
Loaded from cache
Traceback (most recent call last):
  File "/home/iwe34/software/chroma/test.py", line 5, in <module>
    protein = chroma.design(protein)
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/iwe34/software/chroma/chroma/models/chroma.py", line 532, in design
    X_sample, S_sample, _ = self.design_network.sample(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/iwe34/anaconda3/envs/chroma_env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/iwe34/software/chroma/chroma/data/xcs.py", line 114, in new_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/iwe34/software/chroma/chroma/models/graph_design.py", line 826, in sample
    node_h, edge_h, edge_idx, mask_i, mask_ij = self.encode(X, C, t=t)
                                                ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/iwe34/software/chroma/chroma/data/xcs.py", line 114, in new_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/iwe34/software/chroma/chroma/models/graph_design.py", line 536, in encode
    node_h, edge_h, edge_idx, mask_i, mask_ij = self.encoder(
                                                ^^^^^^^^^^^^^
  File "/home/iwe34/anaconda3/envs/chroma_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/iwe34/software/chroma/chroma/data/xcs.py", line 114, in new_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/iwe34/software/chroma/chroma/models/graph_design.py", line 1237, in forward
    node_h, edge_h, edge_idx, mask_i, mask_ij = self._checkpoint(
                                                ^^^^^^^^^^^^^^^^^
  File "/home/iwe34/software/chroma/chroma/models/graph_design.py", line 1253, in _checkpoint
    return module(*args)
           ^^^^^^^^^^^^^
  File "/home/iwe34/anaconda3/envs/chroma_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/iwe34/software/chroma/chroma/layers/structure/protein_graph.py", line 224, in forward
    node_h_l = node_h_l - self.__getattr__(f"node_means_{i}")
               ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

When I try to fix that putting everything on the GPU for this particular line, the error occur on another line (next shown below):

  File "/home/iwe34/software/chroma/chroma/layers/structure/protein_graph.py", line 1443, in forward
    h = torch.exp(-(((h.unsqueeze(-1) - rbf_centers) / self.std) ** 2))
                      ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

That looks to me that a lot of other tensors also would show the same problem and that it is not an easy way to fix, because the device type don't seemed to be share across all classes :(

long time persist model parameters

Dear developers,

Here is a solution for long time persist model parameters. Which would save some networks. It would be better to have a sha256 check inside the cache-check process.

diff --git a/chroma/utility/api.py b/chroma/utility/api.py
index 902b776..ce996c8 100644
--- a/chroma/utility/api.py
+++ b/chroma/utility/api.py
@@ -21,7 +21,11 @@ import requests

 import chroma

-ROOT_DIR = os.path.dirname(os.path.dirname(chroma.__file__))
+# SETTING CHROMA_ROOT_DIR or use default directory: ~/.config/chroma
+ROOT_DIR = os.environ.get(
+    "CHROMA_ROOT_DIR",
+    os.path.join(os.path.expanduser("~"), ".config", "chroma"))
+os.makedirs(ROOT_DIR, exist_ok=True)


 def register_key(key: str, key_directory=ROOT_DIR) -> None:
@@ -92,11 +96,8 @@ def download_from_generate(

     # Create a hash of the URL + weight name to determine the path for the cached/temporary file
     url_hash = hashlib.md5((base_url + weights_name).encode()).hexdigest()
-    temp_dir = os.path.join(tempfile.gettempdir(), "chroma_weights", url_hash)
-    destination = os.path.join(temp_dir, "weights.pt")
-
-    # Ensure the directory exists
-    os.makedirs(temp_dir, exist_ok=True)
+    os.makedirs(os.path.join(ROOT_DIR, "weights"), exist_ok=True)
+    destination = os.path.join(ROOT_DIR, "weights", f"{url_hash}.pt")

     # Check if cache exists
     cache_exists = os.path.exists(destination)
@@ -117,8 +118,14 @@ def download_from_generate(
     response = requests.get(base_url, params=params)
     response.raise_for_status()  # Raise an error for HTTP errors

-    with open(destination, "wb") as file:
-        file.write(response.content)
+    # Write into temp_file
+    temp_file = tempfile.TemporaryFile()
+    temp_file.write(response.content)
+
+    # Write into cached destination
+    with open(destination, "wb") as f:
+        temp_file.seek(0)
+        f.write(temp_file.read())

     print(f"Data saved to {destination}")
     return destination

Cannot select multiple index ranges of protein for SubstructureConditioner

I am trying to select the ends of my protein and then redesign the middle. After struggling with the syntax, I was able to configure a selection string that did not produce an interpretation error: "resid (1-30) and resid (110-150)"

However, when I try to run the design process, I get the error in the first step of integrating SDE:
torch._C._LinAlgError: linalg.eigh: (Batch element 0): The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated eigenvalues (error code: 1).

I have tried running it with different combinations of arguments, different residue ranges (only using the ends, not using the ends, only one end included, etc.), but I always get this error for more than one residue range. Am I simply selecting the ranges incorrectly or is this not supported by this particular container?

Binder design notebook with specified hotspot

Great work and elegant methods!

Will anyone share a notebook for binder design that can specify hotspot residues on the receptor?

Many thanks!

Chroma

seeking advices for conditional binder design

Thanks chroma authors for open-sourcing your amazing work.

I am trying to use chroma to modify only 10aa of a previous binder, given a receptor. What I want to do in the sampling phase is sample simultaneously both the docking of binder on receptor's surface and new backbone of 10 modified aa.

What I've been trying is like that:

fix structure and sidechain of receptor by SubstructureConditioner and provide aa mask directly in design
fix structure and sidechain of non-modified part of binder also by SubstructureConditioner and provide aa mask directly in design
sample backbone and sequence in a small part of the binder

My problem is that if I do like that, the coordinates of both receptor and binder remain unchanged, therefore no sampling of docking happens. Can you please suggest me how I can do that? Thank you for your help in advance.

protein_1 = Protein(".complex.pdb", device=device) #receptor in chain B, binder in chain A
X, C, S = protein.to_XCS()

L_binder = (C == 2).sum().item()
L_receptor = (C == 1).sum().item()
L_complex = L_binder+L_receptor

modify_AAs = [i for i in range(321,333)] #indexes of aa being modified

# keep original seqs of unmodified aa by providing the mask
mask_aa = torch.Tensor(L_complex * [[1] * 20])
for i in range(L_complex):
    if i not in modify_AAs:
        mask_aa[i] = torch.Tensor([0] * 20)
        mask_aa[i][S[0][i].item()] = 1
mask_aa = mask_aa[None].cuda()

residues_to_keep_R = [i for i in range(L_receptor)]
protein.sys.save_selection(gti=residues_to_keep_R, selname="receptor")
conditioner_struc_R = conditioners.SubstructureConditioner(
        protein,
        backbone_model=chroma.backbone_network,
        selection = 'namesel receptor').to(device)

residues_to_keep_B = [i for i in range(L_receptor,L_complex) if i not in modify_AAs]
protein.sys.save_selection(gti=residues_to_keep_B, selname="binder")
conditioner_struc_B = conditioners.SubstructureConditioner(
        protein,
        backbone_model=chroma.backbone_network,
        selection = 'namesel binder', gamma=0.5).to(device)

conditioner = conditioners.ComposedConditioner([conditioner_struc_R, conditioner_struc_B, ])

protein, trajectories = chroma.sample(
    protein_init=protein,
    conditioner=conditioner,
    design_selection = mask_aa,
    langevin_factor=2,
    langevin_isothermal=True,
    inverse_temperature=8.0,
    sde_func='langevin',
    full_output=True,
    steps=500,
)

protein.to("sample.cif")

'display' issue

Hi, i'm running on a MacMini via remote login and get an error on the demo (simple 200-amino acid monomeric protein) example .. any thoughts appreciated

python sample_protein.py
Data saved to /var/folders/7w/86cljl2560722vp9qy4rff4r0000gn/T/chroma_weights/90e339502ae6b372797414167ce5a632/weights.pt
Computing reference stats for 2g3n
Data saved to /var/folders/7w/86cljl2560722vp9qy4rff4r0000gn/T/chroma_weights/03a3a9af343ae74998768a2711c8b7ce/weights.pt
Loaded from cache
Integrating SDE: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [09:40<00:00, 1.16s/it]
Potts Sampling: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:04<00:00, 112.53it/s]
Sequential decoding: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:01<00:00, 193.11it/s]
Traceback (most recent call last):
File "/Users/venuv62/Desktop/biology/sample_protein.py", line 9, in
display(protein)
NameError: name 'display' is not defined

Question for CDRs Method

When I attempted to redesign the complementarity-determining regions (CDRs) of the antibody, I found that it is only possible to fix multiple epitopes using a method similar to the one depicted in the diagram below. Has anyone tried this approach? Are there any other methods available?

the structure of sequences generated by chroma.design()

hi, thanks for this amazing work. I am trying to use chroma.design() to redesign my protein (chain4_cdlm.pdb). I wonder how can I get the structure of redesigned sequences (protein1) using chroma? I directly display the redesigned protein(protein1), but I found it is the structure is exactly the same as my input protein (chain4_cdlm.pdb) . Hope you could give me some suggestion! Thanks a lot!

# Configure Substructure Conditioner
from chroma.utility.chroma import plane_split_protein

device='cuda'

chroma = Chroma()

protein = Protein('/home/lynn/Desktop/6mrd/6mrd_pdb/chain4_cdlm.pdb',  device=device)

print(protein)

protein1 = chroma.design(protein, design_selection='resid 50-100')
print(protein1)
display(protein1)

Simple example fails on windows 11 on jupyter lab

Example below

from tqdm.autonotebook import tqdm
from chroma import Chroma, conditioners

chroma = Chroma()
conditioner = conditioners.SymmetryConditioner(G="C_3", num_chain_neighbors=2)
protein = chroma.sample(
chain_lengths=[100],
conditioner=conditioner,
langevin_factor=8,
inverse_temperature=8,
sde_func="langevin",
potts_symmetry_order=conditioner.potts_symmetry_order)

protein.to("sample-C3.cif")

fails with:

`

FileNotFoundError Traceback (most recent call last)
Cell In[4], line 4
1 from tqdm.autonotebook import tqdm
2 from chroma import Chroma, conditioners
----> 4 chroma = Chroma()
5 conditioner = conditioners.SymmetryConditioner(G="C_3", num_chain_neighbors=2)
6 protein = chroma.sample(
7 chain_lengths=[100],
8 conditioner=conditioner,
(...)
11 sde_func="langevin",
12 potts_symmetry_order=conditioner.potts_symmetry_order)

File ~\OneDrive\Documenti\programmazione\chroma\lib\site-packages\chroma\models\chroma.py:84, in Chroma.init(self, weights_backbone, weights_design, device, strict, verbose)
81 else:
82 device = "cpu"
---> 84 self.backbone_network = graph_backbone.load_model(
85 weights_backbone, device=device, strict=strict, verbose=verbose
86 ).eval()
88 self.design_network = graph_design.load_model(
89 weights_design, device=device, strict=strict, verbose=False,
90 ).eval()

File ~\OneDrive\Documenti\programmazione\chroma\lib\site-packages\chroma\models\graph_backbone.py:405, in load_model(weight_file, device, strict, strict_unexpected, verbose)
377 def load_model(
378 weight_file: str,
379 device: str = "cpu",
(...)
382 verbose: bool = True,
383 ) -> GraphBackbone:
384 """Load model GraphBackbone
385
386 Args:
(...)
403 model (GraphBackbone): Instance of GraphBackbone with loaded weights.
404 """
--> 405 return utility_load_model(
406 weight_file,
407 GraphBackbone,
408 device=device,
409 strict=strict,
410 strict_unexpected=strict_unexpected,
411 verbose=verbose,
412 )

File ~\OneDrive\Documenti\programmazione\chroma\lib\site-packages\chroma\utility\model.py:107, in load_model(weights, model_class, device, strict, strict_unexpected, verbose)
105 # load model weights
106 params = torch.load(weights, map_location="cpu")
--> 107 model = model_class(**params["init_kwargs"]).to(device)
108 missing_keys, unexpected_keys = model.load_state_dict(
109 params["model_state_dict"], strict=strict
110 )
111 if strict_unexpected and len(unexpected_keys) > 0:

File ~\OneDrive\Documenti\programmazione\chroma\lib\site-packages\chroma\models\graph_backbone.py:114, in GraphBackbone.init(self, dim_nodes, dim_edges, num_neighbors, node_features, edge_features, num_layers, dropout, node_mlp_layers, node_mlp_dim, edge_update, edge_mlp_layers, edge_mlp_dim, skip_connect_input, mlp_activation, decoder_num_hidden, graph_criterion, graph_random_min_local, backbone_update_method, backbone_update_iterations, backbone_update_num_weights, backbone_update_unconstrained, use_time_features, time_feature_type, time_log_feature_scaling, noise_schedule, noise_covariance_model, noise_beta_min, noise_beta_max, noise_log_snr_range, noise_complex_scaling, loss_scale, loss_scale_ssnr_cutoff, loss_function, checkpoint_gradients, prediction_type, num_graph_cycles, **kwargs)
111 # Encoder GNN process backbone
112 self.num_graph_cycles = args.num_graph_cycles
113 self.encoders = nn.ModuleList(
--> 114 [
115 BackboneEncoderGNN(
116 dim_nodes=args.dim_nodes,
117 dim_edges=args.dim_edges,
118 num_neighbors=args.num_neighbors,
119 node_features=args.node_features,
120 edge_features=args.edge_features,
121 num_layers=args.num_layers,
122 node_mlp_layers=args.node_mlp_layers,
123 node_mlp_dim=args.node_mlp_dim,
124 edge_update=args.edge_update,
125 edge_mlp_layers=args.edge_mlp_layers,
126 edge_mlp_dim=args.edge_mlp_dim,
127 mlp_activation=args.mlp_activation,
128 dropout=args.dropout,
129 skip_connect_input=args.skip_connect_input,
130 graph_criterion=args.graph_criterion,
131 graph_random_min_local=args.graph_random_min_local,
132 checkpoint_gradients=checkpoint_gradients,
133 )
134 for i in range(self.num_graph_cycles)
135 ]
136 )
138 self.backbone_updates = nn.ModuleList(
139 [
140 backbone.GraphBackboneUpdate(
(...)
149 ]
150 )
152 self.use_time_features = args.use_time_features

File ~\OneDrive\Documenti\programmazione\chroma\lib\site-packages\chroma\models\graph_backbone.py:115, in (.0)
111 # Encoder GNN process backbone
112 self.num_graph_cycles = args.num_graph_cycles
113 self.encoders = nn.ModuleList(
114 [
--> 115 BackboneEncoderGNN(
116 dim_nodes=args.dim_nodes,
117 dim_edges=args.dim_edges,
118 num_neighbors=args.num_neighbors,
119 node_features=args.node_features,
120 edge_features=args.edge_features,
121 num_layers=args.num_layers,
122 node_mlp_layers=args.node_mlp_layers,
123 node_mlp_dim=args.node_mlp_dim,
124 edge_update=args.edge_update,
125 edge_mlp_layers=args.edge_mlp_layers,
126 edge_mlp_dim=args.edge_mlp_dim,
127 mlp_activation=args.mlp_activation,
128 dropout=args.dropout,
129 skip_connect_input=args.skip_connect_input,
130 graph_criterion=args.graph_criterion,
131 graph_random_min_local=args.graph_random_min_local,
132 checkpoint_gradients=checkpoint_gradients,
133 )
134 for i in range(self.num_graph_cycles)
135 ]
136 )
138 self.backbone_updates = nn.ModuleList(
139 [
140 backbone.GraphBackboneUpdate(
(...)
149 ]
150 )
152 self.use_time_features = args.use_time_features

File ~\OneDrive\Documenti\programmazione\chroma\lib\site-packages\chroma\models\graph_design.py:1192, in BackboneEncoderGNN.init(self, dim_nodes, dim_edges, num_neighbors, node_features, edge_features, num_layers, node_mlp_layers, node_mlp_dim, edge_update, edge_mlp_layers, edge_mlp_dim, skip_connect_input, mlp_activation, dropout, graph_distance_atom_type, graph_cutoff, graph_mask_interfaces, graph_criterion, graph_random_min_local, checkpoint_gradients, **kwargs)
1182 self.checkpoint_gradients = checkpoint_gradients
1184 graph_kwargs = {
1185 "distance_atom_type": args.graph_distance_atom_type,
1186 "cutoff": args.graph_cutoff,
(...)
1189 "random_min_local": args.graph_random_min_local,
1190 }
-> 1192 self.feature_graph = protein_graph.ProteinFeatureGraph(
1193 dim_nodes=args.dim_nodes,
1194 dim_edges=args.dim_edges,
1195 num_neighbors=args.num_neighbors,
1196 graph_kwargs=graph_kwargs,
1197 node_features=args.node_features,
1198 edge_features=args.edge_features,
1199 )
1201 self.gnn = graph.GraphNN(
1202 dim_nodes=args.dim_nodes,
1203 dim_edges=args.dim_edges,
(...)
1215 checkpoint_gradients=checkpoint_gradients,
1216 )

File ~\OneDrive\Documenti\programmazione\chroma\lib\site-packages\chroma\layers\structure\protein_graph.py:182, in ProteinFeatureGraph.init(self, dim_nodes, dim_edges, num_neighbors, graph_kwargs, node_features, edge_features, centered, centered_pdb)
180 self.centered_pdb = centered_pdb.lower()
181 if self.centered:
--> 182 self._load_centering_params(self.centered_pdb)
184 """
185 Storing separate linear transformations for each layer, rather than concat + one
186 large linear, provides a more even weighting of the different input
(...)
191 dimensions.
192 """
193 self.node_linears = nn.ModuleList(
194 [nn.Linear(l.dim_out, self.dim_nodes) for l in self.node_layers]
195 )

File ~\OneDrive\Documenti\programmazione\chroma\lib\site-packages\chroma\layers\structure\protein_graph.py:276, in ProteinFeatureGraph._load_centering_params(self, reference_pdb)
274 else:
275 print(f"Computing reference stats for {reference_pdb}")
--> 276 param_dictionary = self._reference_stats(reference_pdb)
277 json_line = json.dumps(param_dictionary)
278 f.write(prefix + "\t" + json_line + "\n")

File ~\OneDrive\Documenti\programmazione\chroma\lib\site-packages\chroma\layers\structure\protein_graph.py:294, in ProteinFeatureGraph._reference_stats(self, reference_pdb)
293 def _reference_stats(self, reference_pdb):
--> 294 X, C, _ = Protein.from_PDBID(reference_pdb).to_XCS()
295 stats_dict = self._feature_stats(X, C)
296 return stats_dict

File ~\OneDrive\Documenti\programmazione\chroma\lib\site-packages\chroma\data\protein.py:234, in Protein.from_PDBID(cls, pdb_id, canonicalize, device)
231 from chroma.utility.fetchdb import RCSB_file_download
233 file_cif = f"/tmp/{pdb_id}.cif"
--> 234 RCSB_file_download(pdb_id, ".cif", file_cif)
235 protein = cls.from_CIF(file_cif, canonicalize=canonicalize, device=device)
236 unlink(file_cif)

File ~\OneDrive\Documenti\programmazione\chroma\lib\site-packages\chroma\utility\fetchdb.py:47, in RCSB_file_download(pdb_id, ext, local_filename)
37 """Downloads a file from the RCSB files section.
38
39 Args:
(...)
44 None
45 """
46 url = f"https://files.rcsb.org/view/{pdb_id.upper()}{ext}"
---> 47 return _download_file(url, local_filename)

File ~\OneDrive\Documenti\programmazione\chroma\lib\site-packages\chroma\utility\fetchdb.py:27, in _download_file(url, out_file)
25 with requests.get(url, stream=True) as r:
26 r.raise_for_status()
---> 27 with open(out_file, "wb") as f:
28 for chunk in r.iter_content(chunk_size=8192):
29 if chunk:

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/2g3n.cif'

problem about "protein = chroma.design(protein)"

Hi Chroma experts,

Thanks for your distinguished program for protein design. I'm trying to use it on my laptop that having one GPU and one CPU. When I tried to use this command from tutorial in the python3 interface, like that " protein = chroma.design(protein) " , some errors were accidentally appeared.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Please help me, I also am a freshman in Python programming.

Best regards,
Ning

Great work! Question re. weight format/transformers 4.35.2

Hi Chroma team, great work! I would need to run Chroma in an environment with an updated transformers library (e.g. 4.35.2). However some of the weights are not compatible. Detailed error below that occurs when I load the ProCap module for text conditioning.

Would it be possible to convert the weights to the new transformers format or is there a workaround?

Thank you!

ERROR:

[..] Exception: Error loading model from checkpoint file: /tmp/chroma_weights/87243729397de5f93afc4f392662d1b5/weights.pt contains 24 unexpected keys: ['language_model.transformer.h.0.attn.attention.bias', 'language_model.transformer.h.0.attn.attention.masked_bias', 'language_model.transformer.h.1.attn.attention.bias', 'language_model.transformer.h.1.attn.attention.masked_bias', 'language_model.transformer.h.2.attn.attention.bias', 'language_model.transformer.h.2.attn.attention.masked_bias', 'language_model.transformer.h.3.attn.attention.bias', 'language_model.transformer.h.3.attn.attention.masked_bias', 'language_model.transformer.h.4.attn.attention.bias', 'language_model.transformer.h.4.attn.attention.masked_bias', 'language_model.transformer.h.5.attn.attention.bias', 'language_model.transformer.h.5.attn.attention.masked_bias', 'language_model.transformer.h.6.attn.attention.bias', 'language_model.transformer.h.6.attn.attention.masked_bias', 'language_model.transformer.h.7.attn.attention.bias', 'language_model.transformer.h.7.attn.attention.masked_bias', 'language_model.transformer.h.8.attn.attention.bias', 'language_model.transformer.h.8.attn.attention.masked_bias', 'language_model.transformer.h.9.attn.attention.bias', 'language_model.transformer.h.9.attn.attention.masked_bias', 'language_model.transformer.h.10.attn.attention.bias', 'language_model.transformer.h.10.attn.attention.masked_bias', 'language_model.transformer.h.11.attn.attention.bias', 'language_model.transformer.h.11.attn.attention.masked_bias']

HTTPError: 500 Server Error: Internal Server Error for url: https://chroma-weights.generatebiomedicines.com/downloads?token=%5B<api-key>%5D&weights=chroma_backbone_v1.0.pt

Hello, I have been trying to re-design an n-terminal,
I am fairly still new to coding/commands, so please be gentle and thorough with explanation.
I have created a py file as such:
name: VP2_test.py
Code inside file:

from chroma import api
api.register_key("api-key")

from chroma import Protein, Chroma
chroma = Chroma()

protein = Protein('TAPGKKRPVEPSPQRSPDSSTGIGKKGQQPARKRLNFGQTGDSESVPDPQPLGEPPAAPSGVGPNTMAAGGGAPMADNNEGADGVGSSSGNWHCDSTWLGDRVITTSTRTWALPTYNNHLYKQISNGTSGGATNDNTYFGYSTPWGYFDFNRFHCHFSPRDWQRLINNNWGFRPKRLSFKLFNIQVKEVTQNEGTKTIANNLTSTIQVFTDSEYQLPYVLGSAHQGCLPPFPADVFMIPQYGYLTLNNGSQAVGRSSFYCLEYFPSQMLRTGNNFQFTYTFEDVPFHSSYAHSQSLDRLMNPLIDQYLYYLSRTQTTGGTANTQTLGFSQGGPNTMANQAKNWLPGPCYRQQRVSTTTGQNNNSNFAWTAGTKYHLNGRNSLANPGIAMATHKDDEERFFPSNGILIFGKQNAARDNADYSDVMLTSEEEIKTTNPVATEEYGIVADNLQQQNTAPQIGTVNSQGALPGMVWQNRDVYLQGPIWAKIPHTDGNFHPSPLMGGFGLKHPPPQILIKNTPVPADPPTTFNQSKLNSFITQYSTGQVSVEIEWELQKENSKRWNPEIQYTSNYYKSTSVDFAVNTEGVYSEPRPIGTRYLTRNL')
protein = chroma.design(protein, design_selection="resid 1-65 around 5.0")

protein.to("test.pdb")

Everytime I try to run in windows prompt:

Python VP2_test.py

It gives me this error and I am unsure why it is giving me the error.

(hw2) C:\Users\myuser\chroma>python VP2_test.py
Traceback (most recent call last):
File "C:\Users\myuser\chroma\VP2_test.py", line 6, in
chroma = Chroma()
File "C:\Users\myuser\chroma\chroma\models\chroma.py", line 84, in init
self.backbone_network = graph_backbone.load_model(
File "C:\Users\myuser\chroma\chroma\models\graph_backbone.py", line 405, in load_model
return utility_load_model(
File "C:\Users\myuser\chroma\chroma\utility\model.py", line 104, in load_model
weights = api.download_from_generate(
File "C:\Users\myuser\chroma\chroma\utility\api.py", line 118, in download_from_generate
response.raise_for_status() # Raise an error for HTTP errors
File "C:\Users\myuser.conda\envs\hw2\lib\site-packages\requests-2.31.0-py3.9.egg\requests\models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://chroma-weights.generatebiomedicines.com/downloads?token=%5Bapi-key%5D&weights=chroma_backbone_v1.0.pt

Can you help me? Thank you

(edited to mask API key)

Example on how to condition on Sequence (and Structure)

Hi,

I would like to use Chroma for the following experiment. Suppose I have protein pdb 1XYZ. I would like to condition my protein design on (1) the structure and (2) the sequence. More specifically:

I would like to be able to specify protein residues that should be structurally unchanged.
I also would like to specify regions of the protein sequence that should be unchanged. For example, let's assume that the residues at positions [1,2,3,10] are the residues that I do not want Chroma to "mutate" to a different type (i.e. they have to be the same residue types as in 1XYZ).

I know how to condition on the structure, as explained in one of the notebooks. Can anyone provide an example of how to tell Chroma to not change the residue identity for residues at positions [1,2,3,10]?

Thank you for your support and great work,
Fabio

Is the training code available?

Hi,

I am currently interested in using my own dataset to fine-tune chroma for a specific use case. However, I have noticed that the training code is not available in the repository. I believe that having access to the training scripts would greatly enhance the project's utility and allow the community to contribute more effectively.

Would it be possible for you to consider releasing the training code or providing guidance on how to train the model with custom data? I understand that this might require additional time and effort, but I believe it would be a valuable addition to the project.

Thank you for considering my request.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

I run the following redesign code

Redesign a Protein

from chroma import Protein, Chroma
chroma = Chroma()
protein = Protein('1GFP')
protein = chroma.design(protein)
protein.to("1GFP-redesign.cif")

It comes with errors (attached).
errors.txt

Create functional proteins from a given protein Template/Profile?

Thank you for developing Chroma. Chroma is such an excellent generative AI tool for de novo protein design.
I would like to inquire about how one can use Chroma to design new functional proteins (not just substructures), for instance:

What is the process for leveraging Chroma to create proteins inspired by the functions of existing enzymes? Could one, for example, utilize a protein template—such as a specific domain extracted from a PDB file, a MSA file, a pre-trained HMM profile etc. —for guidance in this design process?

A point of note: In given demo file, it appears that the CATH inputs are restricted to three levels (e.g., "3.30.40"), yet four levels could potentially be more beneficial for the design of a functional protein? Is there a way to incorporate this additional level of detail in Chroma's design process?

In the context of scaffold design, is there a method within Chroma to designate certain critical amino acid residues as invariant hotspots or a interface? ensuring they remain unaltered throughout the design process?

Conditioning on AA composition

Hi,

Amazing work you all did developing Chroma. Whilst playing around, I was wondering if there's a way to condition the sequence/structure on residue content. So for example if it is possible to bias the composition towards alanine and serine, representing x% and y% of all residues respectively?

Best,

Alex

generatebio / chroma Goto Github PK

chroma's Introduction

Get Started

Sampling

Design

Conditioners

Conditioner API

A minimal Conditioner: 2D lattice symmetry

Note on Conditioners

Citing Chroma

Acknowledgements

License

Chroma Code License

Model Weights License

chroma's People

Contributors

Stargazers

Watchers

Forkers

chroma's Issues

`

Redesign a Protein

Recommend Projects

Recommend Topics

Recommend Org