Comments (11)
Agreed with @wujiewang and thanks for the example @NatureGeorge!
@fabiotrovato, you can specify different selections for the structure step and the sequence step. These are the selection
argument of the SubstructureConditioner
constructor and the design_selection
argument of the chroma.design
method, respectively, in @NatureGeorge's example.
I'm a little unclear on something in your previous message, though:
the residues that should be structurally unchanged are 1 to 100 for chain 1. I want residues 1 to 100 to be structurally unchanged for chains 2 and 3, as well.
You're not trying to leave all the backbone coordinates the same and just redesign the sequence, right? That'd just be using the chroma.design
method alone.
from chroma.
I have a working script that exactly does the following jobs:
@wujiewang
What you wanted to do requires a two stage process.
- Use
chroma._sample
withSubstructureConditioner
. TheSubstructureConditioner
can take coordinate clamping mask as shown in our example notebook, so you can regenerate part of 1XYZ with[1,2,3,10]
masked.- Then, use
chroma.design
with the same residue constraint you desire. See Doc on design_selection in chroma.design #5 for example usage for sequence masking.Thanks and let us know if you have more questions.
chroma = Chroma()
protein = Protein('3HSF.pdb', device='cuda:0') # NOTE: change to your PDB file
residues_to_design = list(range(67,84)) # NOTE: change to your desire region
protein.sys.save_selection(gti=residues_to_design, selname="infilling_selection")
str_conditioner = conditioners.SubstructureConditioner(
protein=protein,
backbone_model=chroma.backbone_network,
selection = 'not namesel infilling_selection').to('cuda:0')
infilled_protein = chroma._sample( # NOTE: use `chroma._sample` instead of `chroma.sample`, former keeps the aa sequence fixed
protein_init=protein,
conditioner=str_conditioner,
langevin_factor=4.0,
langevin_isothermal=True,
inverse_temperature=8.0,
sde_func='langevin',
steps=500)
infilled_protein.sys.save_selection(gti=residues_to_design, selname="infilling_selection")
infilled_protein = chroma.design(infilled_protein, design_selection='namesel infilling_selection')
display(infilled_protein)
That is it.
Besides, I thought conditioners.SubsequenceConditioner
should be suitable for this job, but it seems not working.
from chroma.
Great! In this case, you can just modify @NatureGeorge's very nice example above. To be explicit, here's a snippet that redesigns the structure of a subset of one chain and the sequence of a subset of another chain.
protein = Protein("1XYZ", device="cuda")
str_conditioner = conditioners.SubstructureConditioner(
protein=protein,
backbone_model=chroma.backbone_network,
selection="not (chain A and resid 30-60)",
).to("cuda")
new_protein = chroma._sample(
protein_init=protein,
conditioner=str_conditioner,
langevin_factor=4.0,
langevin_isothermal=True,
inverse_temperature=8.0,
sde_func="langevin",
steps=500,
)
new_protein = chroma.design(new_protein, design_selection="chain B and resid 30-60")
Here's a check of the sequence redesign. Note that sometimes you get the same residue as in the original protein.
X_old, C_old, S_old = protein.to_XCS(all_atom=True)
X_new, C_new, S_new = new_protein.to_XCS(all_atom=True)
# C_old.abs() == 2 selects chain B
# in the tensor below, index 29 corresponds to resid 30
torch.isclose(S_old, S_new)[C_old.abs() == 2][19:39]
To check the structure redesign, first we need to standardize the origin. To do this, we translate both structures by the location of a particular undesigned residue so that it's at the origin in both structures. Then, we can check some coordinates.
# C > 0 selects residues for which we have structural information
X_old_standardized = X_old - X_old[0,25,0].expand(X_old.shape) * (C_old > 0)[:, :, None, None].expand(X_old.shape)
X_new_standardized = X_new - X_new[0,25,0].expand(X_new.shape) * (C_new > 0)[:, :, None, None].expand(X_new.shape)
# residues which have been moved, should get Falses
print(torch.isclose(X_old_standardized[0,29:39,0,0], X_new_standardized[0,29:39,0,0]))
# residues which have not been moved, should get Trues
print(torch.isclose(X_old_standardized[0,400:410,0,0], X_new_standardized[0,400:410,0,0]))
I hope this helps! Please let us know if you have any more questions.
from chroma.
Thanks @NatureGeorge for this nice example! We should add this to Chroma cookbook.
@fabiotrovato you might want to follow this, note that you need to separately use ._sample()
and .design()
.
The SubsequenceConditioner is a classifier guidance style conditioning, therefore it is doing a type of "soft" sequence conditioning. @NatureGeorge 's script will enforce a hard conditioning on the model directly.
from chroma.
- My protein has three chains.
You can just use design mask if that is easier. If you want to clamp residue [1,2,3,10], you just set the corresponding residue to 1. You can also just selection string but I am less familiar with grammar.
example:
import torch
complex_mask = []
for _ in range(3):
mask = torch.zeros(1, 200).to(torch.long)
mask[:, torch.LongTensor([1, 2, 3, 10])] = 1
complex_mask.append(mask)
compex_mask = torch.cat(complex_mask, dim=-1)
- For any of the three chains, the residues for structure (1 to 100) and sequence ([1,2,3,10]) conditioning are different.
Additionally, I want to specify the same residues for structure and sequence conditioning for all 3 chains.
I understand that the mask will be different for sequence and structure conditioning. So you can just specify the structure conditioning (1 to 100) and sequence conditioning ([1,2,3,10]) separately the same way @NatureGeorge did. It is probably easier if you just specify the mask as a binary long Tensor.
Thanks for the feedback! let us know if this makes more sense.
from chroma.
Thanks @aismail3-gnr8 (and everyone else)
I tried your code on my protein and the results are great.
selection="not (chain A and resid 30-60)"
correctly leaves unchanged (or nearly so) the backbone of all residues but residues 30 to 60 of chain A. The latter have a different conformation but same sequence, as intended.
new_protein = chroma.design(new_protein, design_selection="chain B and resid 30-60")
has the effect to leave the sequence of chains A and C unchanged. The sequence changes occur at chain B, residues 30 to 60.
Thanks for your help,
Fabio
from chroma.
For keeping residues unchanged during sequence design, try the design_selection
argument, which accepts either selection strings or mask tensors. For instance:
chroma = Chroma()
protein = Protein("5SV5", device="cuda")
chroma.design(protein, design_selection="not resid 1-10") # keeps residues 1-10 fixed
You can use design_selection
with chroma.sample
as well, and it will get passed to the underlying sequence design step.
from chroma.
What you wanted to do requires a two stage process.
- Use
chroma._sample
withSubstructureConditioner
. TheSubstructureConditioner
can take coordinate clamping mask as shown in our example notebook, so you can regenerate part of 1XYZ with[1,2,3,10]
masked. - Then, use
chroma.design
with the same residue constraint you desire. See #5 for example usage for sequence masking.
Thanks and let us know if you have more questions.
from chroma.
Thanks for the directions! I am not sure I fully understand how to apply them to my case. Suppose that protein 1XYZ has three chains, each with residues numbered from 1 to 100, according to the pdb (not sure how the residues in the pdb are mapped to chroma).
What I want to achieve is:
- the residues that should be structurally unchanged are 1 to 100 for chain 1. I want residues 1 to 100 to be structurally unchanged for chains 2 and 3, as well.
- The sequence of residues [1,2,3,10] should be unchanged for chains 1, 2 and 3.
I was playing around with the notebook ChromaDemo.ipynb. This is the code that I have come up with for keeping the substructure unchanged.
pdb_id = "1XYZ"
chain1_length = 100
chain2_length = 100
chain3_length = 100
protein = Protein.from_PDBID(pdb_id, canonicalize=True, device=device)
X, C, _ = protein.to_XCS()
selection_string = "resid 1-300"
residues_to_design = plane_split_protein(X, C, protein, 0.5).nonzero()[:, 1].tolist()
protein.sys.save_selection(gti=residues_to_design, selname=selection_string)
struct_conditioner = conditioners.SubstructureConditioner(
protein, backbone_model=chroma.backbone_network, selection=selection_string
).to(device)
conditioner = struct_conditioner
infilled_protein, trajectories = chroma.sample(
chain_lengths=[chain1_length, chain2_length, chain3_length],
protein_init=protein,
conditioner=conditioner,
langevin_factor=4.0,
langevin_isothermal=True,
inverse_temperature=8.0,
steps=500,
sde_func="langevin",
full_output=True,
)
I ignore how the pdb residues are mapped internally in chroma, so my first question is: do I have to use selection_string = "resid 1-300"
or selection_string = "resid 1-100"
?
Regarding the sequence masking, if I had one chain and residues [1,2,3,10] to mask, I would do:
design_mask = torch.Tensor([1] * 3 + [0] * 6 + [1] * 1 + [0] * 90)[None].cuda()
protein = chroma.sample(chain_lengths=[chain1_length], design_selection=design_mask)
print( protein.sequence() )
Is the code snippet ^^^ correct for the case of 1 chain?
I am not sure how I should modify the above snippet for the case of 3 chains, since I ignore how the pdb residues are mapped to chroma. Can you please clarify?
Best,
Fabio
from chroma.
Hi @NatureGeorge and @wujiewang , thanks for your examples and explanations.
What I was asking is a bit more complex and I am not sure I fully understand if the example by @NatureGeorge applies to my case.
Please see my last post. A brief summary of that post:
- My protein has three chains.
- For any of the three chains, the residues for structure (1 to 100) and sequence ([1,2,3,10]) conditioning are different. Additionally, I want to specify the same residues for structure and sequence conditioning for all 3 chains.
Thanks again.
from chroma.
Hi @aismail3-gnr8
My post was misleading since i described the particular case of leaving unchanged all backbone atoms and performing sequence design on a subset of residues, but that's not the general case i have in mind. In the general case, i only want to condition the structure of a subset of residues (which can be different from the sequence mask).
Hope this clarifies a bit more my intentions. Thanks!
from chroma.
Related Issues (20)
- Progress Bars during Sampling HOT 1
- Constrained diffusion on one chain in a protein complex HOT 3
- Prediction of secondary structure HOT 2
- Substructure Conditioner - Supplementary Appendix O HOT 4
- Tensors not on the same device HOT 2
- Computing the TM score against the PDB dataset
- How to do deterministic sampling? HOT 4
- De-novo binder generation with a hotspot HOT 1
- HTTPError: 500 Server Error: Internal Server Error for url: https://chroma-weights.generatebiomedicines.com/downloads?token=%5B<api-key>%5D&weights=chroma_backbone_v1.0.pt HOT 5
- How to design the interface residues within complex.
- RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! HOT 4
- Made a few adjustments, please check out. 'samples' issue and batch splitting.
- Design multiple sequences for the same input structure HOT 4
- Create functional proteins from a given protein Template/Profile? HOT 1
- Example of training a classifier based on graphclassifier
- linking downloaded weight for offline chroma usage HOT 10
- Is the training code available?
- Conditioning on protein of different length than protein to be designed HOT 1
- Refinement of structure?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chroma.