Git Product home page Git Product logo

Comments (11)

aismail3-gnr8 avatar aismail3-gnr8 commented on September 27, 2024 3

Agreed with @wujiewang and thanks for the example @NatureGeorge!

@fabiotrovato, you can specify different selections for the structure step and the sequence step. These are the selection argument of the SubstructureConditioner constructor and the design_selection argument of the chroma.design method, respectively, in @NatureGeorge's example.

I'm a little unclear on something in your previous message, though:

the residues that should be structurally unchanged are 1 to 100 for chain 1. I want residues 1 to 100 to be structurally unchanged for chains 2 and 3, as well.

You're not trying to leave all the backbone coordinates the same and just redesign the sequence, right? That'd just be using the chroma.design method alone.

from chroma.

NatureGeorge avatar NatureGeorge commented on September 27, 2024 2

I have a working script that exactly does the following jobs:

@wujiewang
What you wanted to do requires a two stage process.

  • Use chroma._sample with SubstructureConditioner. The SubstructureConditioner can take coordinate clamping mask as shown in our example notebook, so you can regenerate part of 1XYZ with [1,2,3,10] masked.
  • Then, use chroma.design with the same residue constraint you desire. See Doc on design_selection in chroma.design #5 for example usage for sequence masking.

Thanks and let us know if you have more questions.

chroma = Chroma()
protein = Protein('3HSF.pdb', device='cuda:0')      # NOTE: change to your PDB file
residues_to_design = list(range(67,84))             # NOTE: change to your desire region
protein.sys.save_selection(gti=residues_to_design, selname="infilling_selection")
str_conditioner = conditioners.SubstructureConditioner(
        protein=protein,
        backbone_model=chroma.backbone_network,
        selection = 'not namesel infilling_selection').to('cuda:0')

infilled_protein = chroma._sample(                  # NOTE: use `chroma._sample` instead of `chroma.sample`, former keeps the aa sequence fixed
             protein_init=protein,
             conditioner=str_conditioner,
             langevin_factor=4.0,
             langevin_isothermal=True,
             inverse_temperature=8.0,
             sde_func='langevin',
             steps=500)

infilled_protein.sys.save_selection(gti=residues_to_design, selname="infilling_selection")
infilled_protein = chroma.design(infilled_protein, design_selection='namesel infilling_selection') 

display(infilled_protein)

That is it.


Besides, I thought conditioners.SubsequenceConditioner should be suitable for this job, but it seems not working.

from chroma.

aismail3-gnr8 avatar aismail3-gnr8 commented on September 27, 2024 2

Great! In this case, you can just modify @NatureGeorge's very nice example above. To be explicit, here's a snippet that redesigns the structure of a subset of one chain and the sequence of a subset of another chain.

protein = Protein("1XYZ", device="cuda")
str_conditioner = conditioners.SubstructureConditioner(
    protein=protein,
    backbone_model=chroma.backbone_network,
    selection="not (chain A and resid 30-60)",
).to("cuda")

new_protein = chroma._sample(
    protein_init=protein,
    conditioner=str_conditioner,
    langevin_factor=4.0,
    langevin_isothermal=True,
    inverse_temperature=8.0,
    sde_func="langevin",
    steps=500,
)

new_protein = chroma.design(new_protein, design_selection="chain B and resid 30-60")

Here's a check of the sequence redesign. Note that sometimes you get the same residue as in the original protein.

X_old, C_old, S_old = protein.to_XCS(all_atom=True)
X_new, C_new, S_new = new_protein.to_XCS(all_atom=True)
# C_old.abs() == 2 selects chain B
# in the tensor below, index 29 corresponds to resid 30
torch.isclose(S_old, S_new)[C_old.abs() == 2][19:39]

To check the structure redesign, first we need to standardize the origin. To do this, we translate both structures by the location of a particular undesigned residue so that it's at the origin in both structures. Then, we can check some coordinates.

# C > 0 selects residues for which we have structural information
X_old_standardized = X_old - X_old[0,25,0].expand(X_old.shape) * (C_old > 0)[:, :, None, None].expand(X_old.shape)
X_new_standardized = X_new - X_new[0,25,0].expand(X_new.shape) * (C_new > 0)[:, :, None, None].expand(X_new.shape)
# residues which have been moved, should get Falses
print(torch.isclose(X_old_standardized[0,29:39,0,0], X_new_standardized[0,29:39,0,0]))
# residues which have not been moved, should get Trues
print(torch.isclose(X_old_standardized[0,400:410,0,0], X_new_standardized[0,400:410,0,0])) 

I hope this helps! Please let us know if you have any more questions.

from chroma.

wujiewang avatar wujiewang commented on September 27, 2024 1

Thanks @NatureGeorge for this nice example! We should add this to Chroma cookbook.

@fabiotrovato you might want to follow this, note that you need to separately use ._sample() and .design().

The SubsequenceConditioner is a classifier guidance style conditioning, therefore it is doing a type of "soft" sequence conditioning. @NatureGeorge 's script will enforce a hard conditioning on the model directly.

from chroma.

wujiewang avatar wujiewang commented on September 27, 2024 1
  • My protein has three chains.

You can just use design mask if that is easier. If you want to clamp residue [1,2,3,10], you just set the corresponding residue to 1. You can also just selection string but I am less familiar with grammar.

example:

import torch
complex_mask = []

for _ in range(3):
    mask = torch.zeros(1, 200).to(torch.long)
    mask[:, torch.LongTensor([1, 2, 3, 10])] = 1
    
    complex_mask.append(mask)

compex_mask = torch.cat(complex_mask, dim=-1)
  • For any of the three chains, the residues for structure (1 to 100) and sequence ([1,2,3,10]) conditioning are different.
    Additionally, I want to specify the same residues for structure and sequence conditioning for all 3 chains.

I understand that the mask will be different for sequence and structure conditioning. So you can just specify the structure conditioning (1 to 100) and sequence conditioning ([1,2,3,10]) separately the same way @NatureGeorge did. It is probably easier if you just specify the mask as a binary long Tensor.

Thanks for the feedback! let us know if this makes more sense.

from chroma.

fabiotrovato avatar fabiotrovato commented on September 27, 2024 1

Thanks @aismail3-gnr8 (and everyone else)
I tried your code on my protein and the results are great.

selection="not (chain A and resid 30-60)" correctly leaves unchanged (or nearly so) the backbone of all residues but residues 30 to 60 of chain A. The latter have a different conformation but same sequence, as intended.

new_protein = chroma.design(new_protein, design_selection="chain B and resid 30-60") has the effect to leave the sequence of chains A and C unchanged. The sequence changes occur at chain B, residues 30 to 60.

Thanks for your help,
Fabio

from chroma.

aismail3-gnr8 avatar aismail3-gnr8 commented on September 27, 2024

For keeping residues unchanged during sequence design, try the design_selection argument, which accepts either selection strings or mask tensors. For instance:

chroma = Chroma()
protein = Protein("5SV5", device="cuda")
chroma.design(protein, design_selection="not resid 1-10") # keeps residues 1-10 fixed

You can use design_selection with chroma.sample as well, and it will get passed to the underlying sequence design step.

from chroma.

wujiewang avatar wujiewang commented on September 27, 2024

What you wanted to do requires a two stage process.

  • Use chroma._sample with SubstructureConditioner. The SubstructureConditioner can take coordinate clamping mask as shown in our example notebook, so you can regenerate part of 1XYZ with [1,2,3,10] masked.
  • Then, use chroma.design with the same residue constraint you desire. See #5 for example usage for sequence masking.

Thanks and let us know if you have more questions.

from chroma.

fabiotrovato avatar fabiotrovato commented on September 27, 2024

Thanks for the directions! I am not sure I fully understand how to apply them to my case. Suppose that protein 1XYZ has three chains, each with residues numbered from 1 to 100, according to the pdb (not sure how the residues in the pdb are mapped to chroma).

What I want to achieve is:

  • the residues that should be structurally unchanged are 1 to 100 for chain 1. I want residues 1 to 100 to be structurally unchanged for chains 2 and 3, as well.
  • The sequence of residues [1,2,3,10] should be unchanged for chains 1, 2 and 3.

I was playing around with the notebook ChromaDemo.ipynb. This is the code that I have come up with for keeping the substructure unchanged.

pdb_id = "1XYZ"  
chain1_length = 100
chain2_length = 100
chain3_length = 100

protein = Protein.from_PDBID(pdb_id, canonicalize=True, device=device)
X, C, _ = protein.to_XCS()
selection_string = "resid 1-300" 
residues_to_design = plane_split_protein(X, C, protein, 0.5).nonzero()[:, 1].tolist()
protein.sys.save_selection(gti=residues_to_design, selname=selection_string)
struct_conditioner = conditioners.SubstructureConditioner(
        protein, backbone_model=chroma.backbone_network, selection=selection_string
    ).to(device)

conditioner = struct_conditioner
infilled_protein, trajectories = chroma.sample(
    chain_lengths=[chain1_length, chain2_length, chain3_length],
    protein_init=protein,
    conditioner=conditioner,
    langevin_factor=4.0,
    langevin_isothermal=True,
    inverse_temperature=8.0,
    steps=500,
    sde_func="langevin",
    full_output=True,
)

I ignore how the pdb residues are mapped internally in chroma, so my first question is: do I have to use selection_string = "resid 1-300" or selection_string = "resid 1-100" ?

Regarding the sequence masking, if I had one chain and residues [1,2,3,10] to mask, I would do:

design_mask = torch.Tensor([1] * 3 + [0] * 6 + [1] * 1  + [0] * 90)[None].cuda()
protein = chroma.sample(chain_lengths=[chain1_length], design_selection=design_mask)
print( protein.sequence() )

Is the code snippet ^^^ correct for the case of 1 chain?

I am not sure how I should modify the above snippet for the case of 3 chains, since I ignore how the pdb residues are mapped to chroma. Can you please clarify?

Best,
Fabio

from chroma.

fabiotrovato avatar fabiotrovato commented on September 27, 2024

Hi @NatureGeorge and @wujiewang , thanks for your examples and explanations.
What I was asking is a bit more complex and I am not sure I fully understand if the example by @NatureGeorge applies to my case.

Please see my last post. A brief summary of that post:

  • My protein has three chains.
  • For any of the three chains, the residues for structure (1 to 100) and sequence ([1,2,3,10]) conditioning are different. Additionally, I want to specify the same residues for structure and sequence conditioning for all 3 chains.

Thanks again.

from chroma.

fabiotrovato avatar fabiotrovato commented on September 27, 2024

Hi @aismail3-gnr8
My post was misleading since i described the particular case of leaving unchanged all backbone atoms and performing sequence design on a subset of residues, but that's not the general case i have in mind. In the general case, i only want to condition the structure of a subset of residues (which can be different from the sequence mask).

Hope this clarifies a bit more my intentions. Thanks!

from chroma.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.