Git Product home page Git Product logo

Comments (6)

jchodera avatar jchodera commented on June 2, 2024

I presume we currently want to use the canonical isomeric SMILES string as our way of judging whether fragments are unique. Later on, we may relax that.

from openff-fragmenter.

ChayaSt avatar ChayaSt commented on June 2, 2024

For the incoming molecules to fragment - should we enforce isomeric SMILES? If it's not isomeric, should we generate an isomeric SMILES before we generate conformations of the molecule to fragment?
In general I need to add a cleanup step for incoming molecules - generating an isomeric SMILES can be part of that step. However, shouldn't we also enumerate enantiomers if several stereocenters exist? The fragments can be different for different enantiomers.

from openff-fragmenter.

jchodera avatar jchodera commented on June 2, 2024

To clarify here:

  • For incoming molecules, we need one stage that expands ambiguous stereochemistry and enumerates likely protonation/tautomeric states. At this point, all molecules will then have specific stereochemistry and explicit hydrogens.
  • We should expand the JSON generated to specify both explicit-hydrogen canonical isomeric SMILES and non-isomeric SMILES so that either can be used as a key for indexing.
  • The resulting molecules can then be distributed in parallel to another function, which fragments each of these molecules and passes fragments on to the next stage.

from openff-fragmenter.

ChayaSt avatar ChayaSt commented on June 2, 2024

We should expand the JSON generated to specify both explicit-hydrogen canonical isomeric SMILES and non-isomeric SMILES so that either can be used as a key for indexing.

@jchodera, The reason we want to expand the JSON to include explicit-hydrogen canonical SMILES is to avoid ambiguity for charged states. However, when the charged states are expanded, explicit H are added where needed.So is the explicit-hydrogen SMILES redundant?

Example for a positively charged Imatinib:
Cc1ccc(cc1Nc2nccc(n2)c3ccc[nH+]c3)NC(=O)c4ccc(cc4)CN5CCN(CC5)C
This is what it looks like with explicit hydrogens:
[H]c1c(c(c([n+](c1[H])[H])[H])c2c(c(nc(n2)N([H])c3c(c(c(c(c3C([H])([H])[H])[H])[H])N([H])C(=O)c4c(c(c(c(c4[H])[H])C([H])([H])N5C(C(N(C(C5([H])[H])([H])[H])C([H])([H])[H])([H])[H])([H])[H])[H])[H])[H])[H])[H])[H]

Currently each fragment has 5 SMILES associated with it:

  1. Explicit hydrogen index tagged SMILES (canonical and isomeric)
  2. canonical isomeric SMILES
  3. canonical isomeric explicit hydrogen SMILES
  4. canonical SMILES
  5. canonical explicit hydrogen SMILES.

Once the database starts getting larger, this might become too large to maintain.

from openff-fragmenter.

jchodera avatar jchodera commented on June 2, 2024

I think all five will be useful! But it's possible 3 and 5 are redundant if the tautomeric and charge state are both encoded by 2 and 4. This is probably a question for @bannanc or @cbayly13

from openff-fragmenter.

ChayaSt avatar ChayaSt commented on June 2, 2024

Stereoisomer (cis/trans, R/S) enumeration was addressed with #10.

from openff-fragmenter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.