Git Product home page Git Product logo

lukasturcani / stk Goto Github PK

View Code? Open in Web Editor NEW
237.0 11.0 45.0 45.83 MB

A Python library which allows construction and manipulation of complex molecules, as well as automatic molecular design and the creation of molecular databases.

License: MIT License

Python 99.97% Just 0.03%
chemistry cheminformatics reactions computational-science computational-chemistry materials-science materials-design materials-discoveries materials-screening molecular-structures

stk's Introduction

maintainers

lukasturcani, andrewtarzia

documentation

https://stk.readthedocs.io

discord

https://discord.gg/zbCUzuxe2B

image

image

Overview

stk is a Python library which allows construction and manipulation of complex molecules, as well as automatic molecular design, and the creation of molecular, and molecular property, databases. The documentation of stk is available on https://stk.readthedocs.io and the project's Discord server can be joined through https://discord.gg/zbCUzuxe2B.

Installation

To get stk, you can install it with pip:

pip install stk

If you would like to get updated when a new release of stk comes out, which happens pretty regularly, click on the watch button on the top right corner of the GitHub page. Then select Releases only from the dropdown menu.

You can see the latest releases here:

https://github.com/lukasturcani/stk/releases

There will be a corresponding release on pip for each release on GitHub, and you can update your stk with:

pip install stk --upgrade

Developer Setup

  1. Install just.
  2. In a new virtual environment run:
just dev
  1. Setup the MongoDB container (make sure docker is installed):
just mongo
  1. Run code checks:
just check

How To Cite

If you use stk please cite

https://github.com/lukasturcani/stk

and

https://aip.scitation.org/doi/10.1063/5.0049708

Publications

about stk

using stk

Acknowledgements

I began developing this code when I was working in the Jelfs group, http://www.jelfs-group.org/, whose members often provide me with very valuable feedback, which I gratefully acknowledge.

stk's People

Contributors

andrewtarzia avatar austin-mroz avatar eberardo avatar fiszczyp avatar joshkamm avatar lukasturcani avatar marcinmiklitz avatar nevetse avatar stevenkbennett avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stk's Issues

Implement a metal coordination environment pre optimizer class

This will be part of the implementation of metal organic systems and will develop an optimizer to be the first part of a sequence which assists stk in maintaining desired metal geometry. I.e. pre restricted optimization, the user would perform this new optimization.

Bug when building Polymer() in stk==2019.4.6.1

Bug, feature request, or proposal:

Bug

Expected Behaviour:

Building of Polymer() works in version 2019.4.3.0 from the 3 building blocks attached and the code below.

Current behavior:

MacroMoleculeBuildError() exception occurs using up-to-date version of stk from pip:

An error message comes up on Traceback:

RuntimeError: Pre-condition Violation
bond already exists
Violation occurred on line 284 in file Code/GraphMol/RWMol.cpp
Failed Expression: !(boost::edge(atomIdx1, atomIdx2, d_graph).second)
RDKIT: 2018.09.1
BOOST: 1_65_1

To reproduce:

import stk

# load in molecules
core = stk.StructUnit('core.mol', ['bromine'])
liga = stk.StructUnit('liga.mol', ['bromine'])
link = stk.StructUnit('link.mol', ['bromine'])

# build molecule
    polymer = stk.Polymer([liga, link, core],
                          stk.Linear(repeating_unit='ABCBA',
                                     orientation=[0, 0, 0, 1, 1],
                                     n=1, ends='fg'))
# output as built
json_file = 'Test.json'
polymer.dump(json_file)
mol_file = 'Test.mol'
polymer.write(mol_file)

I had to convert .mol files to .txt to upload, although I am sure there is a way around this.
core.txt
liga.txt
link.txt

Give each molecule a proper 3D transform.

Each molecule should have a method "transform" which returns and completely describes its position and orientation in space. Methods which set this transform to a desired value should also be added. The transform will be a 4x4 matrix, consisting of a 3x3 orthonormal matrix which described the orientation, the rest of the 4x4 matrix will be the translational component.

The internal coordinate system, ie the 3x3 orthonormal matrix, will be arbitrary but derived consistently. The approach would be to take the longest axis of the molecule and use that as the
first orthonormal vector. From this, take a perpendicular vector as the second orthonormal vector (chosen arbitrarily but consistently). Take the cross product of the first and second orthonormal vectors to get the third orthonomal vector.

Change initializers of ConstructedMolecule.

The current __init__ method of ConstructedMolecule, previously called MacroMolecule has the general form:

def __init__(self, building_blocks, topology, name, note):
    # do stuff

This causes problems in cases like the Cage class:

Cage(
    building_blocks=[bb1, bb2, bb3], 
    topology=FourPlusSix(
        bb_assignments={
            bb1: [0, 1]
            bb2: [2, 3],
            bb3: [1, 2, 3, 4, 5]
        }
    )
)

Notice that there is a dependency between the FourPlusSix instance and the list of building blocks in the initializer. This means you cannot pass the topology freely around, because it requires that its bb_assignments parameter contains the same building blocks as the list. It also violates DRY.

The solution is to change the initializer of Cage so that it can be

Cage(building_blocks=[bb1, bb2, bb3], topology=FourPlusSix())

or

bbs = {
    bb1: [0, 1]
    bb2: [2, 3],
    bb3: [1, 2, 3, 4, 5]
}
Cage(building_blocks=bbs, topology=FourPlusSix())

This initializer no longer violates DRY and the topology instance is independent of the Cage it is used with. It is OK for the building_blocks to depend on the topology it is used with because the initializer is always aware of what topology it is being used with. However, the topology should not have to be aware of which initializer uses it, which is the problem with the current code.

This change will require a bunch of documentation updates as well, especailly in the introductory change.
In general it means that ConstructedMolecule subclasses do not, in general, have to use the same __init__ signature.

Create a general Vertex and Edge class.

At the moment, when new topologies are created, tools to assembled a new topology graph easily do not really exist and are written from scratch for each class of topologies. It would be good to create a general Vertex and Edge class which can be used for any class of topologies. These won't have to be used, but should be available when desired.

add seed to polymer.Linear

when initializing a polymer.Linear a seed shoudl be provided. When the topology is initialized the orientation of the vertices should be decided and fixed. This means that every time make a molecule with the same topology you get the same constructed molecule - even if the orientations supplied to the initializer are random. They are determined randomly - but fixed after initialization.

mol.write folder not found

When using population.write(), the output folder is created when it does not exist. In contrast, molecule.write() requires the folder to exist, otherwise a FileNotFoundError is raised.

Re-implement gen_key()

gen_key() should be defined for every molecular class, such that it accepts the the same arguments as __init__(). The method should then just return a key based on the on the __init__ signature. This should mean that only one Cached metaclass needs to exist. Docs will need to be updated, in the section where adding new Molecule classes is discussed, that a gen_key() method will need to be defined for each new Molecule class which defines a new __init__() method.

improve docs for EA

The overview document should explain the steps of the EA and how the calculators defined in the input file are used by it.

subprocess.call() in xtb functions fails with Shell=False

Only occurs if unlimited_memory=False.

Using Shell=True in all cases introduces security risks due to the possibility of external input into the command.

A better alternative is to use something other than subprocess.call or to remove the possibility of external input into the command. Currently, the input of strings for xTB arguments is a security risk.

Clean up tests.

Lots of the test code is really hard to read. Also split conftest into several files.

Add greater functionality to progress writers.

Currently only mol.id and repr(mol) is written to the log file. The proposed change would allow the user to select what is written depending on the input file for the EA.

with open('progress.log', 'w') as logfile:
                logfile.write(
                    '\n'.join(self.log_file_content(self.progress))
                )

Part that selects what is written to log file:

 @staticmethod
    def log_file_content(progress):
        for sp in progress.subpopulations:
            for mol in sp:
                yield f'{mol.id} {mol}'
            yield '\n'

Documentation should provide a better summary of available Molecular classes and topologies.

Ideally in the initial page should have a summary of all ConstructedMolecule types.
In addition to this each ConstructedMolecule type should have an Example page and a list of topologies which can be used with the constructed molecule type.

Similarly each Topology type should refer back to what ConstructedMolecule it should be used with and have a picture showing an example of a constructed molecule.

Add mutation function which substitutes terminal atomic groups.

It would be nice to add a mutation function which allows the user to specify a query molecule, and a replacement molecule and replace the substructure matched by the query molecule the replacement molecule.

Something like the following code but implemented as a mutation function in stk.

    helicene = rdkit.MolFromMolFile('h_oh.mol', removeHs=False, sanitize=False)
    query = rdkit.MolFromSmarts('[C][H]')
    replacement = rdkit.MolFromSmiles('[N]')
    rdkit.EmbedMolecule(replacement, rdkit.ETKDGv2())
    new_mols = rdkit.ReplaceSubstructs(helicene, query, replacement)
    for i, new_mol in enumerate(new_mols):
        rdkit.SanitizeMol(new_mol)
        rdkit.UFFOptimizeMolecule(new_mol)
        rdkit.MolToMolFile(new_mol, f'new_mol_{i}.mol')

XTBExtractor test not working on Windows

Some character in the xtb extraction cannot be read.

____________________________________________________________________________________________ test_xtb_extractor ____________________________________________________________________________________________

    def test_xtb_extractor():
        known_output_file = join('../data', 'xtb_energy.output')
    
        # Get properties from output_file.
>       xtbext = stk.XTBExtractor(output_file=known_output_file)

..\test_utilities.py:10:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
..\..\stk\utilities\utilities.py:1281: in __init__
    self.output_lines = f.readlines()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <encodings.cp1252.IncrementalDecoder object at 0x000002D9C98866A0>
input = b'\n         #    Occupation            Energy/Eh            Energy/eV\r\n      --------------------------------------...C   40 0.952\r\n    45  H   1.000        C   40 0.960\r\n    46  H   0.997
      C   41 0.977\r\n    47  H   0.997   '
final = False

    def decode(self, input, final=False):
>       return codecs.charmap_decode(input,self.errors,decoding_table)[0]
E       UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5367: character maps to <undefined>

encodings\cp1252.py:23: UnicodeDecodeError
============================================================================ 1 failed, 116 passed, 11 skipped in 22.42 seconds =============================================================================```

bug using stk.ETKDG for long polymers

When optimizing a polymer with n=5, an overflow error is produced within ETKDG.optimize(). It seems to the conf_id variable having a negative sign when calling RemoveConformer, which takes an unsigned int. MWE.txt is a minimum working example that requires the two attached mol files (change ending from .txt to .mol).

MWE.txt
CNben.txt
SBI.txt

Add a use_cache parameter to molecule initializers and turn it off by default.

Much like calculators, it would be good if the molecular cache was turned on or off in the initializer of a given molecule

polymer = Polymer([bb1, bb2], Linear("AB", [0, 0], 3), use_cache=True)

For things which create molecules, such as a Mutator they could be set to use the cache or not for every molecule they create.

mutator = SomeMutator(use_cache=True)
# Mutant is added to and retrieved from the cache.
mutant = mutator.mutate(some_mol)

BuildingBlock identity key needs to account for changes in fgs.

Lets say you make a building block

bb = stk.BuildingBlock('NCCCN', ['amine'])

and then you remove one of its functional groups

bb.func_groups = bb.func_groups[0:1]

Its identity key will remain the same.

This is confusing, because the building block will no longer be identitical to

bb2 = stk.BuildingBlock('NCCCN', ['amine'])

There are couple of solutions to this, - calculate identity_key on the fly or provide a way to
change fgs with a method, and this method will be responsible for updating _identity_key
maybe get_func_groups() and set_func_groups() should be provided.

Create alignment_vector() method for StructUnit2 and StructUnit3

At the moment calling bonder_direction_vectors() on a molecule such as 'BrCBr' will often cause problems, because the bonder in both bromine functional groups is the same atom. As a result there is not direction vector, or rather the vector is (0, 0, 0). While this is the correct answer, a new method is needed which can be used even when bonder_direction_vectors() returns useless results. As a result alignmnet_vector() should be added.

alignment_vector() for StructUnit2 will first attempt to use bonder_direction_vectors() and if this returns (0, 0, 0) it will return the direction vector between the centroids of the functional groups as a backup.

For StructUnit3, if the 3 bonder atoms are all the same atom, the plane will not be defined properly. In this case use the centroid of the functional groups to generate the plane should be used, instead of the bonder centroids.

FormationEnergy example should be usable

Atm the formation energy example does not optimize the water molecule, which means the result it produces is going to be a uselsess, incorrect value.

The energy docs in general shoiuld state that a single point claculation is performed.

Add a mutation function which substitutes atoms.

Add a mutation function which allows the user to specify a query molecule, which should match a single atom, and a replacement atom. The mutation function should substitute the atom matched by the query molecule with the one specified as an input.

The function should also take an optional sanitization function as a parameter. This function can perform any additional cleanup operations on the atom and molecule to make sure the valences are in order, for example.

Add GFN-xTB energy calculator, output reader and other capabilities

An extension of the previously added capability which added an optimization function, that calculates single point energies using the GFN software.

Also require as part of the GFN optimization/energy class:

  • the ability to extract energies (total energy and/or free energy) from GFN output
  • run numerical hessian calculation as part of optimization (extension of optimization class) and - - single point energy (extension of energy class)
  • assign overall charge to each calculation
  • assign an implicit solvent model to each calculation

Make vertex_data and edge_data private

In cage topologies, there is no reason for vertex_data and edge_data to be public attributes. Update the introduction docs to account for this change.

Undesired reordering of StructUnits in Polymer

Bug, feature request, or proposal:

Potential Bug

Expected Behaviour:

Order of StructUnits is the same as input

Current behavior:

For some polymer sequences, StructUnits are reordered, leading to inconsistencies in polymer structures.

To reproduce:

Here is an example of a function which is able to reproduce the issue:

def build_polymer(smiles1, smiles2, sequence):
    
    a = rdkit.MolFromSmiles(smiles1)
    rdkit.AddHs(a)
    rdkit.AllChem.EmbedMolecule(a, rdkit.AllChem.ETKDG())
    A = stk.StructUnit2.rdkit_init(a, 'bromine')

    b = rdkit.MolFromSmiles(smiles2)
    rdkit.AddHs(b)
    rdkit.AllChem.EmbedMolecule(b, rdkit.AllChem.ETKDG())
    B = stk.StructUnit2.rdkit_init(b, 'bromine')

    polymer = stk.Polymer([A, B], stk.Linear(sequence, [0]*12, n=1))
    stk.rdkit_ETKDG(polymer)
    polymer.write('test.mol')
    
    return polymer

Calling the function like so:

build_polymer('Brc1ccc(Br)s1', 'Brc1ccc(Br)cc1', "ABABBBBBABBB")

returns:

Polymer(building_blocks=["StructUnit2 ['bromine', 'InChI=1S/C4H2Br2S/c5-3-1-2-4(6)7-3/h1-2H']", "StructUnit2 ['bromine', 'InChI=1S/C6H4Br2/c7-5-1-2-6(8)4-3-5/h1-4H']"], topology=Linear(ends='h', n=1, orientation=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], repeating_unit='ABABBBBBABBB'))

while calling the function with a different sequence like so:

build_polymer('Brc1ccc(Br)s1', 'Brc1ccc(Br)cc1', "AAABBBBBBBBB")

leads to an inversion of the StructUnits, returning:

Polymer(building_blocks=["StructUnit2 ['bromine', 'InChI=1S/C6H4Br2/c7-5-1-2-6(8)4-3-5/h1-4H']", "StructUnit2 ['bromine', 'InChI=1S/C4H2Br2S/c5-3-1-2-4(6)7-3/h1-2H']"], topology=Linear(ends='h', n=1, orientation=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], repeating_unit='AAABBBBBBBBB'))

This results in monomer 'A' and monomer 'B' being inverted when the polymer structure is written. I have not been able to figure out what causes the reordering, though here are some other sequences that result in reordering:

AABBBBABBBBB
AABBABBBBBBB
AABBBBABBBBB

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.