theochem-vu / tcutility Goto Github PK

View Code? Open in Web Editor NEW

5.0 0.0 0.0 143.66 MB

Utility functions/classes for the TheoCheM programs

Home Page: https://theochem-vu.github.io/TCutility/

License: MIT License

Python 100.00%

ams automation chemistry crest orca quantumchemistry

tcutility's People

Contributors

Stargazers

tcutility's Issues

Create module for storing inputfiles used in the group

We should have a module where input files can be easily retrieved.
We can also create an input file generator, which will generate an input file given a molecule, fragments, settings, etc ...

Read Vibration data from DFTB calcs

This one is easy, vibration results in DFTB calculations are structured exactly the same as in a regular DFT calculation. We will simply copy the function over to DFTB

Automatic publishing to PyPi when a new release is made

We should create a workflow that automatically uploads new releases to PyPi.
One usefull github action: https://github.com/marketplace/actions/pypi-publish

Integrate a molecular viewing program

We can integrate yviewer (currently working on yviewer2) into TCutility. This can be useful to generate pictures of molecules, orbitals, changes in geometry.

Read VibrationalAnalysis jobs has a different folder structure

When performing a VibrationalAnalysis job using ADF there is no adf.rkf created, only the ams.rkf. All info is contained in ams.rkf, however TCutility will try to read from adf.rkf.

We should add a check for the type of task used and then depending on the task read from different files.
Separate the reading of vibrational properties in reader.adf.get_properties into a new function that takes a reader object as argument and then simply supply the ams.rkf when the task equals VibrationalAnalysis

Guess fragments based on xyz files

Currently it is quite a lot of effort to get the correct fragments atom from an xyz file. I propose we accept two types of notation:

As molecule flags

8

N       0.00000000       0.00000000      -0.81474153 
B      -0.00000000      -0.00000000       0.83567034 
H       0.47608351      -0.82460084      -1.14410295 
H       0.47608351       0.82460084      -1.14410295 
H      -0.95216703       0.00000000      -1.14410295 
H      -0.58149793       1.00718395       1.13712667 
H      -0.58149793      -1.00718395       1.13712667 
H       1.16299585      -0.00000000       1.13712667 

frag_Donor = 1, 3:5
frag_Acceptor = 2, 6:8

Fragments should then be separated by their indices (starting at 1). Indices can be given as integers or as ranges with the - notation. The flag that defines the indices should start with frag_ so that we can easily get the name of the fragment.

As atom flags

8

N       0.00000000       0.00000000      -0.81474153 frag=Donor
B      -0.00000000      -0.00000000       0.83567034 frag=Acceptor
H       0.47608351      -0.82460084      -1.14410295 frag=Donor
H       0.47608351       0.82460084      -1.14410295 frag=Donor
H      -0.95216703       0.00000000      -1.14410295 frag=Donor
H      -0.58149793       1.00718395       1.13712667 frag=Acceptor
H      -0.58149793      -1.00718395       1.13712667 frag=Acceptor
H       1.16299585      -0.00000000       1.13712667 frag=Acceptor

The second method is to give the atoms themselves flags. In this case I propose we let the user specify the fragment with the frag flag. In this way you do not have to mess with the indices.

With the notation like this we should add the tcutility.molecule.guess_fragments function. This should take a molecule loaded with tcutility.molecule.load and return a dictionary with fragment names as keys and molecule objects as values.

Pathfunc module

In preparation for some of the code related to the new job-running module and subsequent analysis, I think it would nice to include some functionality related to path. For example, getting all subdirectories in a path. Getting new versions of paths (e.g. suffixed with .002). I already have some functionality in yutility.pathfunc that I will be porting over to TCutility.

Some useful functions:

get_size(path)

Returns the file sizes of the whole tree given in path.

get_path_versions(path)

Returns all paths that correspond to path, but include its versions.
E.g. calling get_path_versions('root/dir_or_file') with the following file-structure

root
|- dir_or_file
|- dir_or_file.002
|- dir_or_file.003
|- ...
|- dir_or_file.n

will return all files starting with root/dir_or_file

next_path_version(path)

Returns a new path with a version number attached to the end that is not yet taken.
E.g. calling next_path_version('root/dir_or_file') with the following file-structure

root
|- dir_or_file
|- dir_or_file.002
|- dir_or_file.003

returns root/dir_or_file.004.

split_all(path)

Returns a list of all path parts, unlike os.path.split, which only splits on the last slash/backslash.

get_subdirectories(root)

Returns a list of all subdirectories in the root.

match_paths(root, pattern)

This is in my opinion the most important function. It will return a list of subdirectories of root matching a certain pattern, and also return information related to the pattern.
E.g. given the following file-structure

root
|- NH3-BH3
    |- BLYP_QZ4P
       |- extra_dir
       |- blablabla
    |- BLYP_TZ2P
       |- another_dir
    |- M06-2X_TZ2P
|- SN2
    |- BLYP_TZ2P
    |- M06-2X_TZ2P
       |- M06-2X_TZ2P

and calling match_paths('root', '{system}/{functional}_{basis_set}') will return a tuple containing a list of subdirectories that match the pattern. We can print the results in a pretty way:

from tcutility import log

dirs, groups = match_paths('pathfunc_test', '{system}/{functional}_{basis_set}')
group_keys = groups.keys()

rows = []
for i, d in enumerate(dirs):
    rows.append([d] + [groups[key][i] for key in group_keys])
log.table(rows, ['Directory'] + list(group_keys))

will print

[2024/01/17 14:39:08] Directory             system    functional   basis_set
[2024/01/17 14:39:08] ───────────────────────────────────────────────────────
[2024/01/17 14:39:08] SN2/M06-2X_TZ2P       SN2       M06-2X       TZ2P     
[2024/01/17 14:39:08] NH3-BH3/BLYP_TZ2P     NH3-BH3   BLYP         TZ2P     
[2024/01/17 14:39:08] NH3-BH3/M06-2X_TZ2P   NH3-BH3   M06-2X       TZ2P     
[2024/01/17 14:39:08] SN2/BLYP_TZ2P         SN2       BLYP         TZ2P     
[2024/01/17 14:39:08] NH3-BH3/BLYP_QZ4P     NH3-BH3   BLYP         QZ4P

Therefore, it ignores extra subdirectories that are not related to your pattern. And it will decompose the subdirectory path into its substituents.

Add support for reading .t21 files for ADF calculations

Older versions of AMS (< 2019) are currently not supported. These versions use .t21 files instead of .rkf and therefore require special treatment. Hopefully people will start using newer versions of AMS and we can drop this in the future.
Tasks:

Enable grabbing .t21 files from calc folders
Add function to read info from .t21 files in ams.py
Add functions to read properties and results from these files in a new module (adf_t21.py)

Create overview of results object contents

We should add a list containing all keys that can be populated when reading a calculation. For example, in the "reading a calculation" page in the documentation of TCutility.

Copying of molecule objects loaded from custom xyz files fails

When loading a molecule using TCutility.molecule.load we capture information about the molecule and the atoms in their respective flags attribute. Because flags is a TCutility.results.result.Result object there will be issues whenever you call the copy method of the molecule. This method is very often used (for example in workflows) and we therefore should fix this. One way of fixing it is to overload the __reduce__ method of the TCutility.results.result.Result class. This method should return a single string specifying the name and namespace of the class. This then tells copy.deepcopy (which plams uses) what the class is supposed to be.

Clarify behaviour and error messages for results.Result objects

Currently the results.Result object is not behaving ideally.

Accessing an empty key should return None instead of an empty Result object. Alternatively, we should raise an error
Accessing an empty key should not create it, only if we assign a key should we assign higher level empty keys
Error messages should originate from raising a KeyValue exception

Improve support for jobs with History

Currently jobs such as IRC and PES-scans are not fully read. I propose to add some variables to help analyse them.

Read indices of accepted geometries
Add variable for energy landscape of accepted geometries
Add coordinates of accepted geometries
Read and store reaction coordinate along accepted geometries

Add energy functions

Please include nuclear internal energy in energy functions, e.g.:

energy.nuclear_internal

Convert to pathlib

As Siebe has mentioned before, PathLib is a much nicer alternative for path manipulation than using the os module. I propose we go through the repository and convert all instances of os path manipulation to pathlib. We can also consider creating a global path module that has some predefined paths for easy access.

Add logging module

I think some of the functions in https://github.com/YHordijk/yutility/blob/master/log.py will be very helpful in debugging/reporting data from TCutility. These functions are very old and from when I did not have as much Python experience, so I will rewrite them before adding them to TCutility.
I propose we add:

log function
Emojis class
NoPrint context-manager
arrow function, which will be renamed to flow
print_list function, which will be renamed to table
loading_bar2 function, which will be renamed to loadbar
boxed_text function, which will be renamed to boxed

The rest I will not port over as I don't use them ever. We can do those later on if needed.

Read and summarize the level-of-theory for ADF calculations

We are currently missing easily getting the level-of-theory from calculations. We can either search the rkf files for the information, or read the user input. In the latter case we should create a method for reading inputs and converting them to Result objects.

Missing import for geometry module

We forgot to import the geometry module in init.py. This prevents us from calling

import TCutility

TCutility.geometry
>>> AttributeError: module 'TCutility' has no attribute 'geometry'

Slurm module

We are currently accessing information from slurm in the results/init.py file. It would be better to include this as a new module instead.
I propose we add a module tcutility.slurm which will contain tools to use slurm, such as detecting if we can use slurm, running squeue and maybe more stuff later on. This module can then be used to detect if a job is pending, running, cancelled, etc. It can also be used to easily set dependencies between jobs using the slurmjobids.

Add log-level to logging module

Siebe commented:
Originally posted by @SiebeLeDe in #53 (review)

Add automatic writing of coordinates in SI files

It would be nice to have a function/module for writing results to a word file, for example for supporting information files.
I propose to add a new module, TCutility.report, that will be responsible for generating these files. I think the following syntax would be nice:

from TCutility import report, results

with report.SI('path/to/si.docx', 'w+') as si:
    si.add_section('Description of following systems')
    res = results.read('path/to/calc')
    si.write_xyz(res, 'name of mol')
    res2 = results.read('path/to/second/calc')
    si.write_xyz(res2, 'name of second mol')

It might also be nice to load a calculation if the path is given instead of a Result object:

from TCutility import report, results

with report.SI('path/to/si.docx', 'w+') as si:
    si.add_section('Description of following systems')
    si.write_xyz('path/to/calc', 'name of mol')
    si.write_xyz('path/to/second/calc', 'TS: F— + CH3F')

This should yield something like:

File: path/to/si.docx

Description of following systems

name of mol
E = —639.8 kcal mol^—1
G = —632.6 kcal mol^—1
H = —612.0 kcal mol^—1

C      0.00000000     0.00000000     0.37180722
F     -0.00000000    -0.00000000     1.82698320
H     -0.51522847    -0.89240188     0.02344328
H     -0.51522847     0.89240188     0.02344328
H      1.03045693     0.00000000     0.02344328
F     -0.00000000    -0.00000000    -2.26912024

TS: F^— + CH₃F
E = —631.7 kcal mol^—1
G = —623.2 kcal mol^—1
H = —605.0 kcal mol^—1
ν_imag = 403i cm^—1

C      0.00000000     0.00000000     0.00000000
F     -0.00000000    -0.00000000     1.86154599
H      0.53782492     0.93154009     0.00000000
H      0.53782492    -0.93154009     0.00000000
H     -1.07564984    -0.00000000     0.00000000
F     -0.00000000    -0.00000000    -1.86154599

Read ORCA data

I am running some ORCA calculations. It might be nice to extract some info from them.

#69
Read the output file/extra files

adding vibration mode analysis

temp

Read input and convert to Result object

We should implement a small method to read AMS user input and return a Result object. This can then be used for reading information that is hard to get from the rkf files. For example, level-of-theory information, TSSearch coordinates, etc...

Improvements to the Result class

While working with the Result class I noticed some areas for improvements.

First is to be able to convert a Result object into a plams.Settings object. For this I have written a simple function that takes care of this, but it requires my other package dictfunc to be able to work.

Second is to add a function that gets data associated with a "multikey". For example, with this we can do the following:

from TCutility import results

columns = [
    'properties.energy.bond',
    'level.xc.functional',
    'level.xc.category',
    'level.xc.dispersion',
    'level.basis.type',
    'level.summary',
    'status.name',
]

res = results.read('path_to_calculation')

data = {column: res.get_multi_key(column) for column in columns}
print(data)

>>> {'level.basis.type': 'QZ4P',
>>>  'level.summary': 'OLYP/QZ4P',
>>>  'level.xc.category': 'GGA',
>>>  'level.xc.dispersion': None,
>>>  'level.xc.functional': 'OLYP',
>>>  'properties.energy.bond': 105.44853900200906,
>>>  'status.name': 'SUCCESS'}

Ensure support for Python 3.12

Python 3.12 was released very recently, so it is probably a good idea to include it in the pytest workflow. Siebe and I discussed this a little bit last week already. One topic that came up is dropping support for the oldest version of Python (Python 3.7).

Dropping Python 3.7 would mean the oldest version is 3.8, of which the most relevant features are:

Assignment expressions could make our code more elegant, but the use-cases are quite limited in my opinion and not very necessary in normal Python use.
Positional only arguments can catch some unexpected results in calling functions when keyword arguments are provided as positional arguments. This is probably not very necessary for us, as long as we implement good tests and are carefull with our function definitions and calls.

In my opinion, we don't really need the features from Python 3.8. I think it is fine to keep supporting 3.7 for now.

Structuring the package

I would like to address the issue about structuring the package. As I currently see, the rkf folder will contain

Factors to take into account for considering the structure

The scope of the package it should be clear what the package should and should not include. Otherwise, we may risk replicating the plams package without the running module.
Get information from the logfile / outputfile when an rkf file may not be present (due to large sizes of adf.rkfs it is sometimes wiser to excess logfile / outputfiles. The user should not have to care about this in my opinion and just use one function. This requires overloading functions (overloading is accepting multiple types of arguments and giving the same result in return).

Also, I would recommend changing the name to TCAnalysis if we don't we to include any (plams) jobrunners and jobs.

I would structure the folder as followes (folder = a directory, file = file, func = function):

analysis (folder)

calc_info (folder)

timings (file)

get_timing (func)
another_func (func)
termination_status (file)
get_termination_status (func)

properties (folder)

energy (file)

get_energy (overloaded func)
_get_energy_from_rkf
_get_energy_from_log
...
enthalpy (file)
get_enthalpy (overloaded func)
_get_enthalpy_from_rkf
_get_enthalpy_from_log
...

and more properties such as frequencies, nmr shifts, (VDD) charges, etc.

Using this structure, we can really make it modular and we can add specific utlities like for plotting, or other uses.

Improvements to the log module

In using the log module I came up with a few improvements that we can make.

Add a function to get the full name of a caller function.
Use the caller function name to give more information when logging a message. For example, log.warn should also tell you where the warning occured.
Add more logging functions for each logging level. E.g. have a debug, info, warn, error, and critical function.

Example:

from TCutility import log

class TestClass:
    def test_method(self):
        log.warn('I am testing the warning function')

TestClass().test_method()

Will print

[2024/01/04 19:57:33] [WARNING](__main__.TestClass.test_method): I am testing the warning function

If a function or class is imported from a different module, e.g.

from yutility.job import ADFJob, OrcaJob

if __name__ == '__main__':
    with ADFJob() as job:
        job.molecule = r"D:\Users\Yuman\Desktop\PhD\TCutility\test\fixtures\chloromethane_sn2_ts\ts sn2.results\output.xyz"
        job.sbatch(p='tc', ntasks_per_node=15)

        job.functional('BM123K')
        job.charge(10)
        job.spin_polarization(1)
        job.transition_state()
        job.optimization()
        job.solvent('Ethanol')

    with OrcaJob() as job:
        job.sbatch(p='tc', mem=224_000)
        job.molecule = r"D:\Users\Yuman\Desktop\PhD\TCutility\test\fixtures\chloromethane_sn2_ts\ts sn2.results\output.xyz"
        job.settings.main.append('SP')
        job.settings.main.append('CCSD(T)')
        job.settings.main.append('cc-pVDZ')
        job.settings.SCF.MaxIter = 500

        job.write()

will print:

[2024/01/04 20:10:42] [INFO](yutility.job.ADFJob.molecule): Succesfully loaded molecule C₁Cl₂H₃ from path.
[2024/01/04 20:10:42] [WARNING](yutility.job.ADFJob.functional): XC-functional BM123K not defined. Defaulting to using LibXC.
[2024/01/04 20:10:42] [INFO](yutility.job.OrcaJob.molecule): Succesfully loaded molecule C₁Cl₂H₃ from path.
[2024/01/04 20:10:42] [WARNING](yutility.job.OrcaJob.write): MaxCore and nprocs not specified. Please use SBATCH settings or set job.processes and job.memory.

Module for geometry tools

We should have a module for generating/preparing molecules. For example, a simple stretching/turning/twisting of a molecule.

Add a test-case for single-atom calculations

We encountered an error earlier whenreading results from single-atom calculations. This was related to some variables being lists when multiple atoms are present and floats/ints when only a single atom is present. To prevent future errors we have to add a test-case for these systems

Simple job running API

Adding a simple job running API would be very helpfull for automation tasks. There are of course many possible ways of implementing this. API calls might look like:

from TCutility import runner, molecule

mol = molecule.load(...)
res = runner.optimize(mol, 'BLYP-D3(BJ)/TZ2P', charge=..., spinpol=..., solvent='water')  # returns a Result object
print(res.summary)

Or, a little more verbose, but more flexible:

from TCutility import runner, molecule

mol = molecule.load(...)
job = runner.Job()
job.functional = 'BLYP-D3(BJ)'
job.basisset = 'TZ2P'
job.quality = 'VeryGood'
job.charge = ...
job.spinpol = ...
job.solvent = 'water'
job.molecule = mol
job.task = 'GeometryOptimization'

job.settings.input.adf.print = 'FmatSFO'  # specify your own settings using the settings property of job

res = job.run()
print(res.summary)

Choose an approach
Think what would be good default settings
- We should always already include printing of Fmat SFO for later analysis with PyOrb
Support engines other than ADF (DFTB, BAND, CREST, ORCA, Guassian, etc ...)

Add rkf reading for DFTB

We should add support for the DFTB engine from AMS

Convert list properties to numpy arrays

Following a discussion in PR #86 we should consider converting all data in lists to numpy arrays. This would save us some time later on to convert them if we need to.
Especially for properties such as normal modes this would be useful.

Element data module

It would be nice to have a module that stores and can easily retrieve data about elements. E. g. atomic radii, CPK atom colors, ionisation potentials, etc.
We can copy the version in yutility, or create a new one. One intuitive way would be to create a CSV/excel file that stores all data and is easily editable that we can then read using Python. Another way would be to have all the data simply in a Python script.

Restructure package to /src structure and extend README

Now we have a "flat" structure in which the package itself ("TCUtility") is in the main folder, together with the "docs" and "tests" folders.

A better practice is to make a "src/" structure since you can add multiple subpackages in this way and you keep the namespace clean (you force yourself to first build the package).

Also, the readme should be extended with:

a code of conduct
how to contribute (CONTRIBUTING.md)
purpose
example script
FAQ and more.

Examples of a good readme and a list of good readme's

Simplify the Result object and add case-insensitivity

Currently the Result object is defined in a convoluted way with a bunch of double code. We should simplify it by adding a few helper functions. We should also take this opportunity to implement case-insensitivity, similar to the plams.Settings class.

Add geometry module

I often find myself using functions from yutility.geometry and they will be very usefull for TCutility projects, such as PyOrb or TCviewer. I will copy most functions from yutility to here, skipping some unused ones.

Use new input reader exclusively to read values from the user input

I noticed in adf.py and dftb.py that we manually open and parse the user input instead of just using the new input parser that we have.

Error when reading calculations on single atoms

An error will be encountered when reading in results from calculations on single atoms. This is because some properties that are usually lists (such as VDD charges) are read as floats or integers instead.
Fix:
Add ensure_list to TCutility. This function ensures that a list stays a list and non-iterables are made into one-element lists. This should then be called with the properties in question and they should therefore always be lists, removing the error.

Make print module

Create a new module which will be responsible for generating text-based summaries of series of calculations. For example, building molecular coordinates + energies + vibrational frequencies for supporting information documents.

#51
Use #51 to write some useful outputs

Return generic info when failing to read calculation

Currently the tcutility.results.read function will attempt to read the calculation using the AMS reader. If that fails it will try the ORCA reader. Currently it will simply raise an error if the ORCA reader fails. I suggest we return some generic information about the requested calculation when both readers fail.

Read ORCA data, such as file-structure, input settings, version, results, etc...

Allow passing of plams.Molecule objects to TCutility.formula.molecule

Currently TCutility.formula.molecule only takes string arguments, but we should also allow plams.Molecule objects. They should first be parsed to get a molstring and then further formatted using TCutility.formula.molecule

Support reading of lists in custom xyz-files

Currently key=value pairs only allow single values. We can extend the syntax to support lists as values as well. I propose we add support for the following:

12

C       0.00000000       1.14206764       0.90887116
C       0.00000000       1.19134753      -0.47779047
C       0.00000000       0.00000000      -1.18863009
C       0.00000000      -1.19134753      -0.47779047
C       0.00000000      -1.14206764       0.90887116
N       0.00000000       0.00000000       1.58935329
H       0.00000000       2.05493274       1.49740370
H       0.00000000       2.14479192      -0.99199860
H       0.00000000       0.00000000      -2.27369478
H       0.00000000      -2.14479192      -0.99199860
H       0.00000000      -2.05493274       1.49740370
Xx       0.12739120      -0.00077609       2.50049043

conn = 6, 12
active_atoms = 8

Currently conn will be read as '6, 12'. I think splitting on , will be the easiest way to create lists. We can then simply split the value by , first, and then parse each part again in a small list-comprehension. This should then yield conn: list[int] = [6, 12]

Compatible python versions

I think we should make the package compatible with python versions >3.7 to ensure that the package can be used by users with various versions. Moreover, other packages are also not always up to date with the latest python version such as plams. Thus, if we want to give the option to use this package in combination with plams, I would sincerely recommend writing the code in such a way that is supports python >3.7

I can include a github workflow routine that tries to build this package with various versions. In the meantime, please use the old type hint method:

from typing import List, Dict, Tuple
list -> List
dict -> Dict 
tuple -> Tuple

Add molecular formula parsing

In preparation for doing #65 we should add a parser for molecular formula. The function should be able to take some string and return an edited string that is valid in either HTML or LaTeX.

from TCutility import formula

print(formula.molecule('F- + CH3Cl', 'html'))
>>> F<sup>—</sup> + CH<sub>3</sub>Cl

Which would convert "F- + CH3Cl" to "F^— + CH₃Cl" in HTML. It should convert charge signs + and - to superscript and also convert numbers to subscript. If a plus-sign is used to denote a reaction, it should not be put in a superscript.

How often to make a new release? How to handle versioning?

I am wondering how often we should make a new release. As I see it now, every merge is a potential fully functional release. I propose we create a new release every merge-request.
About version numbering. We should probably follow the x.y.z convention, where x is the major version (currently 0), y is the minor version and z is the micro version. I propose:

major version should be incremented when a module is feature-complete,
increment the minor version number after every succesful merge,
bug fixes should increment the micro version and should also be published as a release.

Please let me know your thoughts!

Support for custom xyz files

In my projects I use custom xyz files that can provide additional information about the system. For example, I can add flags for the whole molecule or for atoms. For example:

27

C	 2.2470037053	 0.3166885271	 1.5935128849	R2 R3 active_atoms_0 origin=dienophile origin_name=acrolein_E 
C	 1.0262978235	-0.0048642245	 1.1196013247	R4 active_atoms_1 origin=dienophile origin_name=acrolein_E 
C	 0.0368765282	 0.9384192964	 0.6362957151	R1 Rch origin=dienophile origin_name=acrolein_E 
C	-0.7623413264	 3.2111668548	-0.0902164448	origin=dienophile origin_name=acrolein_E origin_R=R1 
C	 0.3597768316	 2.4074931465	 0.5586673637	conn origin=dienophile origin_name=acrolein_E origin_R=R1 
H	-1.6848173064	 3.1024219200	 0.4805689166	origin=dienophile origin_name=acrolein_E origin_R=R1 
H	-0.9382451479	 2.8578169150	-1.1068791522	origin=dienophile origin_name=acrolein_E origin_R=R1 
H	-0.4894506861	 4.2647240836	-0.1243377652	origin=dienophile origin_name=acrolein_E origin_R=R1 
H	 0.5391294895	 2.7622247363	 1.5788916693	origin=dienophile origin_name=acrolein_E origin_R=R1 
H	 1.2933676848	 2.5328333047	 0.0033117263	origin=dienophile origin_name=acrolein_E origin_R=R1 
H	 2.8722517352	-0.4225784304	 2.0717944157	conn origin=dienophile origin_name=acrolein_E origin_R=R2 
H	 2.6086645727	 1.3345015677	 1.6269263942	conn origin=dienophile origin_name=acrolein_E origin_R=R3 
H	 0.6643199677	-1.0273046508	 1.1960953661	conn origin=dienophile origin_name=acrolein_E origin_R=R4 
O	-1.0945945094	 0.5491656261	 0.3150545087	conn plane_2 origin=dienophile origin_name=acrolein_E origin_R=Rch 
Al	-2.0860602380	-0.9706326867	-0.1242814397	anchor conn origin=catalyst origin_name=catalyst origin_R=Rcat 
F	-1.0542417492	-1.7667404701	-1.1584304150	origin=catalyst origin_name=catalyst origin_R=Rcat 
F	-3.4300869155	-0.3609345462	-0.8479122759	origin=catalyst origin_name=catalyst origin_R=Rcat 
F	-2.3214187292	-1.7587748795	 1.3077296544	origin=catalyst origin_name=catalyst origin_R=Rcat 
C	 2.1031242723	-1.4947653286	-1.0264668386	R2 align_0 center origin=diene origin_name=heterocycle5 
C	 3.3902934384	-1.2208486776	-0.5254191252	R3 align_1 origin=diene origin_name=heterocycle5 
C	 3.5249366590	 0.1461089051	-0.5483998020	R4 Rhet active_atoms_0 plane_1 origin=diene origin_name=heterocycle5 
C	 1.5408615686	-0.2765951210	-1.3117329740	R1 active_atoms_1 plane_0 origin=diene origin_name=heterocycle5 
H	 0.6132243248	-0.0109757861	-1.7943004647	conn origin=diene origin_name=heterocycle5 origin_R=R1 
H	 1.6305320605	-2.4576864884	-1.1288220998	conn origin=diene origin_name=heterocycle5 origin_R=R2 
H	 4.1147067240	-1.9313725440	-0.1641562583	conn origin=diene origin_name=heterocycle5 origin_R=R3 
H	 4.3557430238	 0.7972873061	-0.3275724226	conn origin=diene origin_name=heterocycle5 origin_R=R4 
O	 2.4361541766	 0.7097003953	-1.1118541527	conn plane_2 origin=diene origin_name=heterocycle5 origin_R=Rhet 

spinpol = 0.0
charge = 0.0
task = TransitionStateSearch
level = BLYP-D3(BJ)/TZ2P/Good
solvent = Toluene

This format can still be used by AMS as normal. It can be useful to have this for Python workflows, for example for ML purposes.
I have reading and writing this format a long time ago (https://github.com/YHordijk/yutility/blob/master/molecule.py), so adding this would be a simple copy and paste. I propose we add a new molecules.py module and add it there.

Example usage

from yutility import settings, run, molecule

def run_full(xyzfile):
    '''
    Run a full frequency calculation for the given molecule using ADF
    '''
    mol = molecule.load(xyzfile)  # load the molecule
    preset = mol['flags'].get('level', 'OLYP/TZ2P/Good')  # read the level of theory, default is OLYP with TZ2P. In the xyz file it is given using the "level" flag
    sett = settings.default(preset)
    sett = settings.charge(mol['flags']['charge'], sett)  # read the charge from the xyz file
    sett = settings.spin_polarization(mol['flags']['spinpol'], sett)  # read the spin polarization from the xyz file
    sett = settings.vibrations(sett)
    # also set the solvation if it is given in the xyz file. By default it uses COSMO
    if 'solvent' in mol['flags']:
        sett = settings.solvent(mol['flags']['solvent'], sett=sett)
    # run the calculation
    run.run(mol['molecule'], sett, path=j('calculations', xyzfile.removesuffix('.xyz')), folder='dft_freq', skip_already_run=True)

run_full(test.xyz)

test.xyz:

31
minimum
C -0.99133222 0.76721814 2.67722570
C 0.25265637 -1.05111280 1.89059000
C 1.24566773 -0.08425630 2.47294093
C 0.36469727 0.78933634 3.37987552
H -1.05660402 1.54014543 1.89822281
H -1.85054670 0.86372170 3.34728631
H 2.04519542 -0.59568623 3.02169572
H 1.73650707 0.52182235 1.68976652
H 0.27466116 0.33107479 4.37395851
H 0.73904848 1.81184872 3.50639817
C 0.55357407 -2.29794146 1.26708176
O -0.24428078 -3.12203979 0.80387045
O 1.92460503 -2.50592485 1.22904665
C 2.31507675 -3.74885552 0.62202317
H 1.97325180 -3.80199748 -0.41966059
H 3.40825277 -3.76524396 0.66672566
H 1.89634651 -4.60276016 1.17035001
C -2.17443904 -1.16217766 1.62994925
C -3.52731155 -0.66756710 1.69206938
C -3.95763504 0.67991049 1.80004244
C -4.54267172 -1.65438532 1.57052742
C -5.31233920 1.00258368 1.82453374
H -3.23778364 1.49292357 1.82422093
C -5.89056031 -1.32413631 1.59808156
H -4.24456491 -2.69774640 1.46261213
C -6.29043963 0.00981867 1.73464940
H -5.60704364 2.04966697 1.90192718
H -6.63801684 -2.11311541 1.51121666
H -7.34764525 0.27217980 1.75545957
H -2.01265334 -2.16445577 1.23660330
N -1.03624778 -0.56849116 2.00953570

level = BP86/TZ2P/Good
solvent = Toluene

Include spell-checking module

It might be nice to implement a basic spell-checking module. For example, if a user makes a typo it would be nice to give them suggestions for correct keys. In yutility I have implemented two algorithms to calculate the so-called Levenshtein distance between two strings (link). This distance can then be used to make suggestions to users.

For example, it would look something like this:

from tcutility.data import functionals

# try to get information about a functional, but we made a mistake in the functional-name
# the two closest functional names would be OLYP-D3(BJ) and BLYP-D3(BJ)
info = functionals.get('LYP-D3(BJ)')

>>> [2024/01/29 17:18:26] [WARNING](functionals.get): Could not find "LYP-D3(BJ)". Did you mean BLYP-D3(BJ) or OLYP-D3(BJ)?

This principle can be applied in many different places. For example, when accessing results from a calculation, choosing settings for jobs or entering the wrong file-name.

Reading of thermodynamic data

Currently, thermodynamics data is not being read. I propose we add reading of Gibbs free energy and enthalpy to result.properties.energy.gibbs and result.properties.energy.enthalpy respectively.

Issue when reading molecules that have only one atom

When getting molecules from rkf-files that have only one atom there are various places where errors might crop up. This is caused by plams returning a floating point value instead of a list of floats when only one atom is present for some variables in the kf-files.
We can use ensure_list to remedy this.