lsmo-epfl / aiida-raspa Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 8.0 492 KB

The RASPA plugin for the AiiDA workflow and provenance engine. http://www.aiida.net

License: Other

Python 97.21% Shell 0.71% Dockerfile 2.08%

aiida-raspa's People

Contributors

Stargazers

Watchers

Forkers

ltalirz daianedb danieleongari kdeeg kjappelbaum ezpzbz borellim mpougin

aiida-raspa's Issues

Add force field builder

@danieleongari is working on it

Make unit cell expansion optional in base workchain

As this is, if done correctly, a non-trivial transformation we would like to give the use the option to disable this feature (which is also useful for debugging purposes).

Reduce the size of retrieved files

From a recent analysis, my AiiDa repository of 31GB has ca.:

7GB (23%) occupied by retrieved restart files of Raspa
6GB(20%) occupied by output .data files of raspa
Other files related to Raspa (cif, simulate.input, _aidasubmit.sh, and force field .def files), are under 1GB, so no need to care now.

I suggest to take action to reduce the size of the repository, given the large number of RaspaCalculation that are called during our workchain.

Restart files

I suggest not to retrieve them by default but use them locally as we do e.g., for cube files in aiida-ddec.
However, I would leave an option for the user to retrieve them: this is the case of saturated system that require a long equilibration, or for the simulated annealing work chains of aiida-lsmo where a cif file is computed from the restart (https://github.com/lsmo-epfl/aiida-lsmo/blob/6c32eebdaefd11dc226d849415e363c6222a475a/aiida_lsmo/workchains/sim_annealing.py#L88-L106)

Output files

Here we want to think about it because we have different type of data:

check values that are elaborated directly from the input (or default parameters) that we might want to keep for reproducibility
info about the cycles that are printed with PrintEvery and one want to check to see what is going on in the calculation. We generally use 10 checkpoints, so it is not a huge amount of data and we should keep it.
outputs, that can not be suppressed but are useless for some calculation (e.g., Gibbs/Widom null results, when doing GCMC calculations). We can not suppress them unless we change the raspa code.

Therefore, for the output files we need more discussion to reduce the hdd consumption.

Consider to provide input components as a dictionary instead of list

For the moment Componets are provided as a list:

{
  Components:
    [ {'MoleculeName': 'CO2',
      'MoleculeDefinition: 'Name_of_FF'
    }]
}

it should instead become, as suggested by @danieleongari:

{
  Components:
  {
    CO2: {
        'MoleculeDefinition: 'Name_of_FF'
    },
  N2:
  {
      'MoleculeDefinition: 'Name_of_FF'
  },
}

Add coveralls

as the title says. To be fixed after #28

Conversion to g/g is in reality to mg/g

The key conversion_factor_molec_uc_to_gr_gr of output_parameters is parsed from Raspa's output:

Conversion factor molecules/unit cell -> gr/gr:       1.0756975415 [-]

which should read instead molecules/unit cell -> mg/g.
It is weird indeed that this value is on average systems close to 1.

It should be checked if this error is still present in the most recent version of Raspa (if so open an issue there) and decide how to deal with this problem, maintaining back compatibility.

Add coveralls, and build (travis) bandges

base_binary_restart is hanging under pytest

I can submit the calculation example_base_binary_restart.py from the command line without any problem. However, if I run it through pytest, it keeps hanging. This was discovered in the PR #34

Rename the file fixme_base_binary_restart.py back to example_base_binary_restart.py
Fix the problem with hanging test

Perform some validation

According to the RASPA manual, the actual .cif that is used is the one in ./Movies/System_0. It would be good to compare the actual input and the one in this folder to check if RASPA read all loops, especially the charges correctly. In some cases I saw some rounding to happen, so one could use `PyCifRw" to extract the loops in both cifs and compare if

the fractional coordinates
the charges
are correctly read by RASPA.

Something that is also a common issue, but a bit harder to implement is to check if the atom types are read correctly. There might be different issues that one encounters:

Some atoms types are not read correctly and RASPA prints NO Interactionin the output file when we would actually expect interaction
There is this issue with short labels overwriting longer ones.

One easier check might hence be to just extract the atomic symbols (using aseor pymatgen and then to compare if all of them also appear in the output. If not, I would print a warning.

Similarly, I would think about introducing a warning about e.g. negative hydrogens that can yield to non-physical interactions. Of course, hydrides have negative hydrogens but I would guess that in the most common applications of RASPA negative charges would be something un-usual.

Linking @danieleongari for notice.

AttributeError: module 'pytest' has no attribute 'base_calc_pk'

Recent pytest executions result in the following error:

E           AttributeError: module 'pytest' has no attribute 'base_calc_pk'

needs investigation

Modify the README before the 1.0 release

For the moment the readme file is quite outdated with respect to the changes introduced recently in the develop branch. Before making the official release this needs to be fixed.

Incompatibility in pre-commit requirements

Currently, the requirements

        "pre-commit": [
                "pre-commit==1.16.1",
                "yapf==0.27.0",
                "prospector==1.1.6.2",
                "pylint==1.9.4; python_version<'3.0'",
                "pylint==2.2.2; python_version>='3.0'"
        ]

result in the error:

ERROR: prospector 1.1.6.2 has requirement pylint==2.1.1, but you'll have pylint 2.2.2 which is incompatible.

I updated these to the current aiida-core-v1.0.0 pre-commit requirements in my last PR (#46):

        "pre-commit": [
                "pre-commit==1.18.3",
                "yapf==0.28.0",
                "prospector==1.1.7",
                "pylint==1.9.4; python_version<'3.0'",
                "pylint==2.3.1; python_version>='3.0'"
        ]`

Keep consistency: unit/units

Hi,
please change the output API to keep consistency in the {quantity}_unit(s) in the labels of the Dict.

According to https://en.wikipedia.org/wiki/Unit_of_measurement,
the common term is unit (i.e., singular).
For example, even when the unit is composed by different units, it keeps the singular

[...] in terms of the unit microlitres per metre [...]

Move examples to pytest framework

Same way as it is done in the aiida-cp2k

Todos:

Make sure examples are automatically run by pytest
Rename examples from test_ to example_

Add readthedocs

Parser improvement for Enthalpy of adsorption

In the case of multi-component simulation the associated output for enthalpy of adsorption has different structure as follows:

Enthalpy of adsorption:
=======================

    Total enthalpy of adsorption
    ----------------------------
    Block[ 0] -1971.45386        [K]
    Block[ 1] -2133.60014        [K]
    Block[ 2] -1963.42612        [K]
    Block[ 3] -1959.66056        [K]
    Block[ 4] -1981.73614        [K]
    ------------------------------------------------------------------------------
    Average          -2001.97536 +/-         132.492585 [K]
                       -16.64535 +/-           1.101605 [KJ/MOL]
    Note: Ug should be subtracted from this value
    Note: The heat of adsorption Q=-H

    Component 0 [xenon]
    -------------------------------------------------------------
        Block[ 0] -2116.19692        [-]
        Block[ 1] -2163.03260        [-]
        Block[ 2] -1985.42962        [-]
        Block[ 3] -2026.21605        [-]
        Block[ 4] -2108.72107        [-]
        ------------------------------------------------------------------------------
        Average          -2079.91925 +/-         129.199899 [K]
                           -17.29342 +/-           1.074228 [KJ/MOL]
        Note: Ug should be subtracted to this value
        Note: The heat of adsorption Q=-H

    Component 1 [krypton]
    -------------------------------------------------------------
        Block[ 0] -1898.43190        [-]
        Block[ 1] -2113.96402        [-]
        Block[ 2] -1949.28827        [-]
        Block[ 3] -1907.16244        [-]
        Block[ 4] -1890.00666        [-]
        ------------------------------------------------------------------------------
        Average          -1951.77066 +/-         167.242473 [K]
                           -16.22793 +/-           1.390532 [KJ/MOL]
        Note: Ug should be subtracted to this value
        Note: The heat of adsorption Q=-H

    Total enthalpy of adsorption from components and measured mol-fraction
    ----------------------------------------------------------------------
        Block[ 0] -1954.19993        [-]
        Block[ 1] -2129.73762        [-]
        Block[ 2] -1959.94534        [-]
        Block[ 3] -1941.68449        [-]
        Block[ 4] -1952.52440        [-]
        ------------------------------------------------------------------------------
        Average          -1987.61836 +/-         142.609898 [K]
                           -16.52598 +/-           1.185725 [KJ/MOL]
        Note: Ug should be subtracted to this value
        Note: The heat of adsorption Q=-H

Therefore, the current parser does not extract the correct results.

Also, the place of enthalpy of adsorption needs to be moved from general key of output_parameters to the the component section.

Add a possibility to specify multiple block pocket for different adsorbates

This should be done when moved to the AiiDA 1.0, currently it is not possible

Improve readme

explain how to use it
describe the content of the plugin

Add `output_filename` attribute.

The attribute is not set, which causes auto output update widget on AiiDA lab to except.

Missing MANIFEST.in

One needs to add the setup.json file to the MANIFEST.in (see examples in other plugins) so that the file is present in packages on, e.g., pypi.
Also, check e.g. the zeopp plugin to include properly a README.md in setup.py.

I didn't check if you have other plugins with similar issues.

Base tests on aiida-core image.

This image has wait-for-services check that would allow removing the ugly sleep 30 call:
https://github.com/yakutovicha/aiida-raspa/blob/3235d85b4bae85b0885b90d97caee1d53a1973f1/.travis.yml#L10

"Average Widom Rosenbluth-weight" not parsed

When Widom insertions are performed, one of the quantities in the RASPA output is "Average Widom Rosenbluth-weight" for the specified component. However, this quantity is not parsed by the raspa plugin (i.e. it does not appear as one of the outputs in either ParameterData object output_parameters or component_0). Attached is an example RASPA input and output file that suffers this problem.

output_framework_1.1.1_298.000000_0.data.txt
simulation.input.txt

Parse more data

I think if the parser has the file already open, it can parse some additional data.

Grep for 'WARNING' and put it output parameters and the report. I currently do something like

 if "WARNING" in line:
     self.logger.warning(line)
     warnings.append(line.split('WARNING:')[-1])

when we already loop over the file.

For debugging and development of new sampling strategies it is useful to have information about the MC efficiencies, which are of course only printed if the move is also used. Otherwise there will be be OFF in the line. I currently use something like, which is ugly but seems to work

def parse_performance_line(line):
    """
    :param line: string in format
        Component [tip4pew] total tried: 386.000000 succesfull growth: 3.000000 (0.777202 [%]) accepted: 3.000000 (0.777202 [%])
        that will be parsed
    :return:
    """
    parts = line.split()
    total_tried = parts[4]
    successfull_growth_total = parts[7]
    successfull_growth_ratio = parts[8].strip('(')
    accepted_total = parts[-3]
    accepted_ratio = parts[-2].strip('(')
    return {
        'total_tried': int(float(total_tried)),
        'successfull_growth_total': int(float(successfull_growth_total)),
        'successfull_growth_ratio': float(successfull_growth_ratio),
        'accepted_total': int(float(accepted_total)),
        'accepted_ratio': float(accepted_ratio)
    }


def parse_performance_block(lines):
    """
    :param lines: list of strings in format
        	total        1.000000 1.000000 2.000000
            succesfull   1.000000 1.000000 1.000000
            accepted   1.000000 1.000000 0.500000
            displacement 0.487500 0.487500 0.325000
    :return:
    """
    import numpy as np
    totals = [int(float(i)) for i in lines[4].split()[1:]]
    successfull = [int(float(i)) for i in lines[3].split()[1:]]
    acceptance_ratio = [float(i) for i in lines[2].split()[1:]]
    drift = [float(i) for i in lines[1].split()[1:]]
    return {
        'total': totals,
        'successfull': successfull,
        'acceptance_ratio': acceptance_ratio,
        'drift': drift,
        'acceptance_ratio_mean': np.mean(acceptance_ratio)
    }


def parse_performance_mc(f):
    """
    Parse for some performance metrics of the MC moves
    :param f: file as lines
    :return: dictionary with efficiency of MC efficiencies
    """
    efficiencies_dict = {}
    # read from end for efficiency:
    for i, line in enumerate(f[::-1]):
        if ('Performance of the Reinsertion move:' in line) and  ('OFF' not in line):
            efficiencies_dict['reinsertion'] = parse_performance_line(
                f[::-1][i - 2])
        if ('Performance of the swap deletion move:' in line) and ('OFF' not in line):
            efficiencies_dict['deletion'] = parse_performance_line(f[::-1][i -
                                                                           2])
        if ('Performance of the swap addition move:' in line) and ('OFF' not in line):
            efficiencies_dict['addition'] = parse_performance_line(f[::-1][i -
                                                                           2])
        if ('Performance of the rotation move:' in line)  and ('OFF' not in line):
            efficiencies_dict['rotation'] = parse_performance_block(
                f[::-1][i - 7:i - 2])

        if ('Performance of the translation move:' in line) and ('OFF' not in line):
            efficiencies_dict['translation'] = parse_performance_block(
                f[::-1][i - 7:i - 2])

        if 'Monte-Carlo moves statistics' in line:
            break

    return efficiencies_dict

Also, I would parse all energies that are printed, i.e. also tail-correction energy
For MD, I would be interested in stuff like the energy drift
I somehow feel that it would also be good to store some calculation settings like EnergyOverlapCriteria which appears to be different in the manual and the default to ensure reproducibility across different versions of RASPA (maybe also simply line 3 in the output, which is the RASPA version).

I will add additional energies that are always there to this issue. Maybe @danieleongari can also give some of his experience.

Make test examples use "click"

To improve user-friendliness of the code

Automatic retrieval based on input

Needs discussion.

For some, especially smaller, files one could enhance the user experience by automatic retrieval and automatic parsing. I especially think of the the MSD and radial distribution functions.

For the RDF i currently use something like

        # ToDo: this is not really elegant, optimize it
        if 'ComputeRDF' in self.ctx.parameters['GeneralSettings'].keys():
            if self.ctx.parameters['GeneralSettings']['ComputeRDF'] in [
                    'yes', 'YES', 'Yes'
            ]:
                self.ctx.settings = ParameterData(
                    dict={
                        'additional_retrieve_list':
                        ['RadialDistributionFunctions/System_0/*'],
                    })

which of course would be needed to be generalized to multiple systems. It can get large if the user specifies a lot of different atom types.

In my case, I also parse the RDF. One could save space by saving r only once as it is the same for all files.

def parse_rdfs(rdf_file_list):
    rdf_dict = {}
    for file in rdf_file_list:
        name = Path(file).stem.strip('RDF_')
        df = pd.read_csv(file, comment='#', header=None, sep='\s+')
        r = df.iloc[:, 0].values
        rdf = df.iloc[:, 1].values
        rdf_dict[name] = {'r': r, 'rdf': rdf}
    return rdf_dict