Git Product home page Git Product logo

aiida-raspa's People

Contributors

borellim avatar danieleongari avatar ezpzbz avatar kjappelbaum avatar ltalirz avatar mpougin avatar yakutovicha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

aiida-raspa's Issues

Reduce the size of retrieved files

From a recent analysis, my AiiDa repository of 31GB has ca.:

  • 7GB (23%) occupied by retrieved restart files of Raspa
  • 6GB(20%) occupied by output .data files of raspa
    Other files related to Raspa (cif, simulate.input, _aidasubmit.sh, and force field .def files), are under 1GB, so no need to care now.

I suggest to take action to reduce the size of the repository, given the large number of RaspaCalculation that are called during our workchain.

Restart files

I suggest not to retrieve them by default but use them locally as we do e.g., for cube files in aiida-ddec.
However, I would leave an option for the user to retrieve them: this is the case of saturated system that require a long equilibration, or for the simulated annealing work chains of aiida-lsmo where a cif file is computed from the restart (https://github.com/lsmo-epfl/aiida-lsmo/blob/6c32eebdaefd11dc226d849415e363c6222a475a/aiida_lsmo/workchains/sim_annealing.py#L88-L106)

Output files

Here we want to think about it because we have different type of data:

  • check values that are elaborated directly from the input (or default parameters) that we might want to keep for reproducibility
  • info about the cycles that are printed with PrintEvery and one want to check to see what is going on in the calculation. We generally use 10 checkpoints, so it is not a huge amount of data and we should keep it.
  • outputs, that can not be suppressed but are useless for some calculation (e.g., Gibbs/Widom null results, when doing GCMC calculations). We can not suppress them unless we change the raspa code.

Therefore, for the output files we need more discussion to reduce the hdd consumption.

Conversion to g/g is in reality to mg/g

The key conversion_factor_molec_uc_to_gr_gr of output_parameters is parsed from Raspa's output:

Conversion factor molecules/unit cell -> gr/gr:       1.0756975415 [-]

which should read instead molecules/unit cell -> mg/g.
It is weird indeed that this value is on average systems close to 1.

It should be checked if this error is still present in the most recent version of Raspa (if so open an issue there) and decide how to deal with this problem, maintaining back compatibility.

base_binary_restart is hanging under pytest

I can submit the calculation example_base_binary_restart.py from the command line without any problem. However, if I run it through pytest, it keeps hanging. This was discovered in the PR #34

  • Rename the file fixme_base_binary_restart.py back to example_base_binary_restart.py
  • Fix the problem with hanging test

Perform some validation

According to the RASPA manual, the actual .cif that is used is the one in ./Movies/System_0. It would be good to compare the actual input and the one in this folder to check if RASPA read all loops, especially the charges correctly. In some cases I saw some rounding to happen, so one could use `PyCifRw" to extract the loops in both cifs and compare if

  • the fractional coordinates
  • the charges
    are correctly read by RASPA.

Something that is also a common issue, but a bit harder to implement is to check if the atom types are read correctly. There might be different issues that one encounters:

  • Some atoms types are not read correctly and RASPA prints NO Interactionin the output file when we would actually expect interaction
  • There is this issue with short labels overwriting longer ones.

One easier check might hence be to just extract the atomic symbols (using aseor pymatgen and then to compare if all of them also appear in the output. If not, I would print a warning.

Similarly, I would think about introducing a warning about e.g. negative hydrogens that can yield to non-physical interactions. Of course, hydrides have negative hydrogens but I would guess that in the most common applications of RASPA negative charges would be something un-usual.

Linking @danieleongari for notice.

Modify the README before the 1.0 release

For the moment the readme file is quite outdated with respect to the changes introduced recently in the develop branch. Before making the official release this needs to be fixed.

Incompatibility in pre-commit requirements

Currently, the requirements

        "pre-commit": [
                "pre-commit==1.16.1",
                "yapf==0.27.0",
                "prospector==1.1.6.2",
                "pylint==1.9.4; python_version<'3.0'",
                "pylint==2.2.2; python_version>='3.0'"
        ]

result in the error:

ERROR: prospector 1.1.6.2 has requirement pylint==2.1.1, but you'll have pylint 2.2.2 which is incompatible.

I updated these to the current aiida-core-v1.0.0 pre-commit requirements in my last PR (#46):

        "pre-commit": [
                "pre-commit==1.18.3",
                "yapf==0.28.0",
                "prospector==1.1.7",
                "pylint==1.9.4; python_version<'3.0'",
                "pylint==2.3.1; python_version>='3.0'"
        ]`

Parser improvement for Enthalpy of adsorption

In the case of multi-component simulation the associated output for enthalpy of adsorption has different structure as follows:

Enthalpy of adsorption:
=======================

    Total enthalpy of adsorption
    ----------------------------
    Block[ 0] -1971.45386        [K]
    Block[ 1] -2133.60014        [K]
    Block[ 2] -1963.42612        [K]
    Block[ 3] -1959.66056        [K]
    Block[ 4] -1981.73614        [K]
    ------------------------------------------------------------------------------
    Average          -2001.97536 +/-         132.492585 [K]
                       -16.64535 +/-           1.101605 [KJ/MOL]
    Note: Ug should be subtracted from this value
    Note: The heat of adsorption Q=-H

    Component 0 [xenon]
    -------------------------------------------------------------
        Block[ 0] -2116.19692        [-]
        Block[ 1] -2163.03260        [-]
        Block[ 2] -1985.42962        [-]
        Block[ 3] -2026.21605        [-]
        Block[ 4] -2108.72107        [-]
        ------------------------------------------------------------------------------
        Average          -2079.91925 +/-         129.199899 [K]
                           -17.29342 +/-           1.074228 [KJ/MOL]
        Note: Ug should be subtracted to this value
        Note: The heat of adsorption Q=-H

    Component 1 [krypton]
    -------------------------------------------------------------
        Block[ 0] -1898.43190        [-]
        Block[ 1] -2113.96402        [-]
        Block[ 2] -1949.28827        [-]
        Block[ 3] -1907.16244        [-]
        Block[ 4] -1890.00666        [-]
        ------------------------------------------------------------------------------
        Average          -1951.77066 +/-         167.242473 [K]
                           -16.22793 +/-           1.390532 [KJ/MOL]
        Note: Ug should be subtracted to this value
        Note: The heat of adsorption Q=-H

    Total enthalpy of adsorption from components and measured mol-fraction
    ----------------------------------------------------------------------
        Block[ 0] -1954.19993        [-]
        Block[ 1] -2129.73762        [-]
        Block[ 2] -1959.94534        [-]
        Block[ 3] -1941.68449        [-]
        Block[ 4] -1952.52440        [-]
        ------------------------------------------------------------------------------
        Average          -1987.61836 +/-         142.609898 [K]
                           -16.52598 +/-           1.185725 [KJ/MOL]
        Note: Ug should be subtracted to this value
        Note: The heat of adsorption Q=-H

Therefore, the current parser does not extract the correct results.

Also, the place of enthalpy of adsorption needs to be moved from general key of output_parameters to the the component section.

Improve readme

  • explain how to use it
  • describe the content of the plugin

Missing MANIFEST.in

One needs to add the setup.json file to the MANIFEST.in (see examples in other plugins) so that the file is present in packages on, e.g., pypi.
Also, check e.g. the zeopp plugin to include properly a README.md in setup.py.

I didn't check if you have other plugins with similar issues.

"Average Widom Rosenbluth-weight" not parsed

When Widom insertions are performed, one of the quantities in the RASPA output is "Average Widom Rosenbluth-weight" for the specified component. However, this quantity is not parsed by the raspa plugin (i.e. it does not appear as one of the outputs in either ParameterData object output_parameters or component_0). Attached is an example RASPA input and output file that suffers this problem.

output_framework_1.1.1_298.000000_0.data.txt
simulation.input.txt

Parse more data

I think if the parser has the file already open, it can parse some additional data.

  1. Grep for 'WARNING' and put it output parameters and the report. I currently do something like
 if "WARNING" in line:
     self.logger.warning(line)
     warnings.append(line.split('WARNING:')[-1])

when we already loop over the file.

  1. For debugging and development of new sampling strategies it is useful to have information about the MC efficiencies, which are of course only printed if the move is also used. Otherwise there will be be OFF in the line. I currently use something like, which is ugly but seems to work
def parse_performance_line(line):
    """
    :param line: string in format
        Component [tip4pew] total tried: 386.000000 succesfull growth: 3.000000 (0.777202 [%]) accepted: 3.000000 (0.777202 [%])
        that will be parsed
    :return:
    """
    parts = line.split()
    total_tried = parts[4]
    successfull_growth_total = parts[7]
    successfull_growth_ratio = parts[8].strip('(')
    accepted_total = parts[-3]
    accepted_ratio = parts[-2].strip('(')
    return {
        'total_tried': int(float(total_tried)),
        'successfull_growth_total': int(float(successfull_growth_total)),
        'successfull_growth_ratio': float(successfull_growth_ratio),
        'accepted_total': int(float(accepted_total)),
        'accepted_ratio': float(accepted_ratio)
    }


def parse_performance_block(lines):
    """
    :param lines: list of strings in format
        	total        1.000000 1.000000 2.000000
            succesfull   1.000000 1.000000 1.000000
            accepted   1.000000 1.000000 0.500000
            displacement 0.487500 0.487500 0.325000
    :return:
    """
    import numpy as np
    totals = [int(float(i)) for i in lines[4].split()[1:]]
    successfull = [int(float(i)) for i in lines[3].split()[1:]]
    acceptance_ratio = [float(i) for i in lines[2].split()[1:]]
    drift = [float(i) for i in lines[1].split()[1:]]
    return {
        'total': totals,
        'successfull': successfull,
        'acceptance_ratio': acceptance_ratio,
        'drift': drift,
        'acceptance_ratio_mean': np.mean(acceptance_ratio)
    }


def parse_performance_mc(f):
    """
    Parse for some performance metrics of the MC moves
    :param f: file as lines
    :return: dictionary with efficiency of MC efficiencies
    """
    efficiencies_dict = {}
    # read from end for efficiency:
    for i, line in enumerate(f[::-1]):
        if ('Performance of the Reinsertion move:' in line) and  ('OFF' not in line):
            efficiencies_dict['reinsertion'] = parse_performance_line(
                f[::-1][i - 2])
        if ('Performance of the swap deletion move:' in line) and ('OFF' not in line):
            efficiencies_dict['deletion'] = parse_performance_line(f[::-1][i -
                                                                           2])
        if ('Performance of the swap addition move:' in line) and ('OFF' not in line):
            efficiencies_dict['addition'] = parse_performance_line(f[::-1][i -
                                                                           2])
        if ('Performance of the rotation move:' in line)  and ('OFF' not in line):
            efficiencies_dict['rotation'] = parse_performance_block(
                f[::-1][i - 7:i - 2])

        if ('Performance of the translation move:' in line) and ('OFF' not in line):
            efficiencies_dict['translation'] = parse_performance_block(
                f[::-1][i - 7:i - 2])

        if 'Monte-Carlo moves statistics' in line:
            break

    return efficiencies_dict
  1. Also, I would parse all energies that are printed, i.e. also tail-correction energy
  2. For MD, I would be interested in stuff like the energy drift
  3. I somehow feel that it would also be good to store some calculation settings like EnergyOverlapCriteria which appears to be different in the manual and the default to ensure reproducibility across different versions of RASPA (maybe also simply line 3 in the output, which is the RASPA version).

I will add additional energies that are always there to this issue. Maybe @danieleongari can also give some of his experience.

Automatic retrieval based on input

Needs discussion.

For some, especially smaller, files one could enhance the user experience by automatic retrieval and automatic parsing. I especially think of the the MSD and radial distribution functions.

For the RDF i currently use something like

        # ToDo: this is not really elegant, optimize it
        if 'ComputeRDF' in self.ctx.parameters['GeneralSettings'].keys():
            if self.ctx.parameters['GeneralSettings']['ComputeRDF'] in [
                    'yes', 'YES', 'Yes'
            ]:
                self.ctx.settings = ParameterData(
                    dict={
                        'additional_retrieve_list':
                        ['RadialDistributionFunctions/System_0/*'],
                    })

which of course would be needed to be generalized to multiple systems. It can get large if the user specifies a lot of different atom types.

In my case, I also parse the RDF. One could save space by saving r only once as it is the same for all files.

def parse_rdfs(rdf_file_list):
    rdf_dict = {}
    for file in rdf_file_list:
        name = Path(file).stem.strip('RDF_')
        df = pd.read_csv(file, comment='#', header=None, sep='\s+')
        r = df.iloc[:, 0].values
        rdf = df.iloc[:, 1].values
        rdf_dict[name] = {'r': r, 'rdf': rdf}
    return rdf_dict

Linking @danieleongari for notice.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.