aiidateam / aiida-atomistic Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 6.0 1.61 MB

AiiDA plugin which contains data and methods for atomistic simulations.

License: MIT License

Python 56.44% Jupyter Notebook 43.56%

aiida-atomistic's People

Contributors

Stargazers

Watchers

Forkers

mfkiwl mikibonacci superstar54

aiida-atomistic's Issues

Kind error

I set allow_kinds to False, then got an error.

properties = {'cell': {'value': [[3.5, 0.0, 0.0], [0.0, 3.5, 0.0], [0.0, 0.0, 3.5]]},
 'pbc': {'value': [True, True, True]},
 'positions': {'value': [[0.0, 0.0, 0.0], [1.5, 1.5, 1.5]]},
 'symbols': {'value': ['Li', 'Li']},
 'mass': {'value': [6.941, 6.941]},
 'charge': {'value': [1.0, 0.0]},
#  'kinds': {'value': ['Li0', 'Li0']},
 }
structure = StructureData(properties = properties, allow_kinds=False)

--> 780                     self.get_kinds(kind_tags=self.properties.kinds.value)
    781 
    782         # Store the properties in the StructureData node.

[~/repos/mikibonacci/aiida-atomistic/src/aiida_atomistic/data/structure/properties/property_utils.py](https://file+.vscode-resource.vscode-cdn.net/home/xing/tests/aiida/atomistic/~/repos/mikibonacci/aiida-atomistic/src/aiida_atomistic/data/structure/properties/property_utils.py) in <lambda>(self, type_hint, attr)
    105                 assert issubclass(type_hint, BaseProperty)
    106                 cls._valid_properties.add(attr)
--> 107                 func_get = lambda self, type_hint=type_hint, attr=attr: self._template_property(type_hint=type_hint, attr=attr)
    108 
    109                 # We do not allow to set any property after the creation of the instance:

[~/repos/mikibonacci/aiida-atomistic/src/aiida_atomistic/data/structure/properties/property_utils.py](https://file+.vscode-resource.vscode-cdn.net/home/xing/tests/aiida/atomistic/~/repos/mikibonacci/aiida-atomistic/src/aiida_atomistic/data/structure/properties/property_utils.py) in _template_property(self, type_hint, attr)
    125         return type_hint(
    126             parent=self._parent,
--> 127             **self.get_property_attribute(attr)
...
---> 98         return self._property_attributes[key]
     99 
    100     def _inspect_properties(self,properties):

KeyError: 'kinds'

As design principle, we want to have the StructureData as immutable even if it is not stored. This is done in order to avoid the user to attempt to modify the stored node afterwards, as one usually would do in standard python.

However, this blocks also flexibility on how we produce our StructureData instance as ready to be used in calculation. For example, it will be no more possible to use methods like append_atom() for the unstored node, as well as the append_hubbard_parameter in case of the Hubbard property.

To change/update the instance, the only possibility is to generate a new instance, and this is made easier by having the to_dict method, which returns a python dictionary with everything we need to initialise a new object.
However, I think that then the user has to deal with a rather complex object:

properties = {'cell': {'value': [[3.5, 0.0, 0.0], [0.0, 3.5, 0.0], [0.0, 0.0, 3.5]]},
 'pbc': {'value': [True, True, True]},
 'positions': {'value': [[0.0, 0.0, 0.0], [1.5, 1.5, 1.5]]},
 'symbols': {'value': ['Li', 'Li']},
 'mass': {'value': [6.941, 6.941]},
 'charge': {'value': [1.0, 0.0]},
 'kinds': {'value': ['Li0', 'Li1']}}

(we can always dump this as an AttributeDict).

The main difficulty here is that, to change a property, we have to go into this nested structure every time. Moreover, there are not method to change multiple properties at the same time (e.g., a python dictionary has no remove_atom method) - the user should implements its own function to do it.

*I am thinking that maybe returning a different object, can make the modification easier for the user. *

For example, we can think to return an instance of the Properties class (property attribute of the structure node), which actually contains all the information on the StructureData node. This contains the instances of the properties as attributes, so we can call methods of the properties to modify them if we need it.

In [1]: structure.properties.charge
Out[1]: Charge(parent=<StructureData: uuid: 4ccb3961-e6ff-4b32-8dcf-a638b765981c (unstored)>, domain='intra-site', default_kind_threshold=0.1, value=[1.0, 0.0])

The modified Properties instance can be used to initialise a new StructureData (consistency checks are done in the __init__).
One issue that I see is that right now the properties are connected to the StructureData instance.
We can think on making the properties mutable only if they are not attached to any StructureData instance, i.e. if are only attributes of the Properties instance:

In [2]: properties = structure.to_mutable() 
In [3]: properties.charge
Out[3]: Charge(domain='intra-site', default_kind_threshold=0.1, value=[1.0, 0.0])

we can then even think to provide methods to append/remove atoms and properties:

In [4]: properties.append_atom(positions=[0,0,0], symbol="Li", mass=6.941, charge=-1)
In [5]: properties.charge.append_charge(charge=-1, atom_index=1)

It seems that in this way we are providing a new class, which is not AiiDA datatype. It seems that we have then to maintain two classes: actually we are somehow transferring all the modification methods (append, remove atoms) to the Property class.

***I am really not sure this is the way to go, it is just an idea on how we can try to improve user experience. ***

In the end we can somehow achieve a similar thing (limited wrt the properties which are supported), by using ASE or Pymatgen combined with StructureData.from/to_ase/pymatgen.

Tests for the StructureData

Initialization

Immutability of

StructureData
properties attribute
properties.x, where x is a property

I have to check how this is done in aiida-core.

`StructureData.cell` in `dir()` but raises `AttributeError`

I was playing around with the basic StructureData and found some very strange behavior. Running the following:

from aiida_atomistic.data.structure import StructureData

properties_dict = {
    "cell":{"value":[[3.5, 0.0, 0.0], [0.0, 3.5, 0.0], [0.0, 0.0, 3.5]]},
    "pbc":{"value":[True,True,True]},
    "positions":{"value":[[0.0, 0.0, 0.0],[1.5, 1.5, 1.5]]},
    "symbols":{"value":["Li","Li"]},
    }

structure = StructureData(properties = properties_dict)

print('cell' in dir(structure))
structure.cell

prints True, but then raises an AttributeError when I try to access cell. 😅

Adding the `symbols` and `positions` input parameters

Up to know, atom can be placed in the cell or by providing ASE, Pymatgen objects, or using the appdend_atom() method of the original orm.StructureData .

However, as the StructureData will be basically read-only in this new implementation, also atomic positions should be provided in the initialization phase of the instance. This means that we should provide symbols and positions input parameters.

On the implementation, I think a good starting point is to use the append_atom() method during the __init__ phase.

Add the input parameters and the logic in the init.
Specify in the doc string also the units and the format of this input.
Remove the possiblity to append_atom after the initialization.

3. How properties should be stored inside the StructureData Node

The `properties` namespace

Properties may be stored inside an attribute called properties. For example in case of the pbc property, it should be stored as StructureData.properties.pbc: this is the class Pbc and its actual value (which is a list of booleans) can be accessed via StructureData.properties.pbc.value.

everything in the properties attribute can be subject to a consistency check, to understand if there is something unknown. and then raise an error (the same check can be done via the structure.get_valid_properties() and structure.get_defined_properties() methods).
The question also becomes: what is a property? Are the pbc a property? What about the mass of each site?

Physical storage of a property: database of repository?

The advantage of database (storing properties as attribute of StructureData node) is that it is possible to query and search. The repository, instead, is not subject to any query.
However, for example in case of Hubbard parameters (see HubbardStructureData), these are stored in the repository as JSON files, using pydantic functionalities.

Sometimes, we want to be queryable only some part of the properties. This means that we can design a property class to determine which of its parameters/values should be stored and where; we add a level of flexibility for the storage location. This flexibility however should be only in the implementation of the property, i.e. we cannot decide the storage location on-the-fly when creating a StructureData instance.

The getter and setter methods (of the current implementation) should be adjusted to be general and support both db- and repo- based type of properties. Or we should define two different methods, one for db and one for repository

DB-like properties

Stored in self.base.attributes._property_attributes dictionary. --> should change to self.base.attributes.property

Repository-like properties

Stored in the repository and accessed via pydantic BaseModel functionalities.

Tasks

Beta Give feedback

Add possibility to store a property in db or in repository (the decision is property-specific, for now we can store everything in the db)
create a documentation about simple query of the properties
Options

Consider use case with different boundary conditions than periodic

Currently we have strong bias towards periodic boundary conditions built-in the StructureData. Moreover, we consider the dimensionality to be directly derived from the periodic boundary conditions.

However, there are certainly use cases where you have e.g. a 3D system, but different boundary conditions along a certain direction, see:

aiidaplugins/aiida-lammps#121

Test the performance of `atomistic.StructureData` vs `orm.StructureData`

test if the new StructureData is slower than the original one.

create a test (not pytest) that does the same thing for both types, N number of structures
run for 1, 10, 100, 1000 structures
test with the kinds check

Should all properties be equal?

Currently, the StructureData is completely defined as a list of properties (as explained here):

global:
- cell
- periodic boundary conditions (PBC)
site:
- positions
- symbols
- masses
- charge
- magnetization - Coming soon!
- Hubbard U parameters - Coming soon!
inter-site:
- Hubbard V parameters - Coming soon!

I understand the appeal of this, and maybe it's the way to go. Part of me is still wondering if all properties should be equal, i.e. if some properties should receive special treatment (cell, pbc, positions, ...). Thinking of @mikibonacci's question here, I'd prefer to still be able to access the cell as structure.cell, not as:

structure.properties.cell

which then outputs a Cell object:

Cell(parent=<StructureData: uuid: 2f9fc6f5-e3a6-4811-89d6-41015b5f50b4 (unstored)>, domain='global', value=[[3.5, 0.0, 0.0], [0.0, 3.5, 0.0], [0.0, 0.0, 3.5]])

If I want to look at the cell, I'd just want to have structure.cell return a list of list or numpy array. I don't want to have to deal with the Cell object and then figure out how to get the cell vectors from this.

That said, other tools like pymatgen and ASE also have their own "cell" variants (Structure -> Lattice; Atoms -> Cell). And I can see how Cell.value lets you get what you desire. So maybe I'm just ranting for no reason here. Will think about it some more as I'm dogfooding.

EDIT: I think I may have managed to derail the conversation from my original point immediately in the same post. I insta-"Mr. Scope creeped" myself. 😅

Production of a documentation based on the User Stories

What's the difference between supported properties and Stored properties?

In the user guide, one print out the properties:

The whole list of currently supported properties is: 
['symbols', 'cell', 'mass', 'charge', 'positions', 'pbc', 'custom', 'kinds']
Stored properties are: 
['symbols', 'cell', 'mass', 'positions', 'pbc', 'kinds']

Why charge and custom is not stored?

2. Unsupported and custom properties

Currently, we cannot define unsupported properties (some check in set_property method against the get_valid_properties output), or defined property with some parameters which is not recognized (extra parameters);

unsupported properties

A given plugin may not support all the implemented properties (e.g. hubbard).
Two codes that are using the same StructureData may not support the same properties that are stored in the node.

When a structure is passed to a plugin calcjob/workchain, there should be a check on all the properties associated to the structure. Then two behaviors can happen:

1 - the code uses a calcfunction to create a new instance of the StructureData without the not supported property
2 - the code warns about the unsupported property, but skips it and continue; this can also be done using some keyword ignore_property="hubbard"

The second option is the preferred one if we have multiple codes which support different properties, and also avoid some StructureData duplication.
-> it should be obvious how and why the property is ignored in the plugin (for example if you look at an old calculation, you should understand the decision clearly). There should be a part of the documentation which explain how to do it in a plugin, the keywords to be used, the warnings/exceptions

custom properties

We may also let the user to define some custom properties, as done in optimade by means of a prefix.
This means that the StructureData should support such custom property definition, and maybe store them under a different key in the attributes, in such a way to easily find/recognise them.
We should provide a template on how to define such properties.

Remove `ase`/`pymatgen`/... inputs from constructor in favor of `from_` `classmethod`s

Currently the constructor of the StructureData still has e.g. the ase input to initialize an instance from an ASE Atoms object:

aiida-atomistic/src/aiida_atomistic/data/structure/__init__.py

Lines 717 to 727 in 1af101e

 def __init__( 

 self, 

 ## RM cell=None, 

 ## RM pbc=None, 

 ase=None, 

 pymatgen=None, 

 pymatgen_structure=None, 

 pymatgen_molecule=None, 

 properties: Dict[str, Dict[str, Any]] = {}, 

 **kwargs 

 ): # pylint: disable=too-many-arguments

I would instead want to see a from_ase() class method, e.g.:

from ase.build import bulk

bulk_si = bulk('Si', cubic=True)

structure_data = StructureData.from_ase(bulk_si)

Should we store default values for properties or access the defaults on the fly?

for now we store default, but I think it would be memory-efficient to just call them on the fly. to be discussed. @khsrali @GeigerJ2

1. Setting a property: built-in methods vs constructor

Using built-in methods:

The current implementation supports two ways to set a property, both via methods of the StructureData itself or of the property class:

### Store the Pbc property in a StructureData node.
##########################################################
#
#
structure = StructureData()
#
# (1) using the set_property method:
#
structure.set_property(name='pbc', value={'value':[True,False,True]})
#
# (2) using built-in method of Pbc property. 
#     This calls internally the set_property method.
#
structure.pbc.set_from_string(dimensionality="3D") 
# ==> structure.pbc.value is then [True,True,True]
#
#The built-in methods then should work in this way:
structure.pbc.set_pbc(input_list=[True,True,True],...)
structure.pbc.from_structure(structure=other_structure,...)
#
# We cannot do the following:
#
# A property should be  read-only if accessed directly:
structure.pbc.value = 54
structure.pbc = Pbc([True,False,True])



#
# Not possible to set unsupported properties or to provide random parameters
structure.set_property(name='unsupported', value={'value':5})
structure.set_property(name='pbc', value={'unsupported_params':5})

The built-in methods of the properties should call the set_property method in order to set/update the value of a property. This ensures consistency and data validation checks and also consistency for the usage of the StructureData in different plugins.

Using a constructor

However, in this way the immutability of a property is less robust: we can always use one of the methods above to override a property at any time (of course, only if the node is not stored). This may also lead to some unconsistency with other properties which are already defined. A way to avoid these two issues is to delegate everything (StructureData and properties setting) to the constructor of the StructureData class:

### Store the Pbc property in a StructureData node.
##########################################################
#
#
pbc_3d = Pbc(value=[True,True,True])
spin_up = Magnetization(moments=[1,1,1,...])
...
#
structure = StructureData(pbc=pbc_3d, magnetization=spin_up, ...)

some advantages are:

concrete immutability: you have to create a new StructureData class to change a property - everything is really read-only;
consistency checks at creation time
can instantiate a property directly; more intuitive and flexible - at least before the StructureData node creation

A disadvantage that I see it is that structure dependent property that we may want to initialize are not straightforward: for example, if we want to provide Hubbard U for a given set of atoms, we can provide the name of the atom, and then a set of operations are done in order to understand each atomic site which should have the same U. This cannot be done a priori and it is a bit counterintuitive on how to do it form the constructor (we pass the Hubbard class, or a dictionary then to be internally processed in order to set the instance of Hubbard?)

In my opinion, we can provide some method in the property classes that allows to instantiate one of them. This however may give a fake hint on the possiblity to mutate them once the StructureData is created. I don't really know about this, but if I think of the Hubbard property, one and very useful method is the from_list one.

`get_ase()` error

from aiida import orm, load_profile
load_profile()

from aiida_atomistic.data.structure import StructureData

properties_dict = {
    "cell":{"value":[[3.5, 0.0, 0.0], [0.0, 3.5, 0.0], [0.0, 0.0, 3.5]]},
    "pbc":{"value":[True,True,True]},
    "positions":{"value":[[0.0, 0.0, 0.0],[1.5, 1.5, 1.5]]},
    "charge": {"value": [1.0, 1.0]},
    "symbols":{"value":["Li","Li"]},
    }
structure = StructureData(properties = properties_dict)
structure.get_ase()

Raise error:

   1992         _kinds = self.kinds
   1993 

[~/miniconda3/envs/aiida/lib/python3.11/site-packages/aiida/orm/nodes/node.py](https://file+.vscode-resource.vscode-cdn.net/home/xing/tests/aiida/atomistic/~/miniconda3/envs/aiida/lib/python3.11/site-packages/aiida/orm/nodes/node.py) in __getattr__(self, name)
    720             return getattr(self.base.links, new_name)
    721 
--> 722         raise AttributeError(name)

AttributeError: cell

Should be related to #12 .

Simple syntax for users

When create the StructureData, one needs to

properties_dict = {
    "cell":{"value":[[3.5, 0.0, 0.0], [0.0, 3.5, 0.0], [0.0, 0.0, 3.5]]},
    "pbc":{"value":[True,True,True]},
    "positions":{"value":[[0.0, 0.0, 0.0],[1.5, 1.5, 1.5]]},
    "symbols":{"value":["Li","Li"]},
    }
structure = StructureData(properties = properties_dict)

It would be great if the user can provide the dict without specific the value attribute.

properties_dict = {
    "cell":[[3.5, 0.0, 0.0], [0.0, 3.5, 0.0], [0.0, 0.0, 3.5]],
    "pbc":[True,True,True],
    "positions":[[0.0, 0.0, 0.0],[1.5, 1.5, 1.5]],
    "symbols":["Li","Li"],
    }

It would also be good if it support:

structure = StructureData(cell=[[3.5, 0.0, 0.0], [0.0, 3.5, 0.0], [0.0, 0.0, 3.5]], 
                          pbc=[True, True, True],
                          positions=[[0.0, 0.0, 0.0],[1.5, 1.5, 1.5]],
                          symbols=["Li", "Li"])

4. A site-based approach

In this issue we comment on the description of properties in a site-based approach.

pros:

intuitive and straightforward definition of the properties (just a list where for each site we define the property)
no "interference" between properties (successive kind redefinition if one property is set, may create ambiguities)
Hubbard parameters are naturally site-based: some property is complicated to be defined via kinds.
intersite properties (e.g. Hubbard) are easy to define in a site-based approach

cons:

need a mapping into kinds by means of thresholds(like magmoms), and this can be done on the fly in the plugin or as method in our StructureData ->may results in unexpected behaviors (too many starting magnetizations, i.e. too many kinds); however, we can do it in a self-consistent way (if kinds>10, increase the thresholds...)
duplication of information with respect to kinds, which are more compact

The `get_kinds` method

If we adopt the sites based approach, the sites-to-kinds mapping can be provided via a get_kinds method of the StructureData. This can support tolerances for the definition of new kinds.
However, such procedure can be also done by hands for specific cases (for example, we may want a specific order of the kinds)

Thoughts on JSON serialisation and `pydantic`?

Looking at the example on how to create a StructureData instance:

from aiida_atomistic.data.structure import StructureData

properties_dict = {
    "cell":{"value":[[3.5, 0.0, 0.0], [0.0, 3.5, 0.0], [0.0, 0.0, 3.5]]},
    "pbc":{"value":[True,True,True]},
    "positions":{"value":[[0.0, 0.0, 0.0],[1.5, 1.5, 1.5]]},
    "symbols":{"value":["Li","Li"]},
    }

structure = StructureData(properties = properties_dict)

And the fact the entire structure data is now captured by the properties, after #10 the constructor would effectively only have the properties input. This immediately made me think of just having a JSON serializing method (to_json, from_json, or similar). But then I thought about @sphuber's AEP (aiidateam/AEP#40) - which I still have to read 🙈 - and if we shouldn't be thinking about how to integrate the new StructureData with the concepts there.

I'll have to read up some more before having a solid opinion on what to do, but I already wanted to raise the issue.

	def __init__(
	self,
	## RM cell=None,
	## RM pbc=None,
	ase=None,
	pymatgen=None,
	pymatgen_structure=None,
	pymatgen_molecule=None,
	properties: Dict[str, Dict[str, Any]] = {},
	**kwargs
	): # pylint: disable=too-many-arguments