Git Product home page Git Product logo

extxyz.jl's Introduction

ExtXYZ.jl

GitHub Workflow Status docs-dev docs-stable codecov

This package provides Julia bindings for the extxyz C library which implements a parser and writer for the extended XYZ file format used in materials and molecular modelling, following the specification set out in the extxyz repo. Moreover the ExtXYZ.Atoms object directly adheres to the AtomsBase common interface for atomistic structures.

Maintainer: James Kermode (@jameskermode).

Installation

This package is registered in the General registry, so installation of the latest stable release is as simple as pressing ] to enter pkg> mode in the Julia REPL, and then entering:

pkg> add ExtXYZ

or for the development version:

pkg> dev https://github.com/libAtoms/ExtXYZ.jl

Related packages

  • The JuLIP.jl package is an optional - but recommended - companion. JuLIP can use ExtXYZ.jl to read and write extended XYZ files to/from JuLIP.Atoms instances, using the functions JuLIP.read_extxyz() and JuLIP.write_extxyz().
  • The package is integrated with AtomsIO.jl to provide a uniform interface (based on AtomsBase) for reading and writing a large range of atomistic structure files.

Please open issues/PRs here with suggestions of other packages it would be useful to provide interfaces to.

Basic Usage

Four key functions are exported: read_frame() and write_frame() for reading and writing single configurations (snapshots), respectively, and read_frames() and write_frames() for reading and writing trajectories. Moreover ExtXYZ.Atoms provides a datastructure to expose the read configurations in an AtomsBase-compatible manner. All read and write functions can work with string filenames, an open Base.IO instance or (intended primarily for internal use) a C FILE* pointer, stored as a Ptr{Cvoid} type.

using ExtXYZ

frame = read_frame("input.xyz")  # single atomic configuration, represented as a Dict{String}{Any}
write_frame("output.xyz", frame) # write a single frame to `output.xyz`. 

frame10 = read_frame("input.xyz", 10) # read a specific frame, counting from 1 for first frame in file

all_frames = read_frames("seq.xyz")  # read all frames, returns Vector{Dict{String}{Any}}
frames = read_frames("seq.xyz", 1:4) # specific range of frames

write_frames("output.xyz", frames, append=true) # append four frames to output

# Get a frame as AtomsBase-compatible ExtXYZ.Atoms object:
Atoms(read_frame("input.xyz"))

# Get list of frames as AtomsBase-compatible ExtXYZ.Atoms object:
Atoms.(read_frames("seq.xyz", 1:4)

The function iread_frames() provides lazy file-reading using a Channel:

for frame in iread_frames("input.xyz")
    process(frame) # do something with each frame
do

write_frames() can also be used for asynchronous writing by passing in a Channel:

Channel() do ch
    @async write_frames(outfile, ch)
    
    for frame in frames
        put!(ch, frame)
    end
end

Atoms data structure

In lieu of a package-independent data structure for representing atomic structures (i.e. an equivalent to ASE's Atoms class in the Python ecosystem), this package uses a Dict{String}{Any}. For the extended XYZ file:

8
Lattice="5.44 0.0 0.0 0.0 5.44 0.0 0.0 0.0 5.44" Properties=species:S:1:pos:R:3 Time=0.0
Si        0.00000000      0.00000000      0.00000000
Si        1.36000000      1.36000000      1.36000000
Si        2.72000000      2.72000000      0.00000000
Si        4.08000000      4.08000000      1.36000000
Si        2.72000000      0.00000000      2.72000000
Si        4.08000000      1.36000000      4.08000000
Si        0.00000000      2.72000000      2.72000000
Si        1.36000000      4.08000000      4.08000000

The internal representation, shown in JSON format for readability, is as follows:

{
   "N_atoms": 8,
   "arrays": {
      "pos": [
         [
            0.0,
            0.0,
            0.0
         ],
         [
            1.36,
            1.36,
            1.36
         ],
         [
            2.72,
            2.72,
            0.0
         ],
         [
            4.08,
            4.08,
            1.36
         ],
         [
            2.72,
            0.0,
            2.72
         ],
         [
            4.08,
            1.36,
            4.08
         ],
         [
            0.0,
            2.72,
            2.72
         ],
         [
            1.36,
            4.08,
            4.08
         ]
      ],
      "species": [
         "Si",
         "Si",
         "Si",
         "Si",
         "Si",
         "Si",
         "Si",
         "Si"
      ]
   },
   "info": {
      "Time": 0.0
   },
   "cell": [
      [
         5.44,
         0.0,
         0.0
      ],
      [
         0.0,
         5.44,
         0.0
      ],
      [
         0.0,
         0.0,
         5.44
      ]
   ]
}

Important dictionary keys include:

  • N_atoms - the number of atoms (mandatory)
  • cell - the unit cell, a 3x3 matrix of floats containing the cell vectors as rows, i.e. the same as ASE (mandatory)
  • pbc - periodic boundary conditions, Vector{Bool} of length 3 (optional)
  • info - dictionary containing per-configuration key/value pairs parsed from the comment (line #2 in each frame). These can include scalars, vectors and matrices of integer, real, bool and string scalars or vectors. (mandatory, can be empty)
  • arrays - dictionary containing per-atom properties as a N_component x N_atoms matrix, reduced to a vector for the case N_component = 1. These represent scalar (N_component = 1) or vector (N_component > 1) per-atom properties, of integer (I), real (R), bool, (L) or string (S, scalars only) type. The set of properties is extracted from the special Properties key in the comment line. (mandatory, and must contain at least a string property "species" containing atomic symbols and a 3-column vector property

extxyz.jl's People

Contributors

cortner avatar github-actions[bot] avatar jamesgardner1421 avatar jameskermode avatar janhab avatar joegilkes avatar kailiersch avatar mfherbst avatar tjjarvinen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

extxyz.jl's Issues

Files with missing cell

Just tried to read an xyz file that Ilyes sent me, a dataset of cumulene molecules in vacuum, therefore no cell is defined and the "cell" is missing entirely. Could the package be extended to allow this or is this when pbc = "FFF" and specifiy an automatic cubic cell that includes all particles.?

But maybe it goes against the specification of the format?

Example:

13
Properties=species:S:1:pos:R:3:forces:R:3 energy=-66.79083251953125 pbc="F F F"
C       -5.13553286       0.00000000       0.00000000       0.00006186      -0.00000000       0.00000000
H       -5.72485781      -0.91726500       0.00000000      -0.00001280       0.00000756       0.00000000
H       -5.72485781       0.91726500       0.00000000      -0.00001280      -0.00000756      -0.00000000
C       -3.82464290       0.00000000       0.00000000      -0.00004186       0.00000000       0.00000000
C       -2.55226111       0.00000000       0.00000000       0.00003090      -0.00000000      -0.00000000
C       -1.27494299       0.00000000       0.00000000      -0.00009426       0.00000000      -0.00000000
C        0.00000000       0.00000000       0.00000000       0.00000000      -0.00000000      -0.00000000
C        1.27494299       0.00000000       0.00000000       0.00009426       0.00000000       0.00000000
C        2.55226111       0.00000000       0.00000000      -0.00003090      -0.00000000       0.00000000
C        3.82464290       0.00000000       0.00000000       0.00004186       0.00000000      -0.00000000
H        5.72485781       0.00000000       0.91726500       0.00001280      -0.00000000      -0.00000756
H        5.72485781       0.00000000      -0.91726500       0.00001280       0.00000000       0.00000756
C        5.13553286       0.00000000       0.00000000      -0.00006186      -0.00000000      -0.00000000

Incorrect file to dict transformation with multiple atoms

Hi again I think I've found another issue.

When you add a second atom to the test file:

2
Lattice="20.0 0.0 0.0 0.0 20.0 0.0 0.0 0.0 20.0" Properties=species:S:1:pos:R:3:map_shift:I:3:n_neighb:I:1:gap_force:R:3:dft_force:R:3 config_type=isolated_atom gap_energy=-157.7272532 gap_virial="0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0" dft_energy=-158.54496821 cutoff=5.5 nneightol=1.2 pbc="T T T"
Si      1.00000000      2.00000000      3.00000000       -1       -1       -1        0       0.00000000       0.00000000       0.00000000       0.00000000       0.00000000       0.00000000
Si      4.00000000      5.00000000      6.00000000       -1       -1       -1        0       0.00000000       0.00000000       0.00000000       0.00000000       0.00000000       0.00000000

and then read the positions:

data = read_frame("test.xyz")
data["arrays"]["pos"]

3×2 Matrix{Float64}:
 1.0  2.0
 3.0  4.0
 5.0  6.0

The positions are in read in the incorrect order. You can actually also see this in the README example, the data is in the wrong order. When you write the file again it preserves the original structure so the transformation is correctly reversible but the dict is wrong.

I had a go at trying to fix this but it's a bit complicated for me. It arises due to the row major vs col major data representations in C and Julia but when you fix it for positions it will break something else in the info field since all the data is using the same convert function. Maybe you can find a nice way to fix it.

Reading in frames from an IOBuffer

I have some code that generates XYZ geometries, and I need these to be converted into ExtXYZ frames. I can write these geometries to temporary files and read them back in as frames from those files, but this becomes incredibly taxing on disk IO when passing around thousands of XYZs, which I unfortunately have to do quite regularly.

I can write each geometry to a string, but I can't find a way to then read this string back in as a frame, since read_frames only works with a string for the path to the XYZ file, or with an IOStream to a file.

Is there some way of extending the current read_frames implementation to work with IOBuffers? That way I could write the string-form XYZ directly to an IOBuffer and read back in as a frame entirely within memory, which would be much faster.

Rename properties in AtomsBase Interface

Hello, i wonder why you didn't use the AtomsBase names for the properties in atom_data and system_data.
Wouldn't it be good to rename "box" to "bounding_box", "positions" to "position", "atomic_symbols" to "atomic_symbol", and so on.
I just tried it with a few of these properties and for those i've tried it, it seems to work.

Are the names you used important for something or would it be possible to change them, and if so, would you welcome to have this names changed @jameskermode?

(Request) Support for molecular XYZs without Lattice key

Hi James,

I was just wondering if there would be any interest in making this compatible with extXYZ files that don't include a Lattice key, i.e. for molecular geometries with no inherent unit cell?

I'd really like to use this as a simple, lightweight library for doing some low-level manipulation of molecular xyz files within Julia, but unfortunately the hard requirement for a unit cell to be provided means this package won't work on those files.

Thanks!

Segfault when reading non-existent file

if "bla.extxyz" does not exist this

using ExtXYZ
ExtXYZ.read_frame("bla.extxyz")

yields a segfault. Same does read_frames. An empty file is fine, though.

Setup coverage tracking

It would be useful to find bug such as the one fixed in #30 to setup automatic coverage tracking in the package.

Saving isolated molecules to a file results a file that cannot be read

If you create isolated molecule with AtomsBase and save it using ExtXYZ and try to load the file

Here is a Working example:

using AtomsBase
using ExtXYZ
using Unitful

hydrogen = isolated_system([
        :H => [0, 0, 0.]u"Å",
        :H => [0, 0, 1.]u"Å"
])

ExtXYZ.save("example.xyz", hydrogen)

ExtXYZ.load("example.xyz")   # results in error

The saved file reads

2
pbc=[F, F, F] Lattice="inf 0.00000000 0.00000000 0.00000000 inf 0.00000000 0.00000000 0.00000000 inf" Properties=species:S:1:pos:R:3:velocities:R:3:Z:I:1
H         0.00000000       0.00000000       0.00000000         0.00000000       0.00000000       0.00000000          1
H         0.00000000       0.00000000       1.00000000         0.00000000       0.00000000       0.00000000          1

If you remove the lattice and pbc information from the file, then it works as intended.

So, one fix would be to check, if structure has infinite bounding box, and if yes then just remove cell information from the save.

show_system for AtomsBase interface

Hello. I think it would be nice to add a Base.show function similar to the way it is done in AtomsBase.
In this routine the atomic_symbols, the box, and if all boundary_conditions are the same, the boundary_conditions are showed.
Additionally, at the beginning, there is written whether the system is a FlexibleSystem or a FastSystem.
If one just creates an atomic_system in AtomsBase, there is a FlexibleSystem created.

I am not complete sure if ExtXYZ would create a system similar to a FlexibleSystem or a FastSystem?

I would like to add this function, any thoughts on this?

xyz file produced by LAMMPS cannot be parsed

Hi @jameskermode. In AtomsIO I received mfherbst/AtomsIO.jl#19, which essentially is a parsing failure in ExtXYZ for an xyz file produced by LAMMPS --- so something I'd argue we would want to support.

Here is a simplified reproducer:

1
Atoms. Timestep: 1000000
Ar    1.4102613692638457    0.9647607662828660    1.3209769521273491

where the error message printed is:

Failed to parse string at pos 7

Errors in parsing should throw an exception

I find it a bit counterintuitive that erroneous XYZ files only print something (to stderr I believe) but otherwise simply return an empty list. Instead I would have expected an exception with the error to be raised.

JuLIP Dependency

I think it may be worth removing the JuLIP dependency. There is some discussion about implementing a few abstract atoms and structure interfaces that will be shared, that that one could depend on, but JuLIP will not be taken up by the community. I'll most likely retire it in the near future. Instead - until then - we can import ExtXYZ.jl from JuLIP and add JuLIP-specific wrappers.

InitError on Windows loading libcleri.dll

I noticed this issue when attempting to use the latest version of JuLIP which no longer works on windows due to this package. The error I encounter is:

julia> using ExtXYZ
[ Info: Precompiling ExtXYZ [352459e4-ddd7-4360-8937-99dcb397b478]
ERROR: LoadError: LoadError: InitError: could not load library "C:\Users\james\.julia\artifacts\465749424f91519e30d0c324b5228a91738a5131\bin\libcleri.dll"
Access is denied.

From what I understand dlopen cannot open the library since it has not been marked as executable. I checked the permissions for libcleri.dll in the windows tarball and it's -rw-r--r-- whereas for libextxyz.dll it's -rwxr-xr-x.

There's been some discussion about this previously where they manually changed the permissions during the build with chmod but there was an update which is supposed to do this automatically. So here it should've been fixed automatically like with libextxyz.dll but for some reason it's not working?

I noticed that the CI only runs on mac and linux so maybe windows support isn't intended? But if this could be fixed that would be great.

Read error

I get a segfault trying to read .xyz files with ExtXYZ version 0.1.13.

I think it's caused by extxyz_jll version 0.1.3+0 because if I downgrade that to 0.1.0+0 everything works.

Issues with AtomsBase interface

When integrating ExtXYZ with AtomsIO I found a bunch of issues in the AtomsBase interface, which currently lead me to rewrite most of it in AtomsIO, see https://github.com/mfherbst/AtomsIO.jl/blob/master/src/extxyz.jl. Of course better this should be integrated here.

One concern for me at the moment are the lines
https://github.com/libAtoms/ExtXYZ.jl/blob/2c35dcb51728dd2e458807332a2c4fe27c31c8bb/src/atoms.jl#L53--L66
which effectively strip off all units of all data stored in the ExtXYZ.Atoms. I think this is problematic, because it is later impossible to recover the unit associated to a key. I think it should be the responsibility of the user to do the intrusive stripping off of the units and rather just warn about unsupported unitful quantities and ignore them. My rationale would be that it is better to ignore a key rather than conveying information, which could be misleading accidentality. This is the approach I have followed in AtomsIO. Essentially I only support unitless quantities or unitful quantities, which follow a convention, which is documented in AtomsBase (see this PR).

@jameskermode What is your opinion about this?

Update to AB0.4

  • changes to function names
  • require that SVectors are returned

set velocities per default on zero in AtomsBase interface

Hello, i think it would be good to set the velocities per default on zero in the AtomsBase interface.
Yet, if one calls velocity(system) an error is returned if velocities are not determined in the loaded file.
In AtomsBase, if you create a system without creating velocities they are set on zero. This means you get a Vector of a Vector filled with zeros.
I think this would be a good adaptation. However, velocities would be written in the file, if the system is saved, even if they are just zero.
Maybe it would be a possibility to check in the write_dict function whether the velocities are just zero or not.

What do you think about this?

velocity for AtomsBase interface

Hello.
I already opened once an issue on this, but would like to ask again about this, because i saw a problem about not having velocities in every system.
Yet, in your AtomsBase interface, velocity(system) just works, if a velocity is stored in the system.

In AtomsBase, when you instantiate an atom without defining velocities, these velocities are set on zero by default.
https://github.com/JuliaMolSim/AtomsBase.jl/blob/master/src/atom.jl?plain=1#L37

If another package reads an AtomsBase system and automatically tries to read velocity(system) this can lead to errors that don't need to occur.

My idea would be to do it similar to AtomsBase and instantiate velocities with a zero vector if those aren't given in the file.
In save these velocities wouldn't be needed to save if they are zero.
However, one could change it the way, that just by reading velocity(system), zeros or maybe "missing" would be returned if system has no velocities.

What are your thoughts on that? @jameskermode
Do you have any suggestions about this? @mfherbst

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Add a Concistency Check between mass and atomic_mass

atomic_mass is technically retired, but still used in some implementations. Atoms should be careful when converting. use either if only one is available, and confirm consistency of the two if both are available.

interface for arbitrary properties

Hello. I am wondering if we should add functionalities for the AtomsBase interface to get the arbitrary data stored in atom_data or system_data similar to AtomsBase.
In AtomsBase you cann add arbitrary information per Atom and per System. Therefore a Dict is used for each Atom and for the System.
This means all information of the System (or Atom), without the mandatory information(like bounding_box or boundary_conditions for the system) are stored in the Dict.
In ExtXYZ, if you call system_data you, get all information in system_data, included the information about bounding_box and boundary_conditions.

I think it would be great to add two functions:

  • data(atom::Atoms)
  • data(atom::Atoms, i)
    to get information of the system or of each Atom similar to AtomsBase.

What do you think about this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.