ashenoy463 / mdx Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 59 KB

Parallelized and lazy processing for reactive molecular dynamics simulation data

License: GNU General Public License v3.0

Python 100.00%

mdx's Introduction

mdx's People

Contributors

Watchers

mdx's Issues

Xarray doesn't support indexing sparse arrays

Alternative is sparse.COO which needs to densify before addition to dataset. Process to create and store sparse arrays not clear.

Special chunks are handled awkwardly

Can put them in a separate directory with structure isomorphic to /home/Work/sim_data and just initialize a different mdx.ingest.Simulation object for them since they are always analysed separately from the regular data anyway.

Rework meta file format

Reworks needed for xarray:

Execution dates need to be extracted (through #5)
Current nested format is not supported by xarray

Expand bonds parsing to entire LAMMPS spec

https://docs.lammps.org/fix_reaxff_bonds.html

id = atom id
type = atom type
nb = number of bonds
id_1 = atom id of first bond
id_nb = atom id of Nth bond
mol = molecule id
bo_1 = bond order of first bond
bo_nb = bond order of Nth bond
abo = atom bond order (sum of all bonds)
nlp = number of lone pairs
q = atomic charge

Expand thermo parsing to entire LAMMPS spec

https://docs.lammps.org/thermo_style.html

one args = none
multi args = none
yaml args = none
custom args = list of keywords
  possible keywords = step, elapsed, elaplong, dt, time,
                      cpu, tpcpu, spcpu, cpuremain, part, timeremain,
                      atoms, temp, press, pe, ke, etotal,
                      evdwl, ecoul, epair, ebond, eangle, edihed, eimp,
                      emol, elong, etail,
                      enthalpy, ecouple, econserve,
                      vol, density,
                      xlo, xhi, ylo, yhi, zlo, zhi,
                      xy, xz, yz,
                      avecx, avecy, avecz,
                      bvecx, bvecy, bvecz,
                      cvecx, cvecy, cvecz,
                      lx, ly, lz,
                      xlat, ylat, zlat,
                      cella, cellb, cellc, cellalpha, cellbeta, cellgamma,
                      pxx, pyy, pzz, pxy, pxz, pyz,
                      bonds, angles, dihedrals, impropers,
                      fmax, fnorm, nbuild, ndanger,
                      c_ID, c_ID[I], c_ID[I][J],
                      f_ID, f_ID[I], f_ID[I][J],
                      v_name, v_name[I]
    step = timestep
    elapsed = timesteps since start of this run
    elaplong = timesteps since start of initial run in a series of runs
    dt = timestep size
    time = simulation time
    cpu = elapsed CPU time in seconds since start of this run
    tpcpu = time per CPU second
    spcpu = timesteps per CPU second
    cpuremain = estimated CPU time remaining in run
    part = which partition (0 to Npartition-1) this is
    timeremain = remaining time in seconds on timer timeout.
    atoms = # of atoms
    temp = temperature
    press = pressure
    pe = total potential energy
    ke = kinetic energy
    etotal = total energy (pe + ke)
    evdwl = van der Waals pairwise energy (includes etail)
    ecoul = Coulombic pairwise energy
    epair = pairwise energy (evdwl + ecoul + elong)
    ebond = bond energy
    eangle = angle energy
    edihed = dihedral energy
    eimp = improper energy
    emol = molecular energy (ebond + eangle + edihed + eimp)
    elong = long-range kspace energy
    etail = van der Waals energy long-range tail correction
    enthalpy = enthalpy (etotal + press*vol)
    ecouple = cumulative energy change due to thermo/baro statting fixes
    econserve = pe + ke + ecouple = etotal + ecouple
    vol = volume
    density = mass density of system
    xlo,xhi,ylo,yhi,zlo,zhi = box boundaries
    xy,xz,yz = box tilt for restricted triclinic (non-orthogonal) simulation boxes
    avecx,avecy,avecz = components of edge vector A of the simulation box
    bvecx,bvecy,bvecz = components of edge vector B of the simulation box
    cvecx,cvecy,cvecz = components of edge vector C of the simulation box
    lx,ly,lz = box lengths in x,y,z
    xlat,ylat,zlat = [lattice](https://docs.lammps.org/lattice.html) spacings as calculated by lattice command
    cella,cellb,cellc = periodic cell lattice constants a,b,c
    cellalpha, cellbeta, cellgamma = periodic cell angles alpha,beta,gamma
    pxx,pyy,pzz,pxy,pxz,pyz = 6 components of pressure tensor
    bonds,angles,dihedrals,impropers = # of these interactions defined
    fmax = max component of force on any atom in any dimension
    fnorm = length of force vector for all atoms
    nbuild = # of neighbor list builds
    ndanger = # of dangerous neighbor list builds
    c_ID = global scalar value calculated by a compute with ID
    c_ID[I] = Ith component of global vector calculated by a compute with ID, I can include wildcard (see below)
    c_ID[I][J] = I,J component of global array calculated by a compute with ID
    f_ID = global scalar value calculated by a fix with ID
    f_ID[I] = Ith component of global vector calculated by a fix with ID, I can include wildcard (see below)
    f_ID[I][J] = I,J component of global array calculated by a fix with ID
    v_name = value calculated by an equal-style variable with name
    v_name[I] = value calculated by a vector-style variable with name, I can include wildcard (see below)

Metafiles should be validated

Can be implemented mdx.ingest.Simulation constructor once #4 is closed

Appropriate exception will be needed

Inclusion of boundary type as nCDF variable

Are there simulations where the boundary types change?
Maybe multistage ones?

Would not be expensive to store entire series anyway

Consolidate dependencies and use prepared environment

Extend trajectory parsing to entire LAMMPS spec

https://docs.lammps.org/dump.html

custom or custom/gz or custom/zstd or cfg or cfg/gz or cfg/zstd or cfg/uef or netcdf or netcdf/mpiio or yaml attributes:

id = atom ID
mol = molecule ID
proc = ID of processor that owns atom
procp1 = ID+1 of processor that owns atom
type = atom type
element = name of atom element, as defined by dump_modify command
mass = atom mass
x,y,z = unscaled atom coordinates
xs,ys,zs = scaled atom coordinates
xu,yu,zu = unwrapped atom coordinates
xsu,ysu,zsu = scaled unwrapped atom coordinates
ix,iy,iz = box image that the atom is in
vx,vy,vz = atom velocities
fx,fy,fz = forces on atoms
q = atom charge
mux,muy,muz = orientation of dipole moment of atom
mu = magnitude of dipole moment of atom
radius,diameter = radius, diameter of spherical particle
omegax,omegay,omegaz = angular velocity of spherical particle
angmomx,angmomy,angmomz = angular momentum of aspherical particle
tqx,tqy,tqz = torque on finite-size particles
c_ID = per-atom vector calculated by a compute with ID
c_ID[I] = Ith column of per-atom array calculated by a compute with ID, I can include wildcard (see below)
f_ID = per-atom vector calculated by a fix with ID
f_ID[I] = Ith column of per-atom array calculated by a fix with ID, I can include wildcard (see below)
v_name = per-atom vector calculated by an atom-style variable with name
i_name = custom integer vector with name
d_name = custom floating point vector with name
i2_name[I] = Ith column of custom integer array with name, I can include wildcard (see below)
d2_name[I] = Ith column of custom floating point vector with name, I can include wildcard (see below)

local or local/gz or local/zstd attributes:

possible attributes = index, c_ID, c_ID[I], f_ID, f_ID[I]
  index = enumeration of local values
  c_ID = local vector calculated by a compute with ID
  c_ID[I] = Ith column of local array calculated by a compute with ID, I can include wildcard (see below)
  f_ID = local vector calculated by a fix with ID
  f_ID[I] = Ith column of local array calculated by a fix with ID, I can include wildcard (see below)

grid or grid/vtk attributes:

possible attributes = c_ID:gname:dname, c_ID:gname:dname[I], f_ID:gname:dname, f_ID:gname:dname[I]
  gname = name of grid defined by compute or fix
  dname = name of data field defined by compute or fix
  c_ID = per-grid vector calculated by a compute with ID
  c_ID[I] = Ith column of per-grid array calculated by a compute with ID, I can include wildcard (see below)
  f_ID = per-grid vector calculated by a fix with ID
  f_ID[I] = Ith column of per-grid array calculated by a fix with ID, I can include wildcard (see below)

`mdx.ingest.Simulation` should hold data states

Instead of just being a holder for parsing methods, the Simulation object should hold trajectory/bonds/species data in its attributes.

Formulate a list of attributes to have
Make class methods update those attributes
handlers from mdx.io can then be initialized separately by passing a get_writer() function the simulation object

Create metadata model for experimental parameters

T_Start and T_Stop type params also possible

Implement IO framework for existing frame output formats

Adopting xarray as sole canonical format

Reasons for:

1 time investment. no more dealing with text stream overhead, only optimised operations.
Respect the poly-indexability of our data. We can index with timestep, box-time or atom_id, we could even form and track groups
Naturally supports binning, chunking, averaging and mapping array functions
Badaulat Dask; efficient parallel and out of core processing; optimised analysis functions can be developed with minimal effort. we do not need to reinvent the wheel.

renamed
moved

ashenoy463 / mdx Goto Github PK

mdx's Introduction

mdx's People

Contributors

Watchers

mdx's Issues

Recommend Projects

Recommend Topics

Recommend Org