ashenoy463 / mdx Goto Github PK
View Code? Open in Web Editor NEWParallelized and lazy processing for reactive molecular dynamics simulation data
License: GNU General Public License v3.0
Parallelized and lazy processing for reactive molecular dynamics simulation data
License: GNU General Public License v3.0
Alternative is sparse.COO
which needs to densify before addition to dataset. Process to create and store sparse arrays not clear.
Can put them in a separate directory with structure isomorphic to /home/Work/sim_data and just initialize a different mdx.ingest.Simulation
object for them since they are always analysed separately from the regular data anyway.
Reworks needed for xarray:
id = atom id
type = atom type
nb = number of bonds
id_1 = atom id of first bond
id_nb = atom id of Nth bond
mol = molecule id
bo_1 = bond order of first bond
bo_nb = bond order of Nth bond
abo = atom bond order (sum of all bonds)
nlp = number of lone pairs
q = atomic charge
one args = none
multi args = none
yaml args = none
custom args = list of keywords
possible keywords = step, elapsed, elaplong, dt, time,
cpu, tpcpu, spcpu, cpuremain, part, timeremain,
atoms, temp, press, pe, ke, etotal,
evdwl, ecoul, epair, ebond, eangle, edihed, eimp,
emol, elong, etail,
enthalpy, ecouple, econserve,
vol, density,
xlo, xhi, ylo, yhi, zlo, zhi,
xy, xz, yz,
avecx, avecy, avecz,
bvecx, bvecy, bvecz,
cvecx, cvecy, cvecz,
lx, ly, lz,
xlat, ylat, zlat,
cella, cellb, cellc, cellalpha, cellbeta, cellgamma,
pxx, pyy, pzz, pxy, pxz, pyz,
bonds, angles, dihedrals, impropers,
fmax, fnorm, nbuild, ndanger,
c_ID, c_ID[I], c_ID[I][J],
f_ID, f_ID[I], f_ID[I][J],
v_name, v_name[I]
step = timestep
elapsed = timesteps since start of this run
elaplong = timesteps since start of initial run in a series of runs
dt = timestep size
time = simulation time
cpu = elapsed CPU time in seconds since start of this run
tpcpu = time per CPU second
spcpu = timesteps per CPU second
cpuremain = estimated CPU time remaining in run
part = which partition (0 to Npartition-1) this is
timeremain = remaining time in seconds on timer timeout.
atoms = # of atoms
temp = temperature
press = pressure
pe = total potential energy
ke = kinetic energy
etotal = total energy (pe + ke)
evdwl = van der Waals pairwise energy (includes etail)
ecoul = Coulombic pairwise energy
epair = pairwise energy (evdwl + ecoul + elong)
ebond = bond energy
eangle = angle energy
edihed = dihedral energy
eimp = improper energy
emol = molecular energy (ebond + eangle + edihed + eimp)
elong = long-range kspace energy
etail = van der Waals energy long-range tail correction
enthalpy = enthalpy (etotal + press*vol)
ecouple = cumulative energy change due to thermo/baro statting fixes
econserve = pe + ke + ecouple = etotal + ecouple
vol = volume
density = mass density of system
xlo,xhi,ylo,yhi,zlo,zhi = box boundaries
xy,xz,yz = box tilt for restricted triclinic (non-orthogonal) simulation boxes
avecx,avecy,avecz = components of edge vector A of the simulation box
bvecx,bvecy,bvecz = components of edge vector B of the simulation box
cvecx,cvecy,cvecz = components of edge vector C of the simulation box
lx,ly,lz = box lengths in x,y,z
xlat,ylat,zlat = [lattice](https://docs.lammps.org/lattice.html) spacings as calculated by lattice command
cella,cellb,cellc = periodic cell lattice constants a,b,c
cellalpha, cellbeta, cellgamma = periodic cell angles alpha,beta,gamma
pxx,pyy,pzz,pxy,pxz,pyz = 6 components of pressure tensor
bonds,angles,dihedrals,impropers = # of these interactions defined
fmax = max component of force on any atom in any dimension
fnorm = length of force vector for all atoms
nbuild = # of neighbor list builds
ndanger = # of dangerous neighbor list builds
c_ID = global scalar value calculated by a compute with ID
c_ID[I] = Ith component of global vector calculated by a compute with ID, I can include wildcard (see below)
c_ID[I][J] = I,J component of global array calculated by a compute with ID
f_ID = global scalar value calculated by a fix with ID
f_ID[I] = Ith component of global vector calculated by a fix with ID, I can include wildcard (see below)
f_ID[I][J] = I,J component of global array calculated by a fix with ID
v_name = value calculated by an equal-style variable with name
v_name[I] = value calculated by a vector-style variable with name, I can include wildcard (see below)
Can be implemented mdx.ingest.Simulation
constructor once #4 is closed
Appropriate exception will be needed
Are there simulations where the boundary types change?
Maybe multistage ones?
Would not be expensive to store entire series anyway
custom or custom/gz or custom/zstd or cfg or cfg/gz or cfg/zstd or cfg/uef or netcdf or netcdf/mpiio or yaml attributes:
id = atom ID
mol = molecule ID
proc = ID of processor that owns atom
procp1 = ID+1 of processor that owns atom
type = atom type
element = name of atom element, as defined by dump_modify command
mass = atom mass
x,y,z = unscaled atom coordinates
xs,ys,zs = scaled atom coordinates
xu,yu,zu = unwrapped atom coordinates
xsu,ysu,zsu = scaled unwrapped atom coordinates
ix,iy,iz = box image that the atom is in
vx,vy,vz = atom velocities
fx,fy,fz = forces on atoms
q = atom charge
mux,muy,muz = orientation of dipole moment of atom
mu = magnitude of dipole moment of atom
radius,diameter = radius, diameter of spherical particle
omegax,omegay,omegaz = angular velocity of spherical particle
angmomx,angmomy,angmomz = angular momentum of aspherical particle
tqx,tqy,tqz = torque on finite-size particles
c_ID = per-atom vector calculated by a compute with ID
c_ID[I] = Ith column of per-atom array calculated by a compute with ID, I can include wildcard (see below)
f_ID = per-atom vector calculated by a fix with ID
f_ID[I] = Ith column of per-atom array calculated by a fix with ID, I can include wildcard (see below)
v_name = per-atom vector calculated by an atom-style variable with name
i_name = custom integer vector with name
d_name = custom floating point vector with name
i2_name[I] = Ith column of custom integer array with name, I can include wildcard (see below)
d2_name[I] = Ith column of custom floating point vector with name, I can include wildcard (see below)
local or local/gz or local/zstd attributes:
possible attributes = index, c_ID, c_ID[I], f_ID, f_ID[I]
index = enumeration of local values
c_ID = local vector calculated by a compute with ID
c_ID[I] = Ith column of local array calculated by a compute with ID, I can include wildcard (see below)
f_ID = local vector calculated by a fix with ID
f_ID[I] = Ith column of local array calculated by a fix with ID, I can include wildcard (see below)
grid or grid/vtk attributes:
possible attributes = c_ID:gname:dname, c_ID:gname:dname[I], f_ID:gname:dname, f_ID:gname:dname[I]
gname = name of grid defined by compute or fix
dname = name of data field defined by compute or fix
c_ID = per-grid vector calculated by a compute with ID
c_ID[I] = Ith column of per-grid array calculated by a compute with ID, I can include wildcard (see below)
f_ID = per-grid vector calculated by a fix with ID
f_ID[I] = Ith column of per-grid array calculated by a fix with ID, I can include wildcard (see below)
Instead of just being a holder for parsing methods, the Simulation
object should hold trajectory/bonds/species data in its attributes.
Formulate a list of attributes to have
Make class methods update those attributes
handlers from mdx.io
can then be initialized separately by passing a get_writer()
function the simulation object
T_Start and T_Stop type params also possible
Reasons for:
1 time investment. no more dealing with text stream overhead, only optimised operations.
Respect the poly-indexability of our data. We can index with timestep, box-time or atom_id, we could even form and track groups
Naturally supports binning, chunking, averaging and mapping array functions
Badaulat Dask; efficient parallel and out of core processing; optimised analysis functions can be developed with minimal effort. we do not need to reinvent the wheel.
Ideally client initialization and handling should be in the analysis code but would Simulation
need to do anything?
Using bag.distinct
causes massive memory leak and fills /tmp
Kim says alternative is map_partitions
which needs extremely careful alignment at the cost of performance.
Simulation batches may need to be
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.