lumol-org / lumol Goto Github PK
View Code? Open in Web Editor NEWUniversal extensible molecular simulation engine
Home Page: http://lumol.org/
License: BSD 3-Clause "New" or "Revised" License
Universal extensible molecular simulation engine
Home Page: http://lumol.org/
License: BSD 3-Clause "New" or "Revised" License
The API documentation is complete, but lacks examples. Most of the function could get some examples.
Add a Control
algorithm to rewrap molecules inside the unit cell, using the molecule center of mass.
This is easy: just use the functionalities around to build the hybrid move
A custom output where users can specify what they want to be printed to the output stream: temperature, pressure volume, …
This is a three-part bullet point issue:
When I first read through the code of the MonteCarlo propagator I got really confused because it makes no sense to sample a move whose frequency is smaller than the sampled float. Makes a lot more sense if they are cumulative frequencies. It turns out they are, but the attribute is named frequencies
.
Hovewer, if we renamed the attribute to reflect that they are cumulative frequencies things get weird because in the beginning (before setup
is called) they are frequencies.
It turns out the weirdness comes from the methods add
and add_move_with_acceptance
, since they are the ones dealing with frequencies before the setup
.
All of this to ask: is there sometimes a situation where we need to initialize a MonteCarlo
propagator without the associated moves and add the moves later ? Because if not one could pass a Vec
of (Box<MCMove>, f64, Option<f64>)
to the constructor. This way frequencies
would always be cumulative frequencies, and more importantly we would not have methods that can panic, which is always good.
The regression benchmarks (defined in #62) should run for each new commit in master, and a report of the run time should be available (maybe as a graph).
I think we can use Travis for that, even if it is not ideal (there is no way to be sure that the machines always have the same power).
Currently, bad input are tested by checking that the result of the input read is an error. But we do not control the source of the error. Allowing to specify the expected error inside the bad input file would guarantee that we do not get another error that the one wanted (bad TOML, IO error, ...).
Bad input files could use comments to specify the expected input error:
[input]
versn = 1
#^ Missing 'version' key in 'input' table
It would be very convenient to have combining rules. For example, specify in the input:
[input]
version = 1
[[pairs]]
atoms = ["A", "A"]
lj = {sigma = "... ", epsilon = ..."}
[[pairs]]
atoms = ["B", "B"]
lj = {sigma = "... ", epsilon = ..."}
[combining-rule]
atoms = ["A", "B"]
type = "LB" # LB = Lorentz-Berthelot
As an option, one would not need to specify A,B pair interactions. Instead, a combining rule is applied. Another possibility is to make the combining rule "global", i.e.:
[input]
version = 1
[[pairs]]
atoms = ["A", "A"]
lj = {sigma = "... ", epsilon = ..."}
[[pairs]]
atoms = ["B", "B"]
lj = {sigma = "... ", epsilon = ..."}
# the rule is applied to all intermolec. interactions
[combining-rule]
type = "LB" # LB = Lorentz-Berthelot
cymbalum
was a temporary initial name for this project. I need ideas for a better name, which must be
We should add a struct and computation for truncated intermolecular pair potentials with long range corrections.
In Monte-Carlo (MC) simulations we often use potential functions that are truncated at a cutoff distance rc
. Since we don't have to evaluate forces in MC, we don't have to shift the potential at the cutoff distance.
Under the assumption that the pair correlation function is unity beyond rc
, we can compute a tail correction (for the energy and the pressure) by integration of the pair-potential from rc
to infinity. For some potentials this can be solved analytically. The tail correction is a function of the cutoff distance rc
and the systems' density densities of all components (ParticleKind
).
If the simulation crashes, or to restart a simulation for more steps, a restart file is needed. It should be some private binary format, but with compatibility between multiple versions of the code. So a binary dump of the memory is not an option.
Using Chemfiles with a file format containing all the data (NetCDF ? TNG ? Another one ?) could be nice, but chemfiles only provides floats values for now (see chemfiles/chemfiles#34), and double would be better here.
This needs a ResizeCell
Monte-Carlo move.
The following traits should be added:
trait Energy {
// Used in monte-carlo and energy computation
fn energy(&mut self, system: &System) -> f64;
}
trait Forces {
// Used in virial computation
fn forces(&mut self, system: &System) -> Vec<Vector3D>;
}
trait Energetics: Energy + Forces {
// Used in energy minimization
fn forces_and_energy(&mut self, system: &System) -> (f64, Vec<Vector3D>) {
(self.energy(system), self.forces(system))
}
}
This would allow to:
SimpleEnergetics
mode where we do not re-compute periodic boundary conditions when needing both forces and energy;ThreadedEnergetics
, VerletLists
or GPUEnergetics
It would be a good idea to implement a demanded acceptance
for MC moves that have a displacement (translation, volume or box change). That way, the maximum displacement for the move can be adjusted during the simulation to achieve that acceptance. With a fixed displacement as is, simulations can be very inefficient.
Like in
pairs:
- atoms: [*, *]
type: Null
A possible way to efficiently translate or rotate a molecule is to bias the displacement using the actual force/torque acting on a molecule (see this paper). The magnitude of displacement can be adjusted just like in regular translation/rotation moves while the direction is biased towards the force/torque to which an additional random (gaussian) displacement is added.
Implementation of the translation move needs the following steps:
While computations in this move are more involved than in a regular displacement/rotation it offers the possibility to perform multi-particle moves in an efficient manner. In these moves all particles translate/rotate at the same "time". Just like in molecular dynamics, force (and energy) computations can be carried out in parallel leading to an increased efficiency.
[1] it was mentioned in this paper that it is sufficient to approximate the total force by only computing short range contributions. Maybe even with a reduced cut-off distance.
More usefull potentials should be added. I know about theses:
Are there any others?
We should build tests for the systems/states for the NIST Simulation Benchmarks.
These tests should include evaluations of single configurations
as well as simulation runs in different ensembles and potentials:
EDIT: (@Luthaf) Added links
Since I just dove into position scaling for Monte-Carlo volume resizing, I was wondering if one does the same when using the Berendsen Barostat in MD (i.e. only resize the centers of mass, leaving the intramolecular configuration unchanged). Looking at the integration here, I found that currently all particle positions are scaled.
As defined in NIST reference data for LJ potential: https://www.nist.gov/mml/csd/chemical-informatics-research-group/lennard-jones-fluid-reference-calculations
And add corresponding tests in the NIST tests.
I observed a drift in energy for long simulations of LJ particles in the NVT ensemble.
When a move is accepted, I compared the energy delta (energy_new - energy_old) from cache via
du_cache = cache.move_particles_cost(...)
versus the difference computed via system.total_energy()
. Looking at the difference, I found that often the difference is negligible (1e-16) but there are cases when the difference is about 1e-7 to 1e-5.
I think the problem is either move_particles_cost
or the update function of the cache. As far as I can tell, the initialization of the cache works correctly (tail correction is missing at the moment).
You can find the results from a sample simulation here.
This issue was discussed in #58 (comment) and targets computation of the ideal gas pressure:
impl Compute for PressureAtTemperature {
type Output = f64;
fn compute(&self, system: &System) -> f64 {
:
let natoms = system.size() as f64;
return natoms * K_BOLTZMANN * self.temperature / volume + virial / (3.0 * volume);
// ^--- natoms, degrees of freedom, number of molecules?
}
}
We have our own rad2deg and deg2rad functions, but the exact same functions exist in the standard library.
We should replace ours by the ones in the standard library.
Kind of related to #62 but not quite: I intend to do some serious profiling to see where the bottlenecks are, I open this issue to keep you updated. I think I'll do it on some examples and some of the benchmarks.
I'll be starting from this, this and this. If you have any suggestion on what code would be interesting to profile or how to do this (this is the first time I'm profiling Rust code) don't hesitate. @Luthaf I think you told me you already did some kind of profiling, anything I can reuse ?
The main function does really ad-hoc command line arguments parsing. We should use a real library such as clap for command line argument parsing.
It would be great to suggest a fix to the user when the input file contains an error.
Some fixes I can think of:
\t
inside YAML files;hamronic => harmonic
The first one is easy, the others one require some kind of mechanism to register a keyword and associated data to some kind of linter.
Users should be able to get some property for a subgroup of particles: a molecule, all waters, all oxygens, ...
EDIT: removed code example, it made no sense.
I try to read three configurations with a single water molecule from xyz
files. Depending on the order, it either works or throws an error. I simply changed the order by copying the whole line (coordinates are consistent) but it should not matter.
To reproduce, copy .rs
file from this gist into the example folder and .xyz
into example/data.
Error:
$ RUST_BACKTRACE=1 cargo run --example=error_molid --release
thread 'main' panicked at 'assertion failed: new_molid < old_molid', lumol/src/core/src/sys/systems.rs:342
stack backtrace:
1: 0x1095141e8 - std::sys::backtrace::tracing::imp::write::h6f1d53a70916b90d
2: 0x10951637f - std::panicking::default_hook::{{closure}}::h137e876f7d3b5850
3: 0x109515475 - std::panicking::default_hook::h0ac3811ec7cee78c
4: 0x109515a26 - std::panicking::rust_panic_with_hook::hc303199e04562edf
5: 0x1094c04f4 - std::panicking::begin_panic::h341b039f84d0b176
6: 0x1094c9041 - lumol::sys::systems::System::add_bond::ha262c429a07c9463
7: 0x1094ca985 - <chemfiles::frame::Frame as lumol::sys::chfl::ToLumol>::to_lumol::h86945c7802c5c377
8: 0x1094cac6a - lumol::sys::chfl::Trajectory::read_guess_bonds::hb213dbb19c866ab4
9: 0x1094bedd3 - error_molid::main::h83ddaaca3f875982
10: 0x10951693a - __rust_maybe_catch_panic
11: 0x109514fb6 - std::rt::lang_start::h538f8960e7644c80
We need two kind of benchmarks:
Benchmarks live in benches
, and can be run using a nighlty compiler with cargo bench
.
We currently have one regression benchmark for Ewald summation here; an,d one comparison benchmark against LAMMPS for the md simulation of Helium.
Here are some additional ideas:
We already have energy computation for a molecular fluid with charges (water)
It would be nice to have all the combinations of MD/MC -- Lennard-Jones/butane/water -- NVE/NVT/NPT/μVT. Here is a full list:
That is already 18 different simulations, that we should compare against already existing MD and MC codes.
Maybe we can also have tests for bio-molecules, like a small peptide, DNA strand and a bigger protein.
Please comment with more ideas, and open PR to add the benchmarks!
If the box gets too small within a NPT simulation, the cut off radius will be larger than half the minimum box length (rc > L_min/2
). If that is the case, the simulation should stop with an error.
How to get the largest cut off out of PairInteractions
?
Currently, the particles are stored contiguously in a vector, and the molecules have an usize
field pointing to the first particle of the molecule in this vector.
struct System {
particles: Vec<Particle>
molecules: Vec<Molecule>,
}
struct Molecule {
first: usize
}
Pro of this design
for (i, pi) in system.iter().enumerate() {
for pj in system.iter().skip(i + 1) {
// Do work
}
}
Cons of this design
for molecule in system.molecules() {
for i in &molecule {
for j in &molecule {
if j <= i {continue;}
let pi = system[i];
let pj = system[j];
// Do work
}
}
}
Making the particles owned by molecules is not a solution, because it make it very hard to iterate over pairs in the whole system, and add an extra indirection layer.
A possible solution, with a POC implementation here would be to store a pointer to the first atom in the molecule, together with the molecule size. This would make it easier to work with molecular code, but add some unsafe
usage.
In particular, the pointer of all molecules must be updated when adding a new particle to the system, because the vector may allocate additional memory.
So is it worth it to add complexity in the library code in order to simplify the user code?
Control algorithms like RemoveRotation
and RemoveTranslation
do not need to run at each simulation step. they could have separated frequency field to run them only every so often.
Universe properties (Forces
, KineticEnergy
, ...) can be expensive to compute. It would be nice to cache them.
But there are two difficult things in CS: naming and cache invalidation. Because these properties can be accessed at any point in the simulation, the Universe::step
field is a very bad cache indicator. A better one will be harder to compute and use, so I will have to think wether this is worth the gain or not.
To sample intramolecular configurations of molecules, we should implement a "regrowth" or "rebuild" move for Monte Carlo Simulations.
Within the move, an interaction site of a molecule is randomly chosen and translated to a new position. From that single interaction site, neighboring sites are sequentially added using Rosenbluth/configurational bias sampling, which allows for effective sampling of large molecules.
This involve several small steps:
It can be hosted using github pages at lumol-org.github.io
for the moment, we can change the domain name later.
I created an empty repository here to contain the website source.
Triclinic cells are useful for some simulations, but make Ewald sum a bit harder. There is an implementation in the ewald-triclinic
branch, but the tests are failing.
I have been thinking about this for a while, here is a request for comment (RFC) to shape it a bit more and gather feedback. The implementation of this will take time and is not high priority, but I really want to get this right.
A nice way for the code to be easy to extend would be to provide a plugin system. A plugin is some piece of code that is not part of the main lumol code, and is not compiled at the same time than the main binary, nor linked to it.
Having a plugin system would mean that people can write their own code and use it with lumol without recompiling the whole project. It is highly desirable that this plugin system works easily with the input files too. I'd like the loading/use of a plugin to be as easy as:
[input]
version = 1
[plugins]
foo = "path/to/foo/plugin.so"
[[pairs]]
atoms = ["A", "A"]
foo = {} # potential foo is defined inside the foo plugin
[[pairs]]
atoms = ["A", "B"]
bar = {} # potential bar is also defined inside the foo plugin
I see three main kinds of plugins for the Lumol use case:
(1) and (2) are fast, (3) is nice for writing quick prototypes. (1) is easier to implement, and (2) allow to call into other codes for potential computations (to use DFT level energy for example). (1) and (2) would be based on shared library and dynamic loading, while (3) will use some kind of future python binding.
Plugins should be usable to extend lumol in all the possibles means: not only writing new potentials, but also new integrators, new Monte-Carlo moves, ... Everything that is defined as a trait for extensibility in the code should be available as a plugin
I will focus on Rust plugins for now, as they are way easier to implement, and the C plugins can be built on the top of the Rust plugins mechanisms.
Every plugin would be compiled as a shared library, and expose at least a single function: lumol_register
, taking a single argument of type &mut PluginRegistry
defined in lumol_input
crate. This function returns a Result
in case something goes wrong. The PluginRegistry
will define a few method to register extension points
extern crate lumol_input;
use lumol_input::{PluginRegistry, PluginError};
struct Foo;
struct Bar;
impl PairPotential for Foo {...}
impl PairPotential for Bar {...}
pub fn lumol_register(reg: &mut PluginRegistry) -> Result<(), PluginError> {
try!(reg.add_pair::<Foo>("foo"));
try!(reg.add_pair::<Bar>("bar"));
Ok(())
}
// add_pair is defined in lumol_input as
impl PluginRegistry {
pub fn add_pair<T: PairPotential>(&mut self, name: &str) {...}
// a given potential name can only be given once
}
Reading an input file will follow this new algorithm:
PluginRegistry
;lumol_register
for all the plugins using the PluginRegistry
;PluginRegistry
if there is any potential/integrator/MC move with the given name, and use it.This architecture would allow for more modularity. The lumol_core
crate could only contains the main structs (UnitCell
, System
, Simulation
, ...) and the traits (Propagator
, Potential
, ...). Standard implementation would then live in another crate (LennardJones, MolecularDynamics, ...), that is use by lumol_input
.
The problem with this is that traits like MCMove
would live in a separated crate than the MonteCarlo structure.
As rustc does not guarantee a stable ABI for now, calling lumol_register
with two different rustc version could break everything. This may be solved by using a C ABI, but I do not know if it supports passing and returning Rust types.
Thank you for reading this, now please comment if you have any feedback or if clarifications are needed!
Right now, the interactions are stored inside BTreeMap
associating particles types and a vector of interactions. Using a 2D (for pairs and bonds), 3D (for angles) or 4D (for dihedrals) matrix could make the interactions access faster.
This would use more memory, and fill the CPU cache with garbage. So benchmarks are needed!
This is a big issue, and I have been thinking a bit around it.
1 and 3 should not be controversial, and 2 means that input files should look a bit more like
pairs: atoms = [O O], lj = sigma 3.5 A, epsilon = 122K
Than like this
&potp \
3 3 3.5 122 lj
;
About 4, as I plan to integrate a scripting language, there is no need for some ad-hoc scripting language to drive the simulation. A pre-existing language might be more familiar to users, easier to read in most of the cases, will have syntactic highlight in text editors and will be easier to validate without running the simulation.
At this point, JSON is ruled out because it does not support comments, and is a bit too clumsy with ""
around the keys. INI is nice but not standardized. YAML is very good, but has two problems: it is whitespace sensitive (which can confuse the users, see #5 for a wanted workaround), and nest a bit deeply for molecular simulations.
I thing I'll go with TOML, which is clean, simple and readable.
This gist showcases a possible syntax for TOML input files. it is mainly separated in two sections:
This is demonstrated here. We may want to remove the type
section, and use something like this:
[[pairs]]
atoms = ["O", "O"]
lj = {sigma = "3.4 A", epsilon = "0.155 kJ/mol"}
This would allow for potentials to be inside inline tables:
pairs = [
{atoms = ["O", "O"], lj = {sigma = "3.4 A", epsilon = "0.155 kJ/mol"}},
{atoms = ["O", "H"], lj = {sigma = "2.1 A", epsilon = "0.085 kJ/mol"}},
]
There are multiple examples for the syntax:
One of the key points is that systems
and simulations
entry are arrays. That allow to have more than one system to apply some simulation to (think replica-exchange simulations, or gibbs ensemble Monte-Carlo); or to apply multiple simulation to one system: like here.
I might want to remove the type
keyword here too, to replace
[[simulations]]
nsteps = 10_000_000
outputs = [
{type = "Trajectory", file = "filename.xyz", frequency = 100},
{type = "Properties", file = "averages.dat", frequency = 100}
]
with
[[simulations]]
nsteps = 10_000_000
outputs = [
{trajectory = {file = "filename.xyz", frequency = 100}},
{properties = {file = "averages.dat", frequency = 100}}
]
Or this with
[[simulations.propagator]]
type = "MonteCarlo"
moves = [
{translate = {delta = "1 A", proba = 50}},
{rotate = {delta = "20 deg", proba = 50}},
{resize = {proba = 1}},
]
We probably do not want to remove it from everywhere, for example
[[simulations.propagator.moleculardynamics]]
timestep = "1 fs"
integrator = {berendsen_barostat = {pressure = "100 bar", timestep = "10000 fs"}}
thermostat = {berendsen = {temperature = "400 K", timestep = "1000 fs"}}
Feels readable than the original version.
type
section be removed? If yes, where? It appears in potentials, in lists of outputs
or moves
, and as keyword for propagator
, thermostat
, ...For now, the code is plain old serial, and does not exploit the possibilities offered by Rust.
We should discuss about how to make this code parallel. I believe the first section of the code we should parallelize is the energy evaluation (#19 idea is to have both parallel and serial energy computation algorithm). I can see multiple area of improvement:
At the same time, we can already enable some trivial parallelism because the code do not hold any global variable: parallel tempering should be very easy to setup, as should be running the same calculation with a single varying parameter. Here the question is more how can this be added to the input files in an intuitive manner.
Rust code should be cross-platform, but it is better to be sure.
It can also help to build release binaries, cf https://github.com/japaric/rust-everywhere
All the YAML files should have a version
key for indicating the version of the input format. This will prevent strange bugs and incompatibility when updating input format.
version: 1.0
pairs:
- atoms: [O, O]
...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.