A simple interface to allow electrostatic embedding of machine learning potentials in sander. Based on code by Kirill Zinovjev. The code works by reusing the existing interface between sander and ORCA, meaning that no modifications to sander are needed.
First create a conda environment with all of the required dependencies:
conda env create -f environment.yml
conda activate mlmm
If this fails, try using mamba as a replacement for conda.
For GPU functionality, you will need to install appropriate CUDA drivers on
your host system along with NVCC, the CUDA compiler driver. (This doesn't come
with cudatoolkit
from conda-forge
.)
(Depending on your CUDA setup, you might need to prefix the environment creation
command above with something like CONDA_OVERRIDE_CUDA="11.2"
to resolve an
environment that is compatible with your CUDA driver.)
Finally, install the sander-mlmm
interface:
python setup.py install
If you are developing and want an editable install, use:
python setup.py develop
To start an ML/MM calculation server:
mlmm-server
For usage information, run:
mlmm-server --help
To launch a client to send a job to the server:
orca orca_input
Where orca_input
is the path to a fully specified ORCA input file. The orca
executable will be called by sander
when performing QM/MM, i.e. we are using
a fake ORCA as the QM backend.
(Alternatively, just running orca orca_input
will try to connect to an existing
server and start one for you if a connection is not found.)
The server and client should both connect to the same hostname and port. This
can be specified in a script using the environment variables MLMM_HOST
and
MLMM_PORT
. If not specified, then the same default values will be used for
both the client and server.
To stop the server:
mlmm-stop
The embedding method relies on in vacuo energies and gradients, to which
corrections are added based on the predictions of the model. At present we
support the use of Rascal, DeePMD-kit, TorchANI or ORCA
for the backend, providing reference QM with ML/MM embedding, and pure ML/MM
implementations. To specify a backend, use the --backend
argument when launching
mlmm-server
, e.g:
mlmm-server --backend torchani
(The default backend is torchani
.)
When using the rascal
backend you will also need to specify a model file
and the AMBER parm7 topology file that was used to train this model. These
can be specified using the --rascal-model
and --rascal-parm7
command-line
arguments, or using the RASCAL_MODEL
and RASCAL_PARM7
environment variables.
Rascal can be used to train system specific delta-learning models.
When using the orca
backend, you will need to ensure that the fake orca
executable takes precedence in the PATH
. (To check that ML/MM is running,
look for an mlmm_backend_log.txt
file in the working directory, where
backend
is the name of the specified backend.) The input for orca
will
be taken from the &orc
section of the sander
configuration file, so use
this to specify the method, etc.
When using deepmd
as the backend you will also need to specify a model
file to use. This can be passed with the --deepmd-model
command-line argument,
or using the DEEPMD_MODEL
environment variable. This can be a single file, or
a set of model files specified using wildcards, or as a comma-separated list.
When multiple files are specified, energies and gradients will be averaged
over the models. The model files need to be visible to the mlmm-server
, so we
recommend the use of absolute paths.
We currently support CPU
and CUDA
as the device for PyTorch.
This can be configured using the MLMM_DEVICE
environment variable, or by
using the --device
argument when launching mlmm-server
, e.g.:
mlmm-server --backend cuda
When no device is specified, we will preferentially try to use CUDA
if it is
available. By default, the first CUDA
device index will be used. If you want
to specify the index, e.g. when running on a multi-GPU setup, then you can use
the following syntax:
mlmm-server --backend cuda:1
This would tell PyTorch
that we want to use device index 1
. The same formatting
works for the environment varialbe, e.g. MLMM_DEVICE=cuda:1
.
We support both electrostatic, mechanical, and MM embedding. Our
implementation of mechanical embedding uses the model to predict charges for
the QM region, but ignores the induced component of the potential. MM embedding
allows the user to specify fixed MM charges for the QM atoms, with induction
once again disabled. Obviously we are advocating our electrostatic embedding
scheme, but the use of mechanical or MM embedding provides a useful reference
for determining the benefit of using electrostatic embedding for a given system.
The embedding method can be specified using the MLMM_EMBEDDING
environment
variable, or when launching the server, e.g.:
mlmm-server --embedding mechanical
The default option is (unsurprisingly) electrostatic
. When using MM
embedding you will also need to specify MM charges for the atoms within the
QM region. This can be done using the --mm-charges
option, or via the
MLMM_MM_CHARGES
environment variable. The charges can be specified as a list
of floats (space separated from the command-line, comma-separated when using
the environment variable) or a path to a file. When using a file, this should
be formatted as a single column, with one line per QM atom. The units
are electron charge.
The ML/MM implementation uses several ML frameworks to predict energies
and gradients. DeePMD-kit
or TorchANI can be used for the in vacuo
predictions and custom PyTorch code is used to predict
corrections to the in vacuo values in the presence of point charges.
The frameworks make heavy use of
just-in-time compilation.
This compilation is performed during to the first ML/MM call, hence
subsequent calculatons are much faster. By using a long-lived server
process to handle ML/MM calls from sander
we can get the performance
gain of JIT compilation.
A demo showing how to run ML/MM on a solvated alanine dipeptide system can be found in the demo directory. To run:
cd demo
./demo.sh
Output will be written to the demo/output
directory.
The DeePMD-kit conda package pulls in a version of MPI which may cause
problems if using ORCA as the in vacuo backend, particularly when running
on HPC resources that might enforce a specific MPI setup. (ORCA will
internally call mpirun
to parallelise work.) Since we don't need any of
the MPI functionality from DeePMD-kit
, the problematic packages can be
safely removed from the environment with:
conda remove --force mpi mpich
Alternatively, if performance isn't an issue, simply set the number of
threads to 1 in the sander
input file, e.g.:
&orc
method='XTB2',
num_threads=1
/
When running on an HPC resource it can often take a while for the mlmm-server
to start. As such, the client will try reconnecting on failure a specified
number of times before raising an exception. (Sleeping 2 seconds between
retries.) By default, the client tries will try to connect 100 times. If this
is unsuitable for your setup, then the number of attempts can be configured
using the MLMM_RETRIES
environment variable.
If you are trying to use the ORCA backend in an HPC environment then you'll
need to make sure that the fake orca
executable takes precendence in the
PATH
set within your batch script, e.g. by making sure that you source the
mlmm
conda environment after loading the orca
module. It is also important
to make sure that the mlmm
environment isn't active when submitting jobs,
since the PATH
won't be updated correctly within the batch script.