The machine_learning_dynamics from ningwang1990

In this project, we aim to discover dynamics (partial differential equations) from in situ videos of scanning transmission electron microscopy. Mathematically, we want to find the best symbolic equation

$u_t = f(u, u_x, u_y, u_{xx}, u_{xy}, u_{yy})$

that can describe the video. $u$ is the intensity, a fuction of the x-coordinate, the y-coordinate, and t-coordinate (time). RHS is the temporal derivative, and LHS is an unknown symbolic expression as a function of $u$ , the 1st order derivative of $u$ wrt $x$ , $u_x$ , the 1st order derivative of $u$ wrt $y$ , $u_y$ , and the three 2nd order derivatives, $u_{xx}$ , $u_{xy}$ , $u_{yy}$ .

I divide this project into two major steps. The first step is to evaluate numerical derivatives, and the second step is to find the best symbolic equation. The challenge in the first step comes from noise and sparsity of experimental data, and the challenge in the second step is to find the global minimum in the symbolic-equation space. To resolve the first challenge, I proposed a scheme, deep learning total variation regularization. To resolve the second challenge, I proposed the spin sequential Monte Carlo to sample the symbolic-equation space according to the Bayesian posterior probability distribution.

Deep learning total variation regularization

Apparently, we first need to evaluate numerical derivatives. It is actually a challenging task for experimental data as they are noisy and sparse. The convential approaches, such as the finite difference method, are of little help in this scenario. Interestingly, deep learning offers an elegant way to resolve this challenge. We may parametrize a neural nework for the insitu video, which should be a smooth function of $x$ , $y$ , and $t$ . In order to guarantee the smoothness, I apply the total variation regularization on the neural network, which means that I simply add a regularization term in the loss function,

$loss = R(g) + MSE(g,u)$

where the first term is the total-variation regularization term, the second term the mean squared loss, and g the neural network.
Once we finish training the neural networks, we can then use the automatic differentiation to obtain numerical derivatives to any order.

Next, I use a simple example to demonstrate this scheme. The video below is the soft-segmentation result of a real in situ STEM video,

. The signals at the moving interface are very noisy, which prohibits us from using the conventional methods to evaluate the numerical derivatives. Let's use DLTVR to do that.

DLTVR first smoothes the video shown above and return the smoothed video below

We can then employ the automatic differentiation implemeneted in tensorflow or pytorch to calculate the derivatives, which is a piece of cake.

Spin sequential Monte Carlo

In the second step, I proposed the spin sequential Monte Carlo to find the best partial differential equation (PDE) that can describe the video. I call the algorithm spin sequential Monte Carlo because I combined the sequential Monte Carlo and the spin-flip Markov chain Monte Carlo. The inspiration comes from the paper by Ruby et al. and my PhD work. Ruby et al. proposed that the RHS of the partial differential equation might be expressed as a linear combination of non-learn terms

$\frac{\partial u}{\partial t} = f(u, u_x, u_y, u_{xx},u_{xy},u_{yy})\equiv \alpha_1 T_1 + \alpha_2 T_2 + ... \alpha_i T_i + ... \alpha_N T_N$

where $T_1...T_N$ are the non-linear terms in the non-linear library.

Now the key problem becomes which terms we need to select to construct the PDE. Ruby et al. proposed to use the thresholding Ridge regression, which might work well when the non-linear library is not large but become not suitable for a large non-linear library. My PhD work on spin systems inpired me to use spin Monte Carlo sampling for this. To be more specific, we may map the linear combination of the non-linear terms onto a Ising spin chain. The spin up means that the corresponding term is selected, whereas the spin down means that the corresponding term is not selected. We can then use the spin-flip Markov chain Monte Carlo to sample the PDE space. To speed up sampling efficiency, I combine the spin-flip Monte Carlo with the sequential Monte Carlo.

The probability distribution for a PDE is given as the Bayesian posterior

$\P(\mathrm{PDE}|\mathrm{Data}) \propto \mathrm{Likelihood} (\mathrm{Data}| \mathrm{PDE})\cdot \mathrm{Prior}(\mathrm{PDE})$