Motion is estimated from the input images with a convolutional neural network. A warping scheme then displaces the last input image along this motion estimate to produce the future image.
The system is trained end to end
[1] Deep Learning for Physical Processes: Incorporating Prior Scientific Knowledge Emmanuel de Bezenac, Arthur Pajot, Patrick Gallinari URL https://arxiv.org/abs/1711.07970.