Git Product home page Git Product logo

fandosa / normal_flow_prediction Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 238 KB

Deep learning model to predict the normal flow between two consecutive frames, being the normal flow the projection of the optical flow on the gradient directions.

Python 100.00%
cnn cnn-model convolutional-neural-networks deep-learning deep-neural-networks normal-flow optical-flow pytorch pytorch-cnn pytorch-implementation residual-blocks residual-networks residual-neural-network autoencoder encoder-decoder encoder-decoder-architecture encoder-decoder-model

normal_flow_prediction's Introduction

Normal Flow Prediction

This project consists of a deep learning model able to predict the normal flow between two consecutive frames, being the normal flow the projection of the optical flow on the gradient directions. The dataset used to train the model has been TartanAir dataset and the deep learning model is an encoder-decoder with residual blocks based on EVPropNet.

Normal Flow

The normal flow is the projection of the optical flow on the gradient directions of the image and serves as a representation of image motion. To compute it, the brightness constancy constraint has to be applied. The brightness constancy constraint is one of the fundamental assumptions in optical flow computation and computer vision. It is based on the idea that, the intensity, or a function of the intensity, at a pixel remains constant over two consecutive frames. The mathematical expression of this constraint is:

$I(x,y,z)=I(x + u \delta t, y + v \delta t, t + \delta t)$

where $u$ and $v$ represent the optical flow of the pixel $(x, y)$ (i.e., the motion of the image pixel from time $t$ to $t+1$). Thus, the equation can be rewritten as:

$I(x,y,z)=I(x + \delta x, y + \delta y, t + \delta t)$

Approximating the right part of the previous equation with a first order Taylor expansion we obtain:

$I(x,y,z)=I(x,y,z) + \frac{\partial I}{\partial x}\delta x + \frac{\partial I}{\partial y}\delta y + \frac{\partial I}{\partial t}\delta t$

And subtracting $I(x,y,t)$ from both sides of the equation, and then dividing by $\delta t$:

$0 = \frac{\partial I}{\partial x} \frac{\delta x}{\delta t} + \frac{\partial I}{\partial y} \frac{\delta y}{\delta t} + \frac{\partial I}{\partial t} \frac{\delta t}{\delta t} = I_x \frac{\partial x}{\partial t} + I_y \frac{\partial y}{\partial t} + I_t$

Finally, keeping in mind that the optical flow $(u,v)$ represents the motion of the image pixel from time $t$ to $t+1$, this last equation can be rewritten as:

$0 = I_x u + I_y v + I_t$

This equation represents the constraint line. That is, for any point $(x,y)$ in the image, its optical flow $(u,v)$ lies on this line. In the image below, it can be seen an example of this line alongside a gradient vector at a pixel in the image and the corresponding optical flow vector $u$ at that pixel. As shown in the image, the optical flow vector (blue arrow) can be decomposed into two components: the normal flow (depicted by the red arrow) and the parallel flow (indicated by the green arrow).

Therefore, to compute the normal flow vector, it is necessary to calculate the unit vector perpendicular to the constraint line and its magnitude, which corresponds to the distance from the origin to the constraint line. The mathematical expressions of this components are:

$\hat{u}_n = \frac{(I_x, I_y)}{\sqrt{I_x^2 + I_y^2}}$

$|\hat{u}_n| = \frac{|I_t|}{\sqrt{I_x^2 + I_y^2}}$

Obtaining $I_t$ from the constraint line equation, $-I_t = I_xu + I_yv$, the final normal flow vector can be calculated by combining the unit vector of the normal flow, its magnitude, and the value of $|I_t|$:

$u_n = |\hat{u}_n| \hat{u}_n = \frac{|I_t|}{\sqrt{I_x^2 + I_y^2}} \cdot \frac{(I_x,I_y)}{\sqrt{I_x^2 + I_y^2}} = \frac{I_xu + I_yv}{I_x^2 + I_y^2} (I_x,I_y)$

Autoencoder

The deep learning model chosen to predict the normal flow between two consecutive frames has been an autoencoder. This autoencoder is based on EVPropNet, which in turn is based on ResNet. The encoder contains residual blocks with convolutional layers and the decoder contains residual blocks with transpose convolutional layers. The gradients are backpropagated using a mean squared loss computed between groundtruth and predicted normal flow:

$argmin$ $||n - \hat{n} ||_2^2$

The model takes as input two concatenated frames and outputs a matrix of two channels, the components of the normal flow of each pixel. That is, the dimensions of the input and output tensors are $(h,w,6)$ and $(h,w,2)$, respectively.

Run the implementation

As mention before, ehe dataset used to train the model has been TartanAir dataset. This dataset provides many image sequences of different scenarios created in Unreal Engine. At the same time they provide depth maps, optical flow, camera positions and orientations in each image and more. you need to visit their website, download the scenarios you want, and organize the images and their optical flow data the same way they are here in the dataset/train/ folder in the repository. When the data is correctly organized, run the file

python dataset.py

This will create a json file like the one here in the repository with the paths to all images and optical flow data. Then run

python train.py

and the model will start training. A folder like the one here called autoencoder/ will be created. The training checkpoints, as well as the loss values, will be saved here. At the end of the training, an image showing the loss curves will also be saved. You can check the folder in this repository to see what it looks like and the loss curves I have obtained.

To test the model, organise the dataset in the same way as before but using the dataset/test/ folder instead, and run the test file

python test.py

You only have to enter the name of the checkpoint you want to use. In my case, I run

python test.py --checkpoint=checkpoint_395_best.pth

because that's the name of the checkpoint where the loss value was the lowest in my training.

Results

In the following video you can see the results obtained using the hospital scenario of the data set. The video shows 4 screens: the top left screen is the original image, the top right screen is the optical flow, the bottom left screen is the ground truth normal flow and the bottom right screen is the predicted normal flow.

https://www.youtube.com/watch?v=17LuuflLbSo

normal_flow_prediction's People

Contributors

fandosa avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.