Git Product home page Git Product logo

monodepth-to-manydepth's Introduction

MonoDepth to ManyDepth: Self-Supervised Depth Estimation on Monocular Sequences

merge-athens Trevi-merge

  1. Dataset

    • Dense Depth for Autonomous Driving (DDAD)
    • KITTI Eigen Split
    wget -i splits/kitti_archieves_to_download.txt -P kitti_data/
    cd kitti_data/
    unzip "*.zip"
    cd ..
    find kitti_data/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2, 1x1, 1x1 {.}.png {.}.jpg && rm {}'
    
    • The above conversion command creates images with default chroma subsampling 2x2, 1x1, 1x1.
  2. Problem Setting

    while specialist hardware can give per-pixel depth, a more attractive approach is to only require a single RGB camera.

    train a deep network to map from an input image to a depth map

    image

    image

  3. Methods

    • Geometry Models

      The simplest representation of a camera an image plane at a given position and orientation in space.

      image

      The pinhole camera geometry models the camera with two sub-parameterizations, intrinsic and extrinsic paramters. Intrinsic parameters model the optic component (without distortion), and extrinsic model the camera position and orientation in space. This projection of the camera is described as:

      image

      A 3D point is projected in a image with the following formula (homogeneous coordinates):

      image

    • Cross-View Reconstruction

    frames the learning problem as one of novel view-synthesis, by training a network to predict the appearance of a target image from the viewpoint another image using depth (disparity)

    formulate the problem as the minimization of a photometric reprojection error at training time

    image

    image

    image

    Here. pe is a photometric reconstruction error, proj() are the resulting 2D coordinates of the projected depths Dā‚œ in the source view and <> is the sampling operator. For simplicity of notation we assume the pre-comuted intrinsics K of all views are identical, though they can be different. Ī± is set to 0.85.

    image

    consider the scene structure and camera motion at the same time, where camera pose estimation has a positive impact on monocular depth estimation. these two sub-networks are trained jointly, and the entire model is constrained by image reconstruction loss similar to stereo matching methods. formulate the problem as the minimization of a photometric reprojection error at training time formulate the problem as the minimization of a photometric reprojection error at training time

  4. Folder

dataset/
    2011_09_26/
    ...
    ...
model_dataloader/
model_layer/
model_loss/
model_save/
model_test.py
model_train.py
model_parser.py
model_utility.py
  1. Packages
apt-get update -y
apt-get install moreutils
or
apt-get install -y moreutils
  1. Training
python model_train.py --pose_type separate --datatype kitti_eigen_zhou
python model_train.py --pose_type separate --datatype kitti_benchmark
  1. Test
python model_test.py
  1. evaluation
kitti_eigen_zhou 
abs_rel   sqrt_rel  rmse      rmse_log  a1        a2        a3
0.125     0.977     4.992     0.202     0.861     0.955     0.980

kitti_eigen_benchmark
abs_rel   sqrt_rel  rmse      rmse_log  a1        a2        a3
0.104     0.809     4.502     0.182     0.900     0.963     0.981

Padding

What is padding and why do we need it?

Screen Shot 2021-08-12 at 8 33 57 AM

  • What is a feature map? that's the yellow block in the image.

  • It's a collection of N one-dimensional "maps" that each represent a particular "feature" that the model has spotted within the image.

  • why convolutional layers are known as feature extractors

  • How do we get from input (whether image or feature map) to a feature map?

  • through kernels or filters

  • you configure some number N per convolutional layer

  • "slide"(convolve) over your input data

monodepth-to-manydepth's People

Contributors

sally20921 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.