Git Product home page Git Product logo

mono-uncertainty's Introduction

On the uncertainty of
self-supervised monocular depth estimation

Demo code of "On the uncertainty of self-supervised monocular depth estimation", Matteo Poggi, Filippo Aleotti, Fabio Tosi and Stefano Mattoccia, CVPR 2020.

At the moment, we do not plan to release training code.

[Paper] - [Poster] - [Youtube Video]

Citation

@inproceedings{Poggi_CVPR_2020,
  title     = {On the uncertainty of self-supervised monocular depth estimation},
  author    = {Poggi, Matteo and
               Aleotti, Filippo and
               Tosi, Fabio and
               Mattoccia, Stefano},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2020}
}

Contents

  1. Abstract
  2. Usage
  3. Contacts
  4. Acknowledgements

Abstract

Self-supervised paradigms for monocular depth estimation are very appealing since they do not require ground truth annotations at all. Despite the astonishing results yielded by such methodologies, learning to reason about the uncertainty of the estimated depth maps is of paramount importance for practical applications, yet uncharted in the literature. Purposely, we explore for the first time how to estimate the uncertainty for this task and how this affects depth accuracy, proposing a novel peculiar technique specifically designed for self-supervised approaches. On the standard KITTI dataset, we exhaustively assess the performance of each method with different self-supervised paradigms. Such evaluation highlights that our proposal i) always improves depth accuracy significantly and ii) yields state-of-the-art results concerning uncertainty estimation when training on sequences and competitive results uniquely deploying stereo pairs.

Usage

Requirements

Getting started

Clone Monodepth2 repository and set it up using

sh prepare_monodepth2_engine.sh

Download KITTI raw dataset and accurate ground truth maps

sh prepare_kitti_data.sh kitti_data

with kitti_data being the datapath for the raw KITTI dataset. The script checks if you already have raw KITTI images and ground truth maps there. Then, it exports ground truth depths according to Monodepth2 format.

Pretrained models

You can download the following pre-trained models:

Run inference

Launch variants of the following command (see batch_generate.sh for a complete list)

python generate_maps.py --data_path kitti_data \
                        --load_weights_folder weights/M/Monodepth2-Post/models/weights_19/ \
                        --post_process \
                        --eval_split eigen_benchmark \
                        --output_dir experiments/Post/ \
                        --eval_mono

It assumes you have downloaded pre-trained models and placed them in the weights folder. Use --eval_stereo for S and MS models.

Extended options (in addition to Monodepth2 arguments):

  • --bootstraps N: loads N models from different trainings
  • --snapshots N: loads N models from the same training
  • --dropout: enables dropout inference
  • --repr: enables repr inference
  • --log: enables log-likelihood estimation (for Log and Self variants)
  • --no_eval: saves results with custom scale factor (see below), for visualization purpose only
  • --custom_scale: custom scale factor
  • --qual: save qualitative maps for visualization

Results are saved in --output_dir/raw and are ready for evaluation. Qualitatives are saved in --output_dir/qual.

Run evaluation

Launch the following command

python evaluate.py --ext_disp_to_eval experiments/Post/raw/ \
                   --eval_mono \
                   --max_depth 80 \
                   --eval_split eigen_benchmark \
                   --eval_uncert

Optional arguments:

  • --eval_uncert: evaluates estimated uncertainty

Results

Results for evaluating Post depth and uncertainty maps:

   abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 |
&   0.088  &   0.508  &   3.842  &   0.134  &   0.917  &   0.983  &   0.995  \\

   abs_rel |          |     rmse |          |       a1 |          |
      AUSE |     AURG |     AUSE |     AURG |     AUSE |     AURG |
&   0.044  &   0.012  &   2.864  &   0.412  &   0.056  &   0.022  \\

Minor changes can occur with different versions of the python packages (not greater than 0.01)

Minor differences from the paper

  • Results from Drop models fluctuate
  • RMSE for Monodepth2 (S) is 3.868 (Table 2 says 3.942, that is a wrong copy-paste from Table 1)
  • The original Monodepth2-Snap (MS) weights went lost 😭 we provide new weights giving almost identical results

Contacts

m [dot] poggi [at] unibo [dot] it

Acknowledgements

Thanks to Niantic and Clément Godard for sharing Monodepth2 code

mono-uncertainty's People

Contributors

filippoaleotti avatar mattpoggi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mono-uncertainty's Issues

How to train the self-teaching framework, an extended question

I was reading your paper "On the uncertainty of self-supervised monocular depth estimation". Firstly, I would like to congratulate you on writing such an useful and good paper. Not much research has been done on the Uncertainty topic, and you have done it. Thank you for that. I have a question regarding the loss function in the Self Teaching approach. Since the training code is not provided, I wanted to try it out myself as I wanted to reproduce your results for my project. If my understanding is correct, I first use the pretrained model to generate pseudo ground truths, and then use them to train the Student Network. The student network has two channels as outputs, one is disparity and the other is the uncertainty. So, in the loss function ,

image

  1. Does image mean the disparity of the Student network and image  mean the uncertainty generated at the second output channel.
  2. Does the L1 difference operator take pixel wise difference between Teacher and the Student and finally the sum of all the pixel-wise losses are taken to minimize? 
    Please let me know. I am stuck at this point and it would be really helpful if you could shed some light on it. 

Edit : I have already seen the issue #2 , there was no real mention of Loss there. This question might be a simple one, but I am no expert in this field, I have just started working on this. Excuse any ignorance. Thanks.

Content of uncertainty map by log method

Hi, thanks for your great work. I noticed that there were two work for MDE in CVPR20 using uncertainty loss, another work was D3VO. Both of you used the same uncertainty loss (log section in your paper), but gotten totally different uncertainty map. I can get uncertainty map as yours. So I‘d like to ask if you know the reason. Looking forward to your reply. Thanks.

Question about self-teaching scheme in mono-uncertainty

Hi, you mentioned that we can decouple depth and pose when modelling uncertainty with self-teaching strategy in the paper. But I cannot figure out why this scheme can provide appropriate uncertainty.

For example, if our teacher model is poorly trained and provides inaccurate depth estimation, with self-teaching we can still get depth uncertainty from student model. In this situation, can we trust this uncertainty?

A question about inconsistency in the generate_maps.py related to bayesian uncertainty

Hey @mattpoggi ,

I understand that the bayesian uncertainty is calculated by adding the uncertainty from a predictive method and an empirical method, for example snap+log. However, I seem to be missing something.

image
In line 183, you are adding the variance from the empirical method and adding up the uncertainties from the predicitive method(log).

image

However in line 216, you are taking the exponential for the log method, I believe this is because the uncertainties are in fact the \log of uncertainities.

Why do you not use the same torch.exp on the uncertainties while calculating it in the bayesian way, in line 183 above. What am I missing here?.

Thanks in advance

Reproducing uncertainty results

When I am reproducing the results, I got v. wrong results, as shown below:
terminal_output

So when I checked plotting the depths and uncertainties saved in raw/disp and raw/uncert then they are like this (see image below, which is clearly wrong):
notebook_output

The thing is I am not getting proper raw pred_depths and uncertainties in raw directory, however my qualitative pred_depths and uncertainties are saved correctly in qual directory as shown below some examples:
image
image

So the main issue is when we are scaling the raw pred_depths and uncertainties before saving.

  1. Could you please tell why we are multiplying by gt_depth's width in this line 270 of generate_maps.py: when you are saving the depths, why are you scaling it like this:

pred_disps[i]*(dataset.K[0][0]*gt_depths[i].shape[1])*256./ratio/10.

(I know depth = (baseline*focal_length)/disparity) but why by ratio, 256, 10 and gt_depth[i].shape[1] which is the width of the image.

  1. And when saving uncertainties in line 279, why scaling like this: pred_uncerts[i]*(256*256-1)

  2. So when using these saved maps in evaluate.py you are again scaling pred_depths like this before comparing with gt_depth: I can't understand line no. 167.
    where you are dividing pred_disp like this: src = pred_disp / 256. / (0.58*gt_depth.shape[1]) * 10

Please help for this issue. @mattpoggi @FilippoAleotti @fabiotosi92

Sparsification plot | doubt

Hi,

The line here is supposed to sort all the uncertainties from high to low right? But here we are simply negating it, how does this sort it in the reverse order?

Your paper also states the same Given an error metric, we sort all pixels in each depth map in order of descending uncertainty.

Regards,
Shrisha

Reproducing Monodepth2-Self

Hi,

I want to apply your self-teaching approach to a different dataset and thus have to train it myself. In order to make sure that I get everything right, I am trying to reproduce your results on KITTI first, but so far without success.

First, I generated the teacher ground-truth using a modified test_simple.py, which stores the predicted depth for all 4 scales as .npy files:

python test_simple.py --image_path KITTI_RAW_DATA_PATH --image_file splits/eigen_zhou/train_files.txt --out_folder KITTI_TEACHER_DATA_PATH --model_name mono_640x192 --pred_depth

(Repeated the same with val_files.txt, of course.)

I then trained the student network using the following loss function:

def compute_loss(self, inputs, outputs, use_sigmoid=True):

    final_loss = 0

    losses = {}

    for scale in range(self.num_scales):

        depth_teacher = inputs[("pred_depth", scale)]

        _, depth_student = layers.disp_to_depth(outputs[("disp", scale)], self.min_depth, self.max_depth)

        if use_sigmoid:
            sigma = torch.sigmoid(outputs[("uncert", scale)]) + 1e-6
            log_sigma = torch.log(sigma)
        else:
            log_sigma = outputs[("uncert", scale)]
            sigma = torch.exp(log_sigma) + 1e-6

        l1_loss = torch.abs(depth_student - depth_teacher.cuda())
        loss = l1_loss / sigma + log_sigma

        losses[("l1_loss", scale)] = torch.mean(l1_loss)
        losses[("loss", scale)] = torch.mean(loss)

        final_loss += losses[("loss", scale)]

    final_loss /= self.num_scales

    losses["loss"] = final_loss

    return losses

With use_sigmoid=False, I get NaN loss in the last epoch. Evaluation of the second to last checkpoint (weights_18) on eigen_benchmark yields:

   abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.136  &   1.300  &   4.833  &   0.186  &   0.857  &   0.955  &   0.981  \\

   abs_rel |          |     rmse |          |       a1 |          | 
      AUSE |     AURG |     AUSE |     AURG |     AUSE |     AURG | 
&   0.029  &   0.062  &   0.497  &   3.583  &   0.032  &   0.090  \\

With use_sigmoid=True, training finishes without a problem and the last checkpoint (weights_19) yields:

   abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.131  &   1.200  &   4.769  &   0.184  &   0.857  &   0.958  &   0.983  \\

   abs_rel |          |     rmse |          |       a1 |          | 
      AUSE |     AURG |     AUSE |     AURG |     AUSE |     AURG | 
&   0.029  &   0.058  &   0.491  &   3.541  &   0.034  &   0.088  \\

This is still far from what I get with your pre-trained network. Any idea where the problem might be?
I am using PyTorch 1.10.1 instead of 0.4, since that old version is not compatible with newer GPUs, but otherwise I am unable to find differences to what you have described in your paper and published code. Training the original Monodepth2 with the newer PyTorch version also gives nearly identical results to the published ones (e.g. AbsRel 0.090 -> 0.091, RMSE 3.942 -> 3.993), so I doubt that this would be a problem.

Thanks!

Problem with Bayesian uncertainty method

Hi There,

I'm wondering if there is a discrepancy between this line of code and equation 17 in the paper that computes the full uncertainty for the Bayesian approach:

pred_uncert = torch.var(disps_distribution, dim=0, keepdim=False) + torch.sum(torch.cat(uncerts_distribution, 0), dim=0, keepdim=False)

As far as I'm aware, the empirical variance from the ensemble should be added with the square of the learned uncertainty output. Am I correct in associating this line of code with equation 17 of the paper? If so, there might be an issue here. I was wondering if this affects the visualization of uncertainties in the paper.

Thanks!

some questions about my implementation

Hi, @mattpoggi,
Thanks for your excellent work!!
I have some questions about my implementation.

  1. In the depth decoder (S), it is output 4 scale disparity and uncertainty, I would like to know that have you calculated all scale outputs when you implemented eq.14?
    image

  2. If my uncertainty has some negative values, Should I limit it to>0?

which split should I choose when training?

Hi, thanks for your greate work! And I am confused about the dataset split in training, can you tell me which split files were choosed when you training the model? is 'eigen zhou' ? or 'benchmark'? or others. it is my first time to use kitti_dataset. Thanks!

How to train the self-teaching network?

Thx for your nice work! I wonder how to train the self-teaching network. It seems you didn't give the code for training. According to (14), a group of results is needed, will it takes a lot of GPU memory during training?
Besides, I didn't find the drop+self-teaching result in the paper, I wonder the performance of this way.

Training code

Hi, Matteo! Will the training code be published in the future?

What is the initial learning rate for the snap model?

Hey @mattpoggi ,

I tried to look for initial learning rate for the "snap" training mode, but I could not find it. A couple of questions related to snap,

  1. The original Snap authors use initial learning rates of 0.2 and 0.1, Did you also use 0.2 as the initial learning rate?
  2. You use "C" as 20 meaning one Snap every epoch and choose last N Snaps or N random Snaps? (N being 8 according to your paper), Also T = total number of training iterations , which means for eigen zhou split it would be (39810//batch_size)* num_epochs?

Thanks in advance 👍

NaN in loss while training the log model

Hey @mattpoggi ,

I was trying to train the log model. I made necessary changes to the decoder to include the additional channel. When I start training, the intial loss is NaN and then after some batches it is NaN again. I was debugging the issue and stumbled upon this piece of code from your decoder.py

image

  1. In line 81, sigmoid is used as the original code from monodepth2, but I do not see sigmoid being used for uncerts in line 85, Is there any reason for this?

  2. I train on the GPU, but for debugging I use the CPU. While debugging on my CPU with batch_size 2 (any size greater will cause memory issues), I used breakpoints to see the values of uncert.

image
As seen in the image, the min value is negative, Log of a negative number is
NaN. This made me ask the first question, why the uncerts are not clamped
between 0(possibly a tiny bit greater to avoid inf when log is taken in the
loss function) and 1.
Is my understanding right or have I misunderstood something?

  1. My loss function is
    image

    Is there a problem with this? Do you also use the "to_optimise" which is a
    min(reprojection_losses and identity losses) or just the original reprojection
    losses?

EDIT : After reading quite a lot, I feel that my log loss is wrong. Maybe the uncertainities coming at the output channel are already \log(uncertainties) , so I would have to correct my loss function to below?
image

EDIT 2: Would the above edit hold good for the self teaching loss too, meaning the uncertainity outputs are actually the \log(uncertainties), so I have to take torch.exp() in the loss.?

Thanks in advance

Training Code

Hi, Matteo! Can you share the training code to me if you don't mind it?

How to deal with the negative loss?

Hi!
When I try to increase the uncertainty part (described as the log section in your paper) in the network, the loss changed to negative.
Is this normal? How did you solve it?

Modifying Eq 12/13 to deal with negative loss

Hi,
Thanks for this great work. In Eq. 12/13 you have a log(variance) that might go negative for variance<1. Assuming, variance to be always +ve can we modify the Eq. 12/13 to log(1+variance) to prevent negative loss and allowing better optimization?
Will this modification hamper training or net output during inference?

lightweight network for self-teaching

When I use monodepth2 and implement self-teaching, uncertainty does improve the results. But why I use a lightweight network and implement self-teaching, the result is not better but worse?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.