Git Product home page Git Product logo

packnet-sfm's Introduction

PackNet-SfM: 3D Packing for Self-Supervised Monocular Depth Estimation

Install // Datasets // Training // Evaluation // Models // License // References

** UPDATE **: We have released a new depth estimation repository here, containing code related to our latest publications. It is an updated version of this repository, so if you are familiar with PackNet-SfM you should be able to migrate easily. Future publications will be included in our new repository, and this one will remain as is. Thank you very much for your support over these past couple of years!

Official PyTorch implementation of self-supervised monocular depth estimation methods invented by the ML Team at Toyota Research Institute (TRI), in particular for PackNet: 3D Packing for Self-Supervised Monocular Depth Estimation (CVPR 2020 oral), Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos and Adrien Gaidon. Although self-supervised (i.e. trained only on monocular videos), PackNet outperforms other self, semi, and fully supervised methods. Furthermore, it gets better with input resolution and number of parameters, generalizes better, and can run in real-time (with TensorRT). See References for more info on our models.

This is also the official implementation of Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion (3DV 2020 oral), Igor Vasiljevic, Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Wolfram Burgard, Greg Shakhnarovich and Adrien Gaidon. Neural Ray Surfaces (NRS) generalize self-supervised depth and pose estimation beyond the pinhole model to all central cameras, allowing the learning of meaningful depth and pose on non-pinhole cameras such as fisheye and catadioptric.

Install

You need a machine with recent Nvidia drivers and a GPU with at least 6GB of memory (more for the bigger models at higher resolution). We recommend using docker (see nvidia-docker2 instructions) to have a reproducible environment. To setup your environment, type in a terminal (only tested in Ubuntu 18.04):

git clone https://github.com/TRI-ML/packnet-sfm.git
cd packnet-sfm
# if you want to use docker (recommended)
make docker-build

We will list below all commands as if run directly inside our container. To run any of the commands in a container, you can either start the container in interactive mode with make docker-start-interactive to land in a shell where you can type those commands, or you can do it in one step:

# single GPU
make docker-run COMMAND="some-command"
# multi-GPU
make docker-run-mpi COMMAND="some-command"

For instance, to verify that the environment is setup correctly, you can run a simple overfitting test:

# download a tiny subset of KITTI
curl -s https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/datasets/KITTI_tiny.tar | tar xv -C /data/datasets/
# in docker
make docker-run COMMAND="python3 scripts/train.py configs/overfit_kitti.yaml"

If you want to use features related to AWS (for dataset access) and Weights & Biases (WANDB) (for experiment management/visualization), then you should create associated accounts and configure your shell with the following environment variables:

export AWS_SECRET_ACCESS_KEY="something"
export AWS_ACCESS_KEY_ID="something"
export AWS_DEFAULT_REGION="something"
export WANDB_ENTITY="something"
export WANDB_API_KEY="something"

To enable WANDB logging and AWS checkpoint syncing, you can then set the corresponding configuration parameters in configs/<your config>.yaml (cf. configs/default_config.py for defaults and docs):

wandb:
    dry_run: True                                 # Wandb dry-run (not logging)
    name: ''                                      # Wandb run name
    project: os.environ.get("WANDB_PROJECT", "")  # Wandb project
    entity: os.environ.get("WANDB_ENTITY", "")    # Wandb entity
    tags: []                                      # Wandb tags
    dir: ''                                       # Wandb save folder
checkpoint:
    s3_path: ''       # s3 path for AWS model syncing
    s3_frequency: 1   # How often to s3 sync

If you encounter out of memory issues, try a lower batch_size parameter in the config file.

NB: if you would rather not use docker, you could create a conda environment via following the steps in the Dockerfile and mixing conda and pip at your own risks...

Datasets

Datasets are assumed to be downloaded in /data/datasets/<dataset-name> (can be a symbolic link).

Dense Depth for Autonomous Driving (DDAD)

Together with PackNet, we introduce Dense Depth for Automated Driving (DDAD): a new dataset that leverages diverse logs from TRI's fleet of well-calibrated self-driving cars equipped with cameras and high-accuracy long-range LiDARs. Compared to existing benchmarks, DDAD enables much more accurate 360 degree depth evaluation at range, see the official DDAD repository for more info and instructions. You can also download DDAD directly via:

curl -s https://tri-ml-public.s3.amazonaws.com/github/DDAD/datasets/DDAD.tar | tar -xv -C /data/datasets/

KITTI

The KITTI (raw) dataset used in our experiments can be downloaded from the KITTI website. For convenience, we provide the standard splits used for training and evaluation: eigen_zhou, eigen_train, eigen_val and eigen_test, as well as pre-computed ground-truth depth maps: original and improved. The full KITTI_raw dataset, as used in our experiments, can be directly downloaded here or with the following command:

# KITTI_raw
curl -s https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/datasets/KITTI_raw.tar | tar -xv -C /data/datasets/

Tiny DDAD/KITTI

For simple tests, we also provide a "tiny" version of DDAD and KITTI:

# DDAD_tiny
curl -s https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/datasets/DDAD_tiny.tar | tar -xv -C /data/datasets/
# KITTI_tiny
curl -s https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/datasets/KITTI_tiny.tar | tar -xv -C /data/datasets/

OmniCam

The raw data for the catadioptric OmniCam dataset can be downloaded from the Omnicam website. For convenience, we provide the dataset for testing the Neural Ray Surfaces (NRS) model. The dataset can be downloaded with the following command:

# omnicam
curl -s https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/datasets/OmniCam.tar | tar -xv -C /data/datasets/

The ray surface template we used for training on OmniCam can be found here.

Training

PackNet can be trained from scratch in a fully self-supervised way (from video only, cf. CVPR'20), in a semi-supervised way (with sparse lidar using our reprojected 3D loss, cf. CoRL'19), and it can also use a fixed pre-trained semantic segmentation network to guide the representation learning further (cf. ICLR'20).

Any training, including fine-tuning, can be done by passing either a .yaml config file or a .ckpt model checkpoint to scripts/train.py:

python3 scripts/train.py <config.yaml or checkpoint.ckpt>

If you pass a config file, training will start from scratch using the parameters in that config file. Example config files are in configs. If you pass instead a .ckpt file, training will continue from the current checkpoint state.

Note that it is also possible to define checkpoints within the config file itself. These can be done either individually for the depth and/or pose networks or by defining a checkpoint to the model itself, which includes all sub-networks (setting the model checkpoint will overwrite depth and pose checkpoints). In this case, a new training session will start and the networks will be initialized with the model state in the .ckpt file(s). Below we provide the locations in the config file where these checkpoints are defined:

checkpoint:
    # Folder where .ckpt files will be saved during training
    filepath: /path/to/where/checkpoints/will/be/saved
model:
    # Checkpoint for the model (depth + pose)
    checkpoint_path: /path/to/model.ckpt
    depth_net:
        # Checkpoint for the depth network
        checkpoint_path: /path/to/depth_net.ckpt
    pose_net:
        # Checkpoint for the pose network
        checkpoint_path: /path/to/pose_net.ckpt

Every aspect of the training configuration can be controlled by modifying the yaml config file. This include the model configuration (self-supervised, semi-supervised, loss parameters, etc), depth and pose networks configuration (choice of architecture and different parameters), optimizers and schedulers (learning rates, weight decay, etc), datasets (name, splits, depth types, etc) and much more. For a comprehensive list please refer to configs/default_config.py.

Evaluation

Similar to the training case, to evaluate a trained model (cf. above or our pre-trained models) you need to provide a .ckpt checkpoint, followed optionally by a .yaml config file that overrides the configuration stored in the checkpoint.

python3 scripts/eval.py --checkpoint <checkpoint.ckpt> [--config <config.yaml>]

You can also directly run inference on a single image or folder:

python3 scripts/infer.py --checkpoint <checkpoint.ckpt> --input <image or folder> --output <image or folder> [--image_shape <input shape (h,w)>]

Models

DDAD

Model Abs.Rel. Sqr.Rel RMSE RMSElog d < 1.25
ResNet18, Self-Supervised, 384x640, ImageNet → DDAD (D) 0.213 4.975 18.051 0.340 0.761
PackNet, Self-Supervised, 384x640, DDAD (D) 0.162 3.917 13.452 0.269 0.823
ResNet18, Self-Supervised, 384x640, ImageNet → DDAD (D)* 0.227 11.293 17.368 0.303 0.758
PackNet, Self-Supervised, 384x640, DDAD (D)* 0.173 7.164 14.363 0.249 0.835
PackNetSAN, Supervised, 384x640, DDAD (D)* 0.086/0.038 1.609/0.546 10.700/5.951 0.185/0.115 0.909/0.976

*: Note that this repository's results differ slightly from the ones reported in our CVPR'20 paper (first two rows), although conclusions are the same. Since CVPR'20, we have officially released an updated DDAD dataset to account for privacy constraints and improve scene distribution. Please use the latest numbers when comparing to the official DDAD release.

KITTI

Model Abs.Rel. Sqr.Rel RMSE RMSElog d < 1.25
ResNet18, Self-Supervised, 192x640, ImageNet → KITTI (K) 0.116 0.811 4.902 0.198 0.865
PackNet, Self-Supervised, 192x640, KITTI (K) 0.111 0.800 4.576 0.189 0.880
PackNet, Self-Supervised Scale-Aware, 192x640, CS → K 0.108 0.758 4.506 0.185 0.887
PackNet, Self-Supervised Scale-Aware, 384x1280, CS → K 0.106 0.838 4.545 0.186 0.895
PackNet, Semi-Supervised (densified GT), 192x640, CS → K 0.072 0.335 3.220 0.115 0.934
PackNetSAN, Supervised (densified GT), 352x1216, K 0.052/0.016 0.175/0.028 2.230/0.902 0.083/0.032 0.970/0.997

All experiments followed the Eigen et al. protocol for training and evaluation, with Zhou et al's preprocessing to remove static training frames. The PackNet model pre-trained on Cityscapes used for fine-tuning on KITTI can be found here.

OmniCam

Our NRS model for OmniCam can be found here.

Precomputed Depth Maps

For convenience, we also provide pre-computed depth maps for supervised training and evaluation:

License

The source code is released under the MIT license.

References

PackNet relies on symmetric packing and unpacking blocks to jointly learn to compress and decompress detail-preserving representations using 3D convolutions. It also uses depth superresolution, which we introduce in SuperDepth (ICRA 2019). Our network can also output metrically scaled depth thanks to our weak velocity supervision (CVPR 2020).

We also experimented with sparse supervision from as few as 4-beam LiDAR sensors, using a novel reprojection loss that minimizes distance errors in the image plane (CoRL 2019). By enforcing a sparsity-inducing data augmentation policy for ego-motion learning, we were also able to effectively regularize the pose network and enable stronger generalization performance (CoRL 2019). In a follow-up work, we propose the injection of semantic information directly into the decoder layers of the depth networks, using pixel-adaptive convolutions to create semantic-aware features and further improve performance (ICLR 2020).

Depending on the application, please use the following citations when referencing our work:

3D Packing for Self-Supervised Monocular Depth Estimation (CVPR 2020 oral)
Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos and Adrien Gaidon, [paper], [video]

@inproceedings{packnet,
  author = {Vitor Guizilini and Rares Ambrus and Sudeep Pillai and Allan Raventos and Adrien Gaidon},
  title = {3D Packing for Self-Supervised Monocular Depth Estimation},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  primaryClass = {cs.CV}
  year = {2020},
}

Sparse Auxiliary Networks for Unified Monocular Depth Prediction and Completion (CVPR 2021)
Vitor Guizilini, Rares Ambrus, Wolfram Burgard and Adrien Gaidon, [paper]

@inproceedings{packnet-san,
  author = {Vitor Guizilini and Rares Ambrus and Wolfram Burgard and Adrien Gaidon},
  title = {Sparse Auxiliary Networks for Unified Monocular Depth Prediction and Completion},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  primaryClass = {cs.CV}
  year = {2021},
}

Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion (3DV 2020 oral)
Igor Vasiljevic, Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Wolfram Burgard, Greg Shakhnarovich, Adrien Gaidon, [paper], [video]

@inproceedings{vasiljevic2020neural,
  title={Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion},
  author={Vasiljevic, Igor and Guizilini, Vitor and Ambrus, Rares and Pillai, Sudeep and Burgard, Wolfram and Shakhnarovich, Greg and Gaidon, Adrien},
  booktitle = {International Conference on 3D Vision},
  primaryClass = {cs.CV},
  year={2020}
}

Semantically-Guided Representation Learning for Self-Supervised Monocular Depth (ICLR 2020)
Vitor Guizilini, Rui Hou, Jie Li, Rares Ambrus and Adrien Gaidon, [paper]

@inproceedings{packnet-semguided,
  author = {Vitor Guizilini and Rui Hou and Jie Li and Rares Ambrus and Adrien Gaidon},
  title = {Semantically-Guided Representation Learning for Self-Supervised Monocular Depth},
  booktitle = {International Conference on Learning Representations (ICLR)}
  month = {April},
  year = {2020},
}

Robust Semi-Supervised Monocular Depth Estimation with Reprojected Distances (CoRL 2019 spotlight)
Vitor Guizilini, Jie Li, Rares Ambrus, Sudeep Pillai and Adrien Gaidon, [paper],[video]

@inproceedings{packnet-semisup,
  author = {Vitor Guizilini and Jie Li and Rares Ambrus and Sudeep Pillai and Adrien Gaidon},
  title = {Robust Semi-Supervised Monocular Depth Estimation with Reprojected Distances},
  booktitle = {Conference on Robot Learning (CoRL)}
  month = {October},
  year = {2019},
}

Two Stream Networks for Self-Supervised Ego-Motion Estimation (CoRL 2019 spotlight)
Rares Ambrus, Vitor Guizilini, Jie Li, Sudeep Pillai and Adrien Gaidon, [paper]

@inproceedings{packnet-twostream,
  author = {Rares Ambrus and Vitor Guizilini and Jie Li and Sudeep Pillai and Adrien Gaidon},
  title = {{Two Stream Networks for Self-Supervised Ego-Motion Estimation}},
  booktitle = {Conference on Robot Learning (CoRL)}
  month = {October},
  year = {2019},
}

SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation (ICRA 2019)
Sudeep Pillai, Rares Ambrus and Adrien Gaidon, [paper], [video]

@inproceedings{superdepth,
  author = {Sudeep Pillai and Rares Ambrus and Adrien Gaidon},
  title = {SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation},
  booktitle = {IEEE International Conference on Robotics and Automation (ICRA)}
  month = {May},
  year = {2019},
}

packnet-sfm's People

Contributors

adriengaidon-tri avatar ivasiljevic avatar kuanleetri avatar spillai avatar vitorguizilini avatar vitorguizilini-tri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

packnet-sfm's Issues

PackNet Training Based on our own Dataset and Question about train_kitti.yaml

Hi,

I wanted to use training part for my own dataset. However, I'm not sure what are the information do I need for the training part?
Do we need point clouds in addition to the images of our own dataset?

also in the config files in train_kitti.yaml file:

" params:
crop: 'garg'
min_depth: 0.0
max_depth: 80.0 "

What are params and how did you determine the min and max depth? Is it only based on the Kitti dataset or something else?

Thank you so much!
Soheil.

question about provided models & results

Hi,

Pretty cool work, thanks for sharing!
I have some questions regarding the provided KITTI models (linked in the README file), and their performance:

  1. Is the performance (RMSE etc.) of all the models reported w.r.t the improved-GT
    (your paper's table 3 refers to "Improved[42]")?
  2. is the last row (with "densified GT") means you're training also using the improved ground-truth from ref [42] to train on? (Generally - it would be nice if you couldr add a column saying which models have which supervision, just like in your table 3).
  3. in your paper, the last row in table 3 is absolutely amazing: 3.293 [m] RMSE, in absolute scale, without any lidar ground-truth, only pose information. This makes the gap to supervised methods much smaller. Is this model one of the provided models here?

Thanks for the help!
Z.

Issue with Using Image dataset (the one without depth)

I was trying to use the image_dataset to train my own dataset, however it did not work and it could not detect the dataset that I provided. It gives me this error:

`### Preparing Model
Model: SelfSupModel
DepthNet: PackNet01
PoseNet: PoseNet

Preparing Datasets

Setup train datasets

######### 0 (x1): ./data/datasets/CITY_tiny/city_tiny.txt

Setup validation datasets

######### 0: ./data/datasets/CITY_tiny/city_tiny.txt
######### 0: ./data/datasets/CITY_tiny/city_tiny.txt

Setup test datasets

######### 0: ./data/datasets/CITY_tiny/city_tiny.txt

########################################################################################################################

Config: configs.default_config -> ..configs.train_city_tiny.yaml

Name: default_config-train_city_tiny-2020.07.27-19h11m01s

########################################################################################################################
config:
-- name: default_config-train_city_tiny-2020.07.27-19h11m01s
-- debug: False
-- arch:
---- seed: 42
---- min_epochs: 1
---- max_epochs: 50
-- checkpoint:
---- filepath: ./data/experiments_new/default_config-train_city_tiny-2020.07.27-19h11m01s/{epoch:02d}_{CITY_tiny-city_tiny-abs_rel_pp_gt:.3f}
---- save_top_k: 5
---- monitor: CITY_tiny-city_tiny-abs_rel_pp_gt
---- monitor_index: 0
---- mode: min
---- s3_path:
---- s3_frequency: 1
---- s3_url:
-- save:
---- folder:
---- depth:
------ rgb: True
------ viz: True
------ npz: True
------ png: True
---- pretrained:
-- wandb:
---- dry_run: True
---- name:
---- project:
---- entity:
---- tags: []
---- dir:
---- url:
-- model:
---- name: SelfSupModel
---- checkpoint_path:
---- optimizer:
------ name: Adam
------ depth:
-------- lr: 0.0002
-------- weight_decay: 0.0
------ pose:
-------- lr: 0.0002
-------- weight_decay: 0.0
---- scheduler:
------ name: StepLR
------ step_size: 30
------ gamma: 0.5
------ T_max: 20
---- params:
------ crop: garg
------ min_depth: 0.0
------ max_depth: 80.0
---- loss:
------ num_scales: 4
------ progressive_scaling: 0.0
------ flip_lr_prob: 0.5
------ rotation_mode: euler
------ upsample_depth_maps: True
------ ssim_loss_weight: 0.85
------ occ_reg_weight: 0.1
------ smooth_loss_weight: 0.001
------ C1: 0.0001
------ C2: 0.0009
------ photometric_reduce_op: min
------ disp_norm: True
------ clip_loss: 0.0
------ padding_mode: zeros
------ automask_loss: True
------ velocity_loss_weight: 0.1
------ supervised_method: sparse-l1
------ supervised_num_scales: 4
------ supervised_loss_weight: 0.9
---- depth_net:
------ name: PackNet01
------ checkpoint_path:
------ version: 1A
------ dropout: 0.0
---- pose_net:
------ name: PoseNet
------ checkpoint_path:
------ version:
------ dropout: 0.0
-- datasets:
---- augmentation:
------ image_shape: (192, 640)
------ jittering: (0.2, 0.2, 0.2, 0.05)
---- train:
------ batch_size: 1
------ num_workers: 16
------ back_context: 1
------ forward_context: 1
------ dataset: ['Image']
------ path: ['./data/datasets/CITY_tiny']
------ split: ['city_tiny.txt']
------ depth_type: ['']
------ cameras: [[]]
------ repeat: [1]
------ num_logs: 5
---- validation:
------ batch_size: 1
------ num_workers: 8
------ back_context: 0
------ forward_context: 0
------ dataset: ['Image', 'Image']
------ path: ['./data/datasets/CITY_tiny', './data/datasets/CITY_tiny']
------ split: ['city_tiny.txt', 'city_tiny.txt']
------ depth_type: ['', '']
------ cameras: [[], []]
------ num_logs: 5
---- test:
------ batch_size: 1
------ num_workers: 8
------ back_context: 0
------ forward_context: 0
------ dataset: ['Image']
------ path: ['./data/datasets/CITY_tiny']
------ split: ['city_tiny.txt']
------ depth_type: ['']
------ cameras: [[]]
------ num_logs: 5
-- config: ./configs/train_city_tiny.yaml
-- default: configs/default_config
-- prepared: True
########################################################################################################################

Config: configs.default_config -> ..configs.train_city_tiny.yaml

Name: default_config-train_city_tiny-2020.07.27-19h11m01s

########################################################################################################################

0.00 images [00:00, ? images/s]
Traceback (most recent call last):
File "scripts/train.py", line 64, in
train(args.file)
File "scripts/train.py", line 59, in train
trainer.fit(model_wrapper)
File "/workspace/packnet-sfm/packnet_sfm/trainers/horovod_trainer.py", line 58, in fit
self.train(train_dataloader, module, optimizer)
File "/workspace/packnet-sfm/packnet_sfm/trainers/horovod_trainer.py", line 97, in train
return module.training_epoch_end(outputs)
File "/workspace/packnet-sfm/packnet_sfm/models/model_wrapper.py", line 219, in training_epoch_end
loss_and_metrics = average_loss_and_metrics(output_batch, 'avg_train')`

I'm afraid the way that I put my own dataset is wrong or some similar issue. I appreciate if you help me and tell me how can I use Image as the dataset.

Training with ImageDataset?

Hi,
I'm trying to train this model with my private video data, but the program fails to save checkpoint file.
I'm afraid that something is wrong in ImageDataset or my conf for the dataset.
Could you give me some advice?

Console outputs are like following:

Epoch 0 | Avg.Loss 0.0722: 100%|?????????????????????????????????????????????????| 856/856 [09:35<00:00,  1.49 images/s]
session1-frame_{:05d}-: 100%|?????????????????????????????????????????????????????| 746/746 [04:08<00:00, 3.00 images/s]
Traceback (most recent call last):
  File "scripts/train.py", line 62, in <module>
    train(args.file)
  File "scripts/train.py", line 57, in train
    trainer.fit(model_wrapper)
  File "/workspace/packnet-sfm/packnet_sfm/trainers/horovod_trainer.py", line 61, in fit
    self.check_and_save(module, validation_output)
  File "/workspace/packnet-sfm/packnet_sfm/trainers/base_trainer.py", line 42, in check_and_save
    self.checkpoint.check_and_save(module, output)
  File "/workspace/packnet-sfm/packnet_sfm/models/model_checkpoint.py", line 128, in check_and_save
    filepath = self.format_checkpoint_name(epoch, metrics)
  File "/workspace/packnet-sfm/packnet_sfm/models/model_checkpoint.py", line 117, in format_checkpoint_name
    filename = filename.format(**metrics)
ValueError: unexpected '{' in field name

My config for dataset is like following:

datasets:
    ...
    train:
        batch_size: 2
        dataset: ['Image']
        path: ['/workspace/packnet-sfm/data/my_video_001/session0']
        split: ["frame_{:05d}"]
        repeat: [1]

I have a number of images for training data and the path is like:
/workspace/packnet-sfm/data/my_video_001/session0/frame_00123.png

I looked into the code and found that ModelCheckpoint's filename had the value of:
epoch={epoch:02d}_session1-frame_0000{={session1-frame_0000{:01d}--loss:.3f}.
This was the direct cause of the error above.

Thanks in advance,

Simple overfitting test error

Screen Shot 2020-07-21 at 1 19 23 PM
I am getting this ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() error. How should I fix this?
Thanks!

NuScenes Dataset Loader

Hi,

first of all, I want to join the others in congratulating you for your great work and for even providing this well documented code! Thanks.

I am currently trying to write a data loader for the NuScenes dataset. The data split files are formatted as follows

sample_token | backward_context_png | forward_context_png

and the sample is then constructed using this routine

image

However, my training results look as follows
evalImg_ep00002

One problem I encountered, which I am not certain whether it's related or not, is that I had to change the cfg.checkpoint.monitor entry from "loss" to "abs_rel", since the entry "loss" is not initialized if training is False.

My questions are:

  • Do you already have a NuScenes Dataloader which you could share, since you mentioned experimenting with NuScenes in your paper, too?
  • Did you experience such a behavior before or have any insights on what might go wrong?
  • Why did you set the cfg.checkpoint.monitor = "loss" when "loss" cannot be defined in the validation step ... I am guessing, that I did something wrong but don't know what...
  • How shall the gt depth maps be defined? Zeros everywhere, where we don't have detections and depth on pixels with detections or do you directly specify them as inverse depth maps?

Thanks in advance for your time and help.

Issue with saving our own checkpoint (.ckp) file after our training

Hi,

I have a question regarding training on my own dataset. I tried to first use tiny_KITTI datasets for training and build my own .ckp file. However, I was not able to do that. I tried to make some changes in the .yaml file and defined the save directory for checkpoints there. However, I could not find it and I do not know how to do it. I appreciate it if you help me.

Thanks,
Soheil.

error on the step 5: bash evaluate_kitti.sh

docker build
-t packnet-sfm:master-latest . -f docker/Dockerfile
ERRO[0001] failed to dial gRPC: cannot connect to the Docker daemon. Is 'docker daemon' running on this host?: dial unix /var/run/docker.sock: connect: permission denied
error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.40/build?buildargs=%7B%7D&cachefrom=%5B%5D&cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=docker%2FDockerfile&labels=%7B%7D&memory=0&memswap=0&networkmode=default&rm=1&session=73lzetomzynf87189u2c8lnrk&shmsize=0&t=packnet-sfm%3Amaster-latest&target=&ulimits=null&version=1: context canceled
Makefile:33: recipe for target 'docker-build' failed
make: *** [docker-build] Error 1

Question about using monocular camera images dataset

I was checking the Kitti dataset that you used and noticed there are Right and Left side camera images for the training process. Is it possible for us to use only one camera for the training process with PackNet? (I mean not using left and right camera images)

Thanks,
Soheil

Why 3D conv?

First of all, congrats on the impressive work! The image reconstruction sanity check is highly inspiring.

I have a question regarding why PackNet uses 3d conv.

I think what the PackNet wants to do is to blend the 2x2 spatial content that is now scattered into the channel dimension. So PackNet used the 3rd dimension to blend the channel. Maybe group conv makes more sense in this application?
image

Another comment is that the paper mentioned that "2D conv are not designed to directly leverage the tiled structure of this feature space, instead, we propose to first learn to expand this structured representation via a 3d conv layer." I actually did not see in the ablation study how this is the case -- I only see that with 3d conv the results went better, but perhaps this is due to increased parameters in the model?

Thank you very much for your insights!

can't get the desired training result

Hi there,
Thanks so much for sharing the training code, I tried to run with self-sup model over KITTI dataset, but it seems to have a weird result after running for a few epochs with pretrained model.

Screenshot from 2020-05-19 13-46-18

The loss get smaller during training, but the evaluation metrics of error get higher every epoch. Did you have ever encounter this situation ?

@AdrienGaidon-TRI @VitorGuizilini-TRI @spillai

Checkpoint (.ckp) produced files size

I have a question about the size of checkpoint files. Why they all have the same size (519.6 MB). Even I train the program with a tiny version produced checkpoint file has the same volume as you shared it. Is there any specific reason?

fine-tune poorly on my own dataset

Hi, thanks for the great work ! I am trying to fine-tune the 'PackNet01_MR_selfsup_K.ckpt' on my own dataset, but the performance is not good. Here are some examples:

If i use 'PackNet01_MR_selfsup_K.ckpt' without fine-tune, the result looks like this:
image

after fine-tune this model, i get this result:
image

The following is the yaml file i used for training.

arch:
    max_epochs: 10
checkpoint:
    filepath: /workspace/packnet-sfm/results/chept
    save_top_k: -1
model:
    name: 'SelfSupModel'
    checkpoint_path: /data/models/PackNet01_MR_semisup_CStoK.ckpt
    optimizer:
        name: 'Adam'
        depth:
            lr: 0.0002
        pose:
            lr: 0.0002
    scheduler:
        name: 'StepLR'
        step_size: 30
        gamma: 0.5
    depth_net:
        name: 'PackNet01'
        version: '1A'
    pose_net:
        name: 'PoseNet'
        version: ''
    params:
        crop: 'garg'
        min_depth: 0.0
        max_depth: 80.0
datasets:
    augmentation:
        image_shape: (192, 640)
    train:
        batch_size: 1
        dataset: ['Image']
        path: ['my_training_dataset']
        split: ['{:03d}']
        repeat: [1]
    validation:
        dataset: ['Image']
        path: ['my_eval_dataset']
        split: ['{:03d}']
    test:
        dataset: ['Image']
        path: ['my_test_dataset']
        split: ['{:03d}']

i also changed the dummy_calibration parames inside datasets/image_dataset.py:

def dummy_calibration(image):
    w, h = [float(d) for d in image.size]
    return np.array([[343.169585 , 0.    , 321.358181],
                     [0.    , 344.619087, 93.813611],
                     [0.    , 0.    , 1.          ]])

My dataset contains 3k images, resolution is (192,640). (only mono images)
The training is done on 4 x Tesla M40, with NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1
After the training, the 'Avg.Loss' drops from 0.0370 to 0.0350. (it drops really slow.)

I know my training epochs is too few. i am still training it now. but the Avg.Loss is always around 0.0350.
Is there any other possible reasons about this poor result? Any help or suggestion will be appreciated. Thanks !

About the warp_ref_image

Hi Vitor,

in

cams.append(Camera(K=K.float()).scaled(scale_factor).to(device))
ref_cams.append(Camera(K=ref_K.float(), Tcw=pose).scaled(scale_factor).to(device))

you only provide one scale to the camera's scaling function. Wouldn't this mean that, in case my image isn't scaled equaly in x and y direction, the camera intrinsic matrix is scaled incorrectly or is this addressed somewhere else?

Thanks for your time and patience for explaining your code to all of us ^^

Two-stream ego-motion

Hi there,

Thanks for this great work.

I am trying to apply the two-stream ego-motion network to map the internal part of the shoulder when recorded during the arthroscopy. So far, I tried to use the pose and depth network separately. The video frame from the internal part of the joints have serious problem with the lack of textures. So I pretrained the Mono Depth network (Godard 2017) with high texture frames environments first and then trained it with the arthroscopic images, and I achieve some success in getting the depth for the arthroscopic frames. However, with the pose Network I had little success, yet. So far, I trained the pose network in a supervised manner where the coordinates of each frame is known with respect to the center (Absolute pose) and the it only works well when the same joint is being used.

I was thinking to use the two-stream ego-motion network which gives me the relative pose of the target image I_t with respect to the source image I_s. But I am guessing this may not work for my case as in the arthroscopic videos there are occasions when the whole frame is being suddenly obscured by a tissue and at the same time the camera is moving. In that case, I guess there will be no similarity between the I_t and the I_s (which is the previous frame?). This could lead into a huge drift over time. But then when I was reading your paper I noticed the similarity matching cost function (Eq 3) tries to match the I_t to the context images I_S (which I guess are all the available training images?). In that case, and for my application, the pose network should be able to estimate the pose of the first frame right after when obscuring tissue is removed. Or is it otherwise and after training the pose network only estimates the pose of the I_t with respect to the I_s?

Regards,
Jacob

Training on NuScenes

Hello!

First and foremost I would like to congratulate you on your great work!

I have read in your paper that you have done some testing on the NuScenes dataset and I was wondering if meanwhile, you have also done some training on it. This is something I am trying to do right now.

I read in the dataset.proto file about the hierarchy of the files and folders, but I could not find anywhere anything about the generation of those .json files.

What have you used to generate them?

I am asking this because the nuScenes dataset also has some .json files but are very differently structured and I don't know how I can generate .json files, having the same structure with the ones that you have, using those from nuScenes.

Thank you in advance for your response!

Working directory mismatch in Docker environment

Hi,
In commit f41342d, when you runmake doker-run and other Docker-related targets,
the default working directory is set to /workspace/packnet-sfm/
but everything (code, data, confs...) is prepared under /workspace/packnet/ .

This leads python3 scripts/train.py checkpoint.ckpt to fail because train.py is at ../packnet/scripts/train.py.

issue withe evaluation of Kitti dataset (first model: ResNet18, Self-Supervised, 192x640, ImageNet → KITTI (K))

Traceback (most recent call last):
File "scripts/eval.py", line 66, in
test(args.checkpoint, args.config, args.half)
File "scripts/eval.py", line 52, in test
model_wrapper.load_state_dict(state_dict)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 830, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ModelWrapper:

Needed GPU Memory

Is there any possibility to use the network with an 4GB GPU device? Thanks in advance.

Did you compare packnet to resnet50 monodepth on kitti dataset?

Hello,

Very great paper! In table 3 of your paper, you show that absolute relative error for monodepth2 resnet was 0.115. PackNet was 0.111. This doesn't seem like a very big difference? Was the monodepth2 resnet resnet18 or resnet50? If it was only resnet18, then I would suspect that resnet50 would close this small improvement in accuracy to an even smaller improvement over monodepth2. But on your DDAD dataset results, the difference between packnet and monodepth2 resnet18 and resnet50 is huge.

Can you please elaborate on this?

Thank you.

Issue with saving the output from the Infer.py

My issue is although I'm running the program correctly and it wrote that "saving ... to ..." but I cannot see the output.

I have no idea why am I getting this issue, but I assume it relates to the permission problem.

I appreciate it if someone helps me and gives me idea of how to fix it.

About memory leaks

Hi! When i'm trying to train the SelfSupModel ,I found that memory usage is growing slowly (from 21G to 38G in one epoch). That's meant memory leaks? Any solution?

Reconstruction loss experiment

Hello, thanks for the great work!

In your paper, in figure 4, we can see a comparison of the input image, the reconstruction of the classical conv+maxpool+bilinear upsampling, and the reconstruction of the packnet architecture.

Can you provide code for this experiment?

Evaluation of models

KITTI_raw-eigen_test_files-velodyne: 0.00 images [00:00, ? images/s]
Traceback (most recent call last):
File "scripts/eval.py", line 66, in
test(args.checkpoint, args.config, args.half)
File "scripts/eval.py", line 61, in test
trainer.test(model_wrapper)
File "/workspace/packnet-sfm/packnet_sfm/trainers/horovod_trainer.py", line 128, in test
self.evaluate(test_dataloaders, module)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
return func(*args, **kwargs)
File "/workspace/packnet-sfm/packnet_sfm/trainers/horovod_trainer.py", line 152, in evaluate
return module.test_epoch_end(all_outputs)
File "/workspace/packnet-sfm/packnet_sfm/models/model_wrapper.py", line 262, in test_epoch_end
output_data_batch, self.test_dataset, self.metrics_name)
File "/workspace/packnet-sfm/packnet_sfm/utils/reduce.py", line 53, in all_reduce_metrics
names = [key for key in list(output_data_batch[0][0].keys()) if key.startswith(name)]
IndexError: list index out of range
Makefile:77: recipe for target 'docker-run' failed

I got this error when I tried to evaluate the model.
Also, I did not use the whole dataset of Kitti and I tried to use partial data of KITTI.

How to configure this project in IDE like Pycharm?

Hi,
This is really great work. I'm trying to use this network on my data and I'm able to run the code in shell using the Makefile you provided. But i hope to run it in Pycharm so that debugging can be easier. However, since the project is packaged in Docker and also commands are written in the Makefile. I'm not sure how I can configure the project in Pycharm. Searched for many tutorials on line but couldn't find a working one maybe due to my limited understanding of Docker.
I'm wondering if you have used IDE during the development of this project. Could you give some instructions on how to configure this project in Pycharm?
Best,

Convert pose sequence to trajectory

Hi,
Can you please share the code you used in one of your experiments for constructing the traversed route from the sequence of pose estimations?
For some reason I’m having trouble with that..
Thanks!

How to force the model to learn the depth in meters?

Hi!

Thanks for your great work! But i'm curious how to force the model to learn the depth in meters , as the DepthDecoder only predict the value in 0~1.

In my mind, the PoseNet may predict the truth scale with with the velocity supervision. But the output of the DepthNet must be multiplied by a factor to restore to the correct scale. Is right?

How to split the Cityscapes for pretraining?

Is there a .txt file containing training files? just like eigen_zhou_files you provide in the readme.

I'll appreciate it if you would like to share the dataset file for CS and its training files list. Thanks!

Two stage training code to solve infinite depth

Hi,

First off, really good work :) Is there any plans of adding the two stage training technique code? Is it handled as a post processing step by creating point clouds using depth estimates and applying ransac to determine the ground plane?

using pre-trained semantic segmentation

Hi,
First of all thank you for releasing this awesome repository and for your continuous support.
You mention in your readme that "can also use a fixed pre-trained semantic segmentation network to guide the representation learning further" but i didn't see any reference to it in the code.

Do you plan to include this feature in any of your upcoming releases?
thanks

details about your tensorrt implementation

dear author, I'm trying to accelerate your given models by using tensorrt. However, I got some problems.
My envs is as below:
TensorRT7.0 onnx-tensorrt pytorch1.4.0 onnx1.6.0
when converting the onnx style into tensrrt, I found that some operations(Pad, group norm) of the given model couldn't be parsed successfully. I was thinking that using the newest TensorRT could cover all operations. So what's your tensorrt version? if possible, would you mind sharing your converted onnx model with me? @AdrienGaidon-TRI @VitorGuizilini-TRI @spillai

Failed to convert depth_net model to TensorRT model

Hi, i noticed that in your CVPR paper, you said:

an inference time of 60ms on a Titan V100 GPU, which can be further improved to < 30ms using TensorRT

So i try to use TensorRT to improve the performance:

  1. Using the following code to export onnx model:
dummy_input = torch.randn(1, 3, 192, 640, device='cuda')
torch.onnx.export(model_wrapper.model.depth_net, dummy_input, "/my_folder/dump.onnx",opset_version=11)
  1. Using the convert tool provided by TensorRT to convert onnx to trt model:
./trtexec --onnx=/my_folder/dump.onnx --saveEngine=dump.trt

However, step 2 failed, the log info is below:

[07/23/2020-17:14:21] [I] === Model Options ===
[07/23/2020-17:14:21] [I] Format: ONNX
[07/23/2020-17:14:21] [I] Model: /data/exp_results/complex_model/dump.onnx
[07/23/2020-17:14:21] [I] Output:
[07/23/2020-17:14:21] [I] === Build Options ===
[07/23/2020-17:14:21] [I] Max batch: 1
[07/23/2020-17:14:21] [I] Workspace: 16 MB
[07/23/2020-17:14:21] [I] minTiming: 1
[07/23/2020-17:14:21] [I] avgTiming: 8
[07/23/2020-17:14:21] [I] Precision: FP32
[07/23/2020-17:14:21] [I] Calibration: 
[07/23/2020-17:14:21] [I] Safe mode: Disabled
[07/23/2020-17:14:21] [I] Save engine: dump.trt
[07/23/2020-17:14:21] [I] Load engine: 
[07/23/2020-17:14:21] [I] Builder Cache: Enabled
[07/23/2020-17:14:21] [I] NVTX verbosity: 0
[07/23/2020-17:14:21] [I] Inputs format: fp32:CHW
[07/23/2020-17:14:21] [I] Outputs format: fp32:CHW
[07/23/2020-17:14:21] [I] Input build shapes: model
[07/23/2020-17:14:21] [I] Input calibration shapes: model
[07/23/2020-17:14:21] [I] === System Options ===
[07/23/2020-17:14:21] [I] Device: 0
[07/23/2020-17:14:21] [I] DLACore: 
[07/23/2020-17:14:21] [I] Plugins:
[07/23/2020-17:14:21] [I] === Inference Options ===
[07/23/2020-17:14:21] [I] Batch: 1
[07/23/2020-17:14:21] [I] Input inference shapes: model
[07/23/2020-17:14:21] [I] Iterations: 10
[07/23/2020-17:14:21] [I] Duration: 3s (+ 200ms warm up)
[07/23/2020-17:14:21] [I] Sleep time: 0ms
[07/23/2020-17:14:21] [I] Streams: 1
[07/23/2020-17:14:21] [I] ExposeDMA: Disabled
[07/23/2020-17:14:21] [I] Spin-wait: Disabled
[07/23/2020-17:14:21] [I] Multithreading: Disabled
[07/23/2020-17:14:21] [I] CUDA Graph: Disabled
[07/23/2020-17:14:21] [I] Skip inference: Disabled
[07/23/2020-17:14:21] [I] Inputs:
[07/23/2020-17:14:21] [I] === Reporting Options ===
[07/23/2020-17:14:21] [I] Verbose: Disabled
[07/23/2020-17:14:21] [I] Averages: 10 inferences
[07/23/2020-17:14:21] [I] Percentile: 99
[07/23/2020-17:14:21] [I] Dump output: Disabled
[07/23/2020-17:14:21] [I] Profile: Disabled
[07/23/2020-17:14:21] [I] Export timing to JSON file: 
[07/23/2020-17:14:21] [I] Export output to JSON file: 
[07/23/2020-17:14:21] [I] Export profile to JSON file: 
[07/23/2020-17:14:21] [I] 
----------------------------------------------------------------
Input filename:   /data/exp_results/complex_model/dump.onnx
ONNX IR version:  0.0.4
Opset version:    11
Producer name:    pytorch
Producer version: 1.3
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[07/23/2020-17:14:23] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[07/23/2020-17:14:23] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
ERROR: builtin_op_importers.cpp:2179 In function importPad:
[8] Assertion failed: inputs.at(1).is_weights()
[07/23/2020-17:14:23] [E] Failed to parse onnx file
[07/23/2020-17:14:23] [E] Parsing model failed
[07/23/2020-17:14:23] [E] Engine creation failed
[07/23/2020-17:14:23] [E] Engine set up failed

i tried the DepthResNet and PackNet01 model, both failed at this ERROR:

ERROR: builtin_op_importers.cpp:2179 In function importPad:
[8] Assertion failed: inputs.at(1).is_weights()

Have you met this error before? Any suggestions? Thanks.

Query regarding Self Supervised learning model

Hi, thanks for uploading the code. I had a few queries:

  1. As we are specifying the camera intrinsics so I believe the pre-trained model won't give good results on datasets other than which they are trained?
  2. Will the model learn depth in a scenario where there are very few distinct objects in the scene and environment that look like a repeating texture, eg: desert, farms, etc? Are these the situations or scenario where the model can benefit from some sort of supervision(Semi-supervised)
  3. For the Custom Image model, since in config YAML file, we are specifying the resolution of the image, do we need to account for the new resolution in the dummy camera matrix manually or there is function somewhere which premultiplies the camera matrix with the scale matrix.

reproduction of the monodepth2 with resnet18

Hey! Thanks for your wonderful work!

Now i'm trying to reproduce the result of monodepth2 with resnet18 backbone. I used the train_kitti.yaml to train about 50 epochs. But the result is not good with loss around about 0.075 but absrel is only 0.125. I think it is may overfitting with the imperfect hyperparameters. So can you share the .yaml file for training monodepth2?

About ADE20k datasets

Hi @VitorGuizilini,

Thanks for your code and the great work!
I am trying to train your model on ADE20k(MIT) datasets which have no depth information and parameters like camera intrinsics. In your paper ,you trained your model on cityscapes whose situation is similar to ADE20k, could you tell me how to train datasets like ADE20k and cityscapes?
Thank you very much!!

Inference on video

Hi @VitorGuizilini,

Thanks for your code and the great work! I am trying to get the inference on a custom video/image-sequence, and so was wondering if there needs a major change in the config or is to be done frame-by-frame in the video?

Thanks in advance

Solving the Infinite Depth Problem

I read the paper carefully, but it seems that you didn't solve this problem specifically. The weak supervision of velocity doesn't explicitly supervise on this part in my opinion, as there is no clear mathematical evidence that it does. You have 3 attracting depth predictions in the readme where the holes are gone, but I don't think it means you solved it...

Did I miss anything in the paper? Thank you.

Which GPU is used? ("Titan V100 GPU")

Thanks for your contribution first of all.

The paper states "an inference time of 60ms on a Titan V100 GPU". If I'm not mistaken, such a GPU does not exist. It is either the "Titan V" or the "Tesla V100". Could you clarify please?

About the training speed

Thanks for sharing this wonderful work!

I'm trying to train the PackNet model with a single RTX2080 card, using batch_size=1 (due to the limited memory). But it basically needs 7 seconds to perform an iteration (forward and backward), while the DispNet only needs 0.2s. Is this the right training speed for the model or something wrong with it?

How to perform multi-GPU training?

Hello,
I'm trying to train a model with VelSupModel. And I find out that only single GPU is working. How to perform multi-GPU training? thx.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.