dingmyu / d4lcn Goto Github PK

View Code? Open in Web Editor NEW

313.0 9.0 58.0 5.24 MB

A pytorch implementation of "D4LCN: Learning Depth-Guided Convolutions for Monocular 3D Object Detection" CVPR 2020

License: MIT License

CMake 0.06% Shell 0.11% C++ 17.68% MATLAB 8.53% Python 72.33% Makefile 0.01% Cuda 1.28%

d4lcn's Introduction

D4LCN: Learning Depth-Guided Convolutions for Monocular 3D Object Detection (CVPR 2020)

Mingyu Ding, Yuqi Huo, Hongwei Yi, Zhe Wang, Jianping Shi, Zhiwu Lu, Ping Luo

Introduction

Our framework is implemented and tested with Ubuntu 16.04, CUDA 8.0/9.0, Python 3, Pytorch 0.4/1.0/1.1, NVIDIA Tesla V100/TITANX GPU.

If you find our work useful in your research please consider citing our paper:

@inproceedings{ding2020learning,
  title={Learning Depth-Guided Convolutions for Monocular 3D Object Detection},
  author={Ding, Mingyu and Huo, Yuqi and Yi, Hongwei and Wang, Zhe and Shi, Jianping and Lu, Zhiwu and Luo, Ping},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={11672--11681},
  year={2020}
}

Requirements

Cuda & Cudnn & Python & Pytorch

This project is tested with CUDA 8.0/9.0, Python 3, Pytorch 0.4/1.0/1.1, NVIDIA Tesla V100/TITANX GPU. And almost all the packages we use are covered by Anaconda.

Please install proper CUDA and CUDNN version, and then install Anaconda3 and Pytorch.

My settings

source ~/anaconda3/bin/activate (python 3.6.5)
  (base)  pip list
  torch                              1.1.0
  torchfile                          0.1.0
  torchvision                        0.3.0
  numpy                              1.14.3
  numpydoc                           0.8.0
  numba                              0.38.0
  visdom                             0.1.8.9
  opencv-python                      4.1.0.25
  easydict                           1.9
  Shapely                            1.6.4.post2

Data preparation

Download and unzip the full KITTI detection dataset to the folder /path/to/kitti/. Then place a softlink (or the actual data) in data/kitti/. There are two widely used training/validation set splits for the KITTI dataset. Here we only show the setting of split1, you can set split2 accordingly.

cd D4LCN
ln -s /path/to/kitti data/kitti
ln -s /path/to/kitti/testing data/kitti_split1/testing

Our method uses DORN (or other monocular depth models) to extract depth maps for all images. You can download and unzip the depth maps extracted by DORN here and put them (or softlink) to the folder data/kitti/depth_2/. (You can also change the path in the scripts setup_depth.py)

Then use the following scripts to extract the data splits, which use softlinks to the above directory for efficient storage.

python data/kitti_split1/setup_split.py
python data/kitti_split1/setup_depth.py

Next, build the KITTI devkit eval for split1.

sh data/kitti_split1/devkit/cpp/build.sh

Lastly, build the nms modules

cd lib/nms
make

Training

We use visdom for visualization and graphs. Optionally, start the server by command line

sh visdom.sh

The port can be customized in config files. The training monitor can be viewed at http://localhost:9891.

You can change the batch_size according to the number of GPUs, default: 4 GPUs with batch_size = 8.

If you want to utilize the resnet backbone pre-trained on the COCO dataset, it can be downloaded from git or Google Drive, default: ImageNet pretrained pytorch model. You can also set use_corner and corner_in_3d to False for quick training.

See the configurations in scripts/config/depth_guided_config and scripts/train.py for details.

sh train.sh

Testing

We provide the weights, model and config file on the val1 data split available to download.

Testing requires paths to the configuration file and model weights, exposed variables near the top scripts/test.py. To test a configuration and model, simply update the variables and run the test file as below.

sh test.sh

Acknowledgements

We thank Garrick Brazil for his great works and repos.

Contact

For questions regarding D4LCN, feel free to post here or directly contact the authors ([email protected]).

d4lcn's People

Contributors

Stargazers

Watchers

d4lcn's Issues

Why do you set "use_corner = False, corner_in_3d = False, use_hill_loss = False" ?

Hi @dingmyu , Thanks for your work! I have some questions about the loss.

In config, you set conf.corner_in_3d = False, So you don't compute the corners diff in 3D with the GT, am I right?
In use_hill_loss, I guess the hill loss is for Post 3D→2D Optimization , i.e., compute the L1 loss of reprojected 2d coords with respect to GT, am I right?
Why do you disable the two above-mentioned losses in config? Does it deteriorate the final performance by setting them True?
Thank you.

Depth Images for Testing set

Hi,

Thank you for releasing your work. I found the depth_2 folder only contains depth images for the training set. Could you share the depth images generated by using DORN for the testing set?

Thank you for your attention.

sh train.sh TypeError : cannot creat 'generator' instances

someone can help me solve the problem? There is a problemwhen i run train,sh

/home/lab201/wangziniu/D4LCN-master/lib/lr.py:100: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Traceback (most recent call last):
File "/home/lab201/wangziniu/D4LCN-master/scripts/train.py", line 215, in
main(sys.argv[1:])
File "/home/lab201/wangziniu/D4LCN-master/scripts/train.py", line 128, in main
cls, prob, bbox_2d, bbox_3d, feat_size = rpn_net(images.cuda(), depths.cuda())
File "/home/lab201/anaconda3/envs/D4LCN/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/lab201/anaconda3/envs/D4LCN/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.gather(outputs, self.output_device)
File "/home/lab201/anaconda3/envs/D4LCN/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in gather
return gather(outputs, output_device, dim=self.dim)
File "/home/lab201/anaconda3/envs/D4LCN/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 67, in gather
return gather_map(outputs)
File "/home/lab201/anaconda3/envs/D4LCN/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
TypeError: cannot create 'generator' instances

data distribute

What data are you using? How to distribute?

Sincerely.

Replicate results from the paper

Dear authors,

thank you very much for your work. I would like to ask you a few questions.

First, when I evaluate your provided network, I get the following results:

OLD_test_iter pretrain 2d car --> easy: 0.9277, mod: 0.8439, hard: 0.6785
NEW_test_iter pretrain 2d car --> easy: 0.9342, mod: 0.8377, hard: 0.6742
OLD_test_iter pretrain gr car --> easy: 0.3349, mod: 0.2507, hard: 0.1983
NEW_test_iter pretrain gr car --> easy: 0.3225, mod: 0.2268, hard: 0.1722
OLD_test_iter pretrain 3d car --> easy: 0.2490, mod: 0.2077, hard: 0.1729
NEW_test_iter pretrain 3d car --> easy: 0.2317, mod: 0.1621, hard: 0.1234
OLD_test_iter pretrain 2d pedestrian --> easy: 0.6618, mod: 0.5812, hard: 0.4975
NEW_test_iter pretrain 2d pedestrian --> easy: 0.6896, mod: 0.5670, hard: 0.4756
OLD_test_iter pretrain gr pedestrian --> easy: 0.0628, mod: 0.0512, hard: 0.0483
NEW_test_iter pretrain gr pedestrian --> easy: 0.0471, mod: 0.0391, hard: 0.0321
OLD_test_iter pretrain 3d pedestrian --> easy: 0.0436, mod: 0.0445, hard: 0.0396
NEW_test_iter pretrain 3d pedestrian --> easy: 0.0371, mod: 0.0293, hard: 0.0270
OLD_test_iter pretrain 2d cyclist --> easy: 0.6234, mod: 0.4608, hard: 0.3972
NEW_test_iter pretrain 2d cyclist --> easy: 0.6301, mod: 0.4180, hard: 0.3816
OLD_test_iter pretrain gr cyclist --> easy: 0.0344, mod: 0.0296, hard: 0.0306
NEW_test_iter pretrain gr cyclist --> easy: 0.0295, mod: 0.0168, hard: 0.0168
OLD_test_iter pretrain 3d cyclist --> easy: 0.0293, mod: 0.0270, hard: 0.0262
NEW_test_iter pretrain 3d cyclist --> easy: 0.0263, mod: 0.0149, hard: 0.0148

These are OK results for the car class but not for pedestrian and cyclist classes. Also, these results are not the same that you provide in your paper. I mean these results:

Also, when I run train.sh, I get similar results to the results that I get using the provided model, but these results are still not the same as in the paper. In fact, it is significantly better for the pedestrian class and better for the cyclist class.

OLD_test_iter 40000 2d car --> easy: 0.8290, mod: 0.7506, hard: 0.5892
NEW_test_iter 40000 2d car --> easy: 0.8759, mod: 0.7708, hard: 0.6137
OLD_test_iter 40000 gr car --> easy: 0.3448, mod: 0.2528, hard: 0.2053
NEW_test_iter 40000 gr car --> easy: 0.3066, mod: 0.2115, hard: 0.1653
OLD_test_iter 40000 3d car --> easy: 0.2671, mod: 0.1953, hard: 0.1754
NEW_test_iter 40000 3d car --> easy: 0.2230, mod: 0.1503, hard: 0.1193
OLD_test_iter 40000 2d pedestrian --> easy: 0.5670, mod: 0.4883, hard: 0.4096
NEW_test_iter 40000 2d pedestrian --> easy: 0.5822, mod: 0.4813, hard: 0.3946
OLD_test_iter 40000 gr pedestrian --> easy: 0.1323, mod: 0.1156, hard: 0.1137
NEW_test_iter 40000 gr pedestrian --> easy: 0.0528, mod: 0.0424, hard: 0.0351
OLD_test_iter 40000 3d pedestrian --> easy: 0.0473, mod: 0.0482, hard: 0.0413
NEW_test_iter 40000 3d pedestrian --> easy: 0.0405, mod: 0.0314, hard: 0.0287
OLD_test_iter 40000 2d cyclist --> easy: 0.4861, mod: 0.3255, hard: 0.3241
NEW_test_iter 40000 2d cyclist --> easy: 0.4460, mod: 0.2657, hard: 0.2633
OLD_test_iter 40000 gr cyclist --> easy: 0.1132, mod: 0.1058, hard: 0.1064
NEW_test_iter 40000 gr cyclist --> easy: 0.0375, mod: 0.0242, hard: 0.0238
OLD_test_iter 40000 3d cyclist --> easy: 0.1070, mod: 0.0909, hard: 0.0909
NEW_test_iter 40000 3d cyclist --> easy: 0.0213, mod: 0.0141, hard: 0.0144

Can you please tell me how can I obtain the same results as in the paper?
Thank you!

Meaning of the Results

Thanks for your work, I just wonder the meaning of the results:
acc - bg
acc - fg
misc - ry
misc - z
dt

FileNotFoundError: [Errno 2] No such file or directory:'/home/rhett.wang/pyProjects/D4LCN/data/kitti_split1/validation/calib/000000.txt'

Hi, Mingyu
I am new on KITTI. I downloaded this database(data_object_image_2 and label_2) and tried train and test on your model. But error appeared:
FileNotFoundError: [Errno 2] No such file or directory:'/home/rhett.wang/pyProjects/D4LCN/data/kitti_split1/validation/calib/000000.txt'

I think that it should be my fault on the understanding of KITTI and forgot download something else for training. I failed downloading KITTI from official website because of the WALL! Then I got it from Baiduyun. The structure of it had been changed. So could you help to tell me which part of KITTI used in your model and the structure of dirctories of KITTI?

Pretrained model load failure

Hi, I tried your code for testing using the pretrained model, but am receiving this error.
Traceback (most recent call last):
File "scripts/test.py", line 48, in
load_weights(net, weights_path, remove_module=True)
File "/data/pri/D4LCN/lib/core.py", line 405, in load_weights
src_weights = torch.load(path)
File "/home/archbinder/anaconda3/envs/d4lcn/lib/python3.6/site-packages/torch/serialization.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/home/archbinder/anaconda3/envs/d4lcn/lib/python3.6/site-packages/torch/serialization.py", line 574, in _load
result = unpickler.load()
MemoryError

I have verified that, I have enough memory to load the model on both CPU and GPU.

Cannot replicate results from the paper

Dear authors,

thank you very much for your work. I have a question about the replication of the results from your paper.

First, I evaluate your provided weights, model and config file on the val1 data split, I get the following results:

OLD_test_iter pretrain 2d car --> easy: 0.9277, mod: 0.8439, hard: 0.6785
NEW_test_iter pretrain 2d car --> easy: 0.9342, mod: 0.8377, hard: 0.6742
OLD_test_iter pretrain gr car --> easy: 0.3349, mod: 0.2507, hard: 0.1983
NEW_test_iter pretrain gr car --> easy: 0.3225, mod: 0.2268, hard: 0.1722
OLD_test_iter pretrain 3d car --> easy: 0.2490, mod: 0.2077, hard: 0.1729
NEW_test_iter pretrain 3d car --> easy: 0.2317, mod: 0.1621, hard: 0.1234
OLD_test_iter pretrain 2d pedestrian --> easy: 0.6618, mod: 0.5812, hard: 0.4975
NEW_test_iter pretrain 2d pedestrian --> easy: 0.6896, mod: 0.5670, hard: 0.4756
OLD_test_iter pretrain gr pedestrian --> easy: 0.0628, mod: 0.0512, hard: 0.0483
NEW_test_iter pretrain gr pedestrian --> easy: 0.0471, mod: 0.0391, hard: 0.0321
OLD_test_iter pretrain 3d pedestrian --> easy: 0.0436, mod: 0.0445, hard: 0.0396
NEW_test_iter pretrain 3d pedestrian --> easy: 0.0371, mod: 0.0293, hard: 0.0270
OLD_test_iter pretrain 2d cyclist --> easy: 0.6234, mod: 0.4608, hard: 0.3972
NEW_test_iter pretrain 2d cyclist --> easy: 0.6301, mod: 0.4180, hard: 0.3816
OLD_test_iter pretrain gr cyclist --> easy: 0.0344, mod: 0.0296, hard: 0.0306
NEW_test_iter pretrain gr cyclist --> easy: 0.0295, mod: 0.0168, hard: 0.0168
OLD_test_iter pretrain 3d cyclist --> easy: 0.0293, mod: 0.0270, hard: 0.0262
NEW_test_iter pretrain 3d cyclist --> easy: 0.0263, mod: 0.0149, hard: 0.0148

Then i use your config to train the net, config information as below:

    conf.model = 'resnet_dilate'
    conf.lr = 0.01
    conf.max_iter = 40000
    conf.use_dropout = True
    conf.drop_channel = True
    conf.dropout_rate = 0.5
    conf.dropout_position = 'early'  # 'early'  'late' 'adaptive'
    conf.do_test = True
    conf.lr_policy = 'onecycle'  # 'onecycle'  # 'cosinePoly'  # 'cosineRestart'  # 'poly'
    conf.restart_iters = 5000
    conf.batch_size = 2 * 4
    conf.base_model = 50
    conf.depth_channel = 1
    conf.adaptive_diated = True
    conf.use_seg = False
    conf.use_corner = False
    conf.corner_in_3d = False
    conf.use_hill_loss = False
    conf.use_rcnn_pretrain = False
    conf.deformable = False

    conf.alias = 'Adaptive_block2'

    conf.result_dir = '_'.join([conf.alias, conf.model + str(conf.base_model), 'batch' + str(conf.batch_size),
                                'dropout' + conf.dropout_position + str(conf.dropout_rate), 'lr' + str(conf.lr),
                                conf.lr_policy, 'iter' + str(conf.max_iter),
                                datetime.datetime.now().strftime("%Y.%m.%d-%H:%M:%S")]).replace('.', '_').replace(':', '_').replace('-', '_')


    # solver settings
    conf.solver_type = 'sgd'

    conf.momentum = 0.9
    conf.weight_decay = 0.0005

    conf.snapshot_iter = 5000
    conf.display = 50


    
    # sgd parameters

    conf.lr_steps = None
    conf.lr_target = conf.lr * 0.00001
    
    # random
    conf.rng_seed = 2
    conf.cuda_seed = 2
    
    # misc network
    conf.image_means = [0.485, 0.456, 0.406]
    conf.image_stds = [0.229, 0.224, 0.225]
    if conf.use_rcnn_pretrain:
        conf.image_means = [102.9801, 115.9465, 122.7717]  # conf.image_means[::-1]
        conf.image_stds = [1, 1, 1]  #conf.image_stds[::-1]
    if conf.use_seg:
        conf.depth_mean = [4413.160626995486, 4413.160626995486, 5.426258330316642]
        conf.depth_std = [3270.0158918863494, 3270.0158918863494, 0.5365540402943388]
    else:
        conf.depth_mean = [4413.160626995486, 4413.160626995486, 4413.160626995486]  # DORN
        conf.depth_std = [3270.0158918863494, 3270.0158918863494, 3270.0158918863494]
        # conf.depth_mean = [8295.013626842678, 8295.013626842678, 8295.013626842678]  # PSM
        # conf.depth_std = [5134.9781439128665, 5134.9781439128665, 5134.9781439128665]
        # conf.depth_mean = [30.83664619525601, 30.83664619525601, 30.83664619525601]  # DISP
        # conf.depth_std = [19.992999492848206, 19.992999492848206, 19.992999492848206]
    if conf.depth_channel == 3:
        conf.depth_mean = [137.39162828, 40.58310471, 140.70854621]  # MONO1
        conf.depth_std = [33.75859339, 51.479677, 65.254889]
        conf.depth_mean = [107.0805491, 68.26778312, 133.50751215]  # MONO2
        conf.depth_std = [38.65614623, 73.59464917, 88.24401221]

    conf.feat_stride = 16
    
    conf.has_3d = True

    # ----------------------------------------
    #  image sampling and datasets
    # ----------------------------------------

    # scale sampling  
    conf.test_scale = 512
    conf.crop_size = [512, 1760]
    conf.mirror_prob = 0.50
    conf.distort_prob = -1
    
    # datasets
    conf.dataset_test = 'kitti_split1'
    conf.datasets_train = [{'name': 'kitti_split1', 'anno_fmt': 'kitti_det', 'im_ext': '.png', 'scale': 1}]
    conf.use_3d_for_2d = True
    
    # percent expected height ranges based on test_scale
    # used for anchor selection 
    conf.percent_anc_h = [0.0625, 0.75]
    
    # labels settings
    conf.min_gt_h = conf.test_scale*conf.percent_anc_h[0]
    conf.max_gt_h = conf.test_scale*conf.percent_anc_h[1]
    conf.min_gt_vis = 0.65
    conf.ilbls = ['Van', 'ignore']
    conf.lbls = ['Car', 'Pedestrian', 'Cyclist']
    
    # ----------------------------------------
    #  detection sampling
    # ----------------------------------------
    
    # detection sampling

    conf.fg_image_ratio = 1.0
    conf.box_samples = 0.20
    conf.fg_fraction = 0.20
    conf.bg_thresh_lo = 0
    conf.bg_thresh_hi = 0.5
    conf.fg_thresh = 0.5
    conf.ign_thresh = 0.5
    conf.best_thresh = 0.35

    # ----------------------------------------
    #  inference and testing
    # ----------------------------------------

    # nms
    conf.nms_topN_pre = 3000
    conf.nms_topN_post = 40
    conf.nms_thres = 0.4
    conf.clip_boxes = False

    conf.test_protocol = 'kitti'
    conf.test_db = 'kitti'
    conf.test_min_h = 0
    conf.min_det_scales = [0, 0]

    # ----------------------------------------
    #  anchor settings
    # ----------------------------------------
    
    # clustering settings
    conf.cluster_anchors = 0
    conf.even_anchors = 0
    conf.expand_anchors = 0
                             
    conf.anchors = None

    conf.bbox_means = None
    conf.bbox_stds = None
    
    # initialize anchors
    base = (conf.max_gt_h / conf.min_gt_h) ** (1 / (12 - 1))
    conf.anchor_scales = np.array([conf.min_gt_h * (base ** i) for i in range(0, 12)])
    conf.anchor_ratios = np.array([0.5, 1.0, 1.5])
    
    # loss logic
    conf.hard_negatives = True
    conf.focal_loss = 1
    conf.cls_2d_lambda = 1
    conf.iou_2d_lambda = 0
    conf.bbox_2d_lambda = 1
    conf.bbox_3d_lambda = 1
    conf.bbox_3d_proj_lambda = 0.0
    
    conf.hill_climbing = True
    
    # visdom
    conf.visdom_port = 9891

    return conf

But the results are not the same as in the paper or your provided models. And we got different results every time. Our results:

round1:
OLD_test_iter pretrain 2d car --> easy: 0.8227, mod: 0.7466, hard: 0.6567
NEW_test_iter pretrain 2d car --> easy: 0.8689, mod: 0.7665, hard: 0.6302
OLD_test_iter pretrain gr car --> easy: 0.3332, mod: 0.2484, hard: 0.2033
NEW_test_iter pretrain gr car --> easy: 0.2925, mod: 0.2058, hard: 0.1627
OLD_test_iter pretrain 3d car --> easy: 0.2604, mod: 0.1910, hard: 0.1689
NEW_test_iter pretrain 3d car --> easy: 0.2060, mod: 0.1459, hard: 0.1150
OLD_test_iter pretrain 2d pedestrian --> easy: 0.6248, mod: 0.4880, hard: 0.4088
NEW_test_iter pretrain 2d pedestrian --> easy: 0.5965, mod: 0.4799, hard: 0.3932
OLD_test_iter pretrain gr pedestrian --> easy: 0.0437, mod: 0.0456, hard: 0.0433
NEW_test_iter pretrain gr pedestrian --> easy: 0.0387, mod: 0.0346, hard: 0.0294
OLD_test_iter pretrain 3d pedestrian --> easy: 0.0382, mod: 0.0403, hard: 0.0359
NEW_test_iter pretrain 3d pedestrian --> easy: 0.0298, mod: 0.0263, hard: 0.0214
OLD_test_iter pretrain 2d cyclist --> easy: 0.4188, mod: 0.2548, hard: 0.2557
NEW_test_iter pretrain 2d cyclist --> easy: 0.4219, mod: 0.2430, hard: 0.2432
OLD_test_iter pretrain gr cyclist --> easy: 0.0396, mod: 0.0227, hard: 0.0227
NEW_test_iter pretrain gr cyclist --> easy: 0.0256, mod: 0.0135, hard: 0.0137
OLD_test_iter pretrain 3d cyclist --> easy: 0.0374, mod: 0.0227, hard: 0.0227
NEW_test_iter pretrain 3d cyclist --> easy: 0.0219, mod: 0.0127, hard: 0.0121

round2:
OLD_test_iter pretrain 2d car --> easy: 0.8936, mod: 0.7502, hard: 0.6588
NEW_test_iter pretrain 2d car --> easy: 0.8975, mod: 0.7701, hard: 0.6333
OLD_test_iter pretrain gr car --> easy: 0.3283, mod: 0.2423, hard: 0.1982
NEW_test_iter pretrain gr car --> easy: 0.2892, mod: 0.1931, hard: 0.1571
OLD_test_iter pretrain 3d car --> easy: 0.2591, mod: 0.1890, hard: 0.1639
NEW_test_iter pretrain 3d car --> easy: 0.2069, mod: 0.1415, hard: 0.1061
OLD_test_iter pretrain 2d pedestrian --> easy: 0.5659, mod: 0.4894, hard: 0.4094
NEW_test_iter pretrain 2d pedestrian --> easy: 0.5809, mod: 0.4803, hard: 0.3945
OLD_test_iter pretrain gr pedestrian --> easy: 0.0437, mod: 0.0459, hard: 0.0403
NEW_test_iter pretrain gr pedestrian --> easy: 0.0394, mod: 0.0337, hard: 0.0280
OLD_test_iter pretrain 3d pedestrian --> easy: 0.0392, mod: 0.0380, hard: 0.0334
NEW_test_iter pretrain 3d pedestrian --> easy: 0.0298, mod: 0.0254, hard: 0.0200
OLD_test_iter pretrain 2d cyclist --> easy: 0.5863, mod: 0.3419, hard: 0.3375
NEW_test_iter pretrain 2d cyclist --> easy: 0.5724, mod: 0.3404, hard: 0.3179
OLD_test_iter pretrain gr cyclist --> easy: 0.0752, mod: 0.0477, hard: 0.0473
NEW_test_iter pretrain gr cyclist --> easy: 0.0544, mod: 0.0323, hard: 0.0273
OLD_test_iter pretrain 3d cyclist --> easy: 0.0508, mod: 0.0399, hard: 0.0416
NEW_test_iter pretrain 3d cyclist --> easy: 0.0411, mod: 0.0230, hard: 0.0231

round3:
OLD_test_iter pretrain 2d car --> easy: 0.8492, mod: 0.7583, hard: 0.5942
NEW_test_iter pretrain 2d car --> easy: 0.8977, mod: 0.7804, hard: 0.6208
OLD_test_iter pretrain gr car --> easy: 0.3537, mod: 0.2569, hard: 0.2068
NEW_test_iter pretrain gr car --> easy: 0.3130, mod: 0.2161, hard: 0.1703
OLD_test_iter pretrain 3d car --> easy: 0.2734, mod: 0.1939, hard: 0.1767
NEW_test_iter pretrain 3d car --> easy: 0.2269, mod: 0.1524, hard: 0.1224
OLD_test_iter pretrain 2d pedestrian --> easy: 0.5628, mod: 0.4819, hard: 0.4035
NEW_test_iter pretrain 2d pedestrian --> easy: 0.5641, mod: 0.4587, hard: 0.3722
OLD_test_iter pretrain gr pedestrian --> easy: 0.0692, mod: 0.0542, hard: 0.0516
NEW_test_iter pretrain gr pedestrian --> easy: 0.0491, mod: 0.0393, hard: 0.0324
OLD_test_iter pretrain 3d pedestrian --> easy: 0.0527, mod: 0.0507, hard: 0.0450
NEW_test_iter pretrain 3d pedestrian --> easy: 0.0402, mod: 0.0309, hard: 0.0246
OLD_test_iter pretrain 2d cyclist --> easy: 0.4663, mod: 0.3081, hard: 0.2457
NEW_test_iter pretrain 2d cyclist --> easy: 0.4275, mod: 0.2526, hard: 0.2339
OLD_test_iter pretrain gr cyclist --> easy: 0.0603, mod: 0.0455, hard: 0.0455
NEW_test_iter pretrain gr cyclist --> easy: 0.0272, mod: 0.0134, hard: 0.0136
OLD_test_iter pretrain 3d cyclist --> easy: 0.0586, mod: 0.0455, hard: 0.0455
NEW_test_iter pretrain 3d cyclist --> easy: 0.0254, mod: 0.0125, hard: 0.0123

Can you please tell me how can I obtain the same results as in the paper?
Thank you!

Depth maps estimation

Hi,

I was trying to reproduce the results using the depth maps available on the GitHub repo. I have a question about the standalone model you used for depth maps estimation. Sorry if this is a trivial question, but are these methods, such as PSMNet, DispNet, or DORN, trained on the KITTI dataset? If so, how did you split the training/validation/testing set?

Thank you!

Where is the function ‘projection_ray_trace’？

depth image of kitti test set

Hi，I'm running your project these days. Could you also provide the depth image of kitti test set?

Thank you!

How can i train the results like you in paper?

dear dingmyu,

Thanks for your work, i use your config to train the net but can't get the results like yours, is it your config wrong?

Source of Randomness

Hi, Thanks for open-sourcing the great work.

I downloaded your pretrained model from #4 (comment) and test it repeatedly. Between each run, I cleared the output folder. But I got slightly different results every time:

OLD_test_iter pretrain 3d car --> easy: 0.2702, mod: 0.2178, hard: 0.1833
OLD_test_iter pretrain 3d car --> easy: 0.2701, mod: 0.2167, hard: 0.1830
OLD_test_iter pretrain 3d car --> easy: 0.2715, mod: 0.2164, hard: 0.1823
OLD_test_iter pretrain 3d car --> easy: 0.2690, mod: 0.2173, hard: 0.1829

What's the source of such randomness?

Also, the model provided in the link above is different from that in the readme. Which is the official model used in the paper?

Thank you very much,

Training my own dataset

I want to use your D4LCN model to train the data set I created myself, what changes should I make in the code part？I'm looking forward to your reply.
Thank you so much!

The code of depth-guided module

Hello dingmyu!
Thanks for sharing your gread code! I am interested in your depth-guided module. When I look at your code about this part. I see two adaptive_diated layer. One is between layer2 and layer3, the other is between layer3 and layer4. But I see your paper, the adaptive_diated and dynamic_local_filtering module is after layer3 to prepare for layer4.
Can you explain why we need 2 such layers in module, and how they work for the whole net.
Thanks a lot!

evaluating with error " died with <Signals.SIGSEGV: 11>."

After predicting labels of validation set,I use
with open(os.devnull, 'w') as devnull: out = subprocess.check_output([script, results_path.replace('/data', '')], stderr=devnull)
to load the file in path "data/kitti_split1/devkit/cpp/evaluate_object" .
However,there is an error below:

<ipython-input-5-9b215e94f907> in test_kitti_3d_back(dataset_test, test_split, rpn_conf, results_path, test_path, use_log)
----> 9         out = subprocess.check_output([script, results_path.replace('/data', '')], stderr=devnull)
     10 
     11     for lbl in rpn_conf.lbls:

~/anaconda3/lib/python3.7/subprocess.py in check_output(timeout, *popenargs, **kwargs)
    393 
    394     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
--> 395                **kwargs).stdout
    396 
    397 

~/anaconda3/lib/python3.7/subprocess.py in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    485         if check and retcode:
    486             raise CalledProcessError(retcode, process.args,
--> 487                                      output=stdout, stderr=stderr)
    488     return CompletedProcess(process.args, retcode, stdout, stderr)
    489 

CalledProcessError: Command '['/home/coolguy/Project/D4LCN/data/kitti_split1/devkit/cpp/evaluate_object', 'output/tmp_results']' died with <Signals.SIGSEGV: 11>.

How can i fix it to get the correct evaluate results? Thanks.

Depth map

I wanna know that the depth map files you provided are generated by the official Dorn code? Thank you!

Hi, dingmyu, where's Depth-Guided Filtering Module?

Error when loading model checkpoint

I tried loading up the pretrained model but got this error:

---------------------------------------------------------------------------
UnpicklingError                           Traceback (most recent call last)
<ipython-input-5-c6549c052d91> in <module>()
     43 
     44 # load weights
---> 45 load_weights(net, weights_path, remove_module=True)
     46 
     47 # switch modes for evaluation

2 frames
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _load(f, map_location, pickle_module, **pickle_load_args)
    611     unpickler = pickle_module.Unpickler(f)
    612     unpickler.persistent_load = persistent_load
--> 613     result = unpickler.load()
    614 
    615     deserialized_storage_keys = pickle_module.load(f, **pickle_load_args)

UnpicklingError: invalid load key, '\x06'.

Low performance on Monodepth2 generated depth maps

Hi, I am trying to use your work with depth maps generated from monodepth2 depth estimation network.

I tried using one of your kitti samples to test the pipeline, with depth configuration as :
conf.depth_mean = [107.0805491, 68.26778312, 133.50751215] conf.depth_std = [38.65614623, 73.59464917, 88.24401221]

Even then, the max score that i get is less than 30 percent.
Depth map and image uploaded for reference.
Is there any configurations I need to change in the monodepth code itself to get this working?

Model used for testing is the pretrained model provided in the repo.
My monodepth post processing code is:
_, scaled_depth = disp_to_depth(disp, 0.1, 100)
depth_resized = torch.nn.functional.interpolate(scaled_depth, (original_height, original_width), mode="bilinear", align_corners=False)
depth_resized_np = depth_resized.squeeze().cpu().numpy()
normalizer_depth = mpl.colors.Normalize(vmin=depth_resized_np.min())
mapper_depth = cm.ScalarMappable(norm=normalizer_depth, cmap='gray')
colormapped_im_depth = (mapper_depth.to_rgba(depth_resized_np)[:, :, :3] * 255).astype(np.uint8)
im_depth = pil.fromarray(colormapped_im_depth)

Depth Map:

Image:

Output is :(
Car -1 -1 -0.420610 690.218506 147.377106 1291.588501 441.522003 1.693756 1.490031 3.745631 2.358071 1.346878 1.293291 0.648531 0.281668
Car -1 -1 -0.314688 271.298126 136.579041 803.299255 431.537445 1.713774 1.436281 3.459033 1.215498 1.356887 1.107118 0.517339 0.234700

Questions about training results

Hi, I have a question about the results of training. Here is the .txt result from training, such as image_2/002451.txt:

The first 4 values represent the upper-left and lower-right coordinates of the 2d box, is right?
But the 2d box coordinates in the label are(794.92, 171.12, 870.61, 235.39)(301.59, 184.71, 487.48, 262.98)...
The results are different. I don't know why. Thank you for your reply.

Question about "gpu_nms"

Hello and thank you so very much for your fantastic work and awesome codes. I have a little question that I would greatly appreciate receiving a response for. 💐

I am trying to run the test.sh code. but I get the following error (specifically, from the line from lib.nms.gpu_nms import gpu_nms in rpn_util.py code:

ModuleNotFoundError: No module named 'lib.nms.gpu_nms'

How can I fix this? Am I forgetting to install or import a particular library/dependency?

Thank you very much.

Executing your model on the KITTI Tracking dataset

Hello,

congratulations on phenomenal monocular 3D detection results! (and opensourcing your code)

I have successfully run your pretrained model on Kitti_val1 dataset and results seem great. I was wondering if I could make it work for the Tracking dataset on Kitti as well (http://www.cvlibs.net/datasets/kitti/eval_tracking.php),
and connect it with some tracker to see how it performs.

So basically I am interested in any advice and intuition you have regarding it - for now it seems to me I should (for evaluation with your pretrained weights):

Get DORN depth map results for that dataset
Place the dataset and these depth maps somewhere in your project
probably change couple of things in your configuration file
modify your test.py script

At this point I'm just wondering if I'm missing some other big step in this process before I start ?

train completed have not save weight

helllo! i meet problems
train completed, not create weight

how to calculate mean and std

Hi @dingmyu :

Thank you for sharing this great project!
I want to apply it to my own depth prediction model, how should I calculate the mean and std for my depth map? What's the meaning for those 3 values in the mean and std? Is it averaged through the entire training dataset? Should I use all depth values or filtering only the valid area(e.g. 0-80m, filter out the sky area)? Can you share your code for generating this mean and std?

Thank you so much!

Sincerely
Ziyue Feng

Reproducing AP|R11 results

Hi, some results in table 1 was based on AP|R11 metric. Do we need to change "const double N_SAMPLE_PTS = 41;" to "const double N_SAMPLE_PTS = 11;" in evaluate_object.cpp to reproduce the results?

your work consumes 0.2s ?what‘s 0.2s contain?

Hi,dingmyu! I am quite interested in your work, especially its efficiency.But you don't report the run-time in the paper.So I found it on kitti official website.

Could you please tell me what does the time represents? Does it including the pre/post-process in the network,such as the time from cpu to gpu ,3D anchor transform, nms, post-optimization ?
Or just calculating the consuming time of network flow on the gpu,from the processed image to feature map?

result

excuse me?How to shown the result image?I can just get the result like below

About 'shift-pooling operator'

Hi dingmyu, you mention a shift-pooling operator in your paper. However, I didn't find any relevant implementation in this repo. Any suggestions or follow-ups? Thanks :)

M3D-RPN?

https://github.com/garrickbrazil/M3D-RPN

ModuleNotFoundError: No module named 'lib.imdb_util'

Hi, I am trying to run your, while testing getting error like this "ModuleNotFoundError: No module named 'lib.imdb_util'". Can you please help me to go through this.
Thanks in advance.

Model different to paper

Hi - thank you for the good work,

I notice that your model in this repo is different from the one you presented in your paper, and that in README you mentioned one can achieve better performance by adding more D4LCN modules.

Which model is the one you used to produce the results in the paper? What's the difference in terms of performance?

Thanks

sh train.sh undefined symbol: __cudaRegisterFatBinaryEnd

ImportError: /home/lw/D4LCN-master/lib/nms/gpu_nms.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaRegisterFatBinaryEnd

sir, please help me! how do it?

low performance

When I use your simplified version to train, it produced a bad performance

Depth maps in training set and testing set

Hello, I'm using the pre-trained model from Dorn, but I found that depth errors in KITTI testing set are much bigger than KITTI training set. I wanna know that is the issue that also existing in your experiments? Thank you very much!

Reproduce 3DNet+CL

Hi @dingmyu , I have the same question here.

I want to reproduce your result in corner loss, but what you said above makes me confused.Did the corner loss useful here?How can I reproduce "3DNet+CL" ?

Originally posted by @kaixinbear in #22 (comment)

Confuion in building project

Sir, I am a little confuse while building the project. I need some assistance in the data processing module.

GPUs

Now I have a 4-core GPU machine, but only two of them can be used, how should I change it in the code? thanks

TypeError: 'NoneType' object is not subscriptable

Hi Mingyu,

When I try to train the model, it showed error as below.

Traceback (most recent call last): File "scripts/train.py", line 213, in <module> main(sys.argv[1:]) File "scripts/train.py", line 118, in main iterator, images, depths, imobjs = next_iteration(dataset.loader, iterator) File "/home/rhett.wang/pyProjects/D4LCN/lib/core.py", line 604, in next_iteration images, depths, imobjs = next(iterator) File "/home/rhett.wang/local/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 637, in __next__ return self._process_next_batch(batch) File "/home/rhett.wang/local/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch raise batch.exc_type(batch.exc_msg) TypeError: Traceback (most recent call last): File "/home/rhett.wang/local/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/rhett.wang/local/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp> samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/rhett.wang/pyProjects/D4LCN/lib/imdb_util.py", line 234, in __getitem__ depth = depth[:, :, np.newaxis] TypeError: 'NoneType' object is not subscriptable
I thought it maybe the problem of the version of pytorch.
But I have tried pytorch 1.0/1.1 and cuda 9.0/10.0. The probelm was still there.

Have you met this error before?

Optimizer.step() & scheduler.step() in wrong order when restoring

Hello,

first of all, thank you for your amazing work.

I am using pytorch 1.5.1 and I receive the warining that lr_scheduler.step() is called before optimizer.step() when I restore a checkpoint in order to continue a training. I don't get this warning when I start a new training by scratch.

Is this normal or should I be worried?

Thanks

How to get the visual results?

Hi,
Thanks for your code.
I don't know how to get the visual results such as :

Thank you very much for your reply!

Dynamic local filtering?

Hi,
Very interesting work! I have some problems about dynamic local filtering.
Line 23 in resnet_dilate.py. In my understanding, shift with different dilation rates (e.g., 1) should only operate on the depth feature. However, in this code, it seems that both the pad_depth and pad_x have the same shift step size. If so, it may be more like direct element-wise product between pad_depth and pad_x. Is there something wrong with my understanding?

Actually, this is a very thoughful idea about depth-guided conv.