Git Product home page Git Product logo

densefusion's Introduction

DenseFusion

News

We have released the code and arXiv preprint for our new project 6-PACK which is based on this work and used for category-level 6D pose tracking.

Table of Content

Overview

This repository is the implementation code of the paper "DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion"(arXiv, Project, Video) by Wang et al. at Stanford Vision and Learning Lab and Stanford People, AI & Robots Group. The model takes an RGB-D image as input and predicts the 6D pose of the each object in the frame. This network is implemented using PyTorch and the rest of the framework is in Python. Since this project focuses on the 6D pose estimation process, we do not specifically limit the choice of the segmentation models. You can choose your preferred semantic-segmentation/instance-segmentation methods according to your needs. In this repo, we provide our full implementation code of the DenseFusion model, Iterative Refinement model and a vanilla SegNet semantic-segmentation model used in our real-robot grasping experiment. The ROS code of the real robot grasping experiment is not included.

Requirements

  • Python 2.7/3.5/3.6 (If you want to use Python2.7 to run this repo, please rebuild the lib/knn/ (with PyTorch 0.4.1).)
  • PyTorch 0.4.1 (PyTroch 1.0 branch)
  • PIL
  • scipy
  • numpy
  • pyyaml
  • logging
  • matplotlib
  • CUDA 7.5/8.0/9.0 (Required. CPU-only will lead to extreme slow training speed because of the loss calculation of the symmetry objects (pixel-wise nearest neighbour loss).)

Code Structure

  • datasets
    • datasets/ycb
      • datasets/ycb/dataset.py: Data loader for YCB_Video dataset.
      • datasets/ycb/dataset_config
        • datasets/ycb/dataset_config/classes.txt: Object list of YCB_Video dataset.
        • datasets/ycb/dataset_config/train_data_list.txt: Training set of YCB_Video dataset.
        • datasets/ycb/dataset_config/test_data_list.txt: Testing set of YCB_Video dataset.
    • datasets/linemod
      • datasets/linemod/dataset.py: Data loader for LineMOD dataset.
      • datasets/linemod/dataset_config:
        • datasets/linemod/dataset_config/models_info.yml: Object model info of LineMOD dataset.
  • replace_ycb_toolbox: Replacement codes for the evaluation with YCB_Video_toolbox.
  • trained_models
    • trained_models/ycb: Checkpoints of YCB_Video dataset.
    • trained_models/linemod: Checkpoints of LineMOD dataset.
  • lib
    • lib/loss.py: Loss calculation for DenseFusion model.
    • lib/loss_refiner.py: Loss calculation for iterative refinement model.
    • lib/transformations.py: Transformation Function Library.
    • lib/network.py: Network architecture.
    • lib/extractors.py: Encoder network architecture adapted from pspnet-pytorch.
    • lib/pspnet.py: Decoder network architecture.
    • lib/utils.py: Logger code.
    • lib/knn/: CUDA K-nearest neighbours library adapted from pytorch_knn_cuda.
  • tools
    • tools/_init_paths.py: Add local path.
    • tools/eval_ycb.py: Evaluation code for YCB_Video dataset.
    • tools/eval_linemod.py: Evaluation code for LineMOD dataset.
    • tools/train.py: Training code for YCB_Video dataset and LineMOD dataset.
  • experiments
    • experiments/eval_result
      • experiments/eval_result/ycb
        • experiments/eval_result/ycb/Densefusion_wo_refine_result: Evaluation result on YCB_Video dataset without refinement.
        • experiments/eval_result/ycb/Densefusion_iterative_result: Evaluation result on YCB_Video dataset with iterative refinement.
      • experiments/eval_result/linemod: Evaluation results on LineMOD dataset with iterative refinement.
    • experiments/logs/: Training log files.
    • experiments/scripts
      • experiments/scripts/train_ycb.sh: Training script on the YCB_Video dataset.
      • experiments/scripts/train_linemod.sh: Training script on the LineMOD dataset.
      • experiments/scripts/eval_ycb.sh: Evaluation script on the YCB_Video dataset.
      • experiments/scripts/eval_linemod.sh: Evaluation script on the LineMOD dataset.
  • download.sh: Script for downloading YCB_Video Dataset, preprocessed LineMOD dataset and the trained checkpoints.

Datasets

This work is tested on two 6D object pose estimation datasets:

  • YCB_Video Dataset: Training and Testing sets follow PoseCNN. The training set includes 80 training videos 0000-0047 & 0060-0091 (choosen by 7 frame as a gap in our training) and synthetic data 000000-079999. The testing set includes 2949 keyframes from 10 testing videos 0048-0059.

  • LineMOD: Download the preprocessed LineMOD dataset (including the testing results outputted by the trained vanilla SegNet used for evaluation).

Download YCB_Video Dataset, preprocessed LineMOD dataset and the trained checkpoints (You can modify this script according to your needs.):

./download.sh

Training

  • YCB_Video Dataset: After you have downloaded and unzipped the YCB_Video_Dataset.zip and installed all the dependency packages, please run:
./experiments/scripts/train_ycb.sh
  • LineMOD Dataset: After you have downloaded and unzipped the Linemod_preprocessed.zip, please run:
./experiments/scripts/train_linemod.sh

Training Process: The training process contains two components: (i) Training of the DenseFusion model. (ii) Training of the Iterative Refinement model. In this code, a DenseFusion model will be trained first. When the average testing distance result (ADD for non-symmetry objects, ADD-S for symmetry objects) is smaller than a certain margin, the training of the Iterative Refinement model will start automatically and the DenseFusion model will then be fixed. You can change this margin to have better DenseFusion result without refinement but it's inferior than the final result after the iterative refinement.

Checkpoints and Resuming: After the training of each 1000 batches, a pose_model_current.pth / pose_refine_model_current.pth checkpoint will be saved. You can use it to resume the training. After each testing epoch, if the average distance result is the best so far, a pose_model_(epoch)_(best_score).pth / pose_model_refiner_(epoch)_(best_score).pth checkpoint will be saved. You can use it for the evaluation.

Notice: The training of the iterative refinement model takes some time. Please be patient and the improvement will come after about 30 epoches.

  • vanilla SegNet: Just run:
cd vanilla_segmentation/
python train.py --dataset_root=./datasets/ycb/YCB_Video_Dataset

To make the best use of the training set, several data augementation techniques are used in this code:

(1) A random noise is added to the brightness, contrast and saturation of the input RGB image with the torchvision.transforms.ColorJitter function, where we set the function as torchvision.transforms.ColorJitter(0.2, 0.2, 0.2, 0.05).

(2) A random pose translation noise is added to the training set of the pose estimator, where we set the range of the translation noise to 3cm for both datasets.

(3) For the YCB_Video dataset, since the synthetic data is not contain background. We randomly select the real training data as the background. In each frame, we also randomly select two instances segmentation clips from another synthetic training image to mask at the front of the input RGB-D image, so that more occlusion situations can be generated.

Evaluation

Evaluation on YCB_Video Dataset

For fair comparison, we use the same segmentation results of PoseCNN and compare with their results after ICP refinement. Please run:

./experiments/scripts/eval_ycb.sh

This script will first download the YCB_Video_toolbox to the root folder of this repo and test the selected DenseFusion and Iterative Refinement models on the 2949 keyframes of the 10 testing video in YCB_Video Dataset with the same segmentation result of PoseCNN. The result without refinement is stored in experiments/eval_result/ycb/Densefusion_wo_refine_result and the refined result is in experiments/eval_result/ycb/Densefusion_iterative_result.

After that, you can add the path of experiments/eval_result/ycb/Densefusion_wo_refine_result/ and experiments/eval_result/ycb/Densefusion_iterative_result/ to the code YCB_Video_toolbox/evaluate_poses_keyframe.m and run it with MATLAB. The code YCB_Video_toolbox/plot_accuracy_keyframe.m can show you the comparsion plot result. You can easily make it by copying the adapted codes from the replace_ycb_toolbox/ folder and replace them in the YCB_Video_toolbox/ folder. But you might still need to change the path of your YCB_Video Dataset/ in the globals.m and copy two result folders(Densefusion_wo_refine_result/ and Densefusion_iterative_result/) to the YCB_Video_toolbox/ folder.

Evaluation on LineMOD Dataset

Just run:

./experiments/scripts/eval_linemod.sh

This script will test the models on the testing set of the LineMOD dataset with the masks outputted by the trained vanilla SegNet model. The result will be printed at the end of the execution and saved as a log in experiments/eval_result/linemod/.

Results

  • YCB_Video Dataset:

Quantitative evaluation result with ADD-S metric compared to other RGB-D methods. Ours(per-pixel) is the result of the DenseFusion model without refinement and Ours(iterative) is the result with iterative refinement.

Important! Before you use these numbers to compare with your methods, please make sure one important issus: One difficulty for testing on the YCB_Video Dataset is how to let the network to tell the difference between the object 051_large_clamp and 052_extra_large_clamp. The result of all the approaches in this table uses the same segmentation masks released by PoseCNN without any detection priors, so all of them suffer a performance drop on these two objects because of the poor detection result and this drop is also added to the final overall score. If you have added detection priors to your detector to distinguish these two objects, please clarify or do not copy the overall score for comparsion experiments.

  • LineMOD Dataset:

Quantitative evaluation result with ADD metric for non-symmetry objects and ADD-S for symmetry objects(eggbox, glue) compared to other RGB-D methods. High performance RGB methods are also listed for reference.

The qualitative result on the YCB_Video dataset.

Trained Checkpoints

You can download the trained DenseFusion and Iterative Refinement checkpoints of both datasets from Link.

Tips for your own dataset

As you can see in this repo, the network code and the hyperparameters (lr and w) remain the same for both datasets. Which means you might not need to adjust too much on the network structure and hyperparameters when you use this repo on your own dataset. Please make sure that the distance metric in your dataset should be converted to meter, otherwise the hyperparameter w need to be adjusted. Several useful tools including LabelFusion and sixd_toolkit has been tested to work well. (Please make sure to turn on the depth image collection in LabelFusion when you use it.)

Citations

Please cite DenseFusion if you use this repository in your publications:

@article{wang2019densefusion,
  title={DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion},
  author={Wang, Chen and Xu, Danfei and Zhu, Yuke and Mart{\'\i}n-Mart{\'\i}n, Roberto and Lu, Cewu and Fei-Fei, Li and Savarese, Silvio},
  booktitle={Computer Vision and Pattern Recognition (CVPR)},
  year={2019}
}

License

Licensed under the MIT License

densefusion's People

Contributors

danfeix avatar huckl3b3rry87 avatar huipengly avatar j96w avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

densefusion's Issues

3d bbox

Hello,
There are many 3d bboxes in the figures of this paper. Could you tell me how to draw the 3d bbox with output of this network(R and T)?
thanks for help ๏ผ
@j96w

Testing on my own data is not correct

Thank you very much for sharing the good code, I have run and test the code on YCB data correctly, I can also draw correct 3d box and point cloud on 2d-image plane. but when I use my own data to test the trained model on YCB, the segmentation result is correct but the pose estimation is not yet, the depth image is stored in png format, uint16, shown as following, could you give me some advice?
Predicted_3DBox_0001
Predicted_Pose_0001
Normalized_depth_0001
Predicted_Label_0001

About data augmentation implementation

In readme, it says:

For the YCB_Video dataset, since the synthetic data do not contain background. We randomly select the real training data as the background. In each frame, we also randomly select two instances segmentation clips from another synthetic training image to mask at the front of the input RGB-D image, so that more occlusion situations can be generated.

But I did not find such implementation in data loader. Is it somewhere else?
BTW, this work is awesome!

KNN segmentation fault

When I try to train on LINEMOD, I met this:

./experiments/scripts/train_linemod.sh: line 10: 10128 Segmentation fault python3 ./tools/train.py --dataset linemod --dataset_root ./datasets/linemod/Linemod_preprocessed

I have verified that it happens at this line

Could you point me to any solutions? Thanks!

Evaluation on LineMOD

Thank you for your help. Sorry to bother you again. I still have some questions.
1.The max default number of epochs to train is 500. When training 104 epochs on ycb, the dis is 0.0090381440936. But it costs long time. And the ycb
trained_models you provide is pose_refine_model_69_0.009449292959118935.pth. So how to determine the epoch of training for pose refine model? The lower the dis, the better the model? How to prevent overfitting?

2.When evaluation on LineMOD Dataset, the final content of eval_result_logs.txt is as follows:

No.13390 NOT Pass! Distance: 0.0217275395989
No.13391 Pass! Distance: 0.0048154219985
No.13392 Pass! Distance: 0.0142814125866
No.13393 Pass! Distance: 0.00356977432966
No.13394 Pass! Distance: 0.00472941761836
No.13395 Pass! Distance: 0.00784354563802
No.13396 Pass! Distance: 0.0127922594547
No.13397 Pass! Distance: 0.00581285078079
No.13398 NOT Pass! Distance: 0.0267377775162
No.13399 Pass! Distance: 0.00628646928817
No.13400 Pass! Distance: 0.0146940667182
No.13401 NOT Pass! Distance: 0.0340396799147
No.13402 NOT Pass! Distance: 0.0713591426611
No.13403 NOT Pass! Distance: 0.0522820688784
No.13404 Pass! Distance: 0.0038491380401
No.13405 NOT Pass! Distance: 0.0213586390018
No.13406 Pass! Distance: 0.00927203428
Object 1 success rate: 0
Object 2 success rate: 0
Object 4 success rate: 0
Object 5 success rate: 0
Object 6 success rate: 0
Object 8 success rate: 0
Object 9 success rate: 0
Object 10 success rate: 0
Object 11 success rate: 1
Object 12 success rate: 0
Object 13 success rate: 0
Object 14 success rate: 0
Object 15 success rate: 0
ALL success rate: 0

Is anything wrong with evaluation on LineMOD Dataset๏ผŸHow can I get an evaluation result similar to YCB_Video Dataset run with MATLAB?

Thank you in advance.

Originally posted by @sunshantong in #7 (comment)

Real-time pose detection

Hello.
Thank you for sharing the good code.
I wanna test the model trained with my own data.
But, the evaluation code seems to require the information in the meta file.('cam_t_m2c','cam_R_m2c'..)
I think that it is difficult to test the model in real-time with this code.
Is there a code that can only give pose in real-time? If you don't, how can i modify it and try it?
I'll be waiting for the reply.
Thank you.

Data visualisation

Hi @j96w , thank's for your work , i'm really stack on visualization part for days , could you just explain to me ( if it possible with enough details pls ) how to show , detected objects of linemod data on the videos files ( or data photos ) in real time ,

I trained data and also make an evaluation , and result was as expected , but i wan't this part of visualization in real time how to this see object mesh on videos , and also how to work with a camera in real time ,

Thank you in advance .
Hamza

Why adding "points" in loss.py?

Thanks for sharing the code, amazing work!

I am reading your code, but I found that in loss.py file line 38, when you calculate the loss, why you also add points? see the copied code below:

pred= torch.add(torch.bmm(model_points, base), points + pred_t)

how to get the numbers of epochs of the training .

HI @j96w , i just wanted to know how can we get the number of epoches in orther to estimate our time of training , because i'm waiting it to finish training . I was supposing , thank's to README.txt , that epoches=30 , but it's not right .

17:38:09,448 : Train time 16h 25m 47s Epoch 39 Batch 8046 Frame 32184 Avg_dis:0.0042653061100281775
2019-03-24 17:38:09,549 : Train time 16h 25m 47s Epoch 39 Batch 8047 Frame 32188 Avg_dis:0.0038781535113230348
2019-03-24 17:38:09,660 : Train time 16h 25m 47s Epoch 39 Batch 8048 Frame 32192 Avg_dis:0.003846941574010998
2019-03-24 17:38:09,777 : Train time 16h 25m 47s Epoch 39 Batch 8049 Frame 32196 Avg_dis:0.002740198280662298
2019-03-24 17:38:09,915 : Train time 16h 25m 47s Epoch 39 Batch 8050 Frame 32200 Avg_dis:0.003869467240292579
2019-03-24 17:38:10,033 : Train time 16h 25m 47s Epoch 39 Batch 8051 Frame 32204 Avg_dis:0.004200253228191286

how many epoches are fixed for linemod training, Thank you in advance .

Question on loss calculation

Hi๏ผŒ

Thank you for sharing the codes and I have several questions on the loss calculation :

Q1.

t = ori_t[which_max[0]] + points[which_max[0]]
Why do you combine one point from point clouds with pred_t here?

Q2.

new_target = torch.bmm((new_target - ori_t), ori_base).contiguous()
I am sorry that I am confused about the reason of updating with this method.
Why the point cloud is updated by subtracting pred_t before rotation? Shouldn't it be updated the same as prediction(adding pred_t after rotation) ?

Following is some of my understanding:

  1. To predict a residual pose, we can:
    1) update the model points same as prediction, while keeping the target the same;
    2) or keep the model points the same an update the target in a reverse way.

In the codes, the point clouds and target are updated in the same way, and it's hard for me to understand. Could you please help to explain?

Thank you and looking forward to your reply.

Best,
Stacey

PoseNet(No refine) model evaluate result does not match

On linemod dataset, we evaluated the model provided (trained_models/linemod/pose_model_9_0.01310166542980859.pth) without refinement and the success rate is 0.83169, which does not match what is claimed in the the paper (per-pixel: 86.2). Is the provided PoseNet model used to evaluate the per-pixel performance? If not, could you please provide the model used to evaluate the per-pixel performance?
Thank you!

About batch and frame

Hi @j96w ! Thank you for your work.
line 131~154 in train.py

            for i, data in enumerate(dataloader, 0):
                #โ€ฆโ€ฆโ€ฆโ€ฆ
                train_count += 1

                if train_count % opt.batch_size == 0:
                    logger.info('Train time {0} Epoch {1} Batch {2} Frame {3} Avg_dis:{4}'.format(time.strftime("%Hh %Mm %Ss", time.gmtime(time.time() - st_time)), epoch, int(train_count / opt.batch_size), train_count, train_dis_avg / opt.batch_size))

I think one iteration is one batch, so batch number should be equivalent to train_count.
In logger.info, why batch number equals int(train_count / opt.batch_size), and frame number equals train_count?

Generalization Ability of the Method

Hi dear authors,

I was thinking of using the method for kinda small objects (chip, ring, metal pieces, etc. found in devices such as this) and wondered if the method could actually generalize, or do we really need to have the precise 3D model of what we are looking for? Because it is impossible to have the models for each and every thing in this domain, so at some point the method should generalize. Would it actually follow this line of thought?

Regards,

ROS Package

Would it be possible if you'd include the ROS package used in the demo video ?
Was the inference time affected when the code was deployed to ROS ?
Thank you.

Incorrect visualization result on YCB dataset

Hello, thanks for sharing your code!!
I tried to visualize the output on YCB test set, but the result doesn't not align with that in Fig 4. in your paper.
Here is one of my visualization results.
The up left image is generated using the ground truth R and T in xxx-mata.mat, and it is correct. The down left and down right images are generated by simply replacing the R and T by those in mat files in result_wo_refine_dir and result_refine_dir. Pose estimation in these two images seems to go wrong.
420_gt
The visualization process is to transfer points in points.xyz with R and T and then scale and transform those points to fit into the object's tight bounding box.
I used trained checkpoints provided by you.

May I ask your suggestion on what the problem might be?
Thanks for your time.

Training on Multiple GPUs

Hello, thanks for sharing the code
I am new to PyTorch. As far as I understand, the code provided/ DataParallel function should run training(Posenet) on multiple gpus however, in my case it is using only one out of four gpus. I have a gpu cluster of 4x 1080ti. Also, torch.cuda.device_count() shows 4. Could you please tell me if you have used multiple gpus for training and is there something I am missing here?

Error - train_ycb.sh - Pytorch-1.0

I used Pytorch-1.0 branch and ran ./experiments/scripts/train_ycb.sh.
Then I got the error:

pred = torch.add(torch.bmm(model_points, base), points + pred_t) RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:441

My system:

  • GeForce RTX 2080 Ti
  • nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04

About image normalization

The documents of "torvision.model" have said that
The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225].

The cropped image have been directly normalized without scaled into [0, 1]ใ€‚
Is there any bug though it work?

About the refine_margin

@j96w Thanks for your work!
Here is my question. The condition that "best_test < opt.refine_margin(0.013)" is achieved after about 10 epochs, which means the posenet is just trained 10 times. However, I found the best_test is about 0.006 after training the refineNet for more than 400 times. So, is the refine_margin a little big? Can I set it smaller (eg, 0.008) to train the posenet more times? Or you have found that a refine_margin value smaller than 0.013 will lead to overfitting on the posenet?

undefined symbol:_Py_Dealloc

When trying to run the training script ./train_ycb.sh on Ubuntu with Python2.7.12, it fails with the stack trace:

  • set -e
  • export PYTHONUNBUFFERED=True
  • export CUDA_VISIBLE_DEVICES=0
  • python ./tools/train.py --dataset ycb --dataset_root ./data/YCB_Video_Dataset
    /home/user/Work/DenseFusion/lib/transformations.py:1912: UserWarning: failed to import module _transformations
    warnings.warn('failed to import module %s' % name)
    Traceback (most recent call last):
    File "./tools/train.py", line 26, in
    from lib.loss import Loss
    File "/home/user/Work/DenseFusion/lib/loss.py", line 9, in
    from lib.knn.init import KNearestNeighbor
    File "/home/user/Work/DenseFusion/lib/knn/init.py", line 7, in
    from lib.knn import knn_pytorch as knn_pytorch
    File "/home/user/Work/DenseFusion/lib/knn/knn_pytorch/init.py", line 3, in
    from ._knn_pytorch import lib as _lib, ffi as _ffi
    ImportError: /home/user/Work/DenseFusion/lib/knn/knn_pytorch/_knn_pytorch.so: undefined symbol: _Py_Dealloc

I want to addressed via building _knn_pytorch.so with Py2, but it still fails with the stack trace:

python2 build_ffi.py
Traceback (most recent call last):
File "build_ffi.py", line 19, in
include_dirs=[osp.join(abs_path, 'include')]
File "/usr/local/lib/python2.7/dist-packages/torch/utils/ffi/init.py", line 176, in create_extension
ffi = cffi.FFI()
File "/usr/local/lib/python2.7/dist-packages/cffi/api.py", line 46, in init
import _cffi_backend as backend
ImportError: /usr/local/lib/python2.7/dist-packages/_cffi_backend.so: undefined symbol: PyUnicodeUCS2_FromUnicode
Makefile:31: recipe for target 'build/knn_pytorch/_knn_pytorch.so' failed
make: *** [build/knn_pytorch/_knn_pytorch.so] Error 1

ImportError : torch.utils.ffi requires the cffi pacakge

Hi
when trying to run the training script experiments/scripts/train_ycb.sh on ubuntu with python2.7.12,
it fails with the stack trace:

set -e
export PYTHONUNBUFFERED=True
EXPORT CUDA_VISIBLE_DEVICES=0
python2 ./tools/train.py --dataset ycb --dataset_root ./datasets/ycb/YCB_VIdeoDataset
/root/catkin_ws/densefusion/DenseFusion/lib/transformations.py:1912: UserWarning: failed to import module _transformations

warnings.warn('failed to import module %s' % name)

Traceback (most recent call last):
File "./tools/train.py", line 26, in
from lib.loss import Loss
File "/root/catkin_ws/densefusion/DenseFusion/lib/loss.py", line 9, in
from lib.knn.init import KNearestNeighbor
File "/root/catkin_ws/densefusion/DenseFusion/lib/knn/init.py", line 7, in
from lib.knn import knn_pytorch as knn_pytorch
File "/root/catkin_ws/densefusion/DenseFusion/lib/knn/knn_pytorch/init.py", line 2, in
from torch.utils.ffi import _wrap_function
File "/usr/local/lib/python2.7/dist-packages/torch/utils/ffi/init.py", line 14, in
raise ImportError("torch.utils.ffi requires the cffi package")

ImportError: torch.utils.ffi requires the cffi package

I access DenseFusion/lib/knn folder and revised Makefile PYTHON:= python2
I run $make

python2 build_ffi.py
Traceback (most recent call last):
File "build_ffi.py", line 5, in
from torch.utils.ffi import create_extension
File "/usr/local/lib/python2.7/dist-packages/torch/utils/ffi/init.py", line 14, in
raise ImportError("torch.utils.ffi requires the cffi package")
ImportError: torch.utils.ffi requires the cffi package
Makefile:31: recipe for target 'build/knn_pytorch/_knn_pytorch.so' failed
make: *** [build/knn_pytorch/_knn_pytorch.so] Error 1

make with conda 2 (python 2) ; doesn't work

hi @j96w , i'm working on conda 2.7 , so i built the knn_torch with python 2 in conda environment and always get this problem with modules ,

set -e
export PYTHONUNBUFFERED=True
PYTHONUNBUFFERED=True
export CUDA_VISIBLE_DEVICES=0
CUDA_VISIBLE_DEVICES=0
python ./tools/train.py --dataset linemod --dataset_root ./datasets/linemod/Linemod_preprocessed
Traceback (most recent call last):
File "./tools/train.py", line 12, in
import numpy as np
ImportError: No module named numpy

this is the make by python2 :
..$ make
python2 build_ffi.py
generating /tmp/tmpmsC9ic/_knn_pytorch.c
setting the current directory to '/tmp/tmpmsC9ic'
running build_ext
building '_knn_pytorch' extension
creating home
creating home/hamza
creating home/hamza/Tรฉlรฉchargements
creating home/hamza/Tรฉlรฉchargements/DenseFusion-master
creating home/hamza/Tรฉlรฉchargements/DenseFusion-master/lib
creating home/hamza/Tรฉlรฉchargements/DenseFusion-master/lib/knn
creating home/hamza/Tรฉlรฉchargements/DenseFusion-master/lib/knn/src
gcc -pthread -B /home/hamza/anaconda2/envs/py27/compiler_compat -Wl,--sysroot=/ -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/hamza/anaconda2/envs/py27/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include -I/home/hamza/anaconda2/envs/py27/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/hamza/anaconda2/envs/py27/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/hamza/Tรฉlรฉchargements/DenseFusion-master/lib/knn/include -I/home/hamza/anaconda2/envs/py27/include/python2.7 -c _knn_pytorch.c -o ./_knn_pytorch.o -std=c99
gcc -pthread -B /home/hamza/anaconda2/envs/py27/compiler_compat -Wl,--sysroot=/ -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/hamza/anaconda2/envs/py27/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include -I/home/hamza/anaconda2/envs/py27/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/hamza/anaconda2/envs/py27/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/hamza/Tรฉlรฉchargements/DenseFusion-master/lib/knn/include -I/home/hamza/anaconda2/envs/py27/include/python2.7 -c /home/hamza/Tรฉlรฉchargements/DenseFusion-master/lib/knn/src/knn_pytorch.c -o ./home/hamza/Tรฉlรฉchargements/DenseFusion-master/lib/knn/src/knn_pytorch.o -std=c99
gcc -pthread -shared -B /home/hamza/anaconda2/envs/py27/compiler_compat -L/home/hamza/anaconda2/envs/py27/lib -Wl,-rpath=/home/hamza/anaconda2/envs/py27/lib -Wl,--no-as-needed -Wl,--sysroot=/ ./_knn_pytorch.o ./home/hamza/Tรฉlรฉchargements/DenseFusion-master/lib/knn/src/knn_pytorch.o /home/hamza/Tรฉlรฉchargements/DenseFusion-master/lib/knn/build/knn_cuda_kernel.so /usr/local/cuda/lib64/libnppitc_static.a /usr/local/cuda/lib64/libcublas_device.a /usr/local/cuda/lib64/libnppc_static.a /usr/local/cuda/lib64/libcusparse_static.a /usr/local/cuda/lib64/libcublas_static.a /usr/local/cuda/lib64/libcufftw_static.a /usr/local/cuda/lib64/libnppicom_static.a /usr/local/cuda/lib64/libnppicc_static.a /usr/local/cuda/lib64/libcufft_static.a /usr/local/cuda/lib64/libcudnn_static.a /usr/local/cuda/lib64/libnppist_static.a /usr/local/cuda/lib64/libcurand_static.a /usr/local/cuda/lib64/libnpps_static.a /usr/local/cuda/lib64/libnppig_static.a /usr/local/cuda/lib64/libcudadevrt.a /usr/local/cuda/lib64/libnppidei_static.a /usr/local/cuda/lib64/libnppisu_static.a /usr/local/cuda/lib64/libnvgraph_static.a /usr/local/cuda/lib64/libcusolver_static.a /usr/local/cuda/lib64/libnppif_static.a /usr/local/cuda/lib64/libnppim_static.a /usr/local/cuda/lib64/libculibos.a /usr/local/cuda/lib64/libcudart_static.a /usr/local/cuda/lib64/libnppial_static.a -L/home/hamza/anaconda2/envs/py27/lib -lpython2.7 -o ./_knn_pytorch.so

I'm working with linemod data

i tried it also outside conda environment , always the same problem . i don't know if i miss something !
thank you in advance

some confusion about loss calculation

Thank you for sharing your codes.
I have some confusion about the dis_calculation in the stage of pose estimation.
pred = torch.add(torch.bmm(model_points, base), points + pred_t)
why not๏ผš
pred = torch.add(torch.bmm(model_points, base), pred_t)
Thank you and looking forward to your reply.

how to make my own dataset?

Dear sir , i am a freshman in 6d pose , i am tring to learn from your code ,but ,when i need to build my own dataset on a specific object like my own thing or some other thing .how to make ur own dataset?

Potential bug in lib.loss.loss_calculation

Disclaimer: I haven't read the paper

That being said, this looks very suspicious.

pred = torch.add(torch.bmm(model_points, base), points + pred_t)

It should be only

pred = torch.add(torch.bmm(model_points, base), pred_t)

like you have in loss_refiner.py. I don't see a reason to add the points you acquired from the camera here. Especially because you compute the point to point distance error a couple of lines below

dis = torch.mean(torch.norm((pred - target), dim=2), dim=1)

potential leaky information was uesd in eval linemod datasets

In eval_linemod.py, the code still use the rmin, rmax, cmin, cmax = get_bbox(meta['obj_bb']) to get the rmin, rmax, cmin, cmax. That's the important process for the image crop. In my opinion, gt.yaml is the groundtruth for the objects, and the obj_bb is the 2d bounding box. I don't know whether the code is right. Maybe I was wrong.
thank you =ใ€‚=

module 'lib.knn.knn_pytorch' has no attribute 'knn'

Hi, thanks for sharing your code!
I downloaded the latest version of DenseFusion-Pytorch-1.0. When I ran ./experiments/scripts/train_ycb.sh, I got an error 'AttributeError: module 'lib.knn.knn_pytorch' has no attribute 'knn'' . My python version is 3.6.8 |Anaconda custom (64-bit) and the pytorch version is 1.0.1.post2.

I tried to insert pdb.set_trace() in the line 20 of ./lib/knn/_init_.py

inds = torch.empty(query.shape[0], self.k, query.shape[2]).long().cuda()

#import pdb
#pdb.set_trace()
knn_pytorch.knn(ref, query, inds)

return inds

I print dir(knn_pytorch) which shows the following messages:

(Pdb) p dir(knn_pytorch)
['__doc__', '__loader__', '__name__', '__package__', '__path__', '__spec__']

It seems that the module knn_pytorch doesn't have the knn. How can I solve this error? Please help me.

The details are as follows:

+ set -e
+ export PYTHONUNBUFFERED=True
+ PYTHONUNBUFFERED=True
+ export CUDA_VISIBLE_DEVICES=0
+ CUDA_VISIBLE_DEVICES=0
+ python3 ./tools/train.py --dataset ycb --dataset_root ./datasets/ycb/YCB_Video_Dataset
/home/qingqing/Downloads/qingqing_disk/p4600_disk/DenseFusion/lib/transformations.py:1912:      UserWarning: failed to import module _transformations
 warnings.warn('failed to import module %s' % name)
96189
2949
>>>>>>>>----------Dataset loaded!---------<<<<<<<<
length of the training set: 96189
length of the testing set: 2949
number of sample points on mesh: 500
symmetry object list: [12, 15, 18, 19, 20]
/home/qingqing/anaconda3/lib/python3.6/site-packages/torch/nn/_reduction.py:49: UserWarning:    size_average and reduce args will be deprecated, please use reduction='mean' instead.
 warnings.warn(warning.format(ret))
2019-04-23 10:20:35,077 : Train time 00h 00m 00s, Training started
/home/qingqing/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py:2351:   UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
 warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
/home/qingqing/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py:2423:    UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False    since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/qingqing/anaconda3/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:129:     UserWarning: nn.Upsample is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.{} is deprecated. Use nn.functional.interpolate instead.".format(self.name))
/home/qingqing/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py:92:     UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to   include dim=X as an argument.
  input = module(input)
2019-04-23 10:20:36,413 : Train time 00h 00m 01s Epoch 1 Batch 1 Frame 8     Avg_dis:0.1779076661914587
Traceback (most recent call last):
  File "./tools/train.py", line 237, in <module>
    main()
  File "./tools/train.py", line 140, in main
    loss, dis, new_points, new_target = criterion(pred_r, pred_t, pred_c, target, model_points, idx,     points, opt.w, opt.refine_start)
  File "/home/qingqing/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line     489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/qingqing/Downloads/qingqing_disk/p4600_disk/DenseFusion/lib/loss.py", line 83, in forward
    return loss_calculation(pred_r, pred_t, pred_c, target, model_points, idx, points, w, refine, self.num_pt_mesh, self.sym_list)
  File "/home/qingqing/Downloads/qingqing_disk/p4600_disk/DenseFusion/lib/loss.py", line 44, in loss_calculation
    inds = knn(target.unsqueeze(0), pred.unsqueeze(0))
  File "/home/qingqing/Downloads/qingqing_disk/p4600_disk/DenseFusion/lib/knn/__init__.py", line 23, in forward
    knn_pytorch.knn(ref, query, inds)
AttributeError: module 'lib.knn.knn_pytorch' has no attribute 'knn'

Segmentation fault

Hi, Thanks for sharing the code!!!

I ran download.sh file and success

so I run

sh .experiments/scripts/train_linemod.sh

after object buffer loaded, I get this error

----------Dataset loaded!---------<<<<<<<<
length of the training set: 2373
length of the testing set: 1336
number of sample points on mesh: 500
symmetry object list: [7, 8]
2019-04-08 03:59:01,660 : Train time 00h 00m 00s, Training started
/home/user/miniconda/envs/py36/lib/python3.6/site-packages/torch/nn/functional.py:1749: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/home/user/miniconda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/container.py:91: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
input = module(input)
Segmentation fault (core dumped)

How can I solve this error and run train code

Undefined symbol error inside knn while training

I followed the issue #33 and extracted egg file and moved *so and knn_python.py to the root knn dir.
After that when I run ./experiments/scripts/train_ycb.sh
it shows the error as below,
`

  • set -e
  • export PYTHONUNBUFFERED=True
  • export CUDA_VISIBLE_DEVICES=0
  • python3 ./tools/train.py --dataset ycb --dataset_root ./datasets/ycb/YCB_Video_Dataset
    /home/taeuk/network/DenseFusion/DenseFusion-Pytorch-1.0/lib/transformations.py:1912: UserWarning: failed to import module _transformations
    warnings.warn('failed to import module %s' % name)
    Traceback (most recent call last):
    File "./tools/train.py", line 26, in
    from lib.loss import Loss
    File "/home/taeuk/network/DenseFusion/DenseFusion-Pytorch-1.0/lib/loss.py", line 9, in
    from lib.knn.init import KNearestNeighbor
    File "/home/taeuk/network/DenseFusion/DenseFusion-Pytorch-1.0/lib/knn/init.py", line 7, in
    from lib.knn import knn_pytorch as knn_pytorch
    ImportError: /home/taeuk/network/DenseFusion/DenseFusion-Pytorch-1.0/lib/knn/knn_pytorch.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC1ENS_14SourceLocationERKSs
    `

my system is:
Ubuntu 16.04
GPU : RTX2080 ti
CUDA 10.0
python 3.6.8
pytorch 1.01

any small ideas would be precious for me.
Thanks.

problem to run YCB_Video_toolbox/evaluate_poses_ keyframe.m

Hi, when I use YCB_Video_toolbox to run the matlab code "YCB_Video_toolbox/evaluate_poses_
keyframe.m
", I meet the problem! Who can help me ? Thanks a lot in advance!

Error using load
Unable to read file 'Densefusion_iterative_result/0026.mat'. No such file or directory.

Error in evaluate_poses_keyframe (line 50)
    result_my = load(filename);

what is "start_epoch" used for?

Hey, I want to ask what the option "start_epoch" is used for? My understanding is that if the model was trained for example until epoch 42 and the process was terminated, next time it can be set to start from epoch 42 if the option "--start_epoch= 42" is given, is that correct?

I have observed that the "Avg_dist" is much higher than last time before the process is terminated, e.g., the model was trained until 42 epoch and the "Avg_dist" is like 0.00xx before the process is terminated, next time when the process is restarted, it's like 0,0x even if the "--start_epoch=42" is set. Is that normal?

Thanks in advance!

pointfeat_1 meaning

Hi, a PointNet-based network that processes each point in the masked 3D point cloud to a geometric feature embedding, said in the paper. But in the code implementation only convolution and ReLU of point cloud x in network.py. No spacial transform networks(STN) and maxpooling operation on point cloud. These are the two most important features of PointNet.

` Class PoseNetFeat (nn. Module):
def init(self, num_points):
super(PoseNetFeat, self).init()
self.conv1 = torch.nn.Conv1d(3, 64, 1)
self.conv2 = torch.nn.Conv1d(64, 128, 1)

    self.e_conv1 = torch.nn.Conv1d(32, 64, 1)
    self.e_conv2 = torch.nn.Conv1d(64, 128, 1)

    self.conv5 = torch.nn.Conv1d(256, 512, 1)
    self.conv6 = torch.nn.Conv1d(512, 1024, 1)

    self.ap1 = torch.nn.AvgPool1d(num_points)
    self.num_points = num_points
def forward(self, x, emb):
    x = F.relu(self.conv1(x))
    emb = F.relu(self.e_conv1(emb))
    pointfeat_1 = torch.cat((x, emb), dim=1)

    x = F.relu(self.conv2(x))
    emb = F.relu(self.e_conv2(emb))
    pointfeat_2 = torch.cat((x, emb), dim=1)

    x = F.relu(self.conv5(pointfeat_2))
    x = F.relu(self.conv6(x))

    ap_x = self.ap1(x)

    ap_x = ap_x.view(-1, 1024, 1).repeat(1, 1, self.num_points)
    return torch.cat([pointfeat_1, pointfeat_2, ap_x], 1) #128 + 256 + 1024`

I am confused. Would you explain this to me?
Thanks in advance.

I have some related question about this part.

  1. I'm confused in this code which variable represents color embeddings and geometry embeddings??
    I understood pointfeat_2(color embeding+geometry embeding), ap_x(global_feature) based on the paper embeding dimensions are 128 .

what's the meaning of

pointfeat_1 = torch.cat((x, emb), dim=1)
return torch.cat([pointfeat_1, pointfeat_2, ap_x], 1) #128 + 256 + 1024`

Would you explain about this part??
Thanks.

Originally posted by @trevor-taeyeop in #34 (comment)

Inconsistence between equation (2) in the paper and code implementation

Equation (2) in the paper detects the closest point from predicted model points to each of ground truth model points for symmetric shapes which is consistent with the equation (6) in PoseCNN paper, but in the code implementation (line 44-47 in lib/loss.py) you instead find the closest point from ground truth model to each predicted model point, which is opposite. Can you explain? Because those two metrics are different. Thanks!

Training vanilla SegNet

When I try to run
python3 train.py --dataset_root=./datasets/ycb/YCB_Video_Dataset

I get the following issue :

Traceback (most recent call last):
  File "train.py", line 69, in <module>
    for i, data in enumerate(dataloader, 0):
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 336, in __next__
    return self._process_next_batch(batch)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
FileNotFoundError: Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 106, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/media/intern/disk2/DenseFusion/vanilla_segmentation/data_controller.py", line 47, in __getitem__
    label = np.array(Image.open('{0}/{1}-label.png'.format(self.root, self.path[index])))
  File "/usr/local/lib/python3.5/dist-packages/PIL/Image.py", line 2652, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: './datasets/ycb/YCB_Video_Dataset/data_syn/002715-label.png'

I tried several times and got the same issue with different #-label.png files and the files are in the folder so I don't really understand the error. It is always the 22nd call to getitem that crash. Any idea ?
Thanks

Output Vanilla SegNet

I'm trying to use the output of vanilla SegNet network to label YCB-Video images but I don't find an efficient way to transform the 22*640*480 output into a single label image 640*480.
For the moment I'm using something like that:

seg_data = seg(rgb) #  output SegNet
seg_data = seg_data.detach().cpu().numpy()[0]
seg_image = np.zeros((480, 640))
obj_list = []
for i in range(480):
    for j in range(640):
        prob_max = 0
        label = 0
        for r in range(22):
            if seg_data[r][i][j] > prob_max:
                label = r
                prob_max = seg_data[r][i][j]
        seg_image[i][j] = label
        if label not in obj_list:
            obj_list.append(label)

How do you use the output for fast segmentation of an rgb image ?

RuntimeError: the derivative for 'index' is not implemented.

I followed @Mars-y470's suggestion and tried to recomplile DenseFusion/lib/knn. The problem #33 'module 'lib.knn.knn_pytorch' has no attribute 'knn'' was solved.
However, during the YCB training (I ran ./experiments/scripts/train_ycb.sh), I got an error 'RuntimeError: the derivative for 'index' is not implemented'. The details are as follows:

2019-05-11 22:29:54,240 : Test time 08h 36m 53s Test Frame No.2948 
dis:0.005122609902173281
2019-05-11 22:29:54,300 : Test time 08h 36m 53s Epoch 33 TEST FINISH Avg dis: 
0.01275980792574587
33 >>>>>>>>----------BEST TEST MODEL SAVED---------<<<<<<<<
96189
2949
>>>>>>>>----------Dataset loaded!---------<<<<<<<<
length of the training set: 96189
length of the testing set: 2949
number of sample points on mesh: 2600
symmetry object list: [12, 15, 18, 19, 20]
2019-05-11 22:29:54,795 : Train time 08h 36m 53s, Training started
Traceback (most recent call last):
  File "./tools/train.py", line 237, in <module>
    main()
  File "./tools/train.py", line 145, in main
    dis, new_points, new_target = criterion_refine(pred_r, pred_t, new_target, model_points, idx, 
new_points)
  File "/home/qingqing/anaconda3/lib/python3.6/site- 
packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/qingqing/Downloads/qingqing_disk/p4600_disk/DenseFusion/lib/loss_refiner.py", 
line 76, in forward
    return loss_calculation(pred_r, pred_t, target, model_points, idx, points, self.num_pt_mesh, 
self.sym_list)
  File "/home/qingqing/Downloads/qingqing_disk/p4600_disk/DenseFusion/lib/loss_refiner.py", 
line 45, in loss_calculation
    target = torch.index_select(target, 1, inds.view(-1) - 1)
RuntimeError: the derivative for 'index' is not implemented

It seems that the refine process of the network was failed.
Could you give me some suggestions? @j96w
Did you meet the same problem? @Mars-y470
Thanks!!

PointNet implementation

Hi, a PointNet-based network that processes each point in the masked 3D point cloud to a geometric feature embedding, said in the paper. But in the code implementation only convolution and ReLU of point cloud x in network.py. No spacial transform networks(STN) and maxpooling operation on point cloud. These are the two most important features of PointNet.

` Class PoseNetFeat (nn. Module):
def init(self, num_points):
super(PoseNetFeat, self).init()
self.conv1 = torch.nn.Conv1d(3, 64, 1)
self.conv2 = torch.nn.Conv1d(64, 128, 1)

    self.e_conv1 = torch.nn.Conv1d(32, 64, 1)
    self.e_conv2 = torch.nn.Conv1d(64, 128, 1)

    self.conv5 = torch.nn.Conv1d(256, 512, 1)
    self.conv6 = torch.nn.Conv1d(512, 1024, 1)

    self.ap1 = torch.nn.AvgPool1d(num_points)
    self.num_points = num_points
def forward(self, x, emb):
    x = F.relu(self.conv1(x))
    emb = F.relu(self.e_conv1(emb))
    pointfeat_1 = torch.cat((x, emb), dim=1)

    x = F.relu(self.conv2(x))
    emb = F.relu(self.e_conv2(emb))
    pointfeat_2 = torch.cat((x, emb), dim=1)

    x = F.relu(self.conv5(pointfeat_2))
    x = F.relu(self.conv6(x))

    ap_x = self.ap1(x)

    ap_x = ap_x.view(-1, 1024, 1).repeat(1, 1, self.num_points)
    return torch.cat([pointfeat_1, pointfeat_2, ap_x], 1) #128 + 256 + 1024`

I am confused. Would you explain this to me?
Thanks in advance.

confusion about get_bbox function

The code is in eval_ycb.py, and datasets/ycb/dataset.py.

I think it inputs the roi from the semantic segmentation, and makes the new width(height) be the the smallest value in border_list not less than the old one(like ceil function). The output roi will only have fixed choice.

Why don't use the rois from segmentation?

Thank you and looking forward to your reply.

border_list = [-1, 40, 80, 120, 160, 200, 240, 280, 320, 360, 400, 440, 480, 520, 560, 600, 640, 680]
def get_bbox(posecnn_rois):
    rmin = int(posecnn_rois[idx][3]) + 1
    rmax = int(posecnn_rois[idx][5]) - 1
    cmin = int(posecnn_rois[idx][2]) + 1
    cmax = int(posecnn_rois[idx][4]) - 1
    r_b = rmax - rmin
    for tt in range(len(border_list)):
        if r_b > border_list[tt] and r_b < border_list[tt + 1]:
            r_b = border_list[tt + 1]
            break
    c_b = cmax - cmin
    for tt in range(len(border_list)):
        if c_b > border_list[tt] and c_b < border_list[tt + 1]:
            c_b = border_list[tt + 1]
            break
    center = [int((rmin + rmax) / 2), int((cmin + cmax) / 2)]
    rmin = center[0] - int(r_b / 2)
    rmax = center[0] + int(r_b / 2)
    cmin = center[1] - int(c_b / 2)
    cmax = center[1] + int(c_b / 2)
    if rmin < 0:
        delt = -rmin
        rmin = 0
        rmax += delt
    if cmin < 0:
        delt = -cmin
        cmin = 0
        cmax += delt
    if rmax > img_width:
        delt = rmax - img_width
        rmax = img_width
        rmin -= delt
    if cmax > img_length:
        delt = cmax - img_length
        cmax = img_length
        cmin -= delt
    return rmin, rmax, cmin, cmax

Problem with Segmentation on LineMod Dataset

I just wonder how did you train the segmentation network on the LineMod dataset.
The LineMod dataset doesn't contain segmentation ground truth for the multiple objects to be detected in the same picture. And you used the masks preprocessed from singshotpose, but there's only one mask for one object in each picture.
So did you train separate segmentation networks for each object using these masks as ground truth? Or if you trained one segmentation network to simultaneously detect all the objects, how did you get the segmentation ground truth?
Thank you !!!

Training DenseFusion with another dataset

Hey,
I developped a database based on a single object, a pink cube, whose data shared the same format than YCB (png images for color, depth and segmentation size 640x480 and .mat file for ground truth values). When I tried to train your network for pose estimation with it by running the .py file with modifications regarding the number of objects and the location of the files to load, most of the time, it failed between the 5th and 15th batch:


2019-06-13 15:08:29,614 : Train time 00h 00m 00s Epoch 1 Batch 1 Frame 8 Avg_dis:11.477169811725616
2019-06-13 15:08:29,961 : Train time 00h 00m 01s Epoch 1 Batch 2 Frame 16 Avg_dis:10.867223545908928
2019-06-13 15:08:30,334 : Train time 00h 00m 01s Epoch 1 Batch 3 Frame 24 Avg_dis:14.736500158905983
2019-06-13 15:08:30,701 : Train time 00h 00m 01s Epoch 1 Batch 4 Frame 32 Avg_dis:7.393726512789726
2019-06-13 15:08:31,080 : Train time 00h 00m 02s Epoch 1 Batch 5 Frame 40 Avg_dis:9.500172346830368
2019-06-13 15:08:31,442 : Train time 00h 00m 02s Epoch 1 Batch 6 Frame 48 Avg_dis:11.712907552719116
2019-06-13 15:08:31,813 : Train time 00h 00m 02s Epoch 1 Batch 7 Frame 56 Avg_dis:34.04928183555603
2019-06-13 15:08:32,182 : Train time 00h 00m 03s Epoch 1 Batch 8 Frame 64 Avg_dis:62.857853412628174
2019-06-13 15:08:32,557 : Train time 00h 00m 03s Epoch 1 Batch 9 Frame 72 Avg_dis:12.033220887184143
2019-06-13 15:08:32,917 : Train time 00h 00m 04s Epoch 1 Batch 10 Frame 80 Avg_dis:nan
2019-06-13 15:08:33,348 : Train time 00h 00m 04s Epoch 1 Batch 11 Frame 88 Avg_dis:nan
2019-06-13 15:08:33,703 : Train time 00h 00m 04s Epoch 1 Batch 12 Frame 96 Avg_dis:nan
2019-06-13 15:08:34,071 : Train time 00h 00m 05s Epoch 1 Batch 13 Frame 104 Avg_dis:nan
2019-06-13 15:08:34,489 : Train time 00h 00m 05s Epoch 1 Batch 14 Frame 112 Avg_dis:nan
2019-06-13 15:08:34,865 : Train time 00h 00m 05s Epoch 1 Batch 15 Frame 120 Avg_dis:nan

Nan values continue like that until the end.

I have no problem training with YCB the way you did so I think the trouble appeared because I am using an other dataset.

EDIT:
Looks like the problem could comes from the pred_c values.
At the beginning they are too close from 0. During the loss calculation there is this line:
loss = torch.mean((dis * pred_c - w * torch.log(pred_c)), dim=0)
and if at least one pred_c value is considered equal to 0 then torch.log as an -inf value, torch.mean returns inf and during the optimizer step an nan value appear.

However I don't know how to manage this issue, do you have any advice ?

Thank you for your help

Is is possible to share trained segmentation model on YCB-video dataset?

Hi, Thanks for sharing the code!

I'm also trying to use your code in the ROS environment for robot manipulation with objects in YCB dataset. However, the inference in DenseFusion requires segmentation to generate the pose and it is very time consuming to train a segmentation model with all the training images in the YCB-Video dataset. I tried to train with the vanilla segmentation code and found even one epoch is taking around 10 hours on YCB-Video dataset with single GPU. And we don't have too much resources on GPU. It would be great if you can share the trained segmentation model on YCB-video dataset!

Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.