Code for "Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation"

Home Page: https://arxiv.org/pdf/1806.08756.pdf

License: Other

Python 69.05% Shell 0.46% Jupyter Notebook 30.19% Dockerfile 0.29%

pytorch robotics manipulation computer-vision deep-learning 3d vision artificial-intelligence self-supervised-learning

pytorch-dense-correspondence's Introduction

Updates

September 4, 2018: Tutorial and data now available! We have a tutorial now available here, which walks through step-by-step of getting this repo running.
June 26, 2019: We have updated the repo to pytorch 1.1 and CUDA 10. For code used for the experiments in the paper see here.

Dense Correspondence Learning in PyTorch

In this project we learn Dense Object Nets, i.e. dense descriptor networks for previously unseen, potentially deformable objects, and potentially classes of objects:

We also demonstrate using Dense Object Nets for robotic manipulation tasks:

Dense Object Nets: Learning Dense Visual Descriptors by and for Robotic Manipulation

This is the reference implementation for our paper:

PDF | Video

Pete Florence*, Lucas Manuelli*, Russ Tedrake

Abstract: What is the right object representation for manipulation? We would like robots to visually perceive scenes and learn an understanding of the objects in them that (i) is task-agnostic and can be used as a building block for a variety of manipulation tasks, (ii) is generally applicable to both rigid and non-rigid objects, (iii) takes advantage of the strong priors provided by 3D vision, and (iv) is entirely learned from self-supervision. This is hard to achieve with previous methods: much recent work in grasping does not extend to grasping specific objects or other tasks, whereas task-specific learning may require many trials to generalize well across object configurations or other tasks. In this paper we present Dense Object Nets, which build on recent developments in self-supervised dense descriptor learning, as a consistent object representation for visual understanding and manipulation. We demonstrate they can be trained quickly (approximately 20 minutes) for a wide variety of previously unseen and potentially non-rigid objects. We additionally present novel contributions to enable multi-object descriptor learning, and show that by modifying our training procedure, we can either acquire descriptors which generalize across classes of objects, or descriptors that are distinct for each object instance. Finally, we demonstrate the novel application of learned dense descriptors to robotic manipulation. We demonstrate grasping of specific points on an object across potentially deformed object configurations, and demonstrate using class general descriptors to transfer specific grasps across objects in a class.

Citing

If you find this code useful in your work, please consider citing:

@article{florencemanuelli2018dense,
  title={Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation},
  author={Florence, Peter and Manuelli, Lucas and Tedrake, Russ},
  journal={Conference on Robot Learning},
  year={2018}
}

Tutorial

getting started with pytorch-dense-correspondence

Code Setup

Dataset

Training and Evaluation

Miscellaneous

Git management

To prevent the repo from growing in size, recommend always "restart and clear outputs" before committing any Jupyter notebooks. If you'd like to save what your notebook looks like, you can always "download as .html", which is a great way to snapshot the state of that notebook and share.

pytorch-dense-correspondence's People

Contributors

Stargazers

Watchers

Forkers

peteflorence hzhang57 dexairobotics zgsxwsdxg archive-git-repo peterzhousz robot-grasp shooter2062424 hyzcn kelvinson abhisek11 jcsoo ideaplexus ozkancamucahit rueberger t2honda eprparadocs gridl kidlin sanjulamadurapperuma yoyokitartora reloadbrain xiantotikui kashimastro intuitionmachine codeaudit dtseng lovelan521 st-le berkeleyautomation zhouxian christian-c noorvir changhai0551 mannykayy jxw-tmp haochihlin joegana xuqisenvv ikbhal mahmud83 tpatten harnix aiyi2099 rubikplayer deadzen ethanweber dingshenglan uts-ri vserpak eric-yyjau gouxiangchen manuelli ouceduxzk morpheus1820 priyasundaresan touristcheng lianglili sahanduiuc hhy5277 jerryxiaoyu wuniuchuanyueshi oztc mpetersen94 moningkai jarygrace natkalin cknd rohanpsingh jlqzzz pb17 richardrl gx9702 alexandrebarral jbaca303 danieltakeshi swiatkowski christian-rauch shisthruna28 zachfang liyi14 jenngrannen bunnyhop-peach mihirp1998 aravinho masato-ka htung0101 xsj01 nmboyd nassimaugsburg wangyixuan12 alpha-soliton wangtiehao2010 rsrawat06 gchal kiwiwan stjordanis spirosperos vonhartz raghavauppuluri13

pytorch-dense-correspondence's Issues

render points as surfel glyphs instead of points

See here

Training Tweaks Log

This should serve as an issue to track tweaks/insights from our training procedure.

Investigate using Dilated ResNet as segmentation network architecture

See https://github.com/fyu/drn

Improve data augmentation

A quick chat with Yunzhu suggests that we should also be randomizing

hue
brightness
contrast

I think this is the pytorch function we want to use.

Another repo that might be useful.

Experiments to Run

This issue serves as a place to record a set of experiments that we would like to run.

Test the effect of background randomization. Train the same network with and without background randomization. Perform this experiment with a multi-object dataset.
Effect of descriptor dimension. So far we have been training with D = 3. Also test D = 16 and D = 32. This experiment is currently in progress on the mulit_object_in_isolation dataset.
Parameter sweeps. Parameters to tweak include learning_rate, weight_decay, non_match_loss_weight, etc.
Do we get object consistency without explicitly enforcing it in the loss function?
Do multiple objects in isolation generalize to cluttered scenes?
- Answer: No

Post the results of running such experiments below as comments.

Store test_loss every 1000 iterations or so.

Store this during training so we can analyze different learning rates, normalization etc.

Bug in Dataset?

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-1552cea504b9> in <module>()
     10     train._config["dense_correspondence_network"]["descriptor_dimension"] = d
     11     train._config["training"]["num_iterations"] = num_iterations
---> 12     train.run()
     13     print "finished training descriptor of dimension %d" %(d)

/home/manuelli/code/dense_correspondence/training/training.pyc in run(self, loss_current_iteration, use_pretrained)
    283         for epoch in range(50):  # loop over the dataset multiple times
    284 
--> 285             for i, data in enumerate(self._data_loader, 0):
    286                 loss_current_iteration += 1
    287                 start_iter = time.time()

/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.pyc in __next__(self)
    194         if self.rcvd_idx in self.reorder_dict:
    195             batch = self.reorder_dict.pop(self.rcvd_idx)
--> 196             return self._process_next_batch(batch)
    197 
    198         if self.batches_outstanding == 0:

/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.pyc in _process_next_batch(self, batch)
    228         self._put_indices()
    229         if isinstance(batch, ExceptionWrapper):
--> 230             raise batch.exc_type(batch.exc_msg)
    231         return batch
    232 

TypeError: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 42, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 119, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 121, in default_collate
    raise TypeError((error_msg.format(type(batch[0]))))
TypeError: batch must contain tensors, numbers, dicts or lists; found <type 'NoneType'>

Enable loading optimizer state together with network

belongs in training.py, I can do

Move training scripts/procedures into py files from notebooks

Add documentation of how to train a network

Perform Batch Normalization

Per discussion with Yunzhu think about how to get batch normalization to work with out setup.

One potential idea.

Sample N pairs of images (this will be our batch)
For each image pair we can choose find matches [m_a, m_b] and non-matches [nm_a, nm_b].
Run all 2*N images thru the network simultaneously, this is where we can do batch-normalization.
Compute the loss function as usual and do back-prop.

Things to be careful about.

Masked matches vs. non-masked matches

Consider normalizing images using standard ImageNet Params

The pytorch docs recommend normalizing the images with

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

when using models pre-trained on ImageNet. Currently we are computing dataset specific mean/var

Read camera parameters from camera_info.yaml rather than hardcoding

Use same normalization when displaying color maps for two different images

Currently it normalizes each image independently.

This means that comparing images visually is not very meaningful.

add option to continue training from existing weights

Got an image with no mask, stopped training

Need to decide on different behavior than https://github.com/peteflorence/pytorch-dense-correspondence/blob/5e90787b6b44bf4ccf8843ab393f69f5d6014b86/dense_correspondence/correspondence_tools/correspondence_finder.py#L196

Currently redownloading resnet model on each new docker container

Would be better to persist this model. Could be a couple different solutions.

I think a while ago the previous docker setup used to not need re-downloads? Potentially related to the addition of ~code/ on the mounting path?

Also not the biggest deal to re-download, but just an issue to track.

Image mean being returned instead of std_dev

See here. I thought we had fixed this bug but apparently not . . .

re-organize folder structure

Put dense_correspondence pytorch code somewhere on the python path.
Centralize the location of all config files

Simplify use of debug flag in dataset

The debug flag should only downsample things when it comes time to plot them, not before. It is too stateful right now.

Save parameter file when training a network

Parameters include:

descriptor dimensionality
learning rate
loss type

We should store everything we need to build and run the network.

visdom server back to haunt

running out of GPU memory when computing test loss

Error happened at iteration 9000 of 10000. Is the test thing just a red herring or are our batch sizes too big somehow?

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-f8bf2b034dd8> in <module>()
     10     train._config["dense_correspondence_network"]["descriptor_dimension"] = d
     11     train._config["training"]["num_iterations"] = num_iterations
---> 12     train.run()
     13     print "finished training descriptor of dimension %d" %(d)

/home/manuelli/code/dense_correspondence/training/training.pyc in run(self, loss_current_iteration, use_pretrained)
    423                     logging.info("Computing test loss")
    424                     test_loss, test_match_loss, test_non_match_loss = DCE.compute_loss_on_dataset(dcn,
--> 425                                                                                                   self._data_loader_test, self._config['loss_function'], num_iterations=self._config['training']['test_loss_num_iterations'])
    426 
    427                     update_visdom_test_loss_plots(test_loss, test_match_loss, test_non_match_loss)

/home/manuelli/code/dense_correspondence/evaluation/evaluation.pyc in compute_loss_on_dataset(dcn, data_loader, loss_config, num_iterations)
   1494                                                         matches_b,
   1495                                                         non_matches_a,
-> 1496                                                         non_matches_b)
   1497 
   1498 

/home/manuelli/code/dense_correspondence/loss_functions/pixelwise_contrastive_loss.pyc in get_loss(self, image_a_pred, image_b_pred, matches_a, matches_b, non_matches_a, non_matches_b, M_descriptor, M_pixel, non_match_loss_weight, use_l2_pixel_loss)
    102                                                        non_matches_a, non_matches_b,
    103                                                        M_descriptor=M_descriptor,
--> 104                                                        M_pixel=M_pixel)
    105         else:
    106             # version with no l2 pixel term

/home/manuelli/code/dense_correspondence/loss_functions/pixelwise_contrastive_loss.pyc in non_match_loss_with_l2_pixel_norm(self, image_a_pred, image_b_pred, matches_b, non_matches_a, non_matches_b, M_descriptor, M_pixel)
    215         num_non_matches = non_matches_a.size()[0]
    216 
--> 217         non_match_descriptor_loss, num_hard_negatives, _, _ = PCL.non_match_descriptor_loss(image_a_pred, image_b_pred, non_matches_a, non_matches_b, M=M_descriptor)
    218 
    219         non_match_pixel_l2_loss, _, _ = self.l2_pixel_loss(matches_b, non_matches_b, M_pixel=M_pixel)

/home/manuelli/code/dense_correspondence/loss_functions/pixelwise_contrastive_loss.pyc in non_match_descriptor_loss(image_a_pred, image_b_pred, non_matches_a, non_matches_b, M)
    161         """
    162 
--> 163         non_matches_a_descriptors = torch.index_select(image_a_pred, 1, non_matches_a).squeeze()
    164         non_matches_b_descriptors = torch.index_select(image_b_pred, 1, non_matches_b).squeeze()
    165 

RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:5

use torch.clamp instead of creating new tensors

See here and #76

Increase batch_size to more than 1

Currently it is just one, which I think is hurting our performance as it leads to very noisy gradients. Should have just a quick chat about what changes to the loss function and/or dataset classes are needed.

Document cross scene annotation tool

Use nn.module state_dict and load_state_dict functionality

Currently we are overloading it with our own function calls, but now that DenseCorrespondenceNetwork inherits from nn.Module this is no longer necessary. Not inheriting should improve compatibility with changing the underlying segmentation network architecture in the future.

save dataset in trained_models folder with trained model

Match Loss during training is very large

Currently it is somewhere in the vicinity of 0.1 after 3500 steps. This translates to an average descriptor distance between matches off sqrt(0.1) = 0.3 which seems incredibly large given that M_pixel = 0.5. This is doubly confusing since the hard negative rate for masked non-matches is fairly low at 4%. So only 4% of non-matches are below the 0.3 distance.

I think this could potentially be fixed by lowering the learning rate.

move hardcoded parameters into config files

There are lots of hardcoded parameters sprinkled throughout the code.

Loss margin
Normalization for non-match loss
Image size
Num non-matches per match

Only apply "real image" style data augmentation

Only use data augmentation tricks that result in a "real image" that could have been captured at that location. Mirror's don't satisfy this property, but rotations by 180 degrees do.

Make `training.yaml` params for all data augmentation

Parse them and set them down in correspondence_augmentation.py

Consider pushing images to docker hub rather than always rebuild

Latest culprit:

+ pip install -U pip setuptools
Collecting pip
  HTTP error 503 while getting https://files.pythonhosted.org/packages/62/a1/0d452b6901b0157a0134fd27ba89bf95a857fbda64ba52e1ca2cf61d8412/pip-10.0.0-py2.py3-none-any.whl#sha256=86a60a96d85e329962a9e6f6af612cbc11106293dbc83f119802b5bee9874cf3 (from https://pypi.org/simple/pip/)
  Could not install requirement pip from https://files.pythonhosted.org/packages/62/a1/0d452b6901b0157a0134fd27ba89bf95a857fbda64ba52e1ca2cf61d8412/pip-10.0.0-py2.py3-none-any.whl#sha256=86a60a96d85e329962a9e6f6af612cbc11106293dbc83f119802b5bee9874cf3 because of error 503 Server Error: Backend is unhealthy for url: https://files.pythonhosted.org/packages/62/a1/0d452b6901b0157a0134fd27ba89bf95a857fbda64ba52e1ca2cf61d8412/pip-10.0.0-py2.py3-none-any.whl
Could not install requirement pip from https://files.pythonhosted.org/packages/62/a1/0d452b6901b0157a0134fd27ba89bf95a857fbda64ba52e1ca2cf61d8412/pip-10.0.0-py2.py3-none-any.whl#sha256=86a60a96d85e329962a9e6f6af612cbc11106293dbc83f119802b5bee9874cf3 because of HTTP error 503 Server Error: Backend is unhealthy for url: https://files.pythonhosted.org/packages/62/a1/0d452b6901b0157a0134fd27ba89bf95a857fbda64ba52e1ca2cf61d8412/pip-10.0.0-py2.py3-none-any.whl for URL https://files.pythonhosted.org/packages/62/a1/0d452b6901b0157a0134fd27ba89bf95a857fbda64ba52e1ca2cf61d8412/pip-10.0.0-py2.py3-none-any.whl#sha256=86a60a96d85e329962a9e6f6af612cbc11106293dbc83f119802b5bee9874cf3 (from https://pypi.org/simple/pip/)

Switch to using visdom for visualizing training/performance

See https://github.com/facebookresearch/visdom

Bring in the ply-to-vtp code from LabelFusion

Need this to convert the ply file to a format that director can accept.

Add regularization to the loss function

Currently we have no weight regularization in the loss function. We do have weight decay in the Adam optimizer however.

Have correspondence plotter support multiple adding calls to plot

Quantitative eval broken

Not sure where, but after pulling up to master (and I also have some additional local changes), it just hangs making no progress... working on debugging...

A candidate set of CI tests to run would be: does the quantitative notebook, a training notebook with 10 steps, a qualitative eval notebook... do they all return with no errors within some bounded time?

Figure out how to visualize tensorboard running remotely on iiwa2 machine.

We are running BatchNorm2D with batch_size = 1

It turns out this isn't a big issue because we are doing BatchNorm2D. Straight from the pytorch documentation here

Because the Batch Normalization is done over the C dimension, computing statistics on (N, H, W) slices, it’s common terminology to call this Spatial Batch Normalization

So even though we have N = 1, the fact that H,W are greater than 1 makes the batch normalization work fine.

We are using the network ResNet34_8s network from WSW (warmspringwinds). See here. This achieves 68% mIoU on RV-VOC12. This network is a (simplified) implementation of a network in this paper.
We are using output_stride=8. Hence the Resnet34_8s takes a [H,W,3] tensor and maps it to a [H/8, W/8, K] tensor where K is the feature size. I need to check but I think that K = 512 for the ResNet34.
This blog post provides a good overview of the different segmentation networks.
For a detailed look at "transposed convolutions" (aka atrous convolution) which is just a convolutional way of doing upsampling see here
This post by WSW describes image segmentation and upsampling.

Should we used norm or squared norm in loss function

Currently we are using squared error. Changing this could affect learning rates, and other params.

Tanner uses squared loss. Slightly different from ours though.

matches D(I_a, I_b, u_a, u_2)^2
non-matches max(0, M - D(I_a, I_b, u_a, u_b))^2

Quantitative eval of 3d match error for cross scenes

Should check if this calculation is assuming that uv_a, uv_b_ground_truth are actually the same 3d point. Shouldn't for cross scene eval

normalize mean and std dev of images

We need to think carefully how this might affect things if we later do batch-normalization

robotlocomotion / pytorch-dense-correspondence Goto Github PK