Git Product home page Git Product logo

robotlocomotion / pytorch-dense-correspondence Goto Github PK

View Code? Open in Web Editor NEW
553.0 28.0 133.0 74.95 MB

Code for "Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation"

Home Page: https://arxiv.org/pdf/1806.08756.pdf

License: Other

Python 69.05% Shell 0.46% Jupyter Notebook 30.19% Dockerfile 0.29%
pytorch robotics manipulation computer-vision deep-learning 3d vision artificial-intelligence self-supervised-learning

pytorch-dense-correspondence's Introduction

Updates

  • September 4, 2018: Tutorial and data now available! We have a tutorial now available here, which walks through step-by-step of getting this repo running.
  • June 26, 2019: We have updated the repo to pytorch 1.1 and CUDA 10. For code used for the experiments in the paper see here.

Dense Correspondence Learning in PyTorch

In this project we learn Dense Object Nets, i.e. dense descriptor networks for previously unseen, potentially deformable objects, and potentially classes of objects:

We also demonstrate using Dense Object Nets for robotic manipulation tasks:

Dense Object Nets: Learning Dense Visual Descriptors by and for Robotic Manipulation

This is the reference implementation for our paper:

PDF | Video

Pete Florence*, Lucas Manuelli*, Russ Tedrake

Abstract: What is the right object representation for manipulation? We would like robots to visually perceive scenes and learn an understanding of the objects in them that (i) is task-agnostic and can be used as a building block for a variety of manipulation tasks, (ii) is generally applicable to both rigid and non-rigid objects, (iii) takes advantage of the strong priors provided by 3D vision, and (iv) is entirely learned from self-supervision. This is hard to achieve with previous methods: much recent work in grasping does not extend to grasping specific objects or other tasks, whereas task-specific learning may require many trials to generalize well across object configurations or other tasks. In this paper we present Dense Object Nets, which build on recent developments in self-supervised dense descriptor learning, as a consistent object representation for visual understanding and manipulation. We demonstrate they can be trained quickly (approximately 20 minutes) for a wide variety of previously unseen and potentially non-rigid objects. We additionally present novel contributions to enable multi-object descriptor learning, and show that by modifying our training procedure, we can either acquire descriptors which generalize across classes of objects, or descriptors that are distinct for each object instance. Finally, we demonstrate the novel application of learned dense descriptors to robotic manipulation. We demonstrate grasping of specific points on an object across potentially deformed object configurations, and demonstrate using class general descriptors to transfer specific grasps across objects in a class.

Citing

If you find this code useful in your work, please consider citing:

@article{florencemanuelli2018dense,
  title={Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation},
  author={Florence, Peter and Manuelli, Lucas and Tedrake, Russ},
  journal={Conference on Robot Learning},
  year={2018}
}

Tutorial

Code Setup

Dataset

Training and Evaluation

Miscellaneous

Git management

To prevent the repo from growing in size, recommend always "restart and clear outputs" before committing any Jupyter notebooks. If you'd like to save what your notebook looks like, you can always "download as .html", which is a great way to snapshot the state of that notebook and share.

pytorch-dense-correspondence's People

Contributors

manuelli avatar peteflorence avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-dense-correspondence's Issues

Training Tweaks Log

This should serve as an issue to track tweaks/insights from our training procedure.

Improve data augmentation

A quick chat with Yunzhu suggests that we should also be randomizing

  • hue
  • brightness
  • contrast

I think this is the pytorch function we want to use.

Another repo that might be useful.

Experiments to Run

This issue serves as a place to record a set of experiments that we would like to run.

  • Test the effect of background randomization. Train the same network with and without background randomization. Perform this experiment with a multi-object dataset.

  • Effect of descriptor dimension. So far we have been training with D = 3. Also test D = 16 and D = 32. This experiment is currently in progress on the mulit_object_in_isolation dataset.

  • Parameter sweeps. Parameters to tweak include learning_rate, weight_decay, non_match_loss_weight, etc.

  • Do we get object consistency without explicitly enforcing it in the loss function?

  • Do multiple objects in isolation generalize to cluttered scenes?

    • Answer: No

Post the results of running such experiments below as comments.

Bug in Dataset?

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-1552cea504b9> in <module>()
     10     train._config["dense_correspondence_network"]["descriptor_dimension"] = d
     11     train._config["training"]["num_iterations"] = num_iterations
---> 12     train.run()
     13     print "finished training descriptor of dimension %d" %(d)

/home/manuelli/code/dense_correspondence/training/training.pyc in run(self, loss_current_iteration, use_pretrained)
    283         for epoch in range(50):  # loop over the dataset multiple times
    284 
--> 285             for i, data in enumerate(self._data_loader, 0):
    286                 loss_current_iteration += 1
    287                 start_iter = time.time()

/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.pyc in __next__(self)
    194         if self.rcvd_idx in self.reorder_dict:
    195             batch = self.reorder_dict.pop(self.rcvd_idx)
--> 196             return self._process_next_batch(batch)
    197 
    198         if self.batches_outstanding == 0:

/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.pyc in _process_next_batch(self, batch)
    228         self._put_indices()
    229         if isinstance(batch, ExceptionWrapper):
--> 230             raise batch.exc_type(batch.exc_msg)
    231         return batch
    232 

TypeError: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 42, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 119, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 121, in default_collate
    raise TypeError((error_msg.format(type(batch[0]))))
TypeError: batch must contain tensors, numbers, dicts or lists; found <type 'NoneType'>

​

Perform Batch Normalization

Per discussion with Yunzhu think about how to get batch normalization to work with out setup.

One potential idea.

  • Sample N pairs of images (this will be our batch)
  • For each image pair we can choose find matches [m_a, m_b] and non-matches [nm_a, nm_b].
  • Run all 2*N images thru the network simultaneously, this is where we can do batch-normalization.
  • Compute the loss function as usual and do back-prop.

Things to be careful about.

  • Masked matches vs. non-masked matches

Currently redownloading resnet model on each new docker container

Would be better to persist this model. Could be a couple different solutions.

I think a while ago the previous docker setup used to not need re-downloads? Potentially related to the addition of ~code/ on the mounting path?

Also not the biggest deal to re-download, but just an issue to track.

re-organize folder structure

  • Put dense_correspondence pytorch code somewhere on the python path.
  • Centralize the location of all config files

running out of GPU memory when computing test loss

Error happened at iteration 9000 of 10000. Is the test thing just a red herring or are our batch sizes too big somehow?

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-f8bf2b034dd8> in <module>()
     10     train._config["dense_correspondence_network"]["descriptor_dimension"] = d
     11     train._config["training"]["num_iterations"] = num_iterations
---> 12     train.run()
     13     print "finished training descriptor of dimension %d" %(d)

/home/manuelli/code/dense_correspondence/training/training.pyc in run(self, loss_current_iteration, use_pretrained)
    423                     logging.info("Computing test loss")
    424                     test_loss, test_match_loss, test_non_match_loss = DCE.compute_loss_on_dataset(dcn,
--> 425                                                                                                   self._data_loader_test, self._config['loss_function'], num_iterations=self._config['training']['test_loss_num_iterations'])
    426 
    427                     update_visdom_test_loss_plots(test_loss, test_match_loss, test_non_match_loss)

/home/manuelli/code/dense_correspondence/evaluation/evaluation.pyc in compute_loss_on_dataset(dcn, data_loader, loss_config, num_iterations)
   1494                                                         matches_b,
   1495                                                         non_matches_a,
-> 1496                                                         non_matches_b)
   1497 
   1498 

/home/manuelli/code/dense_correspondence/loss_functions/pixelwise_contrastive_loss.pyc in get_loss(self, image_a_pred, image_b_pred, matches_a, matches_b, non_matches_a, non_matches_b, M_descriptor, M_pixel, non_match_loss_weight, use_l2_pixel_loss)
    102                                                        non_matches_a, non_matches_b,
    103                                                        M_descriptor=M_descriptor,
--> 104                                                        M_pixel=M_pixel)
    105         else:
    106             # version with no l2 pixel term

/home/manuelli/code/dense_correspondence/loss_functions/pixelwise_contrastive_loss.pyc in non_match_loss_with_l2_pixel_norm(self, image_a_pred, image_b_pred, matches_b, non_matches_a, non_matches_b, M_descriptor, M_pixel)
    215         num_non_matches = non_matches_a.size()[0]
    216 
--> 217         non_match_descriptor_loss, num_hard_negatives, _, _ = PCL.non_match_descriptor_loss(image_a_pred, image_b_pred, non_matches_a, non_matches_b, M=M_descriptor)
    218 
    219         non_match_pixel_l2_loss, _, _ = self.l2_pixel_loss(matches_b, non_matches_b, M_pixel=M_pixel)

/home/manuelli/code/dense_correspondence/loss_functions/pixelwise_contrastive_loss.pyc in non_match_descriptor_loss(image_a_pred, image_b_pred, non_matches_a, non_matches_b, M)
    161         """
    162 
--> 163         non_matches_a_descriptors = torch.index_select(image_a_pred, 1, non_matches_a).squeeze()
    164         non_matches_b_descriptors = torch.index_select(image_b_pred, 1, non_matches_b).squeeze()
    165 

RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:5

Increase batch_size to more than 1

Currently it is just one, which I think is hurting our performance as it leads to very noisy gradients. Should have just a quick chat about what changes to the loss function and/or dataset classes are needed.

Use nn.module state_dict and load_state_dict functionality

Currently we are overloading it with our own function calls, but now that DenseCorrespondenceNetwork inherits from nn.Module this is no longer necessary. Not inheriting should improve compatibility with changing the underlying segmentation network architecture in the future.

Match Loss during training is very large

Currently it is somewhere in the vicinity of 0.1 after 3500 steps. This translates to an average descriptor distance between matches off sqrt(0.1) = 0.3 which seems incredibly large given that M_pixel = 0.5. This is doubly confusing since the hard negative rate for masked non-matches is fairly low at 4%. So only 4% of non-matches are below the 0.3 distance.

I think this could potentially be fixed by lowering the learning rate.

Consider pushing images to docker hub rather than always rebuild

Latest culprit:

+ pip install -U pip setuptools
Collecting pip
  HTTP error 503 while getting https://files.pythonhosted.org/packages/62/a1/0d452b6901b0157a0134fd27ba89bf95a857fbda64ba52e1ca2cf61d8412/pip-10.0.0-py2.py3-none-any.whl#sha256=86a60a96d85e329962a9e6f6af612cbc11106293dbc83f119802b5bee9874cf3 (from https://pypi.org/simple/pip/)
  Could not install requirement pip from https://files.pythonhosted.org/packages/62/a1/0d452b6901b0157a0134fd27ba89bf95a857fbda64ba52e1ca2cf61d8412/pip-10.0.0-py2.py3-none-any.whl#sha256=86a60a96d85e329962a9e6f6af612cbc11106293dbc83f119802b5bee9874cf3 because of error 503 Server Error: Backend is unhealthy for url: https://files.pythonhosted.org/packages/62/a1/0d452b6901b0157a0134fd27ba89bf95a857fbda64ba52e1ca2cf61d8412/pip-10.0.0-py2.py3-none-any.whl
Could not install requirement pip from https://files.pythonhosted.org/packages/62/a1/0d452b6901b0157a0134fd27ba89bf95a857fbda64ba52e1ca2cf61d8412/pip-10.0.0-py2.py3-none-any.whl#sha256=86a60a96d85e329962a9e6f6af612cbc11106293dbc83f119802b5bee9874cf3 because of HTTP error 503 Server Error: Backend is unhealthy for url: https://files.pythonhosted.org/packages/62/a1/0d452b6901b0157a0134fd27ba89bf95a857fbda64ba52e1ca2cf61d8412/pip-10.0.0-py2.py3-none-any.whl for URL https://files.pythonhosted.org/packages/62/a1/0d452b6901b0157a0134fd27ba89bf95a857fbda64ba52e1ca2cf61d8412/pip-10.0.0-py2.py3-none-any.whl#sha256=86a60a96d85e329962a9e6f6af612cbc11106293dbc83f119802b5bee9874cf3 (from https://pypi.org/simple/pip/)

Quantitative eval broken

Not sure where, but after pulling up to master (and I also have some additional local changes), it just hangs making no progress... working on debugging...

A candidate set of CI tests to run would be: does the quantitative notebook, a training notebook with 10 steps, a qualitative eval notebook... do they all return with no errors within some bounded time?

We are running BatchNorm2D with batch_size = 1

It turns out this isn't a big issue because we are doing BatchNorm2D. Straight from the pytorch documentation here

Because the Batch Normalization is done over the C dimension, computing statistics on (N, H, W) slices, it’s common terminology to call this Spatial Batch Normalization

So even though we have N = 1, the fact that H,W are greater than 1 makes the batch normalization work fine.

Clean cruft

Lots of cruft from various versions of dataset loaders (with consitency, without, etc)

What is our segmentation network really doing?

Just a sandbox issue to record some of what I have learned about segmentation networks.

  • We are using the network ResNet34_8s network from WSW (warmspringwinds). See here. This achieves 68% mIoU on RV-VOC12. This network is a (simplified) implementation of a network in this paper.

  • We are using output_stride=8. Hence the Resnet34_8s takes a [H,W,3] tensor and maps it to a [H/8, W/8, K] tensor where K is the feature size. I need to check but I think that K = 512 for the ResNet34.

  • This blog post provides a good overview of the different segmentation networks.

  • For a detailed look at "transposed convolutions" (aka atrous convolution) which is just a convolutional way of doing upsampling see here

  • This post by WSW describes image segmentation and upsampling.

Should we used norm or squared norm in loss function

Currently we are using squared error. Changing this could affect learning rates, and other params.

Tanner uses squared loss. Slightly different from ours though.

  • matches D(I_a, I_b, u_a, u_2)^2
  • non-matches max(0, M - D(I_a, I_b, u_a, u_b))^2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.