pathak22 / unsupervised-video Goto Github PK

[CVPR 2017] Unsupervised deep learning using unlabelled videos on the web

Home Page: https://people.eecs.berkeley.edu/~pathak/unsupervised_video/

License: MIT License

Python 4.88% Shell 9.20% Lua 85.93%

unsupervised-learning deep-learning video-processing video-segmentation machine-learning computer-vision motion-segmentation feature-learning

unsupervised-video's Introduction

Learning Features by Watching Objects Move

In CVPR 2017. [Project Website].

Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, Bharath Hariharan
University of California, Berkeley
Facebook AI Research (FAIR)

This is the code for our CVPR 2017 paper on Unsupervised Learning using unlabeled videos. This repository contains models trained by the unsupervised motion grouping algorithm both in Caffe and Torch. If you find this work useful in your research, please cite:

@inproceedings{pathakCVPR17learning,
    Author = {Pathak, Deepak and Girshick, Ross and Doll\'{a}r,
              Piotr and Darrell, Trevor and Hariharan, Bharath},
    Title = {Learning Features by Watching Objects Move},
    Booktitle = {Computer Vision and Pattern Recognition ({CVPR})},
    Year = {2017}
}

1) Fetching Models for Unsupervised Transfer

The models below only contains the layer that are used for unsupervised transfer learning. For the full model that contains motion segmentation, see next section.

Clone the repository

git clone https://github.com/pathak22/unsupervised-video.git

Fetch caffe models

cd unsupervised-video/
bash ./models/download_caffe_models.sh
# This will populate the `./models/` folder with trained models.

The models were initially trained in Torch and then converted to caffe. Hence, please include pycaffe based image_transform_layer.py in your folder. It converts the scale and mean of the input image as needed.

Fetch torch models

cd unsupervised-video/
bash ./models/download_torch_models.sh
# This will populate the `./models/` folder with trained models.

2) Fetching Motion Segmentation models

Follow the instructions below to download full motion segmentation model trained on the automatically selected 205K videos from YFCC100m. I trained it in Torch, but you can train your own model from the full data available here in any deep learning package using the training details from paper.

cd unsupervised-video/
bash ./models/download_torch_motion_model.sh
# This will populate the `./models/` folder with trained model.

cd motionseg/
th load_motionmodel.lua -input ../models/motionSegmenter_fullModel.t7

3) Additional Software Packages

We are releasing software packages which were developed in the project, but could be generally useful for computer vision research. If you find them useful, please consider citing our work. These include:

(a) uNLC [github]: Implementation of unsupervised bottom-up video segmentation algorithm which is unsupervised adaptation of NLC algorithm by Faktor and Irani, BMVC 2014. For additional details, see section 5.1 in the paper.

(b) PyFlow [github]: This is python wrapper around Ce Liu's C++ implementation of Coarse2Fine Optical Flow. This is used inside uNLC implementation, and also generally useful as an independent package.

unsupervised-video's People

Contributors

Stargazers

Watchers

unsupervised-video's Issues

How was the validation accuracy profile while training the foreground/background segmentation model with noisy segments obtained from the motion segmentation algorithm.

Hi,
I just wanted to know, how was the validation accuracy profile when you were training a foreground/background segmentation model with AlexNet/CaffeNet architecture. From what accuracy did you start the training and what validation accuracy you obtained at the end. Were you getting low validation accuracy because of the noisy labels obtained because of the inaccuracy of the motion segmentation algorithm, or you observed a general trend of increasing validation accuracy?
Thanks,
Aditya Vora

Request for sample codes generating segmentation result

I tried to do inference on trained motion segmentation model with motionSegmenter_fullModel.t7.
However I could not find any input loader or sample inference codes.
I only found following piece of inference method code that seems not be able to run.

-- function: inference (used for full scene inference)
function DeepMask:inference()
   self:cuda()
   utils.linear2convTrunk(self.trunk,self.fSz)
   self.trunk:evaluate()
   self.trunk:forward(torch.CudaTensor(1,3,800,800))
   if self.flow then
      utils.linear2convHead(self.flowBranch)
      self.flowBranch:evaluate()
      self.flowBranch:forward(torch.CudaTensor(1,512,300,300))
      return
   end

   utils.linear2convHead(self.maskBranch.modules[1])
   self.maskBranch = self.maskBranch.modules[1]
   self.maskBranch:evaluate()
   self.maskBranch:forward(torch.CudaTensor(1,512,300,300))

   if self.color then
      utils.linear2convHead(self.colorBranch)
      self.colorBranch:evaluate()
      self.colorBranch:forward(torch.CudaTensor(1,512,300,300))
   else
      utils.linear2convHead(self.scoreBranch)
      self.scoreBranch:evaluate()
      self.scoreBranch:forward(torch.CudaTensor(1,512,300,300))
   end
end

could you provide sample codes for inference(generating segmentation mask from trained DeepMaskAlexNet) or explain how to do it.
Thanks.

Size of output mask

Hi Deepak,

I had a question, can you share how the output mask is created, like if I have an input image of let's say 227x227x3 and get an output mask of 56 x 56, how should I apply the output mask on the image or will I be able to see proper segmentation in the mask itself and there is no need to apply it on the image. Are there some coordinates for this mask?

Thanks and Regards

Only NIL in motionSegmenter_fullModel.t7

I got this error while running "load_motionmodel.lua"
load_motionmodel.lua:16: attempt to index local 'model' (a nil value)

I found that after loading "motionSegmenter_fullModel.t7" I only got 'nil' in model.
Any suggestion?

Error while loading torch model

I am trying to load the torch models but none of them seem to work. I keep getting these errors:

Warning: Failed to load function from bytecode: (binary): cannot load incompatible bytecodeWarning: Failed to load function from bytecode: [string ""]:1: unexpected symbol/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/torch/File.lua:308: bad argument #1 to 'ipairs' (table expected, got nil)
stack traceback:
[C]: in function 'ipairs'
/root/torch/install/share/lua/5.1/torch/File.lua:308: in function 'readObject'
/root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/root/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read'
/root/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
/root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/root/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read'
/root/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
/root/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
load_motionmodel.lua:13: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

I am using this docker image: https://hub.docker.com/r/kaixhin/cuda-torch/ with tag: 8.0
It has Lua5.1, Torch7 and Luajit2.1beta installed and I believe that the error comes from the fact that these versions of the software read those files differently.

Can you please tell me what versions of the software you used or whether there are any additional setup details? Thank you.

how to train the model?

Doing inference on provided model

I read the paper, downloaded the model and I have some questions:
(I am talking about the "motionSegmenter_fullModel.t7")

How do you provide data for inference? I understand that there is the 'trunk' which is the modified AlexNet and then there are different heads. I managed to feed it an image and then to feed the maskBranch and scoreBranch with the output from the trunk. I could figure out that only the maskBranch and scoreBranch are used, by following the execution flow which leads me to the next question:
How can I make the model use the colorBranch? And what is the flowBranch used for? It seems that the model in that file just has the scoreBranch in it.
How to interpret the numbers that the scoreBranch and maskBranch compute? I could see that maskBranch outputs a feature map with 3136 channels, but what should it be used for?
I had to modify the line with "model:float()" from load_motionmodel.lua to "model = model:float()" and did the same for cuda as well as the float and cuda functions in DeepMask.

Torch model

Hi, the torch model cannot be read in torch with luajit 21 because of serialization error I am assuming. Please fix.