Git Product home page Git Product logo

osvos-pytorch's Introduction

OSVOS: One-Shot Video Object Segmentation

Check our project page for additional information. OSVOS

This repository was ported to PyTorch 0.4.0!

OSVOS is a method that tackles the task of semi-supervised video object segmentation. It is based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one-shot). Experiments on DAVIS 2016 show that OSVOS is faster than currently available techniques and improves the state of the art by a significant margin (79.8% vs 68.0%).

This PyTorch code is a posteriori implementation of OSVOS and it does not contain the boundary snapping branch. The results published in the paper were obtained using the Caffe version that can be found at OSVOS-caffe. TensorFlow implementation is also available at OSVOS-TensorFlow.

Installation:

  1. Clone the OSVOS-PyTorch repository

    git clone https://github.com/kmaninis/OSVOS-PyTorch.git
  2. Install - if necessary - the required dependencies:

    • Python (tested with Anaconda 2.7 and 3.6)
    • PyTorch (conda install pytorch torchvision -c pytorch - tested with PyTorch 0.4, CUDA 8.0 and 9.0)
    • Other python dependencies: numpy, scipy, matplotlib, opencv-python, graphviz.
    • Optionally, install tensorboard (pip install tensorboard tensorboardx)
  3. Edit the paths in mypath.py

Online training and testing

  1. Download the parent model (55 MB), and unzip it under models/, by running:
    cd models/
    chmod +x download_parent_model.sh
    ./download_parent_model.sh
    cd ..
  2. Edit in file train_online.py the 'User defined parameters' (eg. gpu_id, etc).
  3. Run python train_online.py.

Training the parent network (optional)

  1. All the training sequences of DAVIS 2016 are required to train the parent model, thus download them from here.
  2. Download the VGG model (55 MB) pretrained on ImageNet, and unzip it under models/, by running:
    cd models/
    chmod +x download_vgg_weights.sh
    ./download_vgg_weights.sh
    cd ..
  3. Place the files with the train and test sequences names in the DAVIS root folder (db_root_dir() in mypath.py).
  4. Edit the 'User defined parameters' (eg. gpu_id) in file train_parent.py.
  5. Run train_parent.py. This step takes 20 hours to train (Titan-X Pascal).

Enjoy!

Citation:

@Inproceedings{Cae+17,
  Title          = {One-Shot Video Object Segmentation},
  Author         = {S. Caelles and K.K. Maninis and J. Pont-Tuset and L. Leal-Taix\'e and D. Cremers and L. {Van Gool}},
  Booktitle      = {Computer Vision and Pattern Recognition (CVPR)},
  Year           = {2017}
}

If you encounter any problems with the code, want to report bugs, etc. please contact us at {kmaninis, scaelles}[at]vision[dot]ee[dot]ethz[dot]ch.

osvos-pytorch's People

Contributors

jponttuset avatar kmaninis avatar scaelles avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

osvos-pytorch's Issues

`train_online.py` time

Using the default settings, roughly how long should train_online.py take to train? I'm seeing ~1 hour on a TITANX -- is that in the right neighborhood? (The paper makes it sound like the finetuning runs for < 1 minute.)

~ Ben

Optimizer learning rates

What’s the basis of setting different learning rates in optimizer for each parameter of the network? Are these learning rates based on an existing research, or are they randomly set at first and then finetuned by you?

confused by a formula in class_balanced_cross_entropy_loss.py

Hi, your repo is really awesome.

and I have a problem in understanding a formula in class_balanced_cross_entropy_loss.py
loss_val = torch.mul(output, (labels - output_gt_zero)) - torch.log( 1 + torch.exp(output - 2 * torch.mul(output, output_gt_zero)))

compared with the original formula, is there any special purpose?

loss islarge

HI, thanks for your sharing, when I run train_online.py, the loss is 403.75 and epoch = 10000,is it normal? Similarly, when I run the code train_parent.py, the network does not converge. Can you give me some suggestions?

How long should train_online take?

Using the default parameters with the "blackswan" sequence, train_online.py takes about 30 minutes for fine tuning + processing the test sequence on the GTX 1080 Ti. Is that normal?

Also as a quick follow up - to test with multiple frames annotated would I simply change the dataloader to load more than one frame when in test mode?

Thanks for sharing this!

Implement three measures

Hello! I am a newcomer to learning video segmentation. I think your thesis is very good. Do you implement three measures:region similarity,contour accuracy and temporal instability? Thank you !

The code of contour snap

Hi,
I read your paper, but I can not find the part of code about contour snap which expressed in paper. Especially,the Fast Bilateral Solver(FBS) and the CNN trained to detect object contour.
Could you show me that. Thanks.

OSVOS Contour Branch code in PyTorch

Hey, It would be great if we could have the entire OSVOS code (including Contour Branch) ported to PyTorch so that we can reproduce the results in the paper and implement our own ideas on top of it. Thanking you in anticipation.

online training

Hello! Is online training using the actual annotated image of the first frame of the training set as training data? Still using the first frame of the test set? I don't know what is online training. Thank you !

net visualisation results in RuntimeError

Hi,
I set the variable vis_net to 1 and run train_online.py but the following errors came out:

Constructing OSVOS architecture..
Initializing weights..
Traceback (most recent call last):
  File "train_online.py", line 72, in <module>
    y = net.forward(x)
  File "/home/yiming/projects/OSVOS-PyTorch/networks/vgg_osvos.py", line 61, in forward
    x = self.stages[0](x)
  File "/home/yiming/projects/miniconda3/envs/pt3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/yiming/projects/miniconda3/envs/pt3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
    input = module(input)
  File "/home/yiming/projects/miniconda3/envs/pt3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/yiming/projects/miniconda3/envs/pt3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

I think the following line should be put before visualisation.
net.to(device) # PyTorch 0.4.0 style

Nice work!

Check on a video

Can you guys explain how do I check the generated models on a video. Thank you.

How to add Mask Input?

Hello,
During test phase , I can not find mask input at the first frame, only find net.forward the image .
Or u mean, firstly we need to use train phase for the first phase,
and test on the following phase.

Thanks!

Question about evaluation result

Hello, I have run the code and got the segmentation of each video after 10000 epoch online training, but the evaluation result is lower than that in the paper. I wonder if there are some tricks to improve the result or just that I made some mistakes in this process? Thanks!

Time comparison between PyTorch and TensorFlow

So I did some time benchmarks and found that about 10 epochs takes the following amount of time:

Env:
OS: Ubuntu 16.04
GPU: Dual 1080 TIs with HB SLI bridge

TensorFlow 1.9: 0.85 seconds
PyTorch 0.5: 2.32 seconds

Is this consistent with what you've observed? I'm trying to figure out which of the following is true: 1) this is a side effect of my installation; 2) the TensorFlow version is much more optimized; 3) there is an underlying side effect that makes TensorFlow faster than PyTorch for this network.

Unable to run train_online.py

# python train_online.py
Constructing OSVOS architecture..
Initializing weights..
Done initializing train_seqs Dataset
Done initializing val_seqs Dataset
Start of Online Training, sequence: blackswan
Traceback (most recent call last):
  File "train_online.py", line 129, in <module>
    outputs = net.forward(inputs)
  File "/host/workspace/OSVOS-PyTorch/networks/vgg_osvos.py", line 68, in forward
    side.append(center_crop(self.upscale[i - 1](side_temp), crop_h, crop_w))
  File "/host/workspace/OSVOS-PyTorch/layers/osvos_layers.py", line 57, in center_crop
    crop_h.ceil().int()[0], crop_h.floor().int()[0],
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py", line 1914, in pad
    return ConstantPadNd.apply(input, pad, value)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/_functions/padding.py", line 27, in forward
    output = input.new(input.size()[:(ctx.l_diff)] + new_dim).fill_(ctx.value)
TypeError: torch.Size() takes an iterable of 'int' (item 2 is 'Variable')

PyTorch version: 0.4.0a0+71d7321
Python version: Python 3.6.4 :: Anaconda, Inc.

train_online RuntimeError

Hi,

since today, I get the following RuntimeError after running python train_online.py:

Constructing OSVOS architecture..
Initializing weights..
Done initializing train_seqs Dataset
Done initializing val_seqs Dataset
Start of Online Training, sequence: blackswan
Traceback (most recent call last):
  File "train_online.py", line 124, in <module>
    outputs = net.forward(inputs)
  File "/home/christoph/OSVOS-PyTorch/networks/vgg_osvos.py", line 61, in forward
    x = self.stages[0](x)
  File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 338, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 64 3 3 3, expected input[1, 1, 3, 1] to have 3 channels, but got 1 channels instead

Do you know what could be the problem?

Best
Christoph

No file `train_seqs.txt`

When I run train_parent.py I get an error that I'm missing the train_seqs.txt file -- is this something that's available w/ the DAVIS dataset? Or is this something that I need to make?

Are you possibly able to upload a copy here, so I know that I'm using the same train/test split as you all?

Thanks
Ben

Averaged Gradients

Hi, can you please clarify on why you average the gradients every n iterations? Is it a way to increase minibatch size if the batch does not fit in memory? Thanks!

an error occurred when running train_parent.py

python train_parent.py
Using GPU: 0
Constructing OSVOS architecture..
Initializing weights..
Loading weights from Caffe VGG
Traceback (most recent call last):
File "train_parent.py", line 112, in
db_train = db.DAVIS2016(train=True, inputRes=None, db_root_dir=db_root_dir, transform=composed_transforms)
File "/home/fcy/OSVOS_PyTorch/dataloaders/davis_2016.py", line 46, in init
with open(os.path.join(db_root_dir, fname + '.txt')) as f:
FileNotFoundError: [Errno 2] No such file or directory: './DAVIS/train_seqs.txt'
there is no 'train_seqs.txt', where is the file?could u plz tell me how to get the file?thx,

Confused by " inputs.requires_grad_()"

Hello! I don't understand the meaning of this code,Why should input be derivable?if inputs.requires_grad_(),the attribute of inputs requires_grad is true,that means input is derivable.I don't why .Thank you !

class_balanced_cross_entropy_loss

Hello!
image

I don't understand this formula. This formula is different from the sigmoid cross entropy loss function.Why do you calculate this way?Thank you !

official measure code

I have a question about the official code "jaccard.py" file and the "f_boundary.py" file.

The parameters of the "db_eval_iou" function in the 'jaccard.py' file, one is the binary annotation map, and the other is the binary segmentation map. The parameter binary segmentation map refers to the network segmentation output with a threshold of 0.5 for binarization. Or is the output of the network binarized with a threshold of 0.5 after the sigmoid activation function?

The parameters of the 'db_eval_boundary' function in the 'f_boundary.py' file, one is the foreground_mask(binary segmentation image.) and the other is gt_mask(binary annotated image). Is the gt_mask referring to the binarization annotation? What is the foreground_mask parameter?

I am a newcomer to learning video segmentation, thank you!

Question about evaluation metric.

Hi, thanks for nice code about OSVOS.
Is there any evaluation Pytorch-code? (about Region Similarity, Contour Accuracy, Temporal stability).
I hope the reply.
Thank.

High CPU usage

I'm experiencing a very high CPU usage running the train_online.py file with the default settings. Is this normal? It increases from 30% to around 95% on all 36 available cores.

OS: Ubuntu 16.04
PyTorch: 0.4.1.post2
CUDA: 9.0
Python: 3.7.0

Thanks,
Yannick

How to Evaluate the model?

@kmaninis How to get the Evaluation metrics as mentioned in the paper. I've got the pth model after training the model from the code but don't know how to evaluate it. I'm new to research domain so I'd be thankful if you could guide me through it.

Also, can we get a combined model for all object masks detection or would it be just individual pth models?

train_seqs.txt and val_seqs.txt

I tried to train the parent network using DAVIS-2016 dataset.
It needs the train_seqs.txt and val_seqs.txt files.
But, I cannot find these text file in DAVIS-2016 dataset.
Can you provide these txt files?
I think these files have a folder name in dataset, right?

Code

Hello!Why is the code link address of this paper “A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation” disappearing?Where can I download this code?Thank you!

Emergency!

Emergency!
I want to test the OSVOS speed on another machine, how many training nEpochs are used in the paper about Davis16 80.2 J&F?
I used the default values disclosed in the code and found that the online training time was exceptionally long, exceeding the speeds listed in other VOS papers. Can you publish some other details of the code testing?
I see that the speed difference in other papers are relatively large, I look forward to your reply!​

Why total iterations are different to them in the paper?

in code train_parent.py
The epoch is set to 240 ( 500,000 /2079, 2079 is the number of training images.)
But both implemntation on tensorflow and the paper say total iteration is 50,000.
Is there any certain reason to this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.