Git Product home page Git Product logo

modgenvis's Introduction

A Generalized Framework for Video Instance Segmentation (CVPR 2023)

Miran Heo, Sukjun Hwang, Jeongseok Hyun, Hanjung Kim, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim

[arXiv] [BibTeX]


Updates

  • Feb 28, 2023: GenVIS is accepted to CVPR 2023!
  • Jan 20, 2023: Code is now available!

Installation

GenVIS is built upon VITA. See installation instructions.

Getting Started

We provide a script train_net_genvis.py, that is made to train all the configs provided in GenVIS.

To train a model with "train_net_genvis.py" on VIS, first setup the corresponding datasets following Preparing Datasets.

Then run with pretrained weights on target VIS dataset in VITA's Model Zoo:

python train_net_genvis.py --num-gpus 4 \
  --config-file configs/genvis/ovis/genvis_R50_bs8_online.yaml \
  MODEL.WEIGHTS vita_r50_ovis.pth

To evaluate a model's performance, use

python train_net_genvis.py --num-gpus 4 \
  --config-file configs/genvis/ovis/genvis_R50_bs8_online.yaml \
  --eval-only MODEL.WEIGHTS /path/to/checkpoint_file

Model Zoo

Additional weights will be updated soon!

YouTubeVIS-2019

Backbone Method AP AP50 AP75 AR1 AR10 Download
R-50 online 50.0 71.5 54.6 49.5 59.7 model
R-50 semi-online 51.3 72.0 57.8 49.5 60.0 model
Swin-L online 64.0 84.9 68.3 56.1 69.4 model
Swin-L semi-online 63.8 85.7 68.5 56.3 68.4 model

YouTubeVIS-2021

Backbone Method AP AP50 AP75 AR1 AR10 Download
R-50 online 47.1 67.5 51.5 41.6 54.7 model
R-50 semi-online 46.3 67.0 50.2 40.6 53.2 model
Swin-L online 59.6 80.9 65.8 48.7 65.0 model
Swin-L semi-online 60.1 80.9 66.5 49.1 64.7 model

OVIS

Backbone Method AP AP50 AP75 AR1 AR10 Download
R-50 online 35.8 60.8 36.2 16.3 39.6 model
R-50 semi-online 34.5 59.4 35.0 16.6 38.3 model
Swin-L online 45.2 69.1 48.4 19.1 48.6 model
Swin-L semi-online 45.4 69.2 47.8 18.9 49.0 model

License

The majority of GenVIS is licensed under a Apache-2.0 License. However portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), IFC(Apache-2.0 License), Mask2Former(MIT License), Deformable-DETR(Apache-2.0 License), and VITA(Apache-2.0 License).

Citing GenVIS

If you use GenVIS in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@inproceedings{GenVIS,
  title={A Generalized Framework for Video Instance Segmentation},
  author={Heo, Miran and Hwang, Sukjun and Hyun, Jeongseok and Kim, Hanjung and Oh, Seoung Wug and Lee, Joon-Young and Kim, Seon Joo},
  booktitle={CVPR},
  year={2023}
}

@inproceedings{VITA,
  title={VITA: Video Instance Segmentation via Object Token Association},
  author={Heo, Miran and Hwang, Sukjun and Oh, Seoung Wug and Lee, Joon-Young and Kim, Seon Joo},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

Acknowledgement

Our code is largely based on Detectron2, IFC, Mask2Former, Deformable DETR, and VITA. We are truly grateful for their excellent work.

modgenvis's People

Contributors

miranheo avatar sukjunhwang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.