Git Product home page Git Product logo

cvpr22_cross_modal_pseudo_labeling's Introduction

Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling

Overview

This repository contains the implementation of Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling.

In this work, we address open-vocabulary instance segmentation, which learn to segment novel objects without any mask annotation during training by generating pseudo masks based on captioned images.

Image


Installation

Our code is based upon OVR, which is built upon mask-rcnn benchmark. To setup the code, please follow the instruction within INSTALL.md.


Datasets

To download the datasets, please follow the below instructions.

For more information the data directory is structured, please refer to maskrcnn_benchmark/config/paths_catalog.py.

MS-COCO

  • Please download the MS-COCO 2017 dataset into ./datasets/coco/ folder from its official website: https://cocodataset.org/#download.
  • Following prior works, the data is partitioned into base and target classes via:
python ./preprocess/coco/construct_coco_json.py

Open Images & Conceptual Captions

Annotations for Open Images

  • Convert annotation format of Open Images to COCO format (json):
cd ./preprocess/openimages/openimages2coco
python convert_annotations.py -p ../../../datasets/openimages/ --version challenge_2019 --task mask --subsets train
python convert_annotations.py -p ../../../datasets/openimages/ --version challenge_2019 --task mask --subsets val
  • Partition all classes into base and target classes:
python ./preprocess/openimages/construct_openimages_json.py

Annotations for Conceptual Captions

Coming Soon!

Experiments

To reproduce the main experiments in the paper, we provide the script to train the teacher and the student models on both MS-COCO and Open Images & Conceptual Captions below. Please notice that the teacher must be trained first in order to produce pseudo labels/masks to train the student models.

MS-COCO

  • Caption pretraining:
    • Please download the pretrained backbone model from here into the folder ./model_weights. This model is from the OVR code base.
    • Alternatively, you can pretrain the model with:
python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/coco_cap_det/mmss.yaml --skip-test OUTPUT_DIR ./model_weights/model_pretrained.pth
  • Teacher training:
python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/coco_cap_det/zeroshot_mask.yaml OUTPUT_DIR ./checkpoint/mscoco_teacher/ MODEL.WEIGHT ./model_weights/model_pretrained.pth
  • Student training:
python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/coco_cap_det/student_teacher_mask_rcnn_uncertainty.yaml OUTPUT_DIR ./checkpoint/mscoco_student/ MODEL.WEIGHT ./checkpoint/mscoco_teacher/model_final.pth
  • Evaluation
    • To quickly evaluate the performance, we provide pretrained student/teacher model in Pretrained Models. Please download them into ./pretrained_model/ folder and run the following script:
python -m torch.distributed.launch --nproc_per_node=8 tools/test_net.py --config-file configs/coco_cap_det/student_teacher_mask_rcnn_uncertainty.yaml OUTPUT_DIR ./results/mscoco_student MODEL.WEIGHT ./pretrained_model/coco_student/model_final.pth

Open Images & Conceptual Captions

  • Caption pretraining:
Coming Soon!
  • Teacher training:
Coming Soon!
  • Student training:
Coming Soon!
  • Evaluation
Coming Soon!

Pretrained Models

Dataset Teacher Student
MS-COCO model model
Conceptual Caps + Open Images model model

Citation

If this code is helpful for your research, we would appreciate if you cite the work:

@article{Huynh:CVPR22,
  author = {D.~Huynh and J.~Kuen and Z.~Lin and J.~Gu and E.~Elhamifar},
  title = {Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling},
  journal = {{IEEE} Conference on Computer Vision and Pattern Recognition},
  year = {2022}}

cvpr22_cross_modal_pseudo_labeling's People

Contributors

hbdat avatar

Stargazers

Xiaobing Han avatar Guo Pinxue avatar  avatar  avatar  avatar Leslie Wang avatar  avatar  avatar Shawn Rong avatar Koorye avatar Yin Tang avatar Fei avatar  avatar  avatar Udon avatar lg(x) avatar Seonghoon-Yu avatar Zilin Wang avatar Olivia-fsm avatar TimZ avatar Matt Shaffer avatar Jianzong Wu avatar Huanyu Zhang avatar Xingchen Zhao avatar David Marx avatar Xingyu Lin avatar epiphany avatar Binhui Xie (谢斌辉) avatar Roman Hossain Shaon avatar 爱可可-爱生活 avatar Minho Ryu avatar Qilong avatar ThanhLamNguyen avatar Xing Yun (邢云) avatar Baiyu Chen avatar Junran Peng avatar Muzhi Zhu avatar Kashu Yamazaki @CVPR2024 avatar Jiaheng Liu avatar justin avatar Sungguk Cha avatar Xu Ma avatar

Watchers

Kashu Yamazaki @CVPR2024 avatar  avatar Matt Shaffer avatar

cvpr22_cross_modal_pseudo_labeling's Issues

recall results

Hi, thanks for your excellent work!

Could you please give us the mAR (mean average recall) metrics of your results on COCO? In the paper, only mAP is given.

That will be a great help!

Reproducibility on COCO dataset

Amazing work.
I try to reproduce the result on COCO dataset following the instruction, but having trouble getting the result reported in the essay.
Would you mind providing a benchmark showing the results produced by this repo?
Thanks.

Exemplars in training

Hello Dat,
Thank you for your implementation. I concern in st_generalized_rcnn.py, I've found that exemplars is not used in the Student training stage.
Is it correct?

Are exemplars used in other training stages?

Thank you.

recall on novel categories

Hi thanks for providing such a nice work.
I have a small question about the recall value on novel categories.
It wonder whether pseudo labeling for mask can make box proposal for novel categories.
Have you tried to check class-agnostic recall (e.g, AR@100,AR@300) of your model?
Do you think using pseudo mask can improve class-agnostic recall than the baseline?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.