Git Product home page Git Product logo

generateu's Introduction

Generative Region-Language Pretraining for Open-Ended Object Detection

Monash University   ByteDance Inc. 
CVPR 2024

⭐ If GenerateU is helpful to your projects, please help star this repo. Thanks! 🤗


Highlight

  • GenerateU is accepted by CVPR2024.
  • We introduce generative open-ended object detection, which is a more general and practical setting where categorical information is not explicitly defined. Such a setting is especially meaningful for scenarios where users lack precise knowledge of object cate- gories during inference.
  • Our GenerateU achieves comparable results to the open-vocabulary object detection method GLIP, even though the category names are not seen by GenerateU during inference.

Results

Zero-shot domain transfer to LVIS

pseudo-label_examples

Visualizations

👨🏻‍🎨 Pseudo-label Examples

pseudo-label_examples

🎨 Zero-shot LVIS

pseudo-label_examples

Overview

overall_structure

Dependencies and Installation

  1. Clone Repo

    git clone https://github.com/clin1223/GenerateU.git
  2. Create Conda Environment and Install Dependencies

    # create new anaconda env
    conda create -n GenerateU python=3.8 -y
    conda activate GenerateU
    
    # install python dependencies
    pip3 install -r requirements.txt 
    
    # compile Deformable DETR
    cd projects/DDETRS/ddetrs/models/deformable_detr/ops
    bash make.sh
    • CUDA >= 11.3
    • PyTorch >= 1.10.0
    • Torchvision >= 0.11.1
    • Other required packages in requirements.txt

Get Started

Prepare pretrained models

Download our pretrained models from here to the weights folder. For training, prepare the backbone weight Swin-Tiny and Swin-Large following instruction in tools/convert-pretrained-swin-model-to-d2.py

The directory structure will be arranged as:

weights
   |- vg_swinT.pth
   |- vg_swinL.pth
   |- vg_grit5m_swinT.pth
   |- vg_grit5m_swinL.pth
   |- swin_tiny_patch4_window7_224.pkl
   |- swin_large_patch4_window12_384_22k.pkl

Dataset preparation

VG Dataset

LVIS Dataset

(Optional) GrIT-20M Dataset

Dataset strcture should look like:

|-- datasets
`-- |-- vg
    |-- |-- images/
    |-- |-- train_from_objects.json
 `-- |-- lvis
    |-- |-- val2017/
    |-- |-- lvis_v1_minival.json
    |-- |-- lvis_v1_clip_a+cname_ViT-H.npy
 `-- |-- grit_20m
    |-- |-- images/
    |-- |-- grit5m_train_pseudo.json

Training

By default, we train GenerateU using 16 A100 GPUs. You can also train on a single node, but this might prevent you from reproducing the results presented in the paper.

Single-Node Training

When pretraining with VG, single node is enough. On a single node with 8 GPUs, run

python3 launch.py --nn 1 --uni 1 \
--config-file projects/DDETRS/configs/vg_swinT.yaml OUTPUT_DIR outputs/${EXP_NAME}

Multiple-Node Training

# On node 0, run
python3 launch.py --nn 2 --port <PORT> --worker_rank 0 --master_address <MASTER_ADDRESS> \
--uni 1 --config-file /path/to/config/name.yaml  OUTPUT_DIR outputs/${EXP_NAME}
# On node 1, run
python3 launch.py --nn 2 --port <PORT> --worker_rank 1 --master_address <MASTER_ADDRESS> \
--uni 1 --config-file /path/to/config/name.yaml OUTPUT_DIR outputs/${EXP_NAME}

<MASTER_ADDRESS> should be the IP address of node 0. <PORT> should be the same among multiple nodes. If <PORT> is not specifed, programm will generate a random number as <PORT>.

Evaluation

To evaluate a model with a trained/ pretrained model, run

python3 launch.py --nn 1 --eval-only --uni 1 --config-file /path/to/config/name.yaml  \
OUTPUT_DIR outputs/${EXP_NAME}  MODEL.WEIGHTS /path/to/weight.pth

Citation

If you find our repo useful for your research, please consider citing our paper:

@inproceedings{lin2024generateu,
   title={Generative Region-Language Pretraining for Open-Ended Object Detection},
   author={Chuang, Lin and Yi, Jiang and Lizhen, Qu and Zehuan, Yuan and Jianfei, Cai},
   booktitle={Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
   year={2024}
}

Contact

If you have any questions, please feel free to reach me out at [email protected].

Acknowledgement

This code is based on UNINEXT. Some code are brought from FlanT5. Thanks for their awesome works.

Special thanks to Bin Yan and Junfeng Wu for their valuable contributions.

generateu's People

Contributors

clin1223 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.