Git Product home page Git Product logo

shine's Introduction

SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

Mingxuan Liu · Tyler L. Hayes · Elisa Ricci · Gabriela Csurka · Riccardo Volpi

CVPR 2024 ✨Highlight✨

Installation

Requirements:

  • Linux or macOS with Python ≥ 3.8
  • PyTorch ≥ 1.8.2. Install them together at pytorch.org to make sure of this. Note, please check PyTorch version matches that is required by Detectron2.
  • Detectron2: follow Detectron2 installation instructions.
  • OpenAI API (optional, if you want to construct hierarchies using LMMs)

Setup environment

# Clone this project repository under your workspace folder
git clone https://github.com/naver/shine.git --recurse-submodules
cd shine
# Create conda environment and install the dependencies
conda env create -n shine -f shine.yml
# Activate the working environment
conda activate shine
# Install Detectron2 under your workspace folder
# (Please follow Detectron2 official instructions)
cd ..
git clone [email protected]:facebookresearch/detectron2.git
cd detectron2
pip install -e .

Our project uses two submodules, CenterNet2 and Deformable-DETR. If you forget to add --recurse-submodules, do git submodule init and then git submodule update.

Set your OpenAI API Key to the environment variable (optional: if you want to generate hierarchies)

export OPENAI_API_KEY=YOUR_OpenAI_Key

OvOD Models Preparation

SHiNe is training-free. So we just need to download off-the-shelf OvOD models and apply SHiNe on top of them. You can download the models:

and put (or, softlink via ln -s command) under the models folder in this repository as:

SHiNe
    └── models
          ├── codet
            ├── CoDet_OVLVIS_R5021k_4x_ft4x.pth
            └── CoDet_OVLVIS_SwinB_4x_ft4x.pth
          ├── detic
            ├── coco_ovod
              ├── BoxSup_OVCOCO_CLIP_R50_1x.pth
              ├── Detic_OVCOCO_CLIP_R50_1x_caption.pth
              ├── Detic_OVCOCO_CLIP_R50_1x_max-size.pth
              └── Detic_OVCOCO_CLIP_R50_1x_max-size_caption.pth
            ├── cross_eval
              ├── BoxSup-C2_L_CLIP_SwinB_896b32_4x.pth
              ├── BoxSup-C2_LCOCO_CLIP_SwinB_896b32_4x.pth
              ├── Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
              ├── Detic_LI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
              └── Detic_LI_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
            ├── lvis_ovod
              ├── BoxSup-C2_Lbase_CLIP_R5021k_640b64_4x.pth
              ├── BoxSup-C2_Lbase_CLIP_SwinB_896b32_4x.pth
              ├── Detic_LbaseCCcapimg_CLIP_R5021k_640b64_4x_ft4x_max-size.pth
              ├── Detic_LbaseCCimg_CLIP_R5021k_640b64_4x_ft4x_max-size.pth
              ├── Detic_LbaseI_CLIP_R5021k_640b64_4x_ft4x_max-size.pth
              └── Detic_LbaseI_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
            ├── lvis_std
              ├── BoxSup-C2_L_CLIP_R5021k_640b64_4x.pth
              ├── BoxSup-DeformDETR_L_R50_4x.pth
              ├── Detic_DeformDETR_LI_R50_4x_ft4x.pth
              └── Detic_LI_CLIP_R5021k_640b64_4x_ft4x_max-size.pth
          ├── vldet
            ├── lvis_base.pth
            ├── lvis_base_swinB.pth
            ├── lvis_vldet.pth
            └── lvis_vldet_swinB.pth

Datasets Preparation

You can download the datasets:

and put (or, softlink via ln -s command) under the datasets folder in this repository as:

SHiNe
    └── datasets
          ├── inat
          ├── fsod
          ├── imagenet2012
          ├── coco
          └── lvis

Run SHiNe on OvOD

Example of applying SHiNe on Detic for OvOD task using iNat dataset:

# Vanilla OvOD (baseline)
bash scripts_local/Detic/inat/swin/baseline/inat_detic_SwinB_LVIS-IN-21K-COCO_baseline.sh
 
# SHiNe using dataset-provided hierarchy
bash scripts_local/Detic/inat/swin/shine_gt/inat_detic_SwinB_LVIS-IN-21K-COCO_shine_gt.sh

# SHiNe using LLM-generated synthetic hierarchy
bash scripts_local/Detic/inat/swin/shine_llm/inat_detic_SwinB_LVIS-IN-21K-COCO_shine_llm.sh

Run SHiNe on Zero-shot classification

Example of applying SHiNe on CLIP zero-shot transfer task using ImageNet-1k dataset:

# Vanilla CLIP Zero-shot transfer (baseline)
bash scripts_local/Classification/imagenet1k/baseline/imagenet1k_vitL14_baseline.sh

# SHiNe using WordNet hierarchy
bash scripts_local/Classification/imagenet1k/shine_wordnet/imagenet1k_vitL14_shine_wordnet.sh

# SHiNe using LLM-generated synthetic hierarchy
bash scripts_local/Classification/imagenet1k/shine_llm/imagenet1k_vitL14_shine_llm.sh

SHiNe Construction (optional)

Example of constructing SHiNe classifier for OvOD task using iNat dataset:

# SHiNe using dataset-provided hierarchy
bash scripts_build_nexus/inat/build_inat_nexus_gt.sh
# SHiNe using LLM-generated synthetic hierarchy
bash scripts_build_nexus/inat/build_inat_nexus_llm.sh

Hierarchy Tree Planting (optional)

Example of building hierarchy trees using either dataset-provided or llm-generated hierarchy entities.

Dataset-provided Hierarchy

# Build hierarchy tree for iNat using dataset-provided hierarchy
bash scripts_plant_hrchy/inat/plant_inat_tree_gt.sh

# Build hierarchy tree for ImageNet-1k using WordNet hierarchy
bash scripts_plant_hrchy/imagenet1k/plant_imagenet1k_tree_wordnet.sh

LLM-generated Hierarchy

# Build hierarchy tree for iNat using LLM-generated synthetic hierarchy
bash scripts_plant_hrchy/inat/plant_inat_tree_llm.sh

# Build hierarchy tree for ImageNet-1k using LLM-generated synthetic hierarchy
bash scripts_plant_hrchy/imagenet1k/plant_imagenet1k_tree_llm.sh

License

This project is licensed under the LICENSE file.

Citation

If you find our work useful for your research, please cite our paper using the following BibTeX entry:

@inproceedings{liu2024shine,
  title={{SH}i{N}e: Semantic Hierarchy Nexus for Open-vocabulary Object Detection},
  author={Liu, Mingxuan and Hayes, Tyler L. and Ricci, Elisa and Csurka, Gabriela and Volpi, Riccardo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024},
}

Acknowledgment

SHiNe is built upon the awesome works iNat, FSOD, BREEDS, Hierarchy-CLIP, Detic, VLDet, and CoDet. We sincerely thank them for their work and contributions.

shine's People

Contributors

oatmealliu avatar ricvolpi avatar

Stargazers

 avatar Chenxu Zhao avatar Gao Changlong avatar Zhihua Liu avatar Evan Tam avatar Krish Kabra avatar  avatar Trad Do avatar Yeong-Jin Kim avatar Wuyang LI avatar Lim Geun Taek avatar Matt Shaffer avatar DudeLovesTequila avatar  avatar  avatar Xiaobing Han avatar  avatar Zhun Zhong avatar  avatar 爱可可-爱生活 avatar SeshurajuP avatar MillX avatar Huy Lê avatar Nguyễn Quí Vinh Quang avatar  avatar Fabien Baradel avatar Junyeob Baek avatar Nari avatar Tao Wang avatar Realcat avatar  avatar

Watchers

Eunjeong Park (EJ Park) avatar  avatar Kostas Georgiou avatar Matt Shaffer avatar

shine's Issues

FileNotFoundError:inat_clip_a+cname_hrchy_l1.npy

Thank you for your excellent work.
When I execute the following code bash scripts_local/CoDet/inat/rn50/inat_codet_rn50_baseline.sh, an error is reported;
FileNotFoundError: [Errno 2] No such file or directory: 'nexus/inat/vitB32/shine_llm/inat_clip_a+cname_hrchy_l1.npy';
How can I get this file? Thank you very much for your reply.

How reproduce the results of the paper!

Hello, thank you for your excellent work. I have only seen the train script so far. Which script should I run to verify or reproduce the results of your paper?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.