Git Product home page Git Product logo

maskclip's Introduction

Extract Free Dense Labels from CLIP [Project Page]

        ███╗   ███╗ █████╗ ███████╗██╗  ██╗ ██████╗██╗     ██╗██████╗
        ████╗ ████║██╔══██╗██╔════╝██║ ██╔╝██╔════╝██║     ██║██╔══██╗
        ██╔████╔██║███████║███████╗█████╔╝ ██║     ██║     ██║██████╔╝
        ██║╚██╔╝██║██╔══██║╚════██║██╔═██╗ ██║     ██║     ██║██╔═══╝
        ██║ ╚═╝ ██║██║  ██║███████║██║  ██╗╚██████╗███████╗██║██║
        ╚═╝     ╚═╝╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝ ╚═════╝╚══════╝╚═╝╚═╝

This is the code for our paper: Extract Free Dense Labels from CLIP.

This repo is a fork of mmsegmentation. So the installation and data preparation is pretty similar.

Installation

Step 0. Install PyTorch and Torchvision following official instructions, e.g.,

pip install torch torchvision
# FYI, we're using torch==1.9.1 and torchvision==0.10.1

Step 1. Install MMCV using MIM.

pip install -U openmim
mim install mmcv-full

Step 2. Install CLIP.

pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git

Step 3. Install MaskCLIP.

git clone https://github.com/chongzhou96/MaskCLIP.git
cd MaskCLIP
pip install -v -e .
# "-v" means verbose, or more output
# "-e" means installing a project in editable mode,
# thus any local modifications made to the code will take effect without reinstallation.

Dataset Preparation

Please refer to dataset_prepare.md. In our paper, we experiment with Pascal VOC, Pascal Context, and COCO Stuff 164k.

MaskCLIP

MaskCLIP doesn't require any training. We only need to (1) download and convert the CLIP model and (2) prepare the text embeddings of the objects of interest.

Step 0. Download and convert the CLIP models, e.g.,

mkdir -p pretrain
python tools/maskclip_utils/convert_clip_weights.py --model ViT16 --backbone
# Other options for model: RN50, RN101, RN50x4, RN50x16, RN50x64, ViT32, ViT16, ViT14

Step 1. Prepare the text embeddings of the objects of interest, e.g.,

python tools/maskclip_utils/prompt_engineering.py --model ViT16 --class-set context
# Other options for model: RN50, RN101, RN50x4, RN50x16, ViT32, ViT16
# Other options for class-set: voc, context, stuff
# Actually, we've played around with many more interesting target classes. (See prompt_engineering.py)

Step 2. Get quantitative results (mIoU):

python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --eval mIoU
# e.g., python tools/test.py configs/maskclip/maskclip_vit16_520x520_pascal_context_59.py pretrain/ViT16_clip_backbone.pth --eval mIoU

Step 3. (optional) Get qualitative results:

python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --show-dir ${OUTPUT_DIR}
# e.g., python tools/test.py configs/maskclip/maskclip_vit16_520x520_pascal_context_59.py pretrain/ViT16_clip_backbone.pth --show-dir output/

MaskCLIP+

MaskCLIP+ trains another segmentation model with pseudo labels extracted from MaskCLIP.

Step 0. Download and convert the CLIP models, e.g.,

mkdir -p pretrain
python tools/maskclip_utils/convert_clip_weights.py --model ViT16
# Other options for model: RN50, RN101, RN50x4, RN50x16, RN50x64, ViT32, ViT16, ViT14

Step 1. Prepare the text embeddings of the target dataset, e.g.,

python tools/maskclip_utils/prompt_engineering.py --model ViT16 --class-set context
# Other options for model: RN50, RN101, RN50x4, RN50x16, ViT32, ViT16
# Other options for class-set: voc, context, stuff

Train. Depending on your setup (single/mutiple GPU(s), multiple machines), the training script can be different. Here, we give an example of multiple GPUs on a single machine. For more infomation, please refer to train.md.

sh tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM}
# e.g., sh tools/dist_train.sh configs/maskclip_plus/zero_shot/maskclip_plus_r50_deeplabv3plus_r101-d8_480x480_40k_pascal_context.py 4

Inference. See step 2 and step 3 under the MaskCLIP section. (We will release the trained models soon.)

Citation

If you use MaskCLIP or this code base in your work, please cite

@InProceedings{zhou2022maskclip,
    author = {Zhou, Chong and Loy, Chen Change and Dai, Bo},
    title = {Extract Free Dense Labels from CLIP},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year = {2022}
}

Contact

For questions about our paper or code, please contact Chong Zhou.

maskclip's People

Contributors

xvjiarui avatar junjun2016 avatar mengzhangli avatar chongzhou96 avatar sennnnn avatar yamengxi avatar rockeycoss avatar sshuair avatar daavoo avatar grimoire avatar hellock avatar lkm2835 avatar johnzja avatar wuziyi616 avatar drcut avatar yinchimaoliang avatar xiaojianzhong avatar uni19 avatar innerlee avatar congee524 avatar vvsssssk avatar mmeendez8 avatar junjue-wang avatar dreamerlin avatar freywang avatar amrit110 avatar pkurainbow avatar rangilyu avatar shoupingshan avatar sixiaozheng avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.