timojl / clipseg Goto Github PK

This repository contains the code of the CVPR 2022 paper "Image Segmentation Using Text and Image Prompts".

License: Other

Jupyter Notebook 18.70% Python 81.30%

clipseg's Introduction

Image Segmentation Using Text and Image Prompts

This repository contains the code used in the paper "Image Segmentation Using Text and Image Prompts".

November 2022: CLIPSeg has been integrated into the HuggingFace Transformers library. Thank you, NielsRogge!
September 2022: We released new weights for fine-grained predictions (see below for details).
March 2022: The Paper has been accepted to CVPR 2022!

The systems allows to create segmentation models without training based on:

An arbitrary text query
Or an image with a mask highlighting stuff or an object.

Quick Start

In the Quickstart.ipynb notebook we provide the code for using a pre-trained CLIPSeg model. If you run the notebook locally, make sure you downloaded the rd64-uni.pth weights, either manually or via git lfs extension. It can also be used interactively using MyBinder (please note that the VM does not use a GPU, thus inference takes a few seconds).

Dependencies

This code base depends on pytorch, torchvision and clip (pip install git+https://github.com/openai/CLIP.git). Additional dependencies are hidden for double blind review.

Datasets

PhraseCut and PhraseCutPlus: Referring expression dataset
PFEPascalWrapper: Wrapper class for PFENet's Pascal-5i implementation
PascalZeroShot: Wrapper class for PascalZeroShot
COCOWrapper: Wrapper class for COCO.

Models

CLIPDensePredT: CLIPSeg model with transformer-based decoder.
ViTDensePredT: CLIPSeg model with transformer-based decoder.

Third Party Dependencies

For some of the datasets third party dependencies are required. Run the following commands in the third_party folder.

git clone https://github.com/cvlab-yonsei/JoEm
git clone https://github.com/Jia-Research-Lab/PFENet.git
git clone https://github.com/ChenyunWu/PhraseCutDataset.git
git clone https://github.com/juhongm999/hsnet.git

Weights

The MIT license does not apply to these weights.

We provide three model weights, for D=64 (2x, ~4MB each) and D=16 (~1MB).

wget https://owncloud.gwdg.de/index.php/s/ioHbRzFx6th32hn/download -O weights.zip
unzip -d weights -j weights.zip

New Fine-grained Weights

We introduced a more complex module for transforming tokens into predictions that allow for more refined predictions (in contrast to the square-like predictions of other weights). Corresponding weights are available in the weight download above called rd64-uni-refined.pth. They can be loaded by:

model = CLIPDensePredT(version='ViT-B/16', reduce_dim=64, complex_trans_conv=True)
model.load_state_dict(torch.load('weights/rd64-uni-refined.pth'), strict=False)

See below for a direct comparison of the new fine-grained weights (top) and the old weights (below).

Training and Evaluation

To train use the training.py script with experiment file and experiment id parameters. E.g. python training.py phrasecut.yaml 0 will train the first phrasecut experiment which is defined by the configuration and first individual_configurations parameters. Model weights will be written in logs/.

For evaluation use score.py. E.g. python score.py phrasecut.yaml 0 0 will train the first phrasecut experiment of test_configuration and the first configuration in individual_configurations.

Usage of PFENet Wrappers

In order to use the dataset and model wrappers for PFENet, the PFENet repository needs to be cloned to the root folder. git clone https://github.com/Jia-Research-Lab/PFENet.git

License

The source code files in this repository (excluding model weights) are released under MIT license.

Citation

@InProceedings{lueddecke22_cvpr,
    author    = {L\"uddecke, Timo and Ecker, Alexander},
    title     = {Image Segmentation Using Text and Image Prompts},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {7086-7096}
}

clipseg's People

Contributors

Stargazers

Watchers

Forkers

afiaka87 benjamesbabala cv-seg batsresearch techthiyanes amrrs jackzhousz wildkong moorehousew nabul brandnewx remybonnav neophack bupt-zhenghui jaedukseo invoke-ai nielsrogge vaibhavpawar05 thedarkzeno sergiobr bitcognosorg himanshu-1994 deeplearning00 humblegamer shalevy1 tahercoolguy ercanozer asears chendudai labdeeman7 joskid webaverse ethan-jiang-1 codeaudit vinesmsuic anylee2021 moerehman 3a1b2c3 tumble-weed s20200366 vismayvora sanchit88 klonggan moisesrbd feng-huang ismailozenc cpulling ethicalsecurity-agency pathbreak panzerrrr mm10111 fake10086 lucyellu paintlikeanengineer yanlingz radiance-ai jokerlg shivamkapoor172002 dengjl-hub navidcomsc daitranskku tsainath sliops54 thejohnhoffer yiluzhou1 jrisking crystallinal 5l1v3r1 ruilvdotcomceo jojijoseph snakey0u0 birkanekinci06 senonets sunpro108 zcfrank1st ut-amrl whuhxb korea-robot paperwave wpr001 shrutidhange chaojie rumaima timyoung2333 zhouhao0218 bluecoffee8 undeadzed liucr jags111 snowymo salehiac djene-mengistu vishvamporwal doubtfire009 packi1992 harleywang elsword016

clipseg's Issues

UnpicklingError: invalid load key, 'v'.

I was running the Quickstart notebook in my local, but I was getting the following error.


---> model.load_state_dict(torch.load('weights/rd16-uni.pth', map_location=torch.device('cpu')), strict=False)

File /scratch/miniconda3/envs/clipseg-environment/lib/python3.10/site-packages/torch/serialization.py:713, in load(f, map_location, pickle_module, **pickle_load_args)
    711             return torch.jit.load(opened_file)
    712         return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
--> 713 return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)

File /scratch/miniconda3/envs/clipseg-environment/lib/python3.10/site-packages/torch/serialization.py:920, in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
    914 if not hasattr(f, 'readinto') and (3, 8, 0) <= sys.version_info < (3, 8, 2):
    915     raise RuntimeError(
    916         "torch.load does not work with file-like objects that do not implement readinto on Python 3.8.0 and 3.8.1. "
    917         f"Received object of type \"{type(f)}\". Please update to Python 3.8.2 or newer to restore this "
    918         "functionality.")
--> 920 magic_number = pickle_module.load(f, **pickle_load_args)
    921 if magic_number != MAGIC_NUMBER:
    922     raise RuntimeError("Invalid magic number; corrupt file?")

UnpicklingError: invalid load key, 'v'.

About zero-shot experiment

Hi @timojl , thanks for sharing this! I was wondering if you were planning to add the .yaml of zero-shot experiment to the repository. Thanks for considering!

table 2

Dear author：
How to get results of the table 2？

import error in clipseg.py

On line 21 (models/clipseg.py) , I got an import error. from models.clip_prompts import imagenet_templates
In fact , there has no file or function called clip_prompts in the models directory.
Thank you very much!

Evaluated on referring image seg datasets on RefCOCO, RefCOCO+ , and G-Ref

@timojl
Nice work!
In the field of referring image seg, RefCOCO, RefCOCO+ , and G-Ref are wided commonly datasets to evaluate proposed method and compare with others, e.g., the new CVPR 2022 paper CRIS: CLIP-Driven Referring Image Segmentation, which is also based on CLIP. Do you plan to report results on these datasets or provide corresponding scripts in your repo. to achieve that by readers themselves ?

Missing file category_info.json

HI, Thanks for your great job!. Could you please provide details on the structure of the LIVS_OneShot3b.tar file, or guidance on how to obtain the missing "category_info.json" file? Your assistance in this matter would be greatly appreciated!

Object Not Present

Hi! Well done for this very interesting work!
Do you know is there is particular way of detecting if the object is not present in the image? (by "detecting" I mean with a threshold, for example)

Performance on Pascal-5i one-shot

Can you share the LVIS subset?

Hello, I saw you are using a subset of LVIS for analysis of generalized prompts. Could you share the file LVIS_OneShot3b.tar? Thanks!

how many gpus is need to train this project

thanks for your great jobs, i would like to know how many gpus is need to train this project

Is there a way to set a seed?

Trying to get the same output consitently for video frame-by-frame use. Can't find any sort of paramater/argument/function to call to set a custom seed...

The datasets used to train the released model CIDAS/clipseg-rd64-refined

Hi, I used the pretrained model CIDAS/clipseg-rd64-refined. The performance is amazing.
What datasets are used to train it?

Binder demo does not start

Thanks for sharing your work. I wanted to play around with the binder demo, but the starting process failed:

Waiting for build to start...
Picked Git content provider.
Cloning into '/tmp/repo2dockerxd2n4z3m'...
HEAD is now at c72775e Update Tables.ipynb
Building conda environment for python=
Using CondaBuildPack builder
Building conda environment for python=
Building conda environment for python=
Step 1/50 : FROM buildpack-deps:bionic
 ---> aeecfa359fe2
Step 2/50 : ENV DEBIAN_FRONTEND=noninteractive
 ---> Using cache
 ---> 15d7fb71e427
Step 3/50 : RUN apt-get -qq update &&     apt-get -qq install --yes --no-install-recommends locales > /dev/null &&     apt-get -qq purge &&     apt-get -qq clean &&     rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> efcacf39cb6c
Step 4/50 : RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen &&     locale-gen
 ---> Using cache
 ---> e97e83d6e6fe
Step 5/50 : ENV LC_ALL=en_US.UTF-8     LANG=en_US.UTF-8     LANGUAGE=en_US.UTF-8
 ---> Using cache
 ---> 313d9271d896
Step 6/50 : ENV SHELL=/bin/bash
 ---> Using cache
 ---> f9506f76bb30
Step 7/50 : ARG NB_USER
 ---> Using cache
 ---> 2e15943b81bd
Step 8/50 : ARG NB_UID
 ---> Using cache
 ---> e4ffad3b472e
Step 9/50 : ENV USER=${NB_USER}     HOME=/home/${NB_USER}
 ---> Using cache
 ---> 18a4ca813c0d
Step 10/50 : RUN groupadd         --gid ${NB_UID}         ${NB_USER} &&     useradd         --comment "Default user"         --create-home         --gid ${NB_UID}         --no-log-init         --shell /bin/bash         --uid ${NB_UID}         ${NB_USER}
 ---> Using cache
 ---> 5ba210baff9f
Step 11/50 : RUN apt-get -qq update &&     apt-get -qq install --yes --no-install-recommends        less        unzip        > /dev/null &&     apt-get -qq purge &&     apt-get -qq clean &&     rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> ea907f656878
Step 12/50 : EXPOSE 8888
 ---> Using cache
 ---> 644fa9523dd9
Step 13/50 : ENV APP_BASE=/srv
 ---> Using cache
 ---> d5074b9e0ab6
Step 14/50 : ENV CONDA_DIR=${APP_BASE}/conda
 ---> Using cache
 ---> 1c0a8568dbed
Step 15/50 : ENV NB_PYTHON_PREFIX=${CONDA_DIR}/envs/notebook
 ---> Using cache
 ---> 99907061c0da
Step 16/50 : ENV NPM_DIR=${APP_BASE}/npm
 ---> Using cache
 ---> f63a59ad3fe2
Step 17/50 : ENV NPM_CONFIG_GLOBALCONFIG=${NPM_DIR}/npmrc
 ---> Using cache
 ---> 73584243829b
Step 18/50 : ENV NB_ENVIRONMENT_FILE=/tmp/env/environment.lock
 ---> Using cache
 ---> 6c59dbc22fde
Step 19/50 : ENV MAMBA_ROOT_PREFIX=${CONDA_DIR}
 ---> Using cache
 ---> f25ddeaab3e1
Step 20/50 : ENV MAMBA_EXE=${CONDA_DIR}/bin/mamba
 ---> Using cache
 ---> 799e495bfcf1
Step 21/50 : ENV CONDA_PLATFORM=linux-64
 ---> Using cache
 ---> f4ccf5e2c06f
Step 22/50 : ENV KERNEL_PYTHON_PREFIX=${NB_PYTHON_PREFIX}
 ---> Using cache
 ---> 78237aeff65f
Step 23/50 : ENV PATH=${NB_PYTHON_PREFIX}/bin:${CONDA_DIR}/bin:${NPM_DIR}/bin:${PATH}
 ---> Using cache
 ---> c2f559bdd37c
Step 24/50 : COPY --chown=1000:1000 build_script_files/-2fusr-2flib-2fpython3-2e10-2fsite-2dpackages-2frepo2docker-2fbuildpacks-2fconda-2factivate-2dconda-2esh-44e764 /etc/profile.d/activate-conda.sh
 ---> Using cache
 ---> 12a3765b4979
Step 25/50 : COPY --chown=1000:1000 build_script_files/-2fusr-2flib-2fpython3-2e10-2fsite-2dpackages-2frepo2docker-2fbuildpacks-2fconda-2fenvironment-2dlinux-2d64-2elock-38eeb7 /tmp/env/environment.lock
 ---> Using cache
 ---> d519cdb7c161
Step 26/50 : COPY --chown=1000:1000 build_script_files/-2fusr-2flib-2fpython3-2e10-2fsite-2dpackages-2frepo2docker-2fbuildpacks-2fconda-2finstall-2dbase-2denv-2ebash-e5509f /tmp/install-base-env.bash
 ---> Using cache
 ---> 67bce1700764
Step 27/50 : RUN TIMEFORMAT='time: %3R' bash -c 'time /tmp/install-base-env.bash' && rm -rf /tmp/install-base-env.bash /tmp/env
 ---> Using cache
 ---> 3a52f8523105
Step 28/50 : RUN mkdir -p ${NPM_DIR} && chown -R ${NB_USER}:${NB_USER} ${NPM_DIR}
 ---> Using cache
 ---> 52d35efb3f2f
Step 29/50 : USER root
 ---> Using cache
 ---> 47252e1fb43b
Step 30/50 : ARG REPO_DIR=${HOME}
 ---> Using cache
 ---> 736e7fb73725
Step 31/50 : ENV REPO_DIR=${REPO_DIR}
 ---> Using cache
 ---> 3dd5cd37b037
Step 32/50 : RUN if [ ! -d "${REPO_DIR}" ]; then         /usr/bin/install -o ${NB_USER} -g ${NB_USER} -d "${REPO_DIR}";     fi
 ---> Using cache
 ---> 2367cd4be48d
Step 33/50 : WORKDIR ${REPO_DIR}
 ---> Using cache
 ---> f77208a02424
Step 34/50 : RUN chown ${NB_USER}:${NB_USER} ${REPO_DIR}
 ---> Using cache
 ---> 3bf86e627f24
Step 35/50 : ENV PATH=${HOME}/.local/bin:${REPO_DIR}/.local/bin:${PATH}
 ---> Using cache
 ---> 24af72d4851c
Step 36/50 : ENV CONDA_DEFAULT_ENV=${KERNEL_PYTHON_PREFIX}
 ---> Using cache
 ---> 6997463f8c6b
Step 37/50 : COPY --chown=1000:1000 src/environment.yml ${REPO_DIR}/environment.yml
 ---> a6203be41c92
Step 38/50 : USER ${NB_USER}
 ---> Running in 31cd6a6a2242
Removing intermediate container 31cd6a6a2242
 ---> 0c18febcc925
Step 39/50 : RUN TIMEFORMAT='time: %3R' bash -c 'time ${MAMBA_EXE} env update -p ${NB_PYTHON_PREFIX} --file "environment.yml" && time ${MAMBA_EXE} clean --all -f -y && ${MAMBA_EXE} list -p ${NB_PYTHON_PREFIX} '
 ---> Running in 563f09fe0407
Transaction

  Prefix: /srv/conda/envs/notebook

  Updating specs:

   - numpy
   - scipy
   - matplotlib-base
   - pip


  Package            Version  Build                Channel                   Size
â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
  Install:
â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€

  + brotli             1.0.9  h166bdaf_8           conda-forge/linux-64      19kB
  + brotli-bin         1.0.9  h166bdaf_8           conda-forge/linux-64      20kB
  + contourpy          1.0.7  py310hdf3cbec_0      conda-forge/linux-64     216kB
  + cycler            0.11.0  pyhd8ed1ab_0         conda-forge/noarch        10kB
  + fonttools         4.39.2  py310h1fa729e_0      conda-forge/linux-64       2MB
  + freetype          2.12.1  hca18f0e_1           conda-forge/linux-64     626kB
  + kiwisolver         1.4.4  py310hbf28c38_1      conda-forge/linux-64      77kB
  + lcms2               2.15  haa2dc70_1           conda-forge/linux-64     242kB
  + lerc               4.0.0  h27087fc_0           conda-forge/linux-64     282kB
  + libblas            3.9.0  16_linux64_openblas  conda-forge/linux-64      13kB
  + libbrotlicommon    1.0.9  h166bdaf_8           conda-forge/linux-64      67kB
  + libbrotlidec       1.0.9  h166bdaf_8           conda-forge/linux-64      34kB
  + libbrotlienc       1.0.9  h166bdaf_8           conda-forge/linux-64     295kB
  + libcblas           3.9.0  16_linux64_openblas  conda-forge/linux-64      13kB
  + libdeflate          1.17  h0b41bf4_0           conda-forge/linux-64      65kB
  + libgfortran-ng    12.2.0  h69a702a_19          conda-forge/linux-64      23kB
  + libgfortran5      12.2.0  h337968e_19          conda-forge/linux-64       2MB
  + libjpeg-turbo    2.1.5.1  h0b41bf4_0           conda-forge/linux-64     491kB
  + liblapack          3.9.0  16_linux64_openblas  conda-forge/linux-64      13kB
  + libopenblas       0.3.21  pthreads_h78a6416_3  conda-forge/linux-64      11MB
  + libpng            1.6.39  h753d276_0           conda-forge/linux-64     283kB
  + libtiff            4.5.0  hddfeb54_5           conda-forge/linux-64     407kB
  + libwebp-base       1.3.0  h0b41bf4_0           conda-forge/linux-64     357kB
  + libxcb              1.13  h7f98852_1004        conda-forge/linux-64     400kB
  + matplotlib-base    3.7.1  py310he60537e_0      conda-forge/linux-64       7MB
  + munkres            1.1.4  pyh9f0ad1d_0         conda-forge/noarch        12kB
  + numpy             1.24.2  py310h8deb116_0      conda-forge/linux-64       7MB
  + openjpeg           2.5.0  hfec8fc6_2           conda-forge/linux-64     352kB
  + pillow             9.4.0  py310h065c6d2_2      conda-forge/linux-64      46MB
  + pooch              1.7.0  pyhd8ed1ab_0         conda-forge/noarch        51kB
  + pthread-stubs        0.4  h36c2ea0_1001        conda-forge/linux-64       6kB
  + pyparsing          3.0.9  pyhd8ed1ab_0         conda-forge/noarch        81kB
  + scipy             1.10.1  py310h8deb116_0      conda-forge/linux-64      25MB
  + unicodedata2      15.0.0  py310h5764c6d_0      conda-forge/linux-64     512kB
  + xorg-libxau        1.0.9  h7f98852_0           conda-forge/linux-64      13kB
  + xorg-libxdmcp      1.1.3  h7f98852_0           conda-forge/linux-64      19kB
  + zstd               1.5.2  h3eb15da_6           conda-forge/linux-64     420kB

  Upgrade:
â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€

  - pip                 23.0  pyhd8ed1ab_0         conda-forge                   
  + pip               23.0.1  pyhd8ed1ab_0         conda-forge/noarch         1MB

  Summary:

  Install: 37 packages
  Upgrade: 1 packages

  Total download: 106MB

â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€




Looking for: ['numpy', 'scipy', 'matplotlib-base', 'pip']


Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Installing pip dependencies: ...working... �[91mPip subprocess error:
  Running command git clone --filter=blob:none --quiet https://github.com/openai/CLIP.git /tmp/pip-req-build-ev5i_0v5
ERROR: Could not find a version that satisfies the requirement torch==1.10.0+cpu (from versions: 1.11.0, 1.11.0+cpu, 1.11.0+cu102, 1.11.0+cu113, 1.11.0+cu115, 1.11.0+rocm4.3.1, 1.11.0+rocm4.5.2, 1.12.0, 1.12.0+cpu, 1.12.0+cu102, 1.12.0+cu113, 1.12.0+cu116, 1.12.0+rocm5.0, 1.12.0+rocm5.1.1, 1.12.1, 1.12.1+cpu, 1.12.1+cu102, 1.12.1+cu113, 1.12.1+cu116, 1.12.1+rocm5.0, 1.12.1+rocm5.1.1, 1.13.0, 1.13.0+cpu, 1.13.0+cu116, 1.13.0+cu117, 1.13.0+cu117.with.pypi.cudnn, 1.13.0+rocm5.1.1, 1.13.0+rocm5.2, 1.13.1, 1.13.1+cpu, 1.13.1+cu116, 1.13.1+cu117, 1.13.1+cu117.with.pypi.cudnn, 1.13.1+rocm5.1.1, 1.13.1+rocm5.2, 2.0.0, 2.0.0+cpu, 2.0.0+cpu.cxx11.abi, 2.0.0+cu117, 2.0.0+cu117.with.pypi.cudnn, 2.0.0+cu118, 2.0.0+rocm5.3, 2.0.0+rocm5.4.2)
ERROR: No matching distribution found for torch==1.10.0+cpu

�[0mRan pip subprocess with arguments:
['/srv/conda/envs/notebook/bin/python', '-m', 'pip', 'install', '-U', '-r', '/home/jovyan/condaenv.mi4k40vy.requirements.txt']
Pip subprocess output:
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting git+https://github.com/openai/CLIP.git (from -r /home/jovyan/condaenv.mi4k40vy.requirements.txt (line 5))
  Cloning https://github.com/openai/CLIP.git to /tmp/pip-req-build-ev5i_0v5
  Resolved https://github.com/openai/CLIP.git to commit a9b1bf5920416aaeaec965c25dd9e8f98c864f16
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'

failed
�[91m
CondaEnvException: Pip failed

�[0m�[91mtime: 62.393
�[0mRemoving intermediate container 563f09fe0407
The command '/bin/sh -c TIMEFORMAT='time: %3R' bash -c 'time ${MAMBA_EXE} env update -p ${NB_PYTHON_PREFIX} --file "environment.yml" && time ${MAMBA_EXE} clean --all -f -y && ${MAMBA_EXE} list -p ${NB_PYTHON_PREFIX} '' returned a non-zero code: 1

The important part being: ERROR: No matching distribution found for torch==1.10.0+cpu

Maybe you can easily fix this.

Missing datasets.lvis_oneshot3 folder

Your repository seems to miss the folder datasets.lvis_oneshot3 which is used to import LVIS_OneShot3 in Visual_Feature_Engineering.ipynb.

import file missing

In phrasecut.py line 122 and line 132 :
from datasets.generate_lvis_oneshot import
I can not find the generate_lvis_oneshot.py file, Thanks!

Expand the size of CLIP

Hello, thank you very much for your work. I would like to ask whether expanding the scale of pre-trained CLIP will help improve the final results? For example, change ViT-B-16 to ViT-L-14. After I changed CLIP to ViT-L-14, the results did not improve, so I don't know if there is a problem with my change or if this change is methodologically useless.

How to download PhraseCut.tar?

Thanks for your great job!

How can I get PhraseCut.tar in phrasecut.py?

Weights are missing

Hi, I think weights directory or the weight files are missing from the repo now (/weights/rd64-uni.pth etc), you might want to check.

image with mask prompt for one shot

Hi, great job! I realy love this CLIP based segment.
I am not so familiar with the code right now, I am just wondering how can I use a image as a prompt ?

How to recognize instances in the image?

It seems the output is a semantic segmentation. How can I distinguish different objects in crowded scene?

import error

When I was running the code, something wrong happened in models/clipseg.py (line 10 and line 17). Please correct the file import error. Thank you for your excellent work!

How can i store dataset while im using train.py

Dear timojl,

I have a question regarding the use of the training.py script. Specifically, I am wondering how to store my training dataset. I am encountering an error that says there is no "dataset_repository" folder. Could you please provide guidance on how to properly store my training dataset including PhraseCut,Coco and so on?
Thank you for your time and assistance.

Best regards,
Cai

LICENSE of this repository

Hi @timojl , thanks for sharing this! I was wondering if you were planning to add an open-source LICENSE to the repository (as of now exclusive copyright would apply to the repo). Thanks for considering :)

owncloud.gwdg.de is not responding

I am trying to download the weights but it fails after a few attempts due to a timeout error.

How can we apply a few-shot using image with mask and text prompt

Thanks for your contribution,I'm still a bit confused on how to do a few- shot?
Is it just a matter of continuing to train using the training code or or fine-tune?

Increasing the resolution of the mask

So when I run the code I get a mask that seems to be made of 32x32 large square on a whole 512x512 pixels image.

Is there a way to increase the resolution of the prediction and thus the mask generated?

Thanks a lot in advance.

image with mask prompt for one shot (revisit)

Hi, I tried but the result does not look good. Would you please help? Thanks.

Weights for ViT-B/32 version of CLIPDensePredT

Hi @timojl, amazing paper! Had a quick question about whether pretrained weights for the CLIPSeg decoder are released for the ViT-B/32 version of CLIP. I see many tutorials with the following code:

os.system("wget https://owncloud.gwdg.de/index.php/s/ioHbRzFx6th32hn/download -O weights.zip")
os.system("unzip -d weights -j weights.zip")
model = CLIPDensePredT(version='ViT-B/16', reduce_dim=64)
model.load_state_dict(torch.load('weights/rd64-uni.pth', map_location=torch.device('cpu')), strict=False)

But I can't find any corresponding weights for the ViT-B/32 version.
Would really appreciate help on this. Thanks!

Difference in PhraseCut images number

Hi @timojl,

This integrity check fails for me, because number of PhraseCut images I downloaded is 78802 (wrt 108250 as you stated in the code). Do you know how that difference came about?

To download the PhraseCut dataset I used the download_dataset script from the PhraseCutDataset repository, just like you mentioned in one of the closed issues.

What should I change if i want to train a new keypoint detection detection?

I want to train the model to get the human's joint position more accureately by training a new dataset. I want to know what should I do if I want to use a new dataset.

Quickstart notebook and refined weights

Hi, thank you for your work.
When I try to modify the Quickstart notebook to load refined wieghts :
model.load_state_dict(torch.load('weights/rd64-uni-refined.pth', map_location=torch.device('cpu')), strict=False);
I get weird results

How do I modify the code when using refined weights?
Thanks

Huggingface's Tranformers implementation?

Is there an implementation that uses the clip model from the transformers lib? If not how hard would it be to implement?

Fine-tune Clipseg

Thanks for the great work! I am writing my master's thesis and would like to use your model for my work, as I already evaluated it on the coco eval2017 dataset and the results are good. I would like to refine the model "CIDAS/clipseg-rd64-refined" for my work, do you have any tutorial on how I can Fine-tune the model, or do you have some tips for me to get started?

Missing imports in `Tables.ipynb` and `Visual_Feature_Engineering.ipynb`

Hi @timojl,

could you please review (and possibly resolve) missing imports in Tables.ipynb (from datasets.lvis_oneshot3 import LVIS_OneShot3, LVIS_OneShot, experiment function) and Visual_Feature_Engineering.ipynb (from datasets.lvis_oneshot3 import LVIS_OneShot3, plot_data, transforms)?

How can we run training?

Thank you for releasing the great code! I enjoyed the notebook a lot.

By the way, can we train a model from scratch or fine-tuning?
Although I read experiment_setup.py, I didn't find how to run the training directly.

Error Running scope.py: 'config.json' File Not Found in Logs Directory

Error Description:
I followed all the setup instructions as mentioned in the readme file. However, when I attempt to run python scope.py coco.yml 0 0, I encounter an error indicating that the 'config.json' file is missing from the 'logs' directory.

Steps to Reproduce:

Set up the environment according to the guidelines.
Execute the command python scope.py coco.yml 0 0.

Expected Result:
The program should run without encountering any errors.

Actual Result:
An error message appears stating that the 'config.json' file cannot be found in the 'logs' directory.

Additional Notes:
I have searched through the documentation and previous issues but could not find any references to this 'config.json' file. Please provide guidance on how to resolve this issue.

License of model weights

Thanks for nice sharing!

Could you tell me the license of model weights?

Unable to download weights

When I run:
"! git clone https://github.com/timojl/clipseg"
in google colab,
I get this error:
Error downloading object: weights/rd16-uni.pth (61545cd): Smudge error: Error downloading weights/rd16-uni.pth (61545cdb3a28f99d33d457c64a9721ade835a9dfbda604c459de6831c504167a): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Here is the FULL ERROR:
Cloning into 'clipseg'...
remote: Enumerating objects: 168, done.
remote: Counting objects: 100% (77/77), done.
remote: Compressing objects: 100% (61/61), done.
remote: Total 168 (delta 36), reused 39 (delta 16), pack-reused 91
Receiving objects: 100% (168/168), 1.21 MiB | 4.54 MiB/s, done.
Resolving deltas: 100% (77/77), done.
Downloading weights/rd16-uni.pth (1.1 MB)
Error downloading object: weights/rd16-uni.pth (61545cd): Smudge error: Error downloading weights/rd16-uni.pth (61545cdb3a28f99d33d457c64a9721ade835a9dfbda604c459de6831c504167a): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Errors logged to /content/clipseg/clipseg/clipseg/.git/lfs/objects/logs/20220922T222731.926515854.log
Use git lfs logs last to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: weights/rd16-uni.pth: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

Could you please share "all_splits.json" file?

Thanks for your contribution, when using lvis db, there is "lvis_sample_areas.pickle" required, could you please share the file with me or share an example data structure of this file?
Thanks in advance.

Reproducing `uni-rd64-refined.pth` weights

Hi @timojl, amazing work, thanks for sharing. :)

Could you please tell me which configuration from individual_configurations produced uni-rd64-refined weights?

Predicted mask image has different dimension than input image

I want to overlay the generated masks with transparency on the original image. But, the predicted masks seem to be in a different shape

image.width,image.height
(1052, 744)
torch.sigmoid(preds[i][0]).cpu().numpy().shape
(352, 352)

only image with mask

hi、 i am interested in your work and i want to know if input is only image with mask, can it output mask object with the orginal image; also i want to know where is the lvis_oneshot dataset. thanks!

FileNotFoundError: [Errno 2] No such file or directory: './dataset_repository/PhraseCut.tar'

Missing init.py file causes import errors

Hey! Great package.

I am using this with the Masquerade-Nodes for comfyui, but on install it complains: "clipseg is not a module".

I found that the clipseg directory doesn't have an __init__.py file in it. Adding this fixed the import issue.

Can you provide an example script for this task - extracting clothes

We give an input image, mask text, and then it will extract that masked area from the image and save into a folder

I have used invoke ai but the results are not like i am expecting

input

command

!mask "C:\Users\King\Downloads\a.jpg" -tm "clothes" 0.99

outputs

my expected output

cv2.error: OpenCV(4.9.0) D:\a\opencv-python\opencv-python\opencv\modules\imgproc\src\resize.cpp:3789: error: (-215:Assertion failed) !dsize.empty() in function 'cv::hal::resize'

!!! Exception during processing !!!
Traceback (most recent call last):
File "D:\Blender_ComfyUI\ComfyUI\execution.py", line 151, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Blender_ComfyUI\ComfyUI\execution.py", line 81, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Blender_ComfyUI\ComfyUI\execution.py", line 74, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Blender_ComfyUI\ComfyUI\custom_nodes\clipseg.py", line 154, in segment_image
heatmap_resized = resize_image(heatmap, dimensions)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Blender_ComfyUI\ComfyUI\custom_nodes\clipseg.py", line 44, in resize_image
return cv2.resize(image, dimensions, interpolation=cv2.INTER_LINEAR)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Adding CLIPSeg to HuggingFace Transformers 🤗

Hi,

Thanks for this awesome work. As I really liked the approach of adapting CLIP for zero and one-shot image segmentation, I implemented your model as a branch of 🤗 Transformers.

The model is soon going to be added to the main library (see huggingface/transformers#20066). Here's a Colab notebook to showcase usage: https://colab.research.google.com/drive/1ijnW67ac6bMnda4D_XkdUbZfilyVZDOh?usp=sharing.

Would you like to create an organization on the hub, under which the checkpoints can be hosted?

Currently I host them under my own username: https://huggingface.co/models?other=clipseg.

Thanks!

Niels,
ML Engineer @ HF