Git Product home page Git Product logo

pix2gestalt's Introduction

pix2gestalt: Amodal Segmentation by Synthesizing Wholes

CVPR 2024 (Highlight)

pix2gestalt: Amodal Segmentation by Synthesizing Wholes
Ege Ozguroglu1, Ruoshi Liu1, Dídac Surís1, Dian Chen2, Achal Dave2, Pavel Tokmakov2, Carl Vondrick1
1Columbia University, 2Toyota Research Institute

teaser

Updates

  • We have released our training script, dataset, and Gradio demo with inference instructions.
  • Custom training & fine-tuning instructions coming soon. Beyond amodal perception, our repository can also be used to fine-tune Stable Diffusion in an image-conditioned manner with spatial prompts, such as binary masks.
  • Pretrained models are released on Huggingface, more details provided here.
  • pix2gestalt was accepted to CVPR 2024, available on arXiv!

Installation

conda create -n pix2gestalt python=3.9
conda activate pix2gestalt
cd pix2gestalt
pip install -r requirements.txt
git clone https://github.com/CompVis/taming-transformers.git
pip install -e taming-transformers/
git clone https://github.com/openai/CLIP.git
pip install -e CLIP/

Note: We tested the installation processes on a system with Ubuntu 20.04 with NVIDIA GPUs using Ampere architecture.

Inference and Weights

First, download the pix2gestalt weights under pix2gestalt/ckpt through one of the following sources:

https://huggingface.co/cvlab/pix2gestalt-weights/tree/main

wget -c -P ./ckpt https://gestalt.cs.columbia.edu/assets/epoch=000005.ckpt

Note that we have released 2 model weights: epoch=000005.ckpt and epoch=000010.ckpt. By default, we use epoch=000005.ckpt which is the checkpoint after finetuning for 5 epochs on our dataset. We have also released epoch=000010.ckpt, trained for 10 epochs. This checkpoint can be desirable for synthetic occlusion settings (given our dataset approach), though it may naturally suffer in zero-shot generalization compared to our default model.

Download SAM checkpoints:

wget -c -P ./ckpt https://gestalt.cs.columbia.edu/assets/sam_vit_{b,h,l}.pth

Run our Gradio demo for amodal completion and segmentation:

python app.py

Note that this app uses 22-28 GB of VRAM, so it may not be possible to run it on any GPU.

For inference without the Gradio demo, we provide standalone functionality for each component here, encapsulated by the run_pix2gestalt method. It supports both predicted modal masks from SAM (like our demo) or ground truth modal masks.

Training

Download the image-conditioned Stable Diffusion checkpoint released by Lambda Labs:

wget -c -P ./ckpt https://gestalt.cs.columbia.edu/assets/sd-image-conditioned-v2.ckpt

Then, download our fine-tuning dataset via the instructions here and update its path (see data:params:root_dir) in our config.

Run training command:

python main.py \
    -t \
    --base configs/sd-finetune-pix2gestalt-c_concat-256.yaml \
    --gpus 0,1,2,3,4,5,6,7 \
    --scale_lr False \
    --num_nodes 1 \
    --seed 42 \
    --check_val_every_n_epoch 2 \
    --finetune_from ckpt/sd-image-conditioned-v2.ckpt

Note that this training script is set for an 8-GPU system, each with 80GB of VRAM. Empirically, the large batch size is very important for "stably" fine-tuning Stable Diffusion in an image conditioned manner. If you have smaller GPUs, consider using smaller batch sizes with gradient accumulation to obtain a similar effective batch size.

Dataset

Download and extract our dataset of occluded objects & their whole counterparts with:

wget https://gestalt.cs.columbia.edu/assets/pix2gestalt_occlusions_release.tar.gz

tar -xvf pix2gestalt_occlusions_release.tar.gz

Disclaimer: note that the source images are from the Segment Anything-1B Dataset, which has faces and license plates de-identified. For amodal perception targeted specifically for such domains, we recommend re-training or fine-tuning pix2gestalt via our custom trainining instructions.

The dataset is intended for research purposes only. The licenses for the source images are released under the same license that they are in SA-1B.

Amodal Recognition and 3D Reconstruction

Since we synthesize RGB images of whole objects (amodal completion), our approach makes it straightforward to equip various computer vision methods with the ability to handle occlusions, beyond amodal segmentation.

For recognition, we use CLIP as the base open-vocabulary classifier. For novel view synthesis and 3D reconstruction, we use SyncDreamer. Refer to our paper and supplementary for more details.

Citation

If you use this code, please consider citing the paper as:

@article{ozguroglu2024pix2gestalt,
        title={pix2gestalt: Amodal Segmentation by Synthesizing Wholes},
        author={Ege Ozguroglu and Ruoshi Liu and D\'idac Sur\'s and Dian Chen and Achal Dave and Pavel Tokmakov and Carl Vondrick},
        journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
        year={2024}
}

Acknowledgement

This research is based on work partially supported by the Toyota Research Institute, the DARPA MCS program under Federal Agreement No. N660011924032, the NSF NRI Award #1925157, and the NSF AI Institute for Artificial and Natural Intelligence Award #2229929. DS is supported by the Microsoft PhD Fellowship.

pix2gestalt's People

Contributors

egeozguroglu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pix2gestalt's Issues

Demo or Inference code

May I ask how we can demo or do inference on custom image (meaning our own image)?

Also, I wish to ask what is computational burden for running inference of the model on GPU.

When will the code released

Dear author.

Thank you for your awesome work, and congratulations on your work being accepted by CVPR 2024.

I wonder when the entire code will be released, I am very interested in your work and hope to replicate it.

Thank you!

Questions during Run Gradio demo for amodal completion and segmentation

When I run the command python app.py, it gives me the error

Loading model from ./ckpt/epoch=000005.ckpt
Global Step: 3200
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.54 M params.
Keeping EMAs of 688.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/carvekit/utils/download_models.py:101: UserWarning: Failed to download model from HuggingFaceCompatibleDownloader downloader. Trying to download from HuggingFaceCompatibleDownloader downloader.
  warnings.warn(
/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/carvekit/utils/download_models.py:107: UserWarning: Failed to download model from HuggingFaceCompatibleDownloader downloader. No fallback downloader available.
  warnings.warn(
Traceback (most recent call last):
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/connection.py", line 198, in _new_conn
    sock = connection.create_connection(
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
OSError: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 491, in _make_request
    raise new_e
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    self._validate_conn(conn)
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
    conn.connect()
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/connection.py", line 616, in connect
    self.sock = sock = self._new_conn()
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/connection.py", line 213, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f18e4694070>: Failed to establish a new connection: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 847, in urlopen
    retries = retries.increment(
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /Carve/tracer_b7/resolve/d8a8fd9e7b3fa0d2f1506fe7242966b34381e9c5/tracer_b7.pth (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f18e4694070>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/carvekit/utils/download_models.py", line 170, in download_model_base
    r = requests.get(hugging_face_url, stream=True)
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/requests/sessions.py", line 725, in send
    history = [resp for resp in gen]
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/requests/sessions.py", line 725, in <listcomp>
    history = [resp for resp in gen]
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/requests/sessions.py", line 266, in resolve_redirects
    resp = self.send(
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/requests/adapters.py", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /Carve/tracer_b7/resolve/d8a8fd9e7b3fa0d2f1506fe7242966b34381e9c5/tracer_b7.pth (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f18e4694070>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/carvekit/utils/download_models.py", line 98, in download_model
    return self.download_model_base(file_name)
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/carvekit/utils/download_models.py", line 190, in download_model_base
    raise ConnectionError(
ConnectionError: Exception caught when downloading model! Model name: tracer_b7.pth. Exception: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /Carve/tracer_b7/resolve/d8a8fd9e7b3fa0d2f1506fe7242966b34381e9c5/tracer_b7.pth (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f18e4694070>: Failed to establish a new connection: [Errno 101] Network is unreachable')).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/connection.py", line 198, in _new_conn
    sock = connection.create_connection(
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
OSError: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 491, in _make_request
    raise new_e
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    self._validate_conn(conn)
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
    conn.connect()
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/connection.py", line 616, in connect
    self.sock = sock = self._new_conn()
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/connection.py", line 213, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f18f0108c40>: Failed to establish a new connection: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/connectionpool.py", line 847, in urlopen
    retries = retries.increment(
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /Carve/tracer_b7/resolve/d8a8fd9e7b3fa0d2f1506fe7242966b34381e9c5/tracer_b7.pth (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f18f0108c40>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/carvekit/utils/download_models.py", line 170, in download_model_base
    r = requests.get(hugging_face_url, stream=True)
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/requests/adapters.py", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /Carve/tracer_b7/resolve/d8a8fd9e7b3fa0d2f1506fe7242966b34381e9c5/tracer_b7.pth (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f18f0108c40>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/chenhx/software/pix2gestalt/pix2gestalt/app.py", line 307, in <module>
    fire.Fire(run_demo)
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/chenhx/software/pix2gestalt/pix2gestalt/app.py", line 208, in run_demo
    interface = create_carvekit_interface()
  File "/home/chenhx/software/pix2gestalt/pix2gestalt/ldm/util.py", line 48, in create_carvekit_interface
    interface = HiInterface(object_type="object",
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/carvekit/api/high.py", line 56, in __init__
    self.u2net = TracerUniversalB7(
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/carvekit/ml/wrap/tracer_b7.py", line 50, in __init__
    model_path = tracer_b7_pretrained()
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/carvekit/ml/files/models_loc.py", line 53, in tracer_b7_pretrained
    return downloader("tracer_b7.pth")
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/carvekit/utils/download_models.py", line 118, in __call__
    return self.download_model(file_name)
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/carvekit/utils/download_models.py", line 105, in download_model
    return self.fallback_downloader.download_model(file_name)
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/carvekit/utils/download_models.py", line 111, in download_model
    raise e
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/carvekit/utils/download_models.py", line 98, in download_model
    return self.download_model_base(file_name)
  File "/home/chenhx/miniconda3/envs/pix2gestalt/lib/python3.9/site-packages/carvekit/utils/download_models.py", line 190, in download_model_base
    raise ConnectionError(
ConnectionError: Exception caught when downloading model! Model name: tracer_b7.pth. Exception: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /Carve/tracer_b7/resolve/d8a8fd9e7b3fa0d2f1506fe7242966b34381e9c5/tracer_b7.pth (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f18f0108c40>: Failed to establish a new connection: [Errno 101] Network is unreachable')).

Question of the app.py

Hi, thanks for your great work.
My question is, in your paper you add a text prompt (such as bench, person, etc), however, in your app.py, no where to find the text prompt. The only prompt is the click prompt. Did i misunderstood your paper?

Thank you.
WX20240516-213811

Want to download the source code

We are doing agricultural research, the crop will be more folded when growing in the field, so we need to use algorithms that can segment the shaded part, I noticed your project, and would like to take the source code and try it out to see if it can be applied to our agricultural research!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.