This repository is a fork of the original Grounded Segment Anything: From Objects to Parts In this repository we only Integrate with the MobileSAM
In this repo, we expand Segment Anything Model (SAM) to support text prompt input. The text prompt could be object-level:full_moon: (eg, dog) and part-level:last_quarter_moon: (eg, dog head). FurthermoreοΌwe build a Visual ChatGPT-based dialogue system π€π¬ that flexibly calls various segmentation models when receiving instructions in the form of natural language.
- 2023/04/14: Edit anything at more fine-grained part-level.
- 2023/04/11: Initial code release.
Part Prompt: "dog body"; Edit Prompt: "zebra"
Part Prompt: "cat head"; Edit Prompt: "tiger"
Part Prompt: "chair seat"; Edit Prompt: "cholocate"
Part Prompt: "person head"; Edit Prompt: "combover hairstyle"
Beyond class-agnostic mask segmentation, this repo contains:
- Grounded segment anything at both object level and part level.
- Interacting with models in the form of natural language.
These abilities come from a series of models, including:
Model | Function |
---|---|
Segment Anything | Segment anything from prompt |
GLIP | Grounded language-image pre-training |
Visual ChatGPT | Connects ChatGPT and segmentation foundation models |
βVLPartβ | Going denser with open-vocabulary part segmentation |
Q: When will VLPart paper be released ?
A: VLPart paper has been released. πππ
Q: What is the difference between Grounded SAM and this project ?
A: Grounded SAM is Grounded DINO + SAM, and this project is GLIP/VLPart + SAM. We believe any open-vocabulary (text prompt) object detection model can be used to combine with SAM.
See installation instructions.
python demo_part_edit.py
# prepare your private OpenAI key (for Linux)
export OPENAI_API_KEY={Your_Private_Openai_Key}
python chatbot.py --load "ImageCaptioning_cuda:0, SegmentAnything_cuda:1, PartPromptSegmentAnything_cuda:1, ObjectPromptSegmentAnything_cuda:0"
wget https://github.com/Cheems-Seminar/grounded-segment-any-parts/releases/download/v1.0/swinbase_part_0a0000.pth
wget https://raw.githubusercontent.com/ChaoningZhang/MobileSAM/master/weights/mobile_sam.pt
python demo_vlpart_sam.py --input_image assets/twodogs.jpeg --output_dir outputs_demo --text_prompt "dog head"
Result:
wget https://github.com/Cheems-Seminar/grounded-segment-any-parts/releases/download/v1.0/glip_large.pth
python demo_glip_sam.py --input_image assets/demo2.jpeg --output_dir outputs_demo --text_prompt "frog"
Result:
For multiple prompts, seperate each prompt with .
, for example, --text_prompt "dog head. dog nose"
This project is under the CC-BY-NC 4.0 license. See LICENSE for details.
A large part of the code is borrowed from segment-anything, EditAnything, CLIP, GLIP, Grounded-Segment-Anything, Visual ChatGPT. Many thanks for their wonderful works.
If you find this project helpful for your research, please consider citing the following BibTeX entry.
@misc{segrec2023,
title = {Grounded Segment Anything: From Objects to Parts},
author = {Sun, Peize and Chen, Shoufa and Luo, Ping},
howpublished = {\url{https://github.com/Cheems-Seminar/grounded-segment-any-parts}},
year = {2023}
}
@article{vlpart2023,
title = {Going Denser with Open-Vocabulary Part Segmentation},
author = {Sun, Peize and Chen, Shoufa and Zhu, Chenchen and Xiao, Fanyi and Luo, Ping and Xie, Saining and Yan, Zhicheng},
journal = {arXiv preprint arXiv:2305.11173},
year = {2023}
}