Git Product home page Git Product logo

mobile_grounded-segment-any-parts's Introduction

This repository is a fork of the original Grounded Segment Anything: From Objects to Parts In this repository we only Integrate with the MobileSAM

Cheems Seminar

Grounded Segment Anything: From Objects to Parts

In this repo, we expand Segment Anything Model (SAM) to support text prompt input. The text prompt could be object-level:full_moon: (eg, dog) and part-level:last_quarter_moon: (eg, dog head). Furthermore,we build a Visual ChatGPT-based dialogue system πŸ€–πŸ’¬ that flexibly calls various segmentation models when receiving instructions in the form of natural language.

News

  • 2023/04/14: Edit anything at more fine-grained part-level.
  • 2023/04/11: Initial code release.

πŸš€NewπŸš€ Edit on Part-Level

Part Prompt: "dog body"; Edit Prompt: "zebra" p Part Prompt: "cat head"; Edit Prompt: "tiger" p Part Prompt: "chair seat"; Edit Prompt: "cholocate" p Part Prompt: "person head"; Edit Prompt: "combover hairstyle" p

✨✨ Highlights ✨✨

Beyond class-agnostic mask segmentation, this repo contains:

  • Grounded segment anything at both object level and part level.
  • Interacting with models in the form of natural language.

These abilities come from a series of models, including:

Model Function
Segment Anything Segment anything from prompt
GLIP Grounded language-image pre-training
Visual ChatGPT Connects ChatGPT and segmentation foundation models
⭐VLPart⭐ Going denser with open-vocabulary part segmentation

FAQ

Q: When will VLPart paper be released ?

A: VLPart paper has been released. πŸš€πŸš€πŸš€

Q: What is the difference between Grounded SAM and this project ?

A: Grounded SAM is Grounded DINO + SAM, and this project is GLIP/VLPart + SAM. We believe any open-vocabulary (text prompt) object detection model can be used to combine with SAM.

Usage

Install

See installation instructions.

Edit

python demo_part_edit.py

πŸ€–πŸ’¬ Integration with Visual ChatGPT

# prepare your private OpenAI key (for Linux)
export OPENAI_API_KEY={Your_Private_Openai_Key}
python chatbot.py --load "ImageCaptioning_cuda:0, SegmentAnything_cuda:1, PartPromptSegmentAnything_cuda:1, ObjectPromptSegmentAnything_cuda:0"

πŸŒ— Prompt Segment Anything at Part Level

wget https://github.com/Cheems-Seminar/grounded-segment-any-parts/releases/download/v1.0/swinbase_part_0a0000.pth
wget https://raw.githubusercontent.com/ChaoningZhang/MobileSAM/master/weights/mobile_sam.pt

python demo_vlpart_sam.py --input_image assets/twodogs.jpeg --output_dir outputs_demo --text_prompt "dog head"

Result:

πŸŒ• Prompt Segment Anything at Object Level

wget https://github.com/Cheems-Seminar/grounded-segment-any-parts/releases/download/v1.0/glip_large.pth

python demo_glip_sam.py --input_image assets/demo2.jpeg --output_dir outputs_demo --text_prompt "frog"

Result:

🍭 Multi-Prompt

For multiple prompts, seperate each prompt with ., for example, --text_prompt "dog head. dog nose"

Model Checkpoints

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Acknowledgement

A large part of the code is borrowed from segment-anything, EditAnything, CLIP, GLIP, Grounded-Segment-Anything, Visual ChatGPT. Many thanks for their wonderful works.

Citation

If you find this project helpful for your research, please consider citing the following BibTeX entry.

@misc{segrec2023,
  title =        {Grounded Segment Anything: From Objects to Parts},
  author =       {Sun, Peize and Chen, Shoufa and Luo, Ping},
  howpublished = {\url{https://github.com/Cheems-Seminar/grounded-segment-any-parts}},
  year =         {2023}
}

@article{vlpart2023,
  title   =  {Going Denser with Open-Vocabulary Part Segmentation},
  author  =  {Sun, Peize and Chen, Shoufa and Zhu, Chenchen and Xiao, Fanyi and Luo, Ping and Xie, Saining and Yan, Zhicheng},
  journal =  {arXiv preprint arXiv:2305.11173},
  year    =  {2023}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.