chongzhou96 / edgesam Goto Github PK

View Code? Open in Web Editor NEW

690.0 14.0 30.0 24.98 MB

Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"

Home Page: https://mmlab-ntu.com/project/edgesam/

License: Other

Jupyter Notebook 98.28% Python 1.72%

on-device-ai segment-anything coreml

edgesam's Introduction

EdgeSAM

Prompt-In-the-Loop Distillation for On-Device Deployment of SAM

Chong Zhou¹, Xiangtai Li¹, Chen Change Loy^1*, Bo Dai²

(*corresponding author)

¹S-Lab, Nanyang Technological University, ²Shanghai Artificial Intelligence Laboratory

[Paper] [Project Page] [Hugging Face Demo] [iOS App (TBA)]

teaser.mp4

Watch the full live demo video: [YouTube] [Bilibili]

Updates

2024/01/01: EdgeSAM is intergrated into X-AnyLabeling.
2023/12/19: EdgeSAM is now supported in ISAT, a segmentation labeling tool.
2023/12/16: EdgeSAM is now supported in Grounded-Segment-Anything. Check out the grounded-edge-sam demo. Thanks to the IDEA Research team!
2023/12/14: autodistill-grounded-edgesam combines Grounding DINO and EdgeSAM to create Grounded EdgeSAM [blog]. Thanks to the Roboflow team!
2023/12/13: Add ONNX export and speed up the web demo with ONNX as the backend.

Overview

EdgeSAM is an accelerated variant of the Segment Anything Model (SAM), optimized for efficient execution on edge devices with minimal compromise in performance. It achieves a 40-fold speed increase compared to the original SAM, and outperforms MobileSAM, being 14 times as fast when deployed on edge devices while enhancing the mIoUs on COCO and LVIS by 2.3 and 3.2 respectively. EdgeSAM is also the first SAM variant that can run at over 30 FPS on an iPhone 14.

In this figure, we show the encoder throughput of EdgeSAM compared with SAM and MobileSAM as well as the mIoU performance on the SA-1K dataset (sampled from SA-1B) with box and point prompts.

Approach

Our approach involves distilling the original ViT-based SAM image encoder into a purely CNN-based architecture, better suited for edge devices. We carefully benchmark various distillation strategies and demonstrate that task-agnostic encoder distillation fails to capture the full knowledge embodied in SAM. To overcome this bottleneck, we include both the prompt encoder and mask decoder in the distillation process, with box and point prompts in the loop, so that the distilled model can accurately capture the intricate dynamics between user input and mask generation.

Performance

Method	Train Set	COCO AP	COCO AP_s	COCO AP_m	COCO AP_l	GFLops	MParam.	FPS iPhone 14	FPS 2080 Ti	FPS 3090
SAM	SA-1B	46.1	33.6	51.9	57.7	2734.8	641.1	-	4.3	-
FastSAM	2% SA-1B	37.9	23.9	43.4	50.0	887.6	68.2	-	-	25.0*
MobileSAM	1% SA-1B	39.4	26.9	44.4	52.2	38.2	9.8	4.9	103.5	100.0*
EdgeSAM	1% SA-1B	42.2	29.6	47.6	53.9	22.1	9.6	38.7	164.3	-
EdgeSAM-3x	3% SA-1B	42.7	30.0	48.6	54.5	22.1	9.6	38.7	164.3	-
EdgeSAM-10x	10% SA-1B	43.0	30.3	48.9	55.1	22.1	9.6	38.7	164.3	-

In this table, we report the mask mAP on the COCO dataset. ViTDet-H is used as the detector, whose box mAP is 58.7, to provide box prompts. For speed benchmarking, we infer both the encoder and decoder (with a single prompt). FLOPs are calculated based on the 1024x1024 input resolution. Numbers denoted by * are copied from MobileSAM. 3x and 10x represent training with more data. Here, we do not apply an additional mask refinement iteration per the setting of the original SAM paper.

Installation
Usage
Web Demo
CoreML / ONNX Export
Checkpoints
iOS App
Acknowledgements
Citation
License

Installation

The code requires python>=3.8 and we use torch==2.0.0 and torchvision==0.15.1. Please refer to the official PyTorch installation instructions.

Clone the repository locally:

git clone https://github.com/chongzhou96/EdgeSAM.git && cd EdgeSAM

Install additional dependencies:

pip install -r requirements.txt

Install EdgeSAM:

pip install -e .

Usage

Download checkpoints (please refer to Checkpoints for more details about the PyTorch and CoreML checkpoints):

mkdir weights
wget -P weights/ https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam.pth
wget -P weights/ https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam_3x.pth

You can easily incorporate EdgeSAM into your Python code with following lines:

from edge_sam import SamPredictor, sam_model_registry
sam = sam_model_registry["edge_sam"](checkpoint="<path/to/checkpoint>")
predictor = SamPredictor(sam)
predictor.set_image(<your_image>)
masks, _, _ = predictor.predict(<input_prompts>)

Since EdgeSAM follows the same encoder-decoder architecture as SAM, their usages are very similar. One minor difference is that EdgeSAM allows outputting 1, 3, and 4 mask candidates for each prompt, while SAM yields either 1 or 3 masks. For more details, please refer to the example Jupyter Notebook.

Web Demo

After installing EdgeSAM and downloading the checkpoints. You can start an interactive web demo with the following command:

python web_demo/gradio_app.py

By default, the demo is hosted on http://0.0.0.0:8080/ and expects edge_sam_3x.pth to be stored in the weights/ folder. You can change the default behavior by:

python web_demo/gradio_app.py --checkpoint [CHECKPOINT] --server-name [SERVER_NAME] --port [PORT]

Since EdgeSAM can run smoothly on a mobile phone, it's fine if you don't have a GPU.

We've deployed the same web demo in the Hugging Face Space [link]. ~~However, since it uses the CPU as the backend and is shared by all users, the experience might not be as good as a local deployment.~~ Really appreciate the Hugging Face team for supporting us with the GPU!

Speed up the web demo with ONNX backend

Install the onnxruntime with pip install onnxruntime if your machine doesn't have a GPU or pip install onnxruntime-gpu if it does (but don't install both of them). Our implementation is tested under version 1.16.3.
Download the ONNX models to the weights/ folder:

wget -P weights/ https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam_3x_encoder.onnx
wget -P weights/ https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam_3x_decoder.onnx

Start the demo:

python web_demo/gradio_app.py --enable-onnx

Navigate to http://0.0.0.0:8080 in your browser.

CoreML / ONNX Export

CoreML

We provide a script that can export a trained EdgeSAM PyTorch model to two CoreML model packages, one for the encoder and another for the decoder. You can also download the exported CoreML models at Checkpoints.

For encoder:

python scripts/export_coreml_model.py [CHECKPOINT]

For decoder:

python scripts/export_coreml_model.py [CHECKPOINT] --decoder --use-stability-score

Since EdgeSAM doesn't perform knowledge distillation on the IoU token of the original SAM, its IoU predictions might not be reliable. Therefore, we use the stability score for mask selection instead. You can stick to the IoU predictions by removing --use-stability-score.

The following shows the performance reports of the EdgeSAM CoreML models measured by Xcode on an iPhone 14 (left: encoder, right: decoder):

Known issues and model descriptions

As of coremltools==7.1, you may encounter the assertion error during the export, e.g., assert len(inputs) <= 3 or inputs[3] is None. One workaround is to comment out this assertion following the traceback path, e.g., /opt/anaconda3/envs/EdgeSAM/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/ops.py line 1573.

Since CoreML doesn't support interpolation with dynamic target sizes, the converted CoreML models do not contain the pre-processing, i.e., resize-norm-pad, and the post-processing, i.e., resize back to the original size.

The encoder takes a 1x3x1024x1024 image as the input and outputs a 1x256x64x64 image embedding. The decoder then takes the image embedding together with point coordinates and point labels as the input. The point coordinates follow the (height, width) format with the top-left corner as the (0, 0). The choices of point labels are 0: negative point, 1: positive point, 2: top-left corner of box, and 3: bottom-right corner of box.

ONNX

Similar to the CoreML export, you can use the following commands to export the encoder and the decoder to ONNX models respectively:

For encoder:

python scripts/export_onnx_model.py [CHECKPOINT]

For decoder:

python scripts/export_onnx_model.py [CHECKPOINT] --decoder --use-stability-score

Checkpoints

Please download the checkpoints of EdgeSAM from its Hugging Face Space (all the EdgeSAM variants only differ in the number of training images):

Model	COCO mAP	PyTorch	CoreML	ONNX
SAM	46.1	-	-	-
EdgeSAM	42.1	Download	[Encoder] [Decoder]	[Encoder] [Decoder]
EdgeSAM-3x	42.7	Download	[Encoder] [Decoder]	[Encoder] [Decoder]
EdgeSAM-10x	43	TBA	TBA	TBA

Note: You need to unzip the CoreML model packages before usage.

iOS App

We are planning to release the iOS app that we used in the live demo to the App Store. Please stay tuned!

Acknowledgements

This study is supported under the RIE2020 Industry Alignment Fund Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). We are grateful to Han Soong Chong for his effort in the demonstration application.

We appreciate the following projects, which enable EdgeSAM: SAM, MobileSAM, FastSAM, TinyViT, and RepViT.

Citation

@article{zhou2023edgesam,
  title={EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM},
  author={Zhou, Chong and Li, Xiangtai and Loy, Chen Change and Dai, Bo},
  journal={arXiv preprint arXiv:2312.06660},
  year={2023}
}

License

This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.

edgesam's People

Contributors

Stargazers

Watchers

edgesam's Issues

tensorrt&rknn&altas inference

感谢edgeSam提供了边缘端及移动端的sam方案，目前想使用tensorrt&rknn&altas进行部署和推理，请问有相应的参考资料么？

Is it possible to export an onnx model with input size 1024*720?

As mentioned in the title and how?

One MNN deployment of EdgeSAM may helps

https://github.com/slz929/EdgeSAM-MNN

whole image

Hello, how can I predict the whole picture without using a mouse click to determine the segmentation coordinate point?

Request for NCNN Integration in EdgeSAM Project

Hello EdgeSAM Team,

I hope this message finds you well. I am reaching out to request the integration of NCNN into the EdgeSAM project. NCNN, known for its high performance in neural network inference on mobile devices, seems like a perfect fit for EdgeSAM's objectives in edge computing scenarios.

The inclusion of NCNN would provide several benefits, including:

Enhanced Performance: Leveraging NCNN's optimized routines could improve the efficiency and speed of EdgeSAM's computations, especially on devices with limited resources.
Broader Compatibility: With NCNN's support for multiple platforms, EdgeSAM could extend its usability across a wider range of mobile and embedded devices.
Community Engagement: Integrating popular and widely-used frameworks like NCNN can attract more contributors and users to the EdgeSAM project, fostering a larger community and encouraging collaborative development.

I believe that the synergy between EdgeSAM and NCNN could lead to significant advancements in edge computing applications. I am looking forward to your thoughts on this suggestion and hope to see a collaboration that benefits both projects and their respective communities.

Thank you for considering my request.

Best regards,
Umit

torch.jit support

Does your model not support torch.jit? I encounter an error when trying to convert the model with the following code. If there is a solution, please let me know.

import torch
from edge_sam import SamPredictor, sam_model_registry

sam_checkpoint = "./weights/edge_sam.pth"
model_type = "edge_sam"

# load model and run
sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam_jit = torch.jit.script(copy.deepcopy(sam))

How would you segment everything in an image and output masks

I don't want to manually set points or boxes, I want to directly segment all the objects in the graph

Segment Anything CPP Wrapper for macOS

Thanks for your great work.
Our C++ code corresponded to your Edge SAM models.

Here is the Segment Anything CPP Wrapper for macOS.

Please ask about your license.
We are going to implement the EdgeSAM models on RectLabel macOS apps.
Our total revenue is about $1K/month from RectLabel apps on Mac App Store.
Can we implement the EdgeSAM models on our apps?
Please let us know your feedback.

模型无法下载下呀？

wget -P weights/ https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam_3x_encoder.onnx
这个地址链接不上，请问还可以在哪下载模型文件呀？

Decoder CoreML running time problem

When running the decoder with the same number of points time after time, the running time may approach the number reported by the performance analysis of Xcode. However, this is not the usual use case of this model.
Usually, one would select points iteratively, such that the size of the model input constantly changes (first 256x64x64, 1x1x2, 1x1, than 256x64x64, 1x2x2, 1x2 and so on).
Every time the model is used with a different size, some internal CoreML state is discarded, and the running time is that of a first-run (which is ~10x slower!).

If the model can be designed such that it is constantly run with the same number of points (16), with some of the points being ignored, perhaps it could help resolve this issue (but I really have no idea if that's possible).

training code

Hello! I'm interested in following your work. May I kindly inquire whether the code you use for training will be made open source? Thank you.

could you provide the train code?

could you provide the train code? thank you very mush

License

Hi thanks for your work, I would like to ask if the license will also apply for the model itself intended as weights values or is only applicable to code, because for me it is not clear from your license file for commercial purpose?

Thanks

Training Code

Will you release it?

Grounded-Edge-SAM demo support

Hello! Thanks a lot for your great work! We've already supported grounded-edga-sam demo in Grounded-Segment-Anything !

Segmentation labeling tool ISAT has supported EdgeSAM.

关于RPN Module

你好，工作非常出色！！
但是关于论文中提及的RPN Module似乎代码没有实现，权重文件也未给出，是出于什么样的考虑呢～期待回复

Finetuning

I'd like to finetune EdgeSAM for a specific task.
Is there code already available?
Have you experience in finetuning EdgeSAM?

Query on ONNX Encoder Inference Time: Samsung A20 with Android 11

The performance demonstrated in the paper for the iPhone 14 is remarkable, and I'm currently attempting to evaluate the model on an entry-level Android phone.

In a basic implementation, the encoder_session.run() operation takes approximately 3,000-4,000 ms for 720x540 images on a Samsung A20 (using the CPU execution provider) as the official ONNX model files do not support NNAPI. NNAPI is the Neural Network API leveraging the GPU on Android.

I'm curious about the significant difference in encoder performance between the iPhone 14, achieving 70 FPS and 14 ms per image, and the Samsung A20, taking 3,000-4,000 ms per image.

Could you please provide some advice on how to address this performance gap?

[Environment]
My testing setup involves ONNX files from the official link: EdgeSAM Encoder and EdgeSAM Decoder.
The tests are conducted using ONNX Runtime version 1.16.3 on a Samsung A20 with Android 11 (CPU: Exynos 7884, GPU: Mali-G71 MP2).

I appreciate your assistance and thank you for your valuable contributions.

X-AnyLabeling-EdgeSAM demo support

Hi, @chongzhou96, thanks a lot for your outstanding work!

I've successfully integrated EdgeSAM into X-AnyLabeling, and your open-source contribution has been invaluable for achieving efficient execution on devices with limited computational resources.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

chongzhou96 / edgesam Goto Github PK

edgesam's Introduction

EdgeSAM

Updates

Overview

Table of Contents

Installation

Usage

Web Demo

CoreML / ONNX Export

Checkpoints

iOS App

Acknowledgements

Citation

License

edgesam's People

Contributors

Stargazers

Watchers

Forkers

edgesam's Issues

Recommend Projects

Recommend Topics

Recommend Org