Git Product home page Git Product logo

sam.cpp's Introduction

SAM.cpp

Inference of Meta's Segment Anything Model in pure C/C++

demo-0.mp4

Quick start

git clone --recursive https://github.com/YavorGIvanov/sam.cpp
cd sam.cpp

Note: you need to download the model checkpoint below (sam_vit_b_01ec64.pth) first from here and place it in the checkpoints folder

# Convert PTH model to ggml. Requires python3, torch and numpy
python convert-pth-to-ggml.py checkpoints/sam_vit_b_01ec64.pth . 1

# You need CMake and SDL2
SDL2 - Used for GUI windows & input [libsdl](https://www.libsdl.org)

[Ubuntu]
$ sudo apt install libsdl2-dev

[Mac OS with brew]
$ brew install sdl2

[MSYS2]
$ pacman -S git cmake make mingw-w64-x86_64-dlfcn mingw-w64-x86_64-gcc mingw-w64-x86_64-SDL2

# Build sam.cpp.
mkdir build && cd build
cmake .. && make -j4

# run inference
./bin/sam -t 16 -i ../img.jpg -m ../checkpoints/ggml-model-f16.bin

Note: The optimal threads parameter ("-t") value should be manually selected based on the specific machine running the inference.

Note: If you have problems with the Windows build, you can check this issue for more details

Downloading and converting the model checkpoints

You can download a model checkpoint and convert it to ggml format using the script convert-pth-to-ggml.py:

# Convert PTH model to ggml
python convert-pth-to-ggml.py sam_vit_b_01ec64.pth . 1

Example output on M2 Ultra

 $ ▶ make -j sam && time ./bin/sam -t 8 -i img.jpg
[ 28%] Built target common
[ 71%] Built target ggml
[100%] Built target sam
main: seed = 1693224265
main: loaded image 'img.jpg' (680 x 453)
sam_image_preprocess: scale = 0.664062
main: preprocessed image (1024 x 1024)
sam_model_load: loading model from 'models/sam-vit-b/ggml-model-f16.bin' - please wait ...
sam_model_load: n_enc_state      = 768
sam_model_load: n_enc_layer      = 12
sam_model_load: n_enc_head       = 12
sam_model_load: n_enc_out_chans  = 256
sam_model_load: n_pt_embd        = 4
sam_model_load: ftype            = 1
sam_model_load: qntvr            = 0
operator(): ggml ctx size = 202.32 MB
sam_model_load: ...................................... done
sam_model_load: model size =   185.05 MB / num tensors = 304
embd_img
dims: 64 64 256 1 f32
First & Last 10 elements:
-0.05117 -0.06408 -0.07154 -0.06991 -0.07212 -0.07690 -0.07508 -0.07281 -0.07383 -0.06779
0.01589 0.01775 0.02250 0.01675 0.01766 0.01661 0.01811 0.02051 0.02103 0.03382
sum:  12736.272313

Skipping mask 0 with iou 0.705935 below threshold 0.880000
Skipping mask 1 with iou 0.762136 below threshold 0.880000
Mask 2: iou = 0.947081, stability_score = 0.955437, bbox (371, 436), (144, 168)


main:     load time =    51.28 ms
main:    total time =  2047.49 ms

real	0m2.068s
user	0m16.343s
sys	0m0.214s

Input point is (414.375, 162.796875) (currently hardcoded)

Input image:

llamas

Output mask (mask_out_2.png in build folder):

mask_glasses

References

Next steps

  • Reduce memory usage by utilizing the new ggml-alloc
  • Remove redundant graph nodes
  • Fix the difference in output masks compared to the PyTorch implementation
  • Filter masks based on stability score
  • Add support for point user input
  • Support bigger model checkpoints
  • Make inference faster
  • Support F16 for heavy F32 ops
  • Test quantization
  • Add support for mask and box input + #14
  • GPU support

sam.cpp's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sam.cpp's Issues

windows version

Hi, thank you for providing your implementation. Really impressive work! I am wondering if we can deploy it on windows os machine using visual studio compiler. Thanks

About unit test inference

Thank you for the great work!

I would to ask if there are anything unit test that can make me inference the single image (just the function)?
if not , can i contribute a pr?

moreover, does sam.cpp support vit_h model? or it just support the vit_b model?
Thank you very much .

error for cmake

-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Linux detected
-- Configuring done
CMake Error at examples/third-party/imgui/CMakeLists.txt:51 (add_library):
  Cannot find source file:

    imgui/backends/imgui_impl_sdl2.cpp

  Tried extensions .c .C .c++ .cc .cpp .cxx .cu .m .M .mm .h .hh .h++ .hm
  .hpp .hxx .in .txx


CMake Error at examples/third-party/imgui/CMakeLists.txt:7 (add_library):
  Cannot find source file:

    imgui/imgui.cpp

  Tried extensions .c .C .c++ .cc .cpp .cxx .cu .m .M .mm .h .hh .h++ .hm
  .hpp .hxx .in .txx


CMake Error at examples/third-party/imgui/CMakeLists.txt:51 (add_library):
  No SOURCES given to target: imgui-sdl2


CMake Error at examples/third-party/imgui/CMakeLists.txt:7 (add_library):
  No SOURCES given to target: imgui


CMake Generate step failed.  Build files cannot be regenerated correctly.

sam_ggml_model_load: sam_ggml_model_load: unknown tensor

Hi,Dear author,i have a question when i used the ggml-model-f16.bin
i build sam.cpp with linked gglm
and use python convert sam_vit_b_01ec64.pth model to gglm bin file
but when i run the code ,this question happened
image

Build Env: VS2015

Do you have any solutions

thanks you for much

Add support for images with 4 channels

Currently the code in load_image_from_file(..) function is naively written to only support 3 channels and return error if this is not the case. This is very limiting as a lot of images have additional alpha channel in them and we are not able to test SAM with them due to that.

static bool load_image_from_file(const std::string & fname, sam_image_u8 & img) {

Does not start on Windows

Nothing happens except for the two prints. Also, what license does this project have? Cheers!

# python 3.11
python -m pip install torch numpy
# downloaded sam_vit_b_01ec64.pth  and put it in checkpoints
python convert-pth-to-ggml.py checkpoints/sam_vit_b_01ec64.pth . 1
mkdir build && cd build 
# PS C:\vcpkg> .\vcpkg.exe install sdl2:x64-windows
cmake .. -DCMAKE_TOOLCHAIN_FILE="C:\vcpkg\scripts\buildsystems\vcpkg.cmake"
cmake --build . --config Release
# contains sam.exe and SDL2.dll
cd .\bin\Release\
 C:\Code\cpp\sam.cpp\build\bin\Release> .\sam.exe -t 16 -i ../../../img.jpg -m ../../../checkpoints/ggml-model-f16.bin
 
SDL_main: seed = 1693989358
SDL_main: loaded image '../../../img.jpg' (680 x 453)
PS C:\Code\cpp\sam.cpp\build\bin\Release>

Built an x86_64 executable

I'm on an M1 Max and followed the README and it built the wrong architecture.

zsh: illegal hardware instruction  ./bin/sam -t 16 -i ../img.jpg -m ../ggml-model-f16.bin
(base) sam@m1macbookpro build % file bin/sam
bin/sam: Mach-O 64-bit executable x86_64
(base) sam@m1macbookpro build % 

There is no license for this project

Subject says it all.

I'm partial to MIT or Apache but perhaps you are partial to copyleft licenses—but not having any is sort of a deal braker for purposes other than running the demo.

Followed instructions, does not build

➜  sam.cpp git:(master) mkdir build && cd build
➜  build git:(master) cmake .. && make -j4

-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:21 (add_subdirectory):
  The source directory

    /media/user/home/Tools/06_MachineLearning/Segmentation/sam.cpp/ggml

  does not contain a CMakeLists.txt file.


-- Found OpenGL: /usr/lib/x86_64-linux-gnu/libOpenGL.so   
-- Configuring incomplete, errors occurred!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.