Git Product home page Git Product logo

segment_anything_tuning's Introduction

Finetune SAM

The main task for this finetuning is to finetune the SAM model to segment your custom data based on promt bounding boxes. Further, this tool can autogenerate segmentation masks on unlabeled data via the following pipeline.

pipeline

Installation

git clone https://github.com/everguard-inc/segment_anything_tuning
cd segment_anything_tuning
conda create -n "fineSAM" python=3.9 -y
caonda activate fineSAM
pip install -r requirements.txt

Quick Start

  1. Prepare your custom dataset. The recommended structure should be the following

    ├── train
    │   ├── img
    │   │   ├── train_img1.jpeg
    │   │   ...
    │   └── masks
    │   │   ├── train_mask1.png
    │   │   ...
    ├── val
    │   ├── img
    │   │   ├── val_img1.jpeg
    │   │   ...
    │   └── masks
    │   │   ├── val_mask1.png
    │   │   ...
    

    Or any other, you anyway need to set a path to every of four dirs in finetune_sam/config.py.

  2. Edit finetune_sam/config.py with your dataset paths. Also, check other params:cuda-devices-ID, out-path, etc.

  3. Run finetune_sam/train.py

  4. To inference eg_dataset (in folder segment_anything_tuning; fineSAM - activated):

pwd # returns ..../segment_anything_tuning
export PYTHONPATH="${PYTHONPATH}:$(pwd)/finetune_sam"
python inference.py

Features

  • Supports custom datasets
  • Performed caching of embeddings:
    • If embedding is not cached: 1-2 sec/img
    • If embedding cached: 0.2-0.4 sec/img
  • Image preprocess encapsulated into the model. As input into the model, you must pass raw RGB image and promt bounding boxes for each image, both in patched form.
  • Setuped Neptune for tracking training progress.

Train plots

  • train_dice_loss train_dice_loss
  • train_focal_loss train_focal_loss
  • train_iou_loss train_iou_loss
  • val_f1 val_f1
  • val_iou val_iou

Results

Trained on dataset of tracks (~3500 imgs - train set, ~700 imgs - test set). After 20-th epoch metrics are mainly stable. Intresting is tha it dows not overfit after huge amoun of pochs (100).

Class IoU dice Epoch
crane 0.3520 0.2905 0
tent 0.2966 0.2663 0
truck 0.6643 0.7808 0
net 0.4888 0.4572 0
total 0.6000 0.5659 0
------ ------- ------- -----
crane 0.4848 0.4724 10
tent 0.7539 0.7163 10
truck 0.9412 0.9166 10
net 0.2203 0.1581 10
total 0.6000 0.5659 10
------ ------- ------- -----
crane 0.4978 0.4873 20
tent 0.8469 0.8120 20
truck 0.9586 0.9428 20
net 0.1576 0.1181 20
total 0.6152 0.5900 20

TODO

  • For generating bounding boxes from text and prompt them to SAM, you may check: lang-segment-anything
  • Add noise to promt boxes for better adapting for low-quality promts.
  • Add training on different classes simultaneously.

Resources

License

This project is licensed the same as the SAM model.

Notes

  • Uses the original implementation of SAM.
  • Loss calculated as stated on the paper (20 * focal loss + dice loss + mse loss).
  • Only supports bounding box input prompts.

segment_anything_tuning's People

Contributors

luca-medeiros avatar alexuvarovskyi avatar fox-flex avatar wiekern avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.