nvidia-ai-iot / nanosam Goto Github PK

View Code? Open in Web Editor NEW

546.0 7.0 40.0 97.63 MB

A distilled Segment Anything (SAM) model capable of running real-time with NVIDIA TensorRT

License: Apache License 2.0

Dockerfile 0.07% Shell 0.21% Python 99.72%

jetson-orin jetson-orin-nano nvidia real-time segment-anything tensorrt

nanosam's Introduction

NanoSAM

👍 Usage - ⏱️ Performance - 🛠️ Setup - 🤸 Examples - 🏋️ Training
- 🧐 Evaluation - 👏 Acknowledgment - 🔗 See also

NanoSAM is a Segment Anything (SAM) model variant that is capable of running in 🔥 real-time 🔥 on NVIDIA Jetson Orin Platforms with NVIDIA TensorRT.

NanoSAM is trained by distilling the MobileSAM image encoder on unlabeled images. For an introduction to knowledge distillation, we recommend checking out this tutorial.

👍 Usage

Using NanoSAM from Python looks like this

from nanosam.utils.predictor import Predictor

predictor = Predictor(
    image_encoder="data/resnet18_image_encoder.engine",
    mask_decoder="data/mobile_sam_mask_decoder.engine"
)

image = PIL.Image.open("dog.jpg")

predictor.set_image(image)

mask, _, _ = predictor.predict(np.array([[x, y]]), np.array([1]))

Notes

The point labels may be

Point Label	Description
0	Background point
1	Foreground point
2	Bounding box top-left
3	Bounding box bottom-right

Follow the instructions below for how to build the engine files.

⏱️ Performance

NanoSAM runs real-time on Jetson Orin Nano.

Model †	⏱️ Jetson Orin Nano (ms)		⏱️ Jetson AGX Orin (ms)		🎯 Accuracy (mIoU) ‡
Model †	Image Encoder	Full Pipeline	Image Encoder	Full Pipeline	All	Small	Medium	Large
MobileSAM	TBD	146	35	39	0.728	0.658	0.759	0.804
NanoSAM (ResNet18)	TBD	27	4.2	8.1	0.706	0.624	0.738	0.796

Notes

† The MobileSAM image encoder is optimized with FP32 precision because it produced erroneous results when built for FP16 precision with TensorRT. The NanoSAM image encoder is built with FP16 precision as we did not notice a significant accuracy degredation. Both pipelines use the same mask decoder which is built with FP32 precision. For all models, the accuracy reported uses the same model configuration used to measure latency.

‡ Accuracy is computed by prompting SAM with ground-truth object bounding box annotations from the COCO 2017 validation dataset. The IoU is then computed between the mask output of the SAM model for the object and the ground-truth COCO segmentation mask for the object. The mIoU is the average IoU over all objects in the COCO 2017 validation set matching the target object size (small, medium, large).

🛠️ Setup

NanoSAM is fairly easy to get started with.

Install the dependencies
1. Install PyTorch
2. Install torch2trt
3. Install NVIDIA TensorRT
4. (optional) Install TRTPose - For the pose example.
```
git clone https://github.com/NVIDIA-AI-IOT/trt_pose
cd trt_pose
python3 setup.py develop --user
```
5. (optional) Install the Transformers library - For the OWL ViT example.
```
python3 -m pip install transformers
```

Install the NanoSAM Python package

git clone https://github.com/NVIDIA-AI-IOT/nanosam
cd nanosam
python3 setup.py develop --user

Build the TensorRT engine for the mask decoder

Export the MobileSAM mask decoder ONNX file (or download directly from here)

python3 -m nanosam.tools.export_sam_mask_decoder_onnx \
    --model-type=vit_t \
    --checkpoint=assets/mobile_sam.pt \
    --output=data/mobile_sam_mask_decoder.onnx

Build the TensorRT engine

trtexec \
    --onnx=data/mobile_sam_mask_decoder.onnx \
    --saveEngine=data/mobile_sam_mask_decoder.engine \
    --minShapes=point_coords:1x1x2,point_labels:1x1 \
    --optShapes=point_coords:1x1x2,point_labels:1x1 \
    --maxShapes=point_coords:1x10x2,point_labels:1x10

This assumes the mask decoder ONNX file is downloaded to data/mobile_sam_mask_decoder.onnx

Notes

This command builds the engine to support up to 10 keypoints. You can increase this limit as needed by specifying a different max shape.

Build the TensorRT engine for the NanoSAM image encoder
1. Download the image encoder: resnet18_image_encoder.onnx
2. Build the TensorRT engine
```
trtexec \
    --onnx=data/resnet18_image_encoder.onnx \
    --saveEngine=data/resnet18_image_encoder.engine \
    --fp16
```

Run the basic usage example

python3 examples/basic_usage.py \
    --image_encoder=data/resnet18_image_encoder.engine \
    --mask_decoder=data/mobile_sam_mask_decoder.engine

This outputs a result to data/basic_usage_out.jpg

That's it! From there, you can read the example code for examples on how to use NanoSAM with Python. Or try running the more advanced examples below.

🤸 Examples

NanoSAM can be applied in many creative ways.

Example 1 - Segment with bounding box

This example uses a known image with a fixed bounding box to control NanoSAM segmentation.

To run the example, call

python3 examples/basic_usage.py \
    --image_encoder="data/resnet18_image_encoder.engine" \
    --mask_decoder="data/mobile_sam_mask_decoder.engine"

Example 2 - Segment with bounding box (using OWL-ViT detections)

This example demonstrates using OWL-ViT to detect objects using a text prompt(s), and then segmenting these objects using NanoSAM.

To run the example, call

python3 examples/segment_from_owl.py \
    --prompt="A tree" \
    --image_encoder="data/resnet18_image_encoder.engine" \
    --mask_decoder="data/mobile_sam_mask_decoder.engine

Notes

- While OWL-ViT does not run real-time on Jetson Orin Nano (3sec/img), it is nice for experimentation as it allows you to detect a wide variety of objects. You could substitute any other real-time pre-trained object detector to take full advantage of NanoSAM's speed.

Example 3 - Segment with keypoints (offline using TRTPose detections)

This example demonstrates how to use human pose keypoints from TRTPose to control NanoSAM segmentation.

To run the example, call

python3 examples/segment_from_pose.py

This will save an output figure to data/segment_from_pose_out.png.

Example 4 - Segment with keypoints (online using TRTPose detections)

This example demonstrates how to use human pose to control segmentation on a live camera feed. This example requires an attached display and camera.

To run the example, call

python3 examples/demo_pose_tshirt.py

Example 5 - Segment and track (experimental)

This example demonstrates a rudimentary segmentation tracking with NanoSAM. This example requires an attached display and camera.

To run the example, call

python3 examples/demo_click_segment_track.py <image_encoder_engine> <mask_decoder_engine>

Once the example is running double click an object you want to track.

Notes

This tracking method is very simple and can get lost easily. It is intended to demonstrate creative ways you can use NanoSAM, but would likely be improved with more work.

🏋️ Training

You can train NanoSAM on a single GPU

Download and extract the COCO 2017 train images

# mkdir -p data/coco  # uncomment if it doesn't exist
mkdir -p data/coco
cd data/coco
wget http://images.cocodataset.org/zips/train2017.zip
unzip train2017.zip
cd ../..

Build the MobileSAM image encoder (used as teacher model)

Export to ONNX

python3 -m nanosam.tools.export_sam_image_encoder_onnx \
    --checkpoint="assets/mobile_sam.pt" \
    --output="data/mobile_sam_image_encoder_bs16.onnx" \
    --model_type=vit_t \
    --batch_size=16

Build the TensorRT engine with batch size 16

trtexec \
    --onnx=data/mobile_sam_image_encoder_bs16.onnx \
    --shapes=image:16x3x1024x1024 \
    --saveEngine=data/mobile_sam_image_encoder_bs16.engine

Train the NanoSAM image encoder by distilling MobileSAM
```
python3 -m nanosam.tools.train \
    --images=data/coco/train2017 \
    --output_dir=data/models/resnet18 \
    --model_name=resnet18 \
    --teacher_image_encoder_engine=data/mobile_sam_image_encoder_bs16.engine \
    --batch_size=16
```
Notes
Once training, visualizations of progress and checkpoints will be saved to the specified output directory. You can stop training and resume from the last saved checkpoint if needed.
For a list of arguments, you can type
```
python3 -m nanosam.tools.train --help
```

Export the trained NanoSAM image encoder to ONNX

python3 -m nanosam.tools.export_image_encoder_onnx \
    --model_name=resnet18 \
    --checkpoint="data/models/resnet18/checkpoint.pth" \
    --output="data/resnet18_image_encoder.onnx"

You can then build the TensorRT engine as detailed in the getting started section.

🧐 Evaluation

You can reproduce the accuracy results above by evaluating against COCO ground truth masks

Download and extract the COCO 2017 validation set.

# mkdir -p data/coco  # uncomment if it doesn't exist
cd data/coco
wget http://images.cocodataset.org/zips/val2017.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip val2017.zip
unzip annotations_trainval2017.zip
cd ../..

Compute the IoU of NanoSAM mask predictions against the ground truth COCO mask annotation.

python3 -m nanosam.tools.eval_coco \
    --coco_root=data/coco/val2017 \
    --coco_ann=data/coco/annotations/instances_val2017.json \
    --image_encoder=data/resnet18_image_encoder.engine \
    --mask_decoder=data/mobile_sam_mask_decoder.engine \
    --output=data/resnet18_coco_results.json

This uses the COCO ground-truth bounding boxes as inputs to NanoSAM

Compute the average IoU over a selected category or size

python3 -m nanosam.tools.compute_eval_coco_metrics \
    data/efficientvit_b0_coco_results.json \
    --size="all"

Notes

For all options type ``python3 -m nanosam.tools.compute_eval_coco_metrics --help``.

To compute the mIoU for a specific category id.

python3 -m nanosam.tools.compute_eval_coco_metrics \
    data/resnet18_coco_results.json \
    --category_id=1

👏 Acknowledgement

This project is enabled by the great projects below.

SAM - The original Segment Anything model.
MobileSAM - The distilled Tiny ViT Segment Anything model.

🔗 See also

Jetson Introduction to Knowledge Distillation Tutorial - For an introduction to knowledge distillation as a model optimization technique.
Jetson Generative AI Playground - For instructions and tips for using a variety of LLMs and transformers on Jetson.
Jetson Containers - For a variety of easily deployable and modular Jetson Containers

nanosam's People

Contributors

Stargazers

Watchers

nanosam's Issues

trtexec Failed for mobile_sam_mask_decoder.onnx

Thanks for this fantastic work!
when i follow the instruction below errors occurred

Warning: Slice op /Slice_slice cannot slice along a uniform dimension.
Warning: Slice op /Slice_slice cannot slice along a uniform dimension.
[09/20/2023-07:55:09] [E] Error[10]: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::MatMul_1729 + (Unnamed Layer* 316) [Shuffle]...(Unnamed Layer* 1451) [Shuffle]]}.)
[09/20/2023-07:55:09] [E] Error[2]: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
[09/20/2023-07:55:09] [E] Engine could not be created from network
[09/20/2023-07:55:09] [E] Building engine failed
[09/20/2023-07:55:09] [E] Failed to create engine from model.
[09/20/2023-07:55:09] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8201] # /usr/src/tensorrt/bin/trtexec --onnx=data/mobile_sam_mask_decoder.onnx --saveEngine=data/mobile_sam_mask_decoder.engine --minShapes=point_coords:1x1x2,point_labels:1x1 --optShapes=point_coords:1x1x2,point_labels:1x1 --maxShapes=point_coords:1x10x2,point_labels:1x10

the tensorRT version is 8.2.1.8
jetpack ：4.6.1

Not an issue!

Just wanted to let you know that I'm excited to see the project, and try it out 😄

Device memory is insufficient to use tactic for jetson NANO

Hey,
I was trying to execute the encoder engine using the trtrexec and got this error:
Tactic Device request: 4248MB Available: 2507MB. Device memory is insufficient to use tactic.

I guess the model is too big for the jetson Nano GPU.
anyone else had this error? is there any solution?

thank you

Issue with building TensorRT engine for the mask decoder

I am trying to setup nanosam using the NGC docker nvcr.io/nvidia/pytorch:23.10-py3

At the part of building the tensorrt engine for the mask decoder, I encounter this error message

&&&& RUNNING TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=data/mobile_sam_mask_decoder.onnx --saveEngine=data/mobile_sam_mask_decoder.engine --minShapes=point_coords:1x1x2,point_labels:1x1 --optShapes=point_coords:1x1x2,point_labels:1x1 --maxShapes=point_coords:1x10x2,point_labels:1x10
[11/21/2023-02:04:10] [I] === Model Options ===
[11/21/2023-02:04:10] [I] Format: ONNX
[11/21/2023-02:04:10] [I] Model: data/mobile_sam_mask_decoder.onnx
[11/21/2023-02:04:10] [I] Output:
[11/21/2023-02:04:10] [I] === Build Options ===
[11/21/2023-02:04:10] [I] Max batch: explicit batch
[11/21/2023-02:04:10] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[11/21/2023-02:04:10] [I] minTiming: 1
[11/21/2023-02:04:10] [I] avgTiming: 8
[11/21/2023-02:04:10] [I] Precision: FP32
[11/21/2023-02:04:10] [I] LayerPrecisions: 
[11/21/2023-02:04:10] [I] Layer Device Types: 
[11/21/2023-02:04:10] [I] Calibration: 
[11/21/2023-02:04:10] [I] Refit: Disabled
[11/21/2023-02:04:10] [I] Version Compatible: Disabled
[11/21/2023-02:04:10] [I] TensorRT runtime: full
[11/21/2023-02:04:10] [I] Lean DLL Path: 
[11/21/2023-02:04:10] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[11/21/2023-02:04:10] [I] Exclude Lean Runtime: Disabled
[11/21/2023-02:04:10] [I] Sparsity: Disabled
[11/21/2023-02:04:10] [I] Safe mode: Disabled
[11/21/2023-02:04:10] [I] Build DLA standalone loadable: Disabled
[11/21/2023-02:04:10] [I] Allow GPU fallback for DLA: Disabled
[11/21/2023-02:04:10] [I] DirectIO mode: Disabled
[11/21/2023-02:04:10] [I] Restricted mode: Disabled
[11/21/2023-02:04:10] [I] Skip inference: Disabled
[11/21/2023-02:04:10] [I] Save engine: data/mobile_sam_mask_decoder.engine
[11/21/2023-02:04:10] [I] Load engine: 
[11/21/2023-02:04:10] [I] Profiling verbosity: 0
[11/21/2023-02:04:10] [I] Tactic sources: Using default tactic sources
[11/21/2023-02:04:10] [I] timingCacheMode: local
[11/21/2023-02:04:10] [I] timingCacheFile: 
[11/21/2023-02:04:10] [I] Heuristic: Disabled
[11/21/2023-02:04:10] [I] Preview Features: Use default preview flags.
[11/21/2023-02:04:10] [I] MaxAuxStreams: -1
[11/21/2023-02:04:10] [I] BuilderOptimizationLevel: -1
[11/21/2023-02:04:10] [I] Input(s)s format: fp32:CHW
[11/21/2023-02:04:10] [I] Output(s)s format: fp32:CHW
[11/21/2023-02:04:10] [I] Input build shape: point_coords=1x1x2+1x1x2+1x10x2
[11/21/2023-02:04:10] [I] Input build shape: point_labels=1x1+1x1+1x10
[11/21/2023-02:04:10] [I] Input calibration shapes: model
[11/21/2023-02:04:10] [I] === System Options ===
[11/21/2023-02:04:10] [I] Device: 0
[11/21/2023-02:04:10] [I] DLACore: 
[11/21/2023-02:04:10] [I] Plugins:
[11/21/2023-02:04:10] [I] setPluginsToSerialize:
[11/21/2023-02:04:10] [I] dynamicPlugins:
[11/21/2023-02:04:10] [I] ignoreParsedPluginLibs: 0
[11/21/2023-02:04:10] [I] 
[11/21/2023-02:04:10] [I] === Inference Options ===
[11/21/2023-02:04:10] [I] Batch: Explicit
[11/21/2023-02:04:10] [I] Input inference shape: point_labels=1x1
[11/21/2023-02:04:10] [I] Input inference shape: point_coords=1x1x2
[11/21/2023-02:04:10] [I] Iterations: 10
[11/21/2023-02:04:10] [I] Duration: 3s (+ 200ms warm up)
[11/21/2023-02:04:10] [I] Sleep time: 0ms
[11/21/2023-02:04:10] [I] Idle time: 0ms
[11/21/2023-02:04:10] [I] Inference Streams: 1
[11/21/2023-02:04:10] [I] ExposeDMA: Disabled
[11/21/2023-02:04:10] [I] Data transfers: Enabled
[11/21/2023-02:04:10] [I] Spin-wait: Disabled
[11/21/2023-02:04:10] [I] Multithreading: Disabled
[11/21/2023-02:04:10] [I] CUDA Graph: Disabled
[11/21/2023-02:04:10] [I] Separate profiling: Disabled
[11/21/2023-02:04:10] [I] Time Deserialize: Disabled
[11/21/2023-02:04:10] [I] Time Refit: Disabled
[11/21/2023-02:04:10] [I] NVTX verbosity: 0
[11/21/2023-02:04:10] [I] Persistent Cache Ratio: 0
[11/21/2023-02:04:10] [I] Inputs:
[11/21/2023-02:04:10] [I] === Reporting Options ===
[11/21/2023-02:04:10] [I] Verbose: Disabled
[11/21/2023-02:04:10] [I] Averages: 10 inferences
[11/21/2023-02:04:10] [I] Percentiles: 90,95,99
[11/21/2023-02:04:10] [I] Dump refittable layers:Disabled
[11/21/2023-02:04:10] [I] Dump output: Disabled
[11/21/2023-02:04:10] [I] Profile: Disabled
[11/21/2023-02:04:10] [I] Export timing to JSON file: 
[11/21/2023-02:04:10] [I] Export output to JSON file: 
[11/21/2023-02:04:10] [I] Export profile to JSON file: 
[11/21/2023-02:04:10] [I] 
[11/21/2023-02:04:10] [I] === Device Information ===
[11/21/2023-02:04:10] [I] Selected Device: Quadro RTX 4000 with Max-Q Design
[11/21/2023-02:04:10] [I] Compute Capability: 7.5
[11/21/2023-02:04:10] [I] SMs: 40
[11/21/2023-02:04:10] [I] Device Global Memory: 7802 MiB
[11/21/2023-02:04:10] [I] Shared Memory per SM: 64 KiB
[11/21/2023-02:04:10] [I] Memory Bus Width: 256 bits (ECC disabled)
[11/21/2023-02:04:10] [I] Application Compute Clock Rate: 1.38 GHz
[11/21/2023-02:04:10] [I] Application Memory Clock Rate: 6.001 GHz
[11/21/2023-02:04:10] [I] 
[11/21/2023-02:04:10] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[11/21/2023-02:04:10] [I] 
[11/21/2023-02:04:10] [I] TensorRT version: 8.6.1
[11/21/2023-02:04:10] [I] Loading standard plugins
[11/21/2023-02:04:10] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 22, GPU 114 (MiB)
[11/21/2023-02:04:15] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +889, GPU +174, now: CPU 987, GPU 288 (MiB)
[11/21/2023-02:04:15] [I] Start parsing network model.
[11/21/2023-02:04:15] [I] [TRT] ----------------------------------------------------------------
[11/21/2023-02:04:15] [I] [TRT] Input filename:   data/mobile_sam_mask_decoder.onnx
[11/21/2023-02:04:15] [I] [TRT] ONNX IR version:  0.0.8
[11/21/2023-02:04:15] [I] [TRT] Opset version:    16
[11/21/2023-02:04:15] [I] [TRT] Producer name:    pytorch
[11/21/2023-02:04:15] [I] [TRT] Producer version: 2.1.0
[11/21/2023-02:04:15] [I] [TRT] Domain:           
[11/21/2023-02:04:15] [I] [TRT] Model version:    0
[11/21/2023-02:04:15] [I] [TRT] Doc string:       
[11/21/2023-02:04:15] [I] [TRT] ----------------------------------------------------------------
[11/21/2023-02:04:15] [I] Finished parsing network model. Parse time: 0.0541873
&&&& FAILED TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=data/mobile_sam_mask_decoder.onnx --saveEngine=data/mobile_sam_mask_decoder.engine --minShapes=point_coords:1x1x2,point_labels:1x1 --optShapes=point_coords:1x1x2,point_labels:1x1 --maxShapes=point_coords:1x10x2,point_labels:1x10

I am not sure how to proceed, and in fact trying to setup (1) with installation instructions in the README (2) using the jetson containers from 'jetson-containers` also did not work out well for me. I am suspecting the jetson containers only work for jetson as I got a hardware error message; my GPU is a quadro RTX 4000.

Would appreciate some help in my attempt to setup a running nanosam, thanks!

How can I write a generator based on this foundation

Hi,
Thanks for this good job!!
I have completed the use of Predictor, now I want to implement a Generator
just like the code in the mobileSam
But i dont know how to do it , it's so hard for me , i need some help
Thanks a lot
''' like this , i want to get whole masks by Generator'''
def get_mask(srcImg, model):
mask_generator = SamAutomaticMaskGenerator(model)
masks = mask_generator.generate(srcImg)
return masks

Usage problem in PC : )

hi
I'm more than happy to try this excellent work on my nano machine, and I also wanna try this work on my PC device.
But the problem came up to me when try to follow the guide to run the ''basic_usage.py'', the detail informathion of the problem was:

**File "C:\Users\Administrator\Desktop\nanosam-main\examples\basic_usage.py", line 50, in
mask, _, _ = predictor.predict(points, point_labels)
File "c:\users\administrator\desktop\nanosam-main\nanosam\utils\predictor.py", line 164, in predict
mask_iou, low_res_mask = run_mask_decoder(
File "c:\users\administrator\desktop\nanosam-main\nanosam\utils\predictor.py", line 114, in run_mask_decoder
iou_predictions, low_res_masks = mask_decoder_engine(
File "C:\Program Files\anaconda\envs\nanosam\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, kwargs)
File "C:\Program Files\anaconda\envs\nanosam\lib\site-packages\torch2trt-0.4.0-py3.8.egg\torch2trt\torch2trt.py", line 616, in forward
self.context.set_binding_shape(idx, shape)
AttributeError: 'NoneType' object has no attribute 'set_binding_shape'

and I have try my best to solve this problem, but I couldn't find a clue, can you help me ?

THANKS!!

Reimplemented nanosam in cpp

Hello thank you for providing the source. It is not an issue. I reimplemented it in cpp on visual studio and would appreciate it if you could provide a feedback on my implementation. thank you: nanosam-cpp

Can this model run on RTX GPUs?

Hello guys,

Congratulations for this new repo.

I understand this model is able to run on Jetson devices, but I would like to know if it can run on RTX GPU for testing purposes.

Thanks again for this fantastic job.

how can I train a new student model rather than only resnet?

so excellent work!!!
but I need to train a student model like myself Lightweight vit model using the mobilesam teather model.
how can I train a new student model rather than only resnet?
can you give me some help?
thank you!

question about mask dimension

Thank you for your work! The output mask shape is 4x256x256 where 4 is (I guess) the number of labels and 256x256 is mask's height x width dimensions. I wonder how to get 1x256x256 mask from the output?

Can you provide the torch model of ResNet18?

Thanks for your great work!
I cannot convert the ONNX model provided by you into a Torch model.
Can you provide the torch model of ResNet18?
Thanks again for this fantastic job.

Compiled against cuBLASLt 11.10.3.0 but running against cuBLASLt 11.5.1.0. help me please..

Namespace(image_encoder='/data/wl/code/nanosam/data/resnet18_image_encoder.engine', mask_decoder='/data/wl/code/nanosam/data/mobile_sam_mask_decoder.engine')
[01/02/2024-19:26:41] [TRT] [E] 1: [raiiMyelinGraph.h::RAIIMyelinGraph::24] Error Code 1: Myelin (Compiled against cuBLASLt 11.10.3.0 but running against cuBLASLt 11.5.1.0.)
/data/wl/code/nanosam/nanosam/utils/predictor.py:84: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352660876/work/torch/csrc/utils/tensor_numpy.cpp:172.)
image_torch_resized = torch.from_numpy(image_np_resized).permute(2, 0, 1)
<tensorrt.tensorrt.IExecutionContext object at 0x7f64a2078ab0>
/data/wl/code/nanosam/nanosam/utils/predictor.py:103: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352660876/work/torch/csrc/utils/tensor_new.cpp:201.)
image_point_coords = torch.tensor([points]).float().cuda()
None
Traceback (most recent call last):
File "/data/wl/code/nanosam/examples/basic_usage.py", line 51, in
mask, _, _ = predictor.predict(points, point_labels)
File "/data/wl/code/nanosam/nanosam/utils/predictor.py", line 164, in predict
mask_iou, low_res_mask = run_mask_decoder(
File "/data/wl/code/nanosam/nanosam/utils/predictor.py", line 113, in run_mask_decoder
iou_predictions, low_res_masks = mask_decoder_engine(
File "/home/lab-10/miniconda3/envs/nanosam/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lab-10/miniconda3/envs/nanosam/lib/python3.9/site-packages/torch2trt-0.4.0-py3.9.egg/torch2trt/torch2trt.py", line 618, in forward
self.context.set_binding_shape(idx, shape)
AttributeError: 'NoneType' object has no attribute 'set_binding_shape'

when I run the demo :python3 examples/basic_usage.py
--image_encoder="data/resnet18_image_encoder.engine"
--mask_decoder="data/mobile_sam_mask_decoder.engine"
I got this error on the above.
could you please help me? thanks!!!!!!

Does it support multi-target real-time detection and segmentation?

Hi,
Thanks for your great work!
I have successfully run your example 4. The camera can collect and segment a single person in real time, but I do not know whether it supports real-time detection and segmentation of multiple people.

good work！

what is the difference between nanoSAM and edgeSAM？
where can we get the c/c++ version？

tensorrt c++ And mask decoder can be fp16 without too big error?

Hello, Thank you for sharing the great work. Do you have a plan for tensorrt c++ version? That will be awsome too,Thank you very much.
I see mask decoder used with fp32, does fp16 tensorrt will produce big accuracy loss or not? Best regards.

Parameters tuning

Hi,

Like Segment-Anything, I’d like to tune parameters like number of points per side or predicted iou threshold. In the examples that are provided, a mask generator is not configured. Can anyone please guide how should one go about it? I see some relevant code in the SamAutomaticMaskGenerator under MobileSam in NanoSam, but if my understanding is correct, there’s no example code that utilizes these, and I am a bit confused how to best go about configuring. Thank you in advance.

trtexec fails to build /mobile_sam_mask_decoder.onnx

Trying to run:

trtexec --onnx=data/mobile_sam_mask_decoder.onnx --saveEngine=data/mobile_sam_mask_decoder.engine --minShapes=point_coords:1x1x2,point_labels:1x1 --optShapes=point_coords:1x1x2,point_labels:1x1 --maxShapes=point_coords:1x10x2,point_labels:1x10

after successfully exporting mobile_sam_mask_decoder.onnx with:
python3 -m nanosam.tools.export_sam_mask_decoder_onnx --model-type=vit_t --checkpoint=assets/mobile_sam.pt --output=/mnt/e/data/mobile_sam_mask_decoder.onnx

resulting in this error:

onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[12/18/2023-11:39:43] [E] Error[4]: [graph.cpp::symbolicExecute::539] Error Code 4: Internal Error (/OneHot: an IIOneHotLayer cannot be used to compute a shape tensor)
[12/18/2023-11:39:43] [E] [TRT] ModelImporter.cpp:771: While parsing node number 146 [Tile -> "/Tile_output_0"]:
[12/18/2023-11:39:43] [E] [TRT] ModelImporter.cpp:772: --- Begin node ---
[12/18/2023-11:39:43] [E] [TRT] ModelImporter.cpp:773: input: "/Unsqueeze_3_output_0"
input: "/Reshape_2_output_0"
output: "/Tile_output_0"
name: "/Tile"
op_type: "Tile"

[12/18/2023-11:39:43] [E] [TRT] ModelImporter.cpp:774: --- End node ---
[12/18/2023-11:39:43] [E] [TRT] ModelImporter.cpp:777: ERROR: ModelImporter.cpp:195 In function parseGraph:
[6] Invalid Node - /Tile
[graph.cpp::symbolicExecute::539] Error Code 4: Internal Error (/OneHot: an IIOneHotLayer cannot be used to compute a shape tensor)
[12/18/2023-11:39:43] [E] Failed to parse onnx file
[12/18/2023-11:39:43] [I] Finished parsing network model. Parse time: 0.32614
[12/18/2023-11:39:43] [E] Parsing model failed
[12/18/2023-11:39:43] [E] Failed to create engine from model or file.
[12/18/2023-11:39:43] [E] Engine set up failed

RuntimeError: Numpy is not available

Hi, Thanks for the work. looks very promising

I am having some problem with basic example and actually with others as well. I get runtime error on Numpy as below

python3 examples/basic_usage.py \

--image_encoder="data/resnet18_image_encoder.engine" \
--mask_decoder="data/mobile_sam_mask_decoder.engine"

/home/dbox1028/.local/lib/python3.8/site-packages/torchvision/models/detection/anchor_utils.py:63: UserWarning: Failed to initialize NumPy: module compiled against API version 0x10 but this version of numpy is 0xd . Check the section C-API incompatibility at the Troubleshooting ImportError section at https://numpy.org/devdocs/user/troubleshooting-importerror.html#c-api-incompatibility for indications on how to solve this problem . (Triggered internally at /root/pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
device: torch.device = torch.device("cpu"),
Traceback (most recent call last):
File "examples/basic_usage.py", line 38, in
predictor.set_image(image)
File "/home/dbox1028/nanosam/nanosam/utils/predictor.py", line 154, in set_image
self.image_tensor = preprocess_image(image, self.image_encoder_size)
File "/home/dbox1028/nanosam/nanosam/utils/predictor.py", line 84, in preprocess_image
image_torch_resized = torch.from_numpy(image_np_resized).permute(2, 0, 1)
RuntimeError: Numpy is not available

image encoder image size is too big ,can reduce 1024 to 640/480 for acceleration?

how can i get a segment mask

Awesome work!
I want to segment anything ,get a full segment mask
From an image using Nano Sam like deeplabv3
What should I do?thank you very much

_{Sent from PPHub}

Container build doesn't seem to work

Hey there,

Thanks for the work on nanosam! I'm working on a project, and trying to run nanosam from the included Docker image.

I've added the line:

    --runtime nvidia \

To the docker/23-01/run.sh so it'll run on Jetpack 6 Developer Preview.

When I add the resnet18_image_encoder.onnx and the mobile_sam_mask_decoder.onnx for training, I get the same error:

$ trtexec --onnx=data/mobile_sam_mask_decoder.onnx     --saveEngine=data/mobile_sam_mask_decoder.engine     --minShapes=point_coords:1x1x2,point_labels:1x1     --optShapes=point_coords:1x1x2,point_labels:1x1     --maxShapes=point_coords:1x10x2,point_labels:1x10

RUNNING TensorRT.trtexec [TensorRT v8502] # trtexec --onnx=data/mobile_sam_mask_decoder.onnx --saveEngine=data/mobile_sam_mask_decoder.engine --minShapes=point_coords:1x1x2,point_labels:1x1 --optShapes=point_coords:1x1x2,point_labels:1x1 --maxShapes=point_coords:1x10x2,point_labels:1x10
[03/29/2024-17:22:18] [I] === Model Options ===
[03/29/2024-17:22:18] [I] Format: ONNX
[03/29/2024-17:22:18] [I] Model: data/mobile_sam_mask_decoder.onnx
[03/29/2024-17:22:18] [I] Output:
[03/29/2024-17:22:18] [I] === Build Options ===
[03/29/2024-17:22:18] [I] Max batch: explicit batch
[03/29/2024-17:22:18] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[03/29/2024-17:22:18] [I] minTiming: 1
[03/29/2024-17:22:18] [I] avgTiming: 8
[03/29/2024-17:22:18] [I] Precision: FP32
[03/29/2024-17:22:18] [I] LayerPrecisions:
[03/29/2024-17:22:18] [I] Calibration:
[03/29/2024-17:22:18] [I] Refit: Disabled
[03/29/2024-17:22:18] [I] Sparsity: Disabled
[03/29/2024-17:22:18] [I] Safe mode: Disabled
[03/29/2024-17:22:18] [I] DirectIO mode: Disabled
[03/29/2024-17:22:18] [I] Restricted mode: Disabled
[03/29/2024-17:22:18] [I] Build only: Disabled
[03/29/2024-17:22:18] [I] Save engine: data/mobile_sam_mask_decoder.engine
[03/29/2024-17:22:18] [I] Load engine:
[03/29/2024-17:22:18] [I] Profiling verbosity: 0
[03/29/2024-17:22:18] [I] Tactic sources: Using default tactic sources
[03/29/2024-17:22:18] [I] timingCacheMode: local
[03/29/2024-17:22:18] [I] timingCacheFile:
[03/29/2024-17:22:18] [I] Heuristic: Disabled
[03/29/2024-17:22:18] [I] Preview Features: Use default preview flags.
[03/29/2024-17:22:18] [I] Input(s)s format: fp32:CHW
[03/29/2024-17:22:18] [I] Output(s)s format: fp32:CHW
[03/29/2024-17:22:18] [I] Input build shape: point_coords=1x1x2+1x1x2+1x10x2
[03/29/2024-17:22:18] [I] Input build shape: point_labels=1x1+1x1+1x10
[03/29/2024-17:22:18] [I] Input calibration shapes: model
[03/29/2024-17:22:18] [I] === System Options ===
[03/29/2024-17:22:18] [I] Device: 0
[03/29/2024-17:22:18] [I] DLACore:
[03/29/2024-17:22:18] [I] Plugins:
[03/29/2024-17:22:18] [I] === Inference Options ===
[03/29/2024-17:22:18] [I] Batch: Explicit
[03/29/2024-17:22:18] [I] Input inference shape: point_labels=1x1
[03/29/2024-17:22:18] [I] Input inference shape: point_coords=1x1x2
[03/29/2024-17:22:18] [I] Iterations: 10
[03/29/2024-17:22:18] [I] Duration: 3s (+ 200ms warm up)
[03/29/2024-17:22:18] [I] Sleep time: 0ms
[03/29/2024-17:22:18] [I] Idle time: 0ms
[03/29/2024-17:22:18] [I] Streams: 1
[03/29/2024-17:22:18] [I] ExposeDMA: Disabled
[03/29/2024-17:22:18] [I] Data transfers: Enabled
[03/29/2024-17:22:18] [I] Spin-wait: Disabled
[03/29/2024-17:22:18] [I] Multithreading: Disabled
[03/29/2024-17:22:18] [I] CUDA Graph: Disabled
[03/29/2024-17:22:18] [I] Separate profiling: Disabled
[03/29/2024-17:22:18] [I] Time Deserialize: Disabled
[03/29/2024-17:22:18] [I] Time Refit: Disabled
[03/29/2024-17:22:18] [I] NVTX verbosity: 0
[03/29/2024-17:22:18] [I] Persistent Cache Ratio: 0
[03/29/2024-17:22:18] [I] Inputs:
[03/29/2024-17:22:18] [I] === Reporting Options ===
[03/29/2024-17:22:18] [I] Verbose: Disabled
[03/29/2024-17:22:18] [I] Averages: 10 inferences
[03/29/2024-17:22:18] [I] Percentiles: 90,95,99
[03/29/2024-17:22:18] [I] Dump refittable layers:Disabled
[03/29/2024-17:22:18] [I] Dump output: Disabled
[03/29/2024-17:22:18] [I] Profile: Disabled
[03/29/2024-17:22:18] [I] Export timing to JSON file:
[03/29/2024-17:22:18] [I] Export output to JSON file:
[03/29/2024-17:22:18] [I] Export profile to JSON file:
[03/29/2024-17:22:18] [I]
Cuda failure: CUDA driver version is insufficient for CUDA runtime version
Aborted (core dumped)

It appears both of these have the same CUDA driver version is insufficient.

When I try to do a:

$ pip install --ugprade tensorrt

In the container, it just fails:

$  pip install --upgrade tensorrt
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: tensorrt in /usr/local/lib/python3.8/dist-packages (8.5.2.2)
Collecting tensorrt
  Downloading tensorrt-8.6.1.post1.tar.gz (18 kB)
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: tensorrt
  Building wheel for tensorrt (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [64 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib
      creating build/lib/tensorrt
      copying tensorrt/__init__.py -> build/lib/tensorrt
      running egg_info
      writing tensorrt.egg-info/PKG-INFO
      writing dependency_links to tensorrt.egg-info/dependency_links.txt
      writing requirements to tensorrt.egg-info/requires.txt
      writing top-level names to tensorrt.egg-info/top_level.txt
      reading manifest file 'tensorrt.egg-info/SOURCES.txt'
      adding license file 'LICENSE.txt'
      writing manifest file 'tensorrt.egg-info/SOURCES.txt'
      /usr/local/lib/python3.8/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        warnings.warn(
      installing to build/bdist.linux-aarch64/wheel
      running install
      Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com, https://pypi.nvidia.com
      ERROR: Could not find a version that satisfies the requirement tensorrt_libs==8.6.1 (from versions: 9.0.0.post11.dev1, 9.0.0.post12.dev1, 9.0.1.post11.dev4, 9.0.1.post12.dev4, 9.1.0.post11.dev4, 9.1.0.post12.dev4, 9.2.0.post11.dev5, 9.2.0.post12.dev5, 9.3.0.post11.dev1, 9.3.0.post12.dev1)
      ERROR: No matching distribution found for tensorrt_libs==8.6.1
      Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com, https://pypi.nvidia.com
      ERROR: Could not find a version that satisfies the requirement tensorrt_libs==8.6.1 (from versions: 9.0.0.post11.dev1, 9.0.0.post12.dev1, 9.0.1.post11.dev4, 9.0.1.post12.dev4, 9.1.0.post11.dev4, 9.1.0.post12.dev4, 9.2.0.post11.dev5, 9.2.0.post12.dev5, 9.3.0.post11.dev1, 9.3.0.post12.dev1)
      ERROR: No matching distribution found for tensorrt_libs==8.6.1
      Traceback (most recent call last):
        File "/tmp/pip-install-yb_w1k5i/tensorrt_7e4f3d0260464b37877fc72585ffd270/setup.py", line 40, in run_pip_command
          return call_func([sys.executable, "-m", "pip"] + args, env=env)
        File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['/usr/bin/python', '-m', 'pip', 'install', '--extra-index-url', 'https://pypi.nvidia.com', 'tensorrt_libs==8.6.1', 'tensorrt_bindings==8.6.1']' returned non-zero exit status 1.

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-yb_w1k5i/tensorrt_7e4f3d0260464b37877fc72585ffd270/setup.py", line 110, in <module>
          setup(
        File "/usr/local/lib/python3.8/dist-packages/setuptools/__init__.py", line 87, in setup
          return distutils.core.setup(**attrs)
        File "/usr/lib/python3.8/distutils/core.py", line 148, in setup
          dist.run_commands()
        File "/usr/lib/python3.8/distutils/dist.py", line 966, in run_commands
          self.run_command(cmd)
        File "/usr/local/lib/python3.8/dist-packages/setuptools/dist.py", line 1217, in run_command
          super().run_command(command)
        File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/local/lib/python3.8/dist-packages/wheel/bdist_wheel.py", line 360, in run
          self.run_command("install")
        File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/local/lib/python3.8/dist-packages/setuptools/dist.py", line 1217, in run_command
          super().run_command(command)
        File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/tmp/pip-install-yb_w1k5i/tensorrt_7e4f3d0260464b37877fc72585ffd270/setup.py", line 62, in run
          run_pip_command(
        File "/tmp/pip-install-yb_w1k5i/tensorrt_7e4f3d0260464b37877fc72585ffd270/setup.py", line 56, in run_pip_command
          return call_func([pip_path] + args, env=env)
        File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['/usr/local/bin/pip', 'install', '--extra-index-url', 'https://pypi.nvidia.com', 'tensorrt_libs==8.6.1', 'tensorrt_bindings==8.6.1']' returned non-zero exit status 1.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for tensorrt
  Running setup.py clean for tensorrt
Failed to build tensorrt
ERROR: Could not build wheels for tensorrt, which is required to install pyproject.toml-based projects

Any tips for further debugging? Should we rebuild containers for Jetpack 6?

definition of point label

Hello thank you for your wonderful work! I would like to know definition of point label (0, 1, 2 etc). What is it? is it the object class? or
is it a specific object id to distinguish it from the other segments? Thanks.

nvidia-ai-iot / nanosam Goto Github PK

nanosam's Introduction

NanoSAM

👍 Usage

⏱️ Performance

🛠️ Setup

🤸 Examples

Example 1 - Segment with bounding box

Example 2 - Segment with bounding box (using OWL-ViT detections)

Example 3 - Segment with keypoints (offline using TRTPose detections)

Example 4 - Segment with keypoints (online using TRTPose detections)

Example 5 - Segment and track (experimental)

🏋️ Training

🧐 Evaluation

👏 Acknowledgement

🔗 See also

nanosam's People

Contributors

Stargazers

Watchers

Forkers

nanosam's Issues

Recommend Projects

Recommend Topics

Recommend Org