It took 3000ms in my conputer about mobilesam HOT 14 CLOSED

chaoningzhang commented on August 29, 2024

It took 3000ms in my conputer

from mobilesam.

Comments (14)

ChaoningZhang commented on August 29, 2024 4

Even after adding the following in the code device = "cuda" mobile_sam.to(device=device)
MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.
SAM 用时： 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时： 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

May I ask you whether you choose anything mode or everything mode?

I re-read the paper and the code, and I'm running in 'segment everything' mode. FastSAM took 0.0546329 seconds, MobileSAM took 1.4033191 seconds.

Thanks for your interest in our work. Note that MobileSAM makes the image encoder lightweight without changing the decoder (like 8ms on the encoder and 4ms on the decoder). Since we mainly target the anything mode (1 times image encoder and 1 times decoder) instead of everything mode (1 times image encoder and 32x32 times decoder), see the paper for definition difference (Anything mode is the foundation task while everything mode is just a downstream task as indicated in the original SAM paper). For everything mode, even though our encoder is much faster than that of the original SAM(close to 500ms), it cannot save too much time for the whole pipeline since most of the time is spent on the 32x32 times decoder. One way to mitigate this is to use smaller number of grids (like 10x10 or 5x5) to make the decoder consume less time, since many redundant masks are generated in the case of 32x32 grids. I hope this addresses your issues, otherwise, please kindly let us know. We are also currently trying to make the image decoder more lightweight by distilling it with smaller one as we did for image encoder. Stayed tuned for our progress. If you have more issues, please kindly let us know and we might not be able to respond in a timely manner, but will try our best.

from mobilesam.

fujianhai commented on August 29, 2024 1

This job is really great, the inference time for a point is about 10ms++, but the time for a full image is not much faster. Our GPU for the full image does take about 2s~3s . After all, the decoder network has not changed, and the entire image cannot be significantly improved.

from mobilesam.

garbe-github-support commented on August 29, 2024

Me too，It's even slower than Sam。
MyCode

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:, :, 3] = 0  

    for ann in sorted_anns:
        m = ann['segmentation'] 
        color_mask = np.concatenate([np.random.random(3), [1]])  
        img[m] = color_mask  

    ax.imshow(img)

def runSam(path):
    sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
    device = "cuda"
    sam.to(device)
    mask_generator = SamAutomaticMaskGenerator(sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    masks = mask_generator.generate(img)

    return masks, img

def runMobileSam(path):
    from mobile_encoder.setup_mobile_sam import setup_model
    checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
    mobile_sam = setup_model()
    mobile_sam.load_state_dict(checkpoint, strict=True)

    from segment_anything import SamAutomaticMaskGenerator

    mask_generator = SamAutomaticMaskGenerator(mobile_sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    masks = mask_generator.generate(img)
    return masks, img


def showRet(masks, img): 
    print(len(masks)) 
    print(masks[0].keys())

    plt.figure(figsize=(20, 20))
    plt.imshow(img)
    show_anns(masks)
    plt.axis('off')
    plt.show()


if __name__ == '__main__':
    path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
    masks, img = runSam(path)
    # masks, img = runMobileSam(path)
    showRet(masks, img)

my environment
windows , pytorch 2.0.1, cuda 11.7, 4070

from mobilesam.

newcoder0531 commented on August 29, 2024

Me too，It's even slower than Sam。 MyCode

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:, :, 3] = 0  

    for ann in sorted_anns:
        m = ann['segmentation'] 
        color_mask = np.concatenate([np.random.random(3), [1]])  
        img[m] = color_mask  

    ax.imshow(img)

def runSam(path):
    sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
    device = "cuda"
    sam.to(device)
    mask_generator = SamAutomaticMaskGenerator(sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    masks = mask_generator.generate(img)

    return masks, img

def runMobileSam(path):
    from mobile_encoder.setup_mobile_sam import setup_model
    checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
    mobile_sam = setup_model()
    mobile_sam.load_state_dict(checkpoint, strict=True)

    from segment_anything import SamAutomaticMaskGenerator

    mask_generator = SamAutomaticMaskGenerator(mobile_sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    masks = mask_generator.generate(img)
    return masks, img


def showRet(masks, img): 
    print(len(masks)) 
    print(masks[0].keys())

    plt.figure(figsize=(20, 20))
    plt.imshow(img)
    show_anns(masks)
    plt.axis('off')
    plt.show()


if __name__ == '__main__':
    path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
    masks, img = runSam(path)
    # masks, img = runMobileSam(path)
    showRet(masks, img)

my environment windows , pytorch 2.0.1, cuda 11.7, 4070

It seems that your mobilesam does not use cuda, but sam used。

from mobilesam.

ChaoningZhang commented on August 29, 2024

It took 3000ms in my conputer,I don't know what is wrong

Without more details, it is difficult for us to help you debug

from mobilesam.

garbe-github-support commented on August 29, 2024

Me too，It's even slower than Sam。 MyCode

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:, :, 3] = 0  

    for ann in sorted_anns:
        m = ann['segmentation'] 
        color_mask = np.concatenate([np.random.random(3), [1]])  
        img[m] = color_mask  

    ax.imshow(img)

def runSam(path):
    sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
    device = "cuda"
    sam.to(device)
    mask_generator = SamAutomaticMaskGenerator(sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    masks = mask_generator.generate(img)

    return masks, img

def runMobileSam(path):
    from mobile_encoder.setup_mobile_sam import setup_model
    checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
    mobile_sam = setup_model()
    mobile_sam.load_state_dict(checkpoint, strict=True)

    from segment_anything import SamAutomaticMaskGenerator

    mask_generator = SamAutomaticMaskGenerator(mobile_sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    masks = mask_generator.generate(img)
    return masks, img


def showRet(masks, img): 
    print(len(masks)) 
    print(masks[0].keys())

    plt.figure(figsize=(20, 20))
    plt.imshow(img)
    show_anns(masks)
    plt.axis('off')
    plt.show()


if __name__ == '__main__':
    path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
    masks, img = runSam(path)
    # masks, img = runMobileSam(path)
    showRet(masks, img)

my environment windows , pytorch 2.0.1, cuda 11.7, 4070

It seems that your mobilesam does not use cuda, but sam used。

Thank you, you are right. Now my time is half that of Sam

from mobilesam.

chenzx2 commented on August 29, 2024

It took 3000ms in my conputer,I don't know what is wrong

Without more details, it is difficult for us to help you debug

here is my code ,my environment: ubantu18 torch=2.0.0cu117

from mobilesam.

ChaoningZhang commented on August 29, 2024

Me too，It's even slower than Sam。 MyCode

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:, :, 3] = 0  

    for ann in sorted_anns:
        m = ann['segmentation'] 
        color_mask = np.concatenate([np.random.random(3), [1]])  
        img[m] = color_mask  

    ax.imshow(img)

def runSam(path):
    sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
    device = "cuda"
    sam.to(device)
    mask_generator = SamAutomaticMaskGenerator(sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    masks = mask_generator.generate(img)

    return masks, img

def runMobileSam(path):
    from mobile_encoder.setup_mobile_sam import setup_model
    checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
    mobile_sam = setup_model()
    mobile_sam.load_state_dict(checkpoint, strict=True)

    from segment_anything import SamAutomaticMaskGenerator

    mask_generator = SamAutomaticMaskGenerator(mobile_sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    masks = mask_generator.generate(img)
    return masks, img


def showRet(masks, img): 
    print(len(masks)) 
    print(masks[0].keys())

    plt.figure(figsize=(20, 20))
    plt.imshow(img)
    show_anns(masks)
    plt.axis('off')
    plt.show()


if __name__ == '__main__':
    path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
    masks, img = runSam(path)
    # masks, img = runMobileSam(path)
    showRet(masks, img)

my environment windows , pytorch 2.0.1, cuda 11.7, 4070

It seems that your mobilesam does not use cuda, but sam used。

Thank you, you are right. Now my time is half that of Sam

It seems that your issues are addressed. Thanks for your interest in our work.

from mobilesam.

ChaoningZhang commented on August 29, 2024

It took 3000ms in my conputer,I don't know what is wrong

Without more details, it is difficult for us to help you debug

here is my code ,my environment: ubantu18 torch=2.0.0cu117

May I ask you whether you choose anything mode or everything mode?

from mobilesam.

SongYii commented on August 29, 2024

Even after adding the following in the code
device = "cuda" mobile_sam.to(device=device)

MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.

SAM 用时： 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时： 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

from mobilesam.

chenzx2 commented on August 29, 2024

fastSAM is fine，I run the code of notebook

from mobilesam.

ChaoningZhang commented on August 29, 2024

Even after adding the following in the code device = "cuda" mobile_sam.to(device=device)

MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.

SAM 用时： 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时： 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

May I ask you whether you choose anything mode or everything mode?

from mobilesam.

SongYii commented on August 29, 2024

Even after adding the following in the code device = "cuda" mobile_sam.to(device=device)
MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.
SAM 用时： 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时： 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

May I ask you whether you choose anything mode or everything mode?

I re-read the paper and the code, and I'm running in 'segment everything' mode. FastSAM took 0.0546329 seconds, MobileSAM took 1.4033191 seconds.

from mobilesam.

ChaoningZhang commented on August 29, 2024

This job is really great, the inference time for a point is about 10ms++, but the time for a full image is not much faster. Our GPU for the full image does take about 2s~3s . After all, the decoder network has not changed, and the entire image cannot be significantly improved.

Thanks for your interest in our work. Please check our replies to others on how to mitigate this issue. Yet another way to speed it up on GPU is to do a batch inference for the decoder with 32*32 grids of prompt points. You can try implementing it and help do a pull request here, if you complete it. We will also implement it by ourselves but it take a while~~

from mobilesam.

It took 3000ms in my conputer about mobilesam HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent