Git Product home page Git Product logo

Comments (14)

ChaoningZhang avatar ChaoningZhang commented on August 29, 2024 4

Even after adding the following in the code device = "cuda" mobile_sam.to(device=device)
MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.
SAM 用时: 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时: 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

May I ask you whether you choose anything mode or everything mode?

I re-read the paper and the code, and I'm running in 'segment everything' mode. FastSAM took 0.0546329 seconds, MobileSAM took 1.4033191 seconds.

Thanks for your interest in our work. Note that MobileSAM makes the image encoder lightweight without changing the decoder (like 8ms on the encoder and 4ms on the decoder). Since we mainly target the anything mode (1 times image encoder and 1 times decoder) instead of everything mode (1 times image encoder and 32x32 times decoder), see the paper for definition difference (Anything mode is the foundation task while everything mode is just a downstream task as indicated in the original SAM paper). For everything mode, even though our encoder is much faster than that of the original SAM(close to 500ms), it cannot save too much time for the whole pipeline since most of the time is spent on the 32x32 times decoder. One way to mitigate this is to use smaller number of grids (like 10x10 or 5x5) to make the decoder consume less time, since many redundant masks are generated in the case of 32x32 grids. I hope this addresses your issues, otherwise, please kindly let us know. We are also currently trying to make the image decoder more lightweight by distilling it with smaller one as we did for image encoder. Stayed tuned for our progress. If you have more issues, please kindly let us know and we might not be able to respond in a timely manner, but will try our best.

from mobilesam.

fujianhai avatar fujianhai commented on August 29, 2024 1

This job is really great, the inference time for a point is about 10ms++, but the time for a full image is not much faster. Our GPU for the full image does take about 2s~3s . After all, the decoder network has not changed, and the entire image cannot be significantly improved.

from mobilesam.

garbe-github-support avatar garbe-github-support commented on August 29, 2024

Me too,It's even slower than Sam。
MyCode

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:, :, 3] = 0  

    for ann in sorted_anns:
        m = ann['segmentation'] 
        color_mask = np.concatenate([np.random.random(3), [1]])  
        img[m] = color_mask  

    ax.imshow(img)

def runSam(path):
    sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
    device = "cuda"
    sam.to(device)
    mask_generator = SamAutomaticMaskGenerator(sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    masks = mask_generator.generate(img)

    return masks, img

def runMobileSam(path):
    from mobile_encoder.setup_mobile_sam import setup_model
    checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
    mobile_sam = setup_model()
    mobile_sam.load_state_dict(checkpoint, strict=True)

    from segment_anything import SamAutomaticMaskGenerator

    mask_generator = SamAutomaticMaskGenerator(mobile_sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    masks = mask_generator.generate(img)
    return masks, img


def showRet(masks, img): 
    print(len(masks)) 
    print(masks[0].keys())

    plt.figure(figsize=(20, 20))
    plt.imshow(img)
    show_anns(masks)
    plt.axis('off')
    plt.show()


if __name__ == '__main__':
    path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
    masks, img = runSam(path)
    # masks, img = runMobileSam(path)
    showRet(masks, img)

my environment
windows , pytorch 2.0.1, cuda 11.7, 4070

from mobilesam.

newcoder0531 avatar newcoder0531 commented on August 29, 2024

Me too,It's even slower than Sam。 MyCode

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:, :, 3] = 0  

    for ann in sorted_anns:
        m = ann['segmentation'] 
        color_mask = np.concatenate([np.random.random(3), [1]])  
        img[m] = color_mask  

    ax.imshow(img)

def runSam(path):
    sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
    device = "cuda"
    sam.to(device)
    mask_generator = SamAutomaticMaskGenerator(sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    masks = mask_generator.generate(img)

    return masks, img

def runMobileSam(path):
    from mobile_encoder.setup_mobile_sam import setup_model
    checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
    mobile_sam = setup_model()
    mobile_sam.load_state_dict(checkpoint, strict=True)

    from segment_anything import SamAutomaticMaskGenerator

    mask_generator = SamAutomaticMaskGenerator(mobile_sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    masks = mask_generator.generate(img)
    return masks, img


def showRet(masks, img): 
    print(len(masks)) 
    print(masks[0].keys())

    plt.figure(figsize=(20, 20))
    plt.imshow(img)
    show_anns(masks)
    plt.axis('off')
    plt.show()


if __name__ == '__main__':
    path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
    masks, img = runSam(path)
    # masks, img = runMobileSam(path)
    showRet(masks, img)

my environment windows , pytorch 2.0.1, cuda 11.7, 4070

It seems that your mobilesam does not use cuda, but sam used。

from mobilesam.

ChaoningZhang avatar ChaoningZhang commented on August 29, 2024

It took 3000ms in my conputer,I don't know what is wrong

Without more details, it is difficult for us to help you debug

from mobilesam.

garbe-github-support avatar garbe-github-support commented on August 29, 2024

Me too,It's even slower than Sam。 MyCode

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:, :, 3] = 0  

    for ann in sorted_anns:
        m = ann['segmentation'] 
        color_mask = np.concatenate([np.random.random(3), [1]])  
        img[m] = color_mask  

    ax.imshow(img)

def runSam(path):
    sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
    device = "cuda"
    sam.to(device)
    mask_generator = SamAutomaticMaskGenerator(sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    masks = mask_generator.generate(img)

    return masks, img

def runMobileSam(path):
    from mobile_encoder.setup_mobile_sam import setup_model
    checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
    mobile_sam = setup_model()
    mobile_sam.load_state_dict(checkpoint, strict=True)

    from segment_anything import SamAutomaticMaskGenerator

    mask_generator = SamAutomaticMaskGenerator(mobile_sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    masks = mask_generator.generate(img)
    return masks, img


def showRet(masks, img): 
    print(len(masks)) 
    print(masks[0].keys())

    plt.figure(figsize=(20, 20))
    plt.imshow(img)
    show_anns(masks)
    plt.axis('off')
    plt.show()


if __name__ == '__main__':
    path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
    masks, img = runSam(path)
    # masks, img = runMobileSam(path)
    showRet(masks, img)

my environment windows , pytorch 2.0.1, cuda 11.7, 4070

It seems that your mobilesam does not use cuda, but sam used。

Thank you, you are right. Now my time is half that of Sam

from mobilesam.

chenzx2 avatar chenzx2 commented on August 29, 2024

It took 3000ms in my conputer,I don't know what is wrong

Without more details, it is difficult for us to help you debug

here is my code ,my environment: ubantu18 torch=2.0.0cu117
企业微信截图_16880289877752

from mobilesam.

ChaoningZhang avatar ChaoningZhang commented on August 29, 2024

Me too,It's even slower than Sam。 MyCode

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:, :, 3] = 0  

    for ann in sorted_anns:
        m = ann['segmentation'] 
        color_mask = np.concatenate([np.random.random(3), [1]])  
        img[m] = color_mask  

    ax.imshow(img)

def runSam(path):
    sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
    device = "cuda"
    sam.to(device)
    mask_generator = SamAutomaticMaskGenerator(sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    masks = mask_generator.generate(img)

    return masks, img

def runMobileSam(path):
    from mobile_encoder.setup_mobile_sam import setup_model
    checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
    mobile_sam = setup_model()
    mobile_sam.load_state_dict(checkpoint, strict=True)

    from segment_anything import SamAutomaticMaskGenerator

    mask_generator = SamAutomaticMaskGenerator(mobile_sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    masks = mask_generator.generate(img)
    return masks, img


def showRet(masks, img): 
    print(len(masks)) 
    print(masks[0].keys())

    plt.figure(figsize=(20, 20))
    plt.imshow(img)
    show_anns(masks)
    plt.axis('off')
    plt.show()


if __name__ == '__main__':
    path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
    masks, img = runSam(path)
    # masks, img = runMobileSam(path)
    showRet(masks, img)

my environment windows , pytorch 2.0.1, cuda 11.7, 4070

It seems that your mobilesam does not use cuda, but sam used。

Thank you, you are right. Now my time is half that of Sam

It seems that your issues are addressed. Thanks for your interest in our work.

from mobilesam.

ChaoningZhang avatar ChaoningZhang commented on August 29, 2024

It took 3000ms in my conputer,I don't know what is wrong

Without more details, it is difficult for us to help you debug

here is my code ,my environment: ubantu18 torch=2.0.0cu117 企业微信截图_16880289877752

May I ask you whether you choose anything mode or everything mode?

from mobilesam.

SongYii avatar SongYii commented on August 29, 2024

Even after adding the following in the code
device = "cuda" mobile_sam.to(device=device)

MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.

SAM 用时: 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时: 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

from mobilesam.

chenzx2 avatar chenzx2 commented on August 29, 2024

fastSAM is fine,I run the code of notebook

from mobilesam.

ChaoningZhang avatar ChaoningZhang commented on August 29, 2024

Even after adding the following in the code device = "cuda" mobile_sam.to(device=device)

MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.

SAM 用时: 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时: 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

May I ask you whether you choose anything mode or everything mode?

from mobilesam.

SongYii avatar SongYii commented on August 29, 2024

Even after adding the following in the code device = "cuda" mobile_sam.to(device=device)
MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.
SAM 用时: 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时: 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

May I ask you whether you choose anything mode or everything mode?

I re-read the paper and the code, and I'm running in 'segment everything' mode. FastSAM took 0.0546329 seconds, MobileSAM took 1.4033191 seconds.

from mobilesam.

ChaoningZhang avatar ChaoningZhang commented on August 29, 2024

This job is really great, the inference time for a point is about 10ms++, but the time for a full image is not much faster. Our GPU for the full image does take about 2s~3s . After all, the decoder network has not changed, and the entire image cannot be significantly improved.

Thanks for your interest in our work. Please check our replies to others on how to mitigate this issue. Yet another way to speed it up on GPU is to do a batch inference for the decoder with 32*32 grids of prompt points. You can try implementing it and help do a pull request here, if you complete it. We will also implement it by ourselves but it take a while~~

from mobilesam.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.