您好，我在使用 ModelScope 进行测试image caption，代码如下，但是无法调用GPU，日志信

您好，我在更换环境后torch可以调用cuda，但运行和上面一致的代码后出现了新的问题。 <div class="snippet-clipboard-content

使用modelscope测试image caption无法调用GPU about modelscope HOT 8 CLOSED

xiapeng1110 commented on July 18, 2024

使用modelscope测试image caption无法调用GPU

from modelscope.

Comments (8)

wenmengzhou commented on July 18, 2024

执行一下torch.cuda.is_available() 看下输出结果

from modelscope.

xiapeng1110 commented on July 18, 2024

感谢！发现是torch版本的问题了。

from modelscope.

xiapeng1110 commented on July 18, 2024

您好，我在更换环境后torch可以调用cuda，但运行和上面一致的代码后出现了新的问题。

2022-11-14 03:31:30,992 - modelscope - INFO - PyTorch version 1.13.0 Found.
2022-11-14 03:31:30,997 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2022-11-14 03:31:31,018 - modelscope - INFO - Loading done! Current index file version is 1.0.3, with md5 122d6b7767ca662025493ae857fab95e
2022-11-14 03:31:34 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2022-11-14 03:31:34,899 - modelscope - INFO - initiate model from ./ofa_image-caption_coco_large_en
2022-11-14 03:31:34 | INFO | modelscope | initiate model from ./ofa_image-caption_coco_large_en
2022-11-14 03:31:34,900 - modelscope - INFO - initiate model from location ./ofa_image-caption_coco_large_en.
2022-11-14 03:31:34 | INFO | modelscope | initiate model from location ./ofa_image-caption_coco_large_en.
2022-11-14 03:31:34,903 - modelscope - INFO - initialize model from ./ofa_image-caption_coco_large_en
2022-11-14 03:31:34 | INFO | modelscope | initialize model from ./ofa_image-caption_coco_large_en
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/modelscope/utils/registry.py", line 209, in build_from_cfg
    return obj_cls._instantiate(**args)
  File "/usr/local/lib/python3.8/dist-packages/modelscope/models/base/base_model.py", line 61, in _instantiate
    return cls(**kwargs)
  File "/usr/local/lib/python3.8/dist-packages/modelscope/models/multi_modal/ofa_for_all_tasks.py", line 44, in __init__
    model = OFAModel.from_pretrained(model_dir)
  File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 2230, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/modelscope/models/multi_modal/ofa/modeling_ofa.py", line 1954, in __init__
    self.encoder = OFAEncoder(config, shared)
  File "/usr/local/lib/python3.8/dist-packages/modelscope/models/multi_modal/ofa/modeling_ofa.py", line 853, in __init__
    self.layernorm_embedding = LayerNorm(embed_dim)
  File "/usr/local/lib/python3.8/dist-packages/modelscope/models/multi_modal/ofa/modeling_ofa.py", line 92, in LayerNorm
    return FusedLayerNorm(normalized_shape, eps, elementwise_affine)
  File "/usr/local/lib/python3.8/dist-packages/apex/normalization/fused_layer_norm.py", line 166, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 657, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 556, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1101, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.8/dist-packages/fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7optionsEv

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/modelscope/utils/registry.py", line 211, in build_from_cfg
    return obj_cls(**args)
  File "/usr/local/lib/python3.8/dist-packages/modelscope/pipelines/multi_modal/image_captioning_pipeline.py", line 32, in __init__
    super().__init__(model=model)
  File "/usr/local/lib/python3.8/dist-packages/modelscope/pipelines/base.py", line 89, in __init__
    self.model = self.initiate_single_model(model)
  File "/usr/local/lib/python3.8/dist-packages/modelscope/pipelines/base.py", line 50, in initiate_single_model
    return Model.from_pretrained(
  File "/usr/local/lib/python3.8/dist-packages/modelscope/models/base/base_model.py", line 122, in from_pretrained
    model = build_model(
  File "/usr/local/lib/python3.8/dist-packages/modelscope/models/builder.py", line 30, in build_model
    return build_from_cfg(
  File "/usr/local/lib/python3.8/dist-packages/modelscope/utils/registry.py", line 214, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
ImportError: OfaForAllTasks: /usr/local/lib/python3.8/dist-packages/fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7optionsEv

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "generate_captions.py", line 18, in <module>
    img_caption_1 = pipeline(Tasks.image_captioning, model=path_1, device='gpu:0') #, device='gpu:5'
  File "/usr/local/lib/python3.8/dist-packages/modelscope/pipelines/builder.py", line 325, in pipeline
    return build_pipeline(cfg, task_name=task)
  File "/usr/local/lib/python3.8/dist-packages/modelscope/pipelines/builder.py", line 242, in build_pipeline
    return build_from_cfg(
  File "/usr/local/lib/python3.8/dist-packages/modelscope/utils/registry.py", line 214, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
ImportError: ImageCaptioningPipeline: OfaForAllTasks: /usr/local/lib/python3.8/dist-packages/fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7optionsEv

希望能得到解答，谢谢！

from modelscope.

wenmengzhou commented on July 18, 2024

fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7optionsEv

看起来是apex安装的有问题，尝试clone代码源码安装试试？

from modelscope.

xiapeng1110 commented on July 18, 2024

这个问题需要重新装apex，装apex的时候需要cuda和cudatoolkit的版本一致，我把cuda的版本调整11.1了，然后重装apex再编译。但是我再次测试时，没有报错但日志仍然显示无法调用GPU，日志信息和对应代码如下：

2022-11-15 04:36:49,671 - modelscope - INFO - PyTorch version 1.10.0+cu111 Found.
2022-11-15 04:36:49,780 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2022-11-15 04:36:49,780 - modelscope - INFO - No valid ast index found from /root/.cache/modelscope/ast_indexer, rebuilding ast index!
2022-11-15 04:36:49,789 - modelscope - INFO - AST-Scaning the path "/usr/local/lib/python3.8/dist-packages/modelscope" with the following sub folders ['models', 'metrics', 'pipelines', 'preprocessors', 'trainers', 'msdatasets']
2022-11-15 04:37:19,149 - modelscope - INFO - Scaning done! A number of 427 files indexed! Time consumed 29.360363721847534s
2022-11-15 04:37:19,175 - modelscope - INFO - Loading done! Current index file version is 1.0.3, with md5 122d6b7767ca662025493ae857fab95e
2022-11-15 04:37:23 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2022-11-15 04:37:23,920 - modelscope - INFO - initiate model from ./ofa_image-caption_coco_large_en
2022-11-15 04:37:23 | INFO | modelscope | initiate model from ./ofa_image-caption_coco_large_en
2022-11-15 04:37:23,922 - modelscope - INFO - initiate model from location ./ofa_image-caption_coco_large_en.
2022-11-15 04:37:23 | INFO | modelscope | initiate model from location ./ofa_image-caption_coco_large_en.
2022-11-15 04:37:23,936 - modelscope - INFO - initialize model from ./ofa_image-caption_coco_large_en
2022-11-15 04:37:23 | INFO | modelscope | initialize model from ./ofa_image-caption_coco_large_en
2022-11-15 04:37:36,234 - modelscope - INFO - cuda is not available, using cpu instead.
2022-11-15 04:37:36 | INFO | modelscope | cuda is not available, using cpu instead.
2022-11-15 04:37:36,234 - modelscope - INFO - initialize model from ./ofa_image-caption_coco_large_en
2022-11-15 04:37:36 | INFO | modelscope | initialize model from ./ofa_image-caption_coco_large_en
2022-11-15 04:38:04,987 - modelscope - INFO - cuda is not available, using cpu instead.
2022-11-15 04:38:04 | INFO | modelscope | cuda is not available, using cpu instead.
2022-11-15 04:38:05,131 - modelscope - INFO - initiate model from ./ofa_image-caption_coco_huge_en
2022-11-15 04:38:05 | INFO | modelscope | initiate model from ./ofa_image-caption_coco_huge_en
2022-11-15 04:38:05,133 - modelscope - INFO - initiate model from location ./ofa_image-caption_coco_huge_en.
2022-11-15 04:38:05 | INFO | modelscope | initiate model from location ./ofa_image-caption_coco_huge_en.
2022-11-15 04:38:05,138 - modelscope - INFO - initialize model from ./ofa_image-caption_coco_huge_en
2022-11-15 04:38:05 | INFO | modelscope | initialize model from ./ofa_image-caption_coco_huge_en
2022-11-15 04:38:39,875 - modelscope - INFO - cuda is not available, using cpu instead.
2022-11-15 04:38:39 | INFO | modelscope | cuda is not available, using cpu instead.
2022-11-15 04:38:39,877 - modelscope - INFO - initialize model from ./ofa_image-caption_coco_huge_en
2022-11-15 04:38:39 | INFO | modelscope | initialize model from ./ofa_image-caption_coco_huge_en
2022-11-15 04:40:18,908 - modelscope - INFO - cuda is not available, using cpu instead.
2022-11-15 04:40:18 | INFO | modelscope | cuda is not available, using cpu instead.

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
from modelscope.outputs import OutputKeys
import os

path_1 = './ofa_image-caption_coco_large_en'
path_2 = './ofa_image-caption_coco_huge_en'

img_dir = './val2014/'

img_caption_1 = pipeline(Tasks.image_captioning, model=path_1, device='gpu:5')
img_caption_2 = pipeline(Tasks.image_captioning, model=path_2, device='gpu:5')

list_caption_1 = []
list_caption_2 = []

for name in os.listdir(img_dir):
	img = img_dir+name
	list_caption_1.append([img,img_caption_1(img)[OutputKeys.CAPTION][0]])
	list_caption_2.append([img,img_caption_2(img)[OutputKeys.CAPTION][0]])

with open('./1.txt', encoding='utf-8', mode='w') as f1:
    for i in list_caption_1:
        for j in i:
            f1.write(str(j)+' ')
        f1.write('\n')

with open('./2.txt', encoding='utf-8', mode='w') as f2:
    for i in list_caption_2:
        for j in i:
            f2.write(str(j)+' ')
        f2.write('\n')

from modelscope.

xiapeng1110 commented on July 18, 2024

已经解决了，感谢您。另外，方便问一下pipeline是否支持batch输入呢，使用单条样本执行inference速度较慢。

from modelscope.

zhangyichang commented on July 18, 2024

已经解决了，感谢您。另外，方便问一下pipeline是否支持batch输入呢，使用单条样本执行inference速度较慢。

我们近期会推出支持batch输入的接口，敬请等待。自己修改的话可以参考ofa trainer中使用dataset的方法。

from modelscope.

7carry7 commented on July 18, 2024

modelscope - INFO - cuda is not available, using cpu instead.
在使用modelscope的模型进行ocr识别时，tensorflow可以访问cpu（tf是1.15.0，配cuda10.0），但加载modelscope时会出现上述语句，请问是为什么呢？需要重新下载一个cuda嘛and并没有在网上找到modelscope与cuda的对应关系

from modelscope.

使用modelscope测试image caption无法调用GPU about modelscope HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent