Describe the issue MoE unit tests fail on older architecture.

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I tested it on V100 on Linux with CUDA 11.8. <div class="snippet-clipboard-content

[Build] MoE related unit tests fail for older architectures such as Pascal when building dev debug about onnxruntime HOT 4 OPEN

yuslepukhin commented on July 18, 2024

[Build] MoE related unit tests fail for older architectures such as Pascal when building dev debug

from onnxruntime.

Comments (4)

tianleiwu commented on July 18, 2024

@wangyems, In MoeGemmRunner::dispatch_to_arch, I saw it dispatch for SM from 70 to 89. We shall skip MOE tests for other GPUs (< 70, and >= 90).

BTW, could you test MOE in V100 to see whether it could run in SM=70? I saw some feature required SM>=80:

onnxruntime/onnxruntime/contrib_ops/cuda/moe/cutlass_extensions/gemm/warp/mma_tensorop_compute_B_with_f16.h

Line 135 in 0996d6e

"MmaTensorOpCvtBToA only supports Fp16 A or Bf16 A on Ampere+");

I think the requirements are like:
float: sm 70 to 89 (In theory, it shall support all GPUs, just a limitation of MoeGemmRunner::dispatch_to_arch)
float16: sm 70 to 89
bfloat16: sm 80 to 89

from onnxruntime.

snnn commented on July 18, 2024

I tested it on V100 on Linux with CUDA 11.8.

[==========] 4 tests from 1 test suite ran. (10758 ms total)
[  PASSED  ] 3 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] MoETest.QMoETest_Mixtral_Int4

from onnxruntime.

snnn commented on July 18, 2024

Our Linux training GPU machine pools use V100. Why they didn't catch this error?

from onnxruntime.

wangyems commented on July 18, 2024

checking..

from onnxruntime.

[Build] MoE related unit tests fail for older architectures such as Pascal when building dev debug about onnxruntime HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent