I observed that in the is_fp16_supported function in cuda_accelerator.py, the method t

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Questions about whether DeepSpeed supports fp16 about deepspeed HOT 1 CLOSED

guoyunqingyue commented on September 20, 2024

Questions about whether DeepSpeed supports fp16

from deepspeed.

Comments (1)

benprofessionaledition commented on September 20, 2024

Hi @guoyunqingyue - compute capability 6.0 (Pascal) predates Tensor Cores (fp16 ops with fp32 accumulate). So while yes, you can certainly, in theory, do fp16 math on CC 6.x, there isn't really hardware support for it--it's only going to go 1/64 as fast as fp32 on a GP104 like the chip in your 1080Ti, and of course the bigger problem for DeepSpeed etc is that it's going to use different CUDA instructions (and data types) than the TC math in CC 7.0+ (Volta etc).

I would double check your sources on 1080Ti support for fp16, and see if there's some fine print. These are NVIDIA's own words on fp16 support in the GP104: (GP100 = P4 and P100, GP104 = 1080Ti)

GP100, designed with training deep neural networks in mind, provides FP16 throughput up to 2x that of FP32 arithmetic. On GP104, FP16 throughput is lower, 1/64th that of FP32. However, compensating for reduced FP16 throughput, GP104 provides additional high-throughput INT8 support not available in GP100.

from deepspeed.

Questions about whether DeepSpeed supports fp16 about deepspeed HOT 1 CLOSED

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent