Comments (1)
Hi @guoyunqingyue - compute capability 6.0 (Pascal) predates Tensor Cores (fp16 ops with fp32 accumulate). So while yes, you can certainly, in theory, do fp16 math on CC 6.x, there isn't really hardware support for it--it's only going to go 1/64 as fast as fp32 on a GP104 like the chip in your 1080Ti, and of course the bigger problem for DeepSpeed etc is that it's going to use different CUDA instructions (and data types) than the TC math in CC 7.0+ (Volta etc).
I would double check your sources on 1080Ti support for fp16, and see if there's some fine print. These are NVIDIA's own words on fp16 support in the GP104: (GP100 = P4 and P100, GP104 = 1080Ti)
GP100, designed with training deep neural networks in mind, provides FP16 throughput up to 2x that of FP32 arithmetic. On GP104, FP16 throughput is lower, 1/64th that of FP32. However, compensating for reduced FP16 throughput, GP104 provides additional high-throughput INT8 support not available in GP100.
from deepspeed.
Related Issues (20)
- [BUG] Concern around mixed precision training where weights are in low precision HOT 1
- [REQUEST] build prebuilt wheels HOT 5
- [REQUEST]Support Galore
- Confusion about Inference HOT 1
- Why does pip install deepspeed fail to install? HOT 7
- [BUG] Failed for using cpu for pipeline based training across multiple machines (2 machines actually) HOT 8
- nv-nightly CI test failure HOT 1
- nv-ds-chat CI test failure HOT 1
- [REQUEST] how can we use deepspeed-ulysess in hugginface transformers?
- [BUG]Failed to call forward() multiple times using deepspeed zero3
- [QUESTION/HELP] ZERO3 weight modification after load HOT 4
- Request for Mixtral 8X7B inference with DP+EP+TP HOT 5
- On the training problem of mixed-precision fp16
- Backward time grows linearly to the number of to zero3_consolidated_16bit_state_dict called HOT 1
- [BUG] ZeRO-3 gradient partitioning issue
- pip install deepspeed error :Getting requirements to build wheel did not run successfully. HOT 4
- [BUG] cannot import name 'log' from elastic agent HOT 3
- [BUG]When I run the DeepSpeed program, I will automatically enter the Python interface HOT 1
- DeepSpeed Inference with Model not From HuggingFace
- [BUG] Dynamically switching freeze layers during the training process HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepspeed.