Comments (2)
I think distinction here is "supported by software"(i.e. emulation) vs "supported by hardware". torch.cuda.is_bf16_supported()
tells that your GPU hardware does not have native bf16 instructions, but software can easily emulate some bf16 operations by shifting input values to the left and then running computation in float32, but it will be slower
from pytorch.
I think distinction here is "supported by software"(i.e. emulation) vs "supported by hardware".
torch.cuda.is_bf16_supported()
tells that your GPU hardware does not have native bf16 instructions, but software can easily emulate some bf16 operations by shifting input values to the left and then running computation in float32, but it will be slower
Thanks for you answer! I have tried to use a bfloat16 mixed precision training on a V100 GPU, which shows the time cost is almost the same as full fp32 training(even a little slower).
from pytorch.
Related Issues (20)
- [CUDA][Compex] `test_reference_numerics_large_jiterator_unary_cuda_complex64` broken after updating to `numpy >= 1.25.0` HOT 3
- DISABLED test_unused_output (__main__.TestAutogradWithCompiledAutograd) HOT 1
- DISABLED test_type_conversions (__main__.TestAutogradWithCompiledAutograd) HOT 1
- [NT] Implementing Multi-Head Attention with NestedTensors HOT 1
- torch.inference_mode documentation not availble HOT 1
- MaxPool2D memory leakage on device MPS HOT 2
- DISABLED test_perfect_match_on_sequence_and_bool_attributes (__main__.TestFxToOnnx) HOT 2
- DISABLED test_inplace_grad_update (__main__.TestCompiledAutograd) HOT 1
- DISABLED test_var_mean_differentiable (__main__.TestAutogradWithCompiledAutograd) HOT 3
- torch.uniform_() is single-threaded on CPU HOT 1
- Strange behavior of randint using device=cuda
- DISABLED test_issue106555 (__main__.TestCompiledAutograd) HOT 2
- DISABLED test_variable_traverse (__main__.TestAutogradWithCompiledAutograd) HOT 3
- ROCm: `fatal error: aotriton/flash.h: No such file or directory` when building with `USE_ROCM=1`
- torch.Library can easily cause segfault on loading/unloading HOT 1
- [Inductor] [Distributed] DDP torch.compile model hangs on exit (python 3.8/3.9) HOT 1
- torch.no_grad() is not working for dynamo inductor backend HOT 2
- Improved strategy for dealing with deterministically flaky tests which are order sensitive HOT 2
- DISABLED test_bmm_multithreaded (__main__.TestTorch) HOT 1
- omp.h not found on macOS during install from source at github repo HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch.