Git Product home page Git Product logo

transformers-benchmarks's People

Contributors

mli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transformers-benchmarks's Issues

About the theoretical value of the GPU

请问沐神:

  1. 在notebook中指向的wiki里,3090ti的理论值40是从表中的core boosted value(39.997)得到的吗?
  2. 我在自己的很多块3090上,用CUDA11.7和nvidia-driver 525跑出来的TFLOPS都只有24,距离base(29.3)和boost(35.6)的理论值都有一定的差距。请问notebook中用3090ti跑的TFLOPS是经过超频的吗?要想达到接近理论值的FLOPS需要做怎样的设置呢?

Exceptionally high memory bandwidth

我在 RTX 4090 工作站上运行 banchmark 程序,取得了异常高的总线带宽数据:

Pytorch version : 1.14.0a0+44dac51
CUDA version    : 12.0
GPU             : NVIDIA GeForce RTX 4090
Matrix Multiplication:
               n=128   n=512   n=2048   n=8192
torch.float32  1.048  29.653   82.788   86.676
torch.float16  1.304  46.890  167.112  158.596

Memory Bandwidth:
        65536    262144    1048576   4194304
TFLOPS    0.025    0.099     0.324     0.484
GB/s    196.343  792.595  2590.594  3868.374

可以看到显存带宽为 3868 GB/s,而我查到的 4090 理论显存带宽为 1000 GB/s 左右。

而我在 A800 服务器上运行 banchmark 程序的结果是正常的:

Pytorch version : 2.0.0a0+1767026
CUDA version    : 12.1
GPU             : NVIDIA A800 80GB PCIe
Matrix Multiplication:
               n=128   n=512   n=2048   n=8192
torch.float32  0.464  25.947   82.386  105.973
torch.float16  0.343  31.456  192.540  215.333

Memory Bandwidth:
        65536    262144    1048576   4194304
TFLOPS    0.009    0.036     0.143     0.216
GB/s     72.026  288.159  1143.973  1727.486

显存带宽 1727 GB/s 低于理论上限 1935 GB/s

这导致 4090 的显存带宽远高于 A800,在我的实际训练中 4090也取得了更快的训练速度。
请问 4090 这样高的带宽是正常的吗?如果不正常的话有什么可能的原因?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.