Git Product home page Git Product logo

Comments (6)

shadidashmiz avatar shadidashmiz commented on August 28, 2024

Looks like your test time increases linearly with num of buffer you allocate does not look like hipStream issue

from hip.

jinz2014 avatar jinz2014 commented on August 28, 2024

In the test, the two datacenter GPUs are not installed on the same host, so I am not sure if different hosts may impact the execution time.
So, I tried to run the CUDA and HIP programs on a desktop computer with both Nvidia and AMD GPUs.

RTX3090

Create+Copy+Synchronize+Destroy time for 1 streams and 1 buffers and 128 iterations 0.0128745 (ms)
Create+Copy+Synchronize+Destroy time for 2 streams and 1 buffers and 64 iterations 0.00757924 (ms)
Create+Copy+Synchronize+Destroy time for 4 streams and 1 buffers and 32 iterations 0.0062003 (ms)
Create+Copy+Synchronize+Destroy time for 8 streams and 1 buffers and 16 iterations 0.0063163 (ms)
Create+Copy+Synchronize+Destroy time for 1 streams and 100 buffers and 64 iterations 0.270007 (ms)
Create+Copy+Synchronize+Destroy time for 2 streams and 100 buffers and 32 iterations 0.255549 (ms)
Create+Copy+Synchronize+Destroy time for 4 streams and 100 buffers and 16 iterations 0.27291 (ms)
Create+Copy+Synchronize+Destroy time for 8 streams and 100 buffers and 8 iterations 0.267216 (ms)
Create+Copy+Synchronize+Destroy time for 1 streams and 1000 buffers and 32 iterations 2.53417 (ms)
Create+Copy+Synchronize+Destroy time for 2 streams and 1000 buffers and 16 iterations 2.52819 (ms)
Create+Copy+Synchronize+Destroy time for 4 streams and 1000 buffers and 8 iterations 2.52339 (ms)
Create+Copy+Synchronize+Destroy time for 8 streams and 1000 buffers and 4 iterations 2.52614 (ms)
Create+Copy+Synchronize+Destroy time for 1 streams and 5000 buffers and 16 iterations 12.7661 (ms)
Create+Copy+Synchronize+Destroy time for 2 streams and 5000 buffers and 8 iterations 12.7234 (ms)
Create+Copy+Synchronize+Destroy time for 4 streams and 5000 buffers and 4 iterations 12.7502 (ms)
Create+Copy+Synchronize+Destroy time for 8 streams and 5000 buffers and 2 iterations 12.7159 (ms)

gfx1030

Create+Copy+Synchronize+Destroy time for 1 streams and 1 buffers and 128 iterations 1.99878 (ms)
Create+Copy+Synchronize+Destroy time for 2 streams and 1 buffers and 64 iterations 0.574584 (ms)
Create+Copy+Synchronize+Destroy time for 4 streams and 1 buffers and 32 iterations 0.610492 (ms)
Create+Copy+Synchronize+Destroy time for 8 streams and 1 buffers and 16 iterations 0.587304 (ms)
Create+Copy+Synchronize+Destroy time for 1 streams and 100 buffers and 64 iterations 1.39792 (ms)
Create+Copy+Synchronize+Destroy time for 2 streams and 100 buffers and 32 iterations 1.39171 (ms)
Create+Copy+Synchronize+Destroy time for 4 streams and 100 buffers and 16 iterations 1.41488 (ms)
Create+Copy+Synchronize+Destroy time for 8 streams and 100 buffers and 8 iterations 1.43967 (ms)
Create+Copy+Synchronize+Destroy time for 1 streams and 1000 buffers and 32 iterations 9.0404 (ms)
Create+Copy+Synchronize+Destroy time for 2 streams and 1000 buffers and 16 iterations 9.03053 (ms)
Create+Copy+Synchronize+Destroy time for 4 streams and 1000 buffers and 8 iterations 9.05028 (ms)
Create+Copy+Synchronize+Destroy time for 8 streams and 1000 buffers and 4 iterations 9.15136 (ms)
Create+Copy+Synchronize+Destroy time for 1 streams and 5000 buffers and 16 iterations 43.0856 (ms)
Create+Copy+Synchronize+Destroy time for 2 streams and 5000 buffers and 8 iterations 43.0919 (ms)
Create+Copy+Synchronize+Destroy time for 4 streams and 5000 buffers and 4 iterations 43.1138 (ms)
Create+Copy+Synchronize+Destroy time for 8 streams and 5000 buffers and 2 iterations 43.166 (ms)

from hip.

bdenhollander avatar bdenhollander commented on August 28, 2024

I profiled your code on Windows on gfx1032. The majority of the time was spent in memcpy rather than creating and destroying streams. This code may be more of host to device copy benchmark.
image

from hip.

jinz2014 avatar jinz2014 commented on August 28, 2024

Yes, most time is spent on data copy. I updated the summary of the issue.

from hip.

jinz2014 avatar jinz2014 commented on August 28, 2024

@bdenhollander What is your profiler ?

from hip.

bdenhollander avatar bdenhollander commented on August 28, 2024

The screenshot is from Visual Studio 2019's built in profiler.

from hip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.