Git Product home page Git Product logo

Comments (3)

kpet avatar kpet commented on June 12, 2024

TSAN is reporting several issues:

  1. The queue allocation code in clvk is not thread safe. Fixed by #597.
  2. The init code for the random number generator used by the OpenCL CTS is not thread safe. Fixed by KhronosGroup/OpenCL-CTS#1797.
  3. It seems there are data races in Mesa's implementation of pipeline caches. Created https://gitlab.freedesktop.org/mesa/mesa/-/issues/9491.

With 1 and 2 and a work-around for 3 (globally serialising all pipeline creations; not committing this for now), I can get a clean run of the test code itself with TSAN enabled. I am still seeing a race on teardown when Mesa destroys an internal thread pool though.

from clvk.

dneto0 avatar dneto0 commented on June 12, 2024

The #597 fix has been in our system for almost a week. I have not seen any crashes.
I retested several times across different machines and against SwiftShader and NVIDIA. I have not seen any crashes.

I have seen some resource exhaustion cases, e.g.

ERROR: clEnqueueReadBuffer failed! (CL_OUT_OF_RESOURCES from /usr/local/google/home/dneto/project/opencl-cts/test_conformance/integer_ops/test_int_basic_ops.cpp:677)
Thread c (job 12) failed test_integer_ops with result fffffffb

But I think that's ok.

  • Vulkan can tell you the maximum queue count for a given queue family; on Pixel 6 it's 2; on a moderate desktop GPU it's 16. This test can easily consume all of those and more.
  • I could not find a spec limit in OpenCL for required minimum number of supported command queues.
    At best this might turn into an OpenCL CTS issue to improve the test.

But those are not crashes. So I consider this a solved problem.

Thank you!

from clvk.

kpet avatar kpet commented on June 12, 2024

Good to hear that your issue is fixed (and thanks for letting us know). I think it's unlikely the CL_OUT_OF_RESOURCES error you're seeing is due to having clvk exhausting the number of available queues. clvk just matches a CL queue with any of the suitable Vulkan queues in a round-robin fashion. This may lead to performance issues but should scale to an arbitrary (within reason) number of CL queues. clvk returns CL_OUT_OF_RESOURCES in many cases when something failed that does not have an error condition specified by OpenCL. Along with CL_OUT_OF_HOST_MEMORY, it's the only catch-all error code that is allowed to be returned from most entry points. Seeing this error code doesn't mean much more than "something went wrong".

from clvk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.