Comments (16)
Excellent. I'm merging this into dev then main at the moment.
from enkits.
I have a small example close to what I was doing in the main app I can share if it would be useful (though I, unfortunately, couldn't reproduce the problem with this).
It might be useful to see this, even if it can't reproduce the problem.
One thing to check is that you haven't deadlocked. There are a number of ways this can occur, but the most common is a WaitForTask
running a task which calls WaitForTask
on the parent. This can be avoided by using task priorities (in both the tasks and also in the WaitForTask
call to prevent running lower priority tasks) or dependencies (dependencies are much safer and also better performance in general). You should normally be able to see this in a debugger by checking the thread callstacks.
Using std::memory_order_seq_cst
might have fixed this by changing the performance in such a way that the deadlock doesn't happen. However I'll also go over the m_WaitingForTaskCount
memory order semantics and check whether acquire/release isn't sufficient here.
from enkits.
Thanks - it does seem from the above that there isn't a deadlock in your code.
from enkits.
Many thanks for running this test!
I am a little puzzled as memory_order_seq_cst
is only sequentially consistent with other memory_order_seq_cst
operations on other threads - yet in this case there no other such operations on other threads except the threadState variable, which is unused by the TaskCompletion thread. So in this case memory_order_seq_cst
and memory_order_acquire
should be the same (since the operation is a load).
So I think memory_order_seq_cst
fixing the problem likely as a side effect of the code the compiler creates for this.
I've pushed one further change which adds changes the fetch_sub
on m_RunningCount
read-modify-write operations to use memory_order_acq_rel
. This should ensure that when m_WaitingForTaskCount
is read by the thread completing the task, m_RunningCount
has been decremented and is visible in the WaitForTaskCompletion
thread in the correct order.
from enkits.
Thanks for this issue report. I expect this is likely due to an ARM specific issue - getting memory order correct is harder on this platform.
The acquire / release semantics should be all we need here, but I will investigate.
I wouldn't change DISPATCH_TIME_FOREVER
to DISPATCH_TIME_NOW
as this won't wait on the semaphore, resulting in the idle CPU threads spinning which wastes power and could lower clock rates.
from enkits.
Hi @dougbinks!
No problem at all and thank you for the speedy reply! 😅 Much appreciated 🙂
I see interesting! Thanks for letting me know, I'd be interested to hear what you find.
Sure thing, that was just a dumb experiment to see what would happen, I have reverted this now 😝
I have a small example close to what I was doing in the main app I can share if it would be useful (though I, unfortunately, couldn't reproduce the problem with this).
Thanks again!
from enkits.
Sure thing, please see attached files below.
This is the example I was working on (it's doing some extra stuff you can probably ignore in the reduction step but it hopefully might be useful).
Just rename the main.cpp.txt
file to main.cpp
(GitHub doesn't like the .cpp
extension) and to build and run you can do cmake -B build
and then cmake --build build
.
If you're using a single-config generator add -DCMAKE_BUILD_TYPE=Release
etc... in the first step or if you're using a multi-config generator pass --config Release
(or some other build type like RelWithDebInfo
) to the second command.
Just for some added context this code is based on a filtering algorithm for points in an area. Basically I have a bunch of control points and only some of them are visible (some can be hidden) but I don't want to actually remove them. The ones I draw are the 'filtered' points. I have a mapping from control_points_to_filtered_points
and filtered_points_to_control points
. The algorithm I was speeding up was the intersection test (which in the snippet here is just control_point_index % 2 == 0
). Locally I'd tried playing around adding different sleep amounts (e.g. std::this_thread::sleep_for(...);
to simulate the time it really took.
It's from this little application - https://pollardinefarm.co.uk/farming/a-wood-pasture-tool/
Thanks!
from enkits.
I was able to go back to the earlier commit and the latest version of enkiTS
without my change and I still can get the repro to hit, the call stack looks like this:
from enkits.
The other worker threads look like this:
Let me know if there's any more info I could share that might be useful. Thanks!
from enkits.
I have created an alternative potential fix for the issue on branch dev_issue_91:
https://github.com/dougbinks/enkiTS/tree/dev_issue_91
Since TaskComplete
is called after every task finishes I would prefer not to add a potentially more expensive atomic operation there. WaitForTaskCompletion
however is only called when there is a thread with no work to run waiting on another task to complete.
So I've changed the read-modify-write operations to in WaitForTaskCompletion
to use memory_order_acq_rel
. I don't have an ARM MacOS system to test on though. If you were able to check if this solves the problem that would be very helpful.
from enkits.
Hi @dougbinks, thanks for taking a look at this.
I just did a quick test and my repro unfortunately still occurs 😖
from enkits.
See the attached video for what happens
enkits.mp4
from enkits.
Thanks for the detailed description! Super interesting, I'll aim to test this later this evening and will get back to you as soon as I can 👍
from enkits.
@dougbinks that worked! 😄 🥳 I can confirm the lock-up no longer happens with commit d1752d0.
Thank you very much for looking into this and finding a fix, much appreciated!
from enkits.
Closing as fixed with d1752d0
Please re-open if you encounter this issue again.
from enkits.
Closing as fixed with d1752d0
Please re-open if you encounter this issue again.
Awesome! 🥳
Absolutely 👍 Thank you!
from enkits.
Related Issues (20)
- Macro redefinition for WIN32_LEAN_AND_MEAN HOT 1
- Conversion warnings HOT 1
- Wrong asserts used HOT 1
- Suggestions for propagating user data into tasks? Client+Server in same exe HOT 3
- Rejecting a task and adding it back to the queue? HOT 8
- Add soname to shared library HOT 1
- Put header files in a subdirectory HOT 3
- Put header files in a subdirectory in the main repository HOT 2
- Where is the proper position to add PreRun and PostRun? HOT 2
- Race Condition? HOT 3
- The threads responsible for executing the tasks HOT 2
- CMake version HOT 2
- Address Sanitizer finding HOT 1
- Deadlock with WaitForNewPinnedTasks and WaitforAllAndShutdown HOT 4
- Feature Request: Shutdown HOT 7
- Make gtl_threadNum invalid by default. HOT 9
- Run all tasks of given priority (or higher) HOT 3
- More platforms supported in enki::GetNumHardwareThreads HOT 5
- Stuttering on Intel hybrid CPUs HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from enkits.