Comments (9)
Okay - that is going to be a good example for a talk... moving the notify_one()
call under the lock fixes the issue. This is normally a pessimization (but with well defined behavior) because the code calling wait()
will wake but immediately block re-acquiring the mutex. However, calling notify_one() from under the lock ensures the function can't exit (or destruct the condition variable) before notify_one() is called... I'll get a PR together in the morning... Wow... I've never hit that issue. Going to have to review my use of condition variables.
from libraries.
So far this is not a bug in the library, but a problem of the code. In the way as you call blocking get on all the futures, you can come into a dead lock situation on the thread pool. Sean Parent goes in detail into this problem in his talk: https://www.youtube.com/watch?v=QIHy8pXbneI at timestamp 1:18:00
The general recommendation is: Use never blocking_get! Use continuations wherever possible. I use blocking_get() only in test applications to ensure, that a single, final calculation is done and the program does not end, before that value is calculated.
I recommend to change the code into the following by using the joining operation of when_all().
#include
#include
#include <stlab/concurrency/concurrency.hpp>
using namespace std;
using namespace stlab;
int main() {
std::vector<future> allFutures;
for(int i = 0; i < 32; ++i)
allFutures.push_back(async(default_executor, [](int i) {
for(int j = 0; j < 16'000'000; ++j)
i *= (2*j+1);
std::cout << "Got " << i << std::endl;
return i;
}, i));
auto end = when_all(default_executor, [](auto&& results) {
for (const auto& r: results)
cout << "R: " << r << "\n";
}, std::make_pair(allFutures.begin(), allFutures.end()));
blocking_get(end);
return 0;
}
from libraries.
I disagree.
- Mac OS X is not single threaded (the video, at 1:18 talks about being able to bring it down to 1 thread).
- Even in a single threaded system, each 'blocking_get' should be calling the task's lambda and there should not be a deadlock.
- The typical output when it gets stuck has all the "Got" 32 times. So all the tasks have executed and finished. For a deadlock, you need that two parts wait for the other to take action. Yet the task on whose future the blocking_get does not return has finished. The task contains no lock whatsoever.
I had tested the when_all before doing this more simple test. I know it works, but I wanted to do different tests. I still think this is a bug.
from libraries.
I believe you are correct @FunMiles that this example is a bug. I'll try and investigate further this evening.
However, we currently do not do task-promotion (immediate execution of the task on a get()). Task promotion can only be done for tasks that are scheduled on a non-serialized executor, and have no other tasks upon which they depend, unless all of those tasks can be promoted (for example, a task associated with a continuation has to also promote the task it is continuing). This is one reason why we don't have a generalized .get() but instead provide a blocking_get(). In order to use a blocking get (or wait) safely you have to know about the tasks that provide data to the future and we are trying to avoid systems that require non-local reasoning. We may add some form of promotion in the future, but only if we can do so without cost or through a more explicit API (make_promotable...).
from libraries.
Status update -
This bug is very reproducible. Nothing shows up running thread sanitizer (except a race using cout
but not calling cout
from the thread fixes that without changing the hang).
We hang waiting on the condition variable inside blocking_get()
except the recover continuation has already fired and set == true
. Which makes zero sense. I stepped through the assembly for the code and it looks like the set = true
in the lambda is correctly executing under the lock, there is appropriate memory barriers, and the notify_one() executes. The other side for the wait() looks equally legit. However, by putting in an atomic counters I have verified that the hang occurs when the loop is entered simultaneously with the completion executing - which would indicate a race but I'm not seeing where.
I've tried a few shots in the dark, setting set
to true through a pointer instead of a reference, making set
an atomic (in case it was a bug with the memory barrier logic in the compiler) but it doesn't change the behavior. Still looking but at the moment I'm stumped...
from libraries.
Another odd data point - pausing in the debugger when it is stuck, and then continuing, will cause it to complete successfully (might cause a "spurious" wake on the condition variable - but still odd).
from libraries.
Ahh... I believe I see the problem.
After the lambda sets the flag to true, the function may exit before the condition variable is notified. In which case we notify a dead condition variable. However, because we are looping I think we end up notifying the condition variable that lands on the stack the next time around and then we are in a bad state... Looking into fixes.
from libraries.
Great. Looking forward to pulling the fix.
@sean-parent, this makes your 'no raw synchronization primitives' all the more relevant. I'm looking forward to your new talk.
Michel Lesoinne
from libraries.
Fixed in 1.2.0
from libraries.
Related Issues (20)
- Create GitHub PR Style Guide
- CMake format enforcement
- Create unit tests for main executors
- Identify and fix any future reduction w/ noncopyable types issues.
- Version in CMakeLists.txt is not correct
- Some files are missing in ./stlab/CMakeLists.txt HOT 1
- Build Error for Portable Executor for c++14 HOT 1
- Executor of packaged_task not used in reduction HOT 7
- Extra Copy of Object in Future Somewhere
- wasm main executor object not defined
- Upgrade to 1.7.1 now requires Qt and Boost HOT 20
- Is documentation generation broken? HOT 12
- No build output HOT 2
- Generated config.hpp Disables Ability to Use Same Source For Multiple Platforms HOT 1
- main_executor_type lost in main_executor.hpp for EMSCRIPTEN HOT 1
- Problem with main_executor for Qt on application exit HOT 3
- WASM main_executor improvements
- Broken compatibility with Qt5 HOT 1
- Handling more then one priority_task_system instance HOT 6
- stlab documentation, best practice and performance HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libraries.