Comments (7)
it seems i was wrong (but please read on, it gets worse). looking at the latest C17 draft [1], it needs to be volatile sig_atomic_t
, and purely because of the signal handling:
"When the processing of the abstract machine is interrupted by receipt of a signal, the values of
objects that are neither lock-free atomic objects nor of type volatile sig_atomic_t are unspecified,
as is the state of the floating-point environment. The value of any object modified by the handler
that is neither a lock-free atomic object nor of type volatile sig_atomic_t becomes indeterminate
when the handler exits, as does the state of the floating-point environment if it is modified by the
handler and not restored to its original state."
[1] http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf
the above also means that the signal handlers are theoretically broken, because they dereference objects that are not of type volatile sig_atomic_t. furthermore, i think a lot of the stuff that is done in the signal handler is theoretically not generally safe to do there:
"If the signal occurs other than as the result of calling the abort or raise function, the behavior is
undefined if [...] the signal handler calls any function in the standard library other than [abort, _Exit, quick_exit, fns in stdatomic.h, atomic_is_lock_free, signal]
SUMMARY
- having a volatile that is very frequently queried from all the threads introduces a memory bottleneck / scalability issues. rt-app should be as scalable as possible and the bottlenecks should be created intentionally.
- installing a signal handler and terminating via it requires the flag to be of type volatile sig_atomic_t
- the current signal handler is invalid according to the standard
PROPOSAL
how about using a signalfd to poll for the termination signals, then call the shutdown fn from there?
instead of blindly pthread_joining() the threads, have a flag per thread that is similarly (seldomly) polled, or, seems a nicer design, could have a pipe to read from, then polling can be done on a fd set including:
- the signalfd
- the individual thread fds
- the timer fd to sleep for opts.duration (or just poll the above 2 every and sleep for )
this would solve all issues above.
there should be no need for a write barrier after continue_running
is modified ("There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call.", so, since there are non-inlineable functions we have sequence points in the right places), but superfluous barriers wouldn't hurt.
would you like to have a PR for this?
from rt-app.
my pLoad is about a third of the previous value
While the description of the problem and the proposal are certainly interesting, I'm having troubles understanding how continue_running affects p_load calculation, as calibration phase happens before threads are created. Could you explain?
Also, do you have some sort of benchmark or example json that highlights the problem (and might show benefit of the proposed solution)?
from rt-app.
-
the bottleneck i was assuming to exist is either not there or in any case i couldn't measure it. i think i misread something, got confused and assumed
continue_running
would be read every time the work loop was executed which is obviously not the case. see below (4.) for some data corroborating that the scalability is ok. -
as you wrote, continue_running shouldn't have anything to do with the calibration and i was curious how the different value came about. i couldn't reproduce it anymore but found the executable. it is curiously statically linked against libjson-c and dynamically against everything else, and looking at the disassembly, the actual workload is done differently: there are only 4 calls to ldexp in the faster executable (vs the expected 12 in the others). the faster version doesn't seem to do any explicit conversion of the return values from ldexp, whereas the regular one is peppered with cvttsd2si instruction. i'm assuming i used different compiler options accidentally (some optimisation seems to have been applied), which then lead to the confusion. my apologies.
-
the signal handler issue is probably largely theoretical, but it seems to be invoking undefined behaviour as outlined above.
-
some data:
- using
doc/examples/template.json
but setting"run" : 50000
, to make it a little bit busier. - number of threads: 1, 2, 4, 8, 16
- machine is an 8 core hyperthreaded xeon skylake
- first a few 8...10 calibration runs (also gathering single-threaded data)
- then the remaining ones with the lowest result from the calibration run
- there is likely quite a bit of noise in the data (machine was under very light load)
looking at the actual run time (column 3 in the log) that the phase took - i hope that is a good metric (is it?), you can see that it scales very well from 1...8 threads, after which the run time increases (hyperthreaded 16 cores, so, i'm inferring that there might be one FP unit per core)
type cores pLoad t_run [us]
[ns] Min. 1st Qu. Median Mean 3rd Qu. Max.
dynamic without volatile 1 45 50780 70259 71399 70775 72396 75017
dynamic without volatile 2 54283 70788 72001 70961 72929 78027
dynamic without volatile 4 52209 70662 72200 70698 73265 107469
dynamic without volatile 8 51925 63860 71114 68986 72860 123700
dynamic without volatile 16 77181 91548 94234 95206 95403 148676
dynamic with volatile 1 45 40892 69514 71290 68744 72438 75206
dynamic with volatile 2 51594 70687 71942 70712 72875 104465
dynamic with volatile 4 52436 71180 72412 71391 73398 103340
dynamic with volatile 8 52109 63560 70558 68612 72766 122087
dynamic with volatile 16 64546 90148 93774 92515 94868 129300
static without volatile 1 36 32902 59491 71079 66288 72362 75172
static without volatile 2 52790 69560 70800 69035 71869 92948
static without volatile 4 51630 70468 71649 70353 72690 108864
static without volatile 8 51139 60098 70367 67226 72285 130469
static without volatile 16 98007 103075 103100 103555 103157 136111
static with volatile 1 36 35767 52928 70204 64570 71867 94839
static with volatile 2 53156 69918 71081 69723 72062 74155
static with volatile 4 54037 70516 71640 70382 72728 74785
static with volatile 8 51025 58962 69852 66360 71930 124560
static with volatile 16 91108 103074 103098 103568 103158 174910
mostly dynamic fast 1 13 57059 70934 73026 72532 74410 76624
mostly dynamic fast 2 56752 73190 74501 73115 75481 80892
mostly dynamic fast 4 55133 73632 75126 73782 76191 101513
mostly dynamic fast 8 54644 66822 74342 71988 75873 116326
mostly dynamic fast 16 76146 84872 87182 87325 89265 113314
as in, it seems to take a bit longer for the under-utilised case (could it be that the loop is so short that the FP pipeline is not kept full?) but scales quite a bit better.
the distributions are almost somewhat gaussian, some a little bi-modal. i blame the noise. not sure whether that is interesting but could upload the plots.
again, apologies for the hassle, i thought i actually found something interesting, but i think the data tell us that there isn't a scalability issue.
and the signal handler issue seems mostly theoretical, so, shall we close this issue?
from rt-app.
Regarding the signal handler issue, I agree with the theory but i think we are safe in our case because we are not doing any read-modify-write but only read xor write
from rt-app.
you mean wrt to continue_running? yes, safe currently, because of the type. wrt the other variables? i read it the way that the contents of opt
, threads
, ft_data
and implicit accesses to log_level
would be unspecified, so everything that is done with them is potentially problematic.
and i read the second paragraph the way that the various calls to the standard library make the app's shutdown behaviour (hopefully not the whole app's behaviour) undefined.
not sure it's worth rewriting the shutdown behaviour just because of this, though.
from rt-app.
not sure it's worth rewriting the shutdown behaviour just because of this, though
Mmm, right. I'm tempted to say that, since afterall we are shutting down everything, even if we stumble upon some undefined behavior, it shouldn't really matter.
Thanks a lot for your experiments above!!!
from rt-app.
closing it, then, iiuc, the worst that can happen is that something might break at shutdown and there don't seem to be any observations of this occuring.
from rt-app.
Related Issues (20)
- Python 3 compatibility HOT 2
- Create hackbench workload with rt-app
- Print in the log file the core used HOT 1
- broken examples HOT 2
- impossible to reset uclamp
- Change in uclamp fails if not accompanied by policy
- uclamp and taskgroup mix issue
- Asymmetrical treatment of "priority" and "policy"
- error with ftrace examples HOT 1
- This isn't an issue rather a question HOT 2
- the current version of json-c doesn't define 'TRUE' in json_object.h any longer. HOT 9
- sequence of 'phases' object in rt-app config is undefined - should use an array instead? HOT 5
- Looks like this question is still valid and unanswered. :-)
- Merge GRUB reclaiming
- RTapp fails to run with the right runtime when given a float
- mlockall always applied by default without graceful failure
- rt-app ignores invalid task names
- rtapp execution gives unreliable actual duty cycle HOT 21
- generated gnuplot plot file is wrong
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rt-app.