Comments (14)
Windows GetQueuedCompletionStatus() which used to poll completion events and pause supports only milliseconds resolution, so how you propose to use microseconds and nanoseconds duration in such case?
I understand problem with monotonic clock, but currently library uses most performant primitives which are 5x times faster then its monotonic alternatives.
from nim-chronos.
The point is to allow the user to specify the time in an unambiguous way - the precision you get is whatever the platform offers you, at that point. by fixing ad2, we can also offer the same usability advantage in downstream libraries.
When you use these functions, you necessarily have to take into account that these are estimates. For example, if you just blindly repeat a call with a 100ms delay, you'll have less than 10 calls per second regardless of the precision the underlying clock - for example because other events are running at the same time.
I'm curious about that 5x claim also, and how much of the total time of calling for example GetQueuedCompletionStatus
the extra 4x make up.
from nim-chronos.
If you check source code of poll()
function https://github.com/status-im/nim-asyncdispatch2/blob/master/asyncdispatch2/asyncloop.nim#L316, you will find that call to fastEpochTime()
performed 2 times.
First call is used to handle already expired callbacks and calculate timeout for system queue waiter, second call is used also to process expired callbacks after system queue waiter.
Second call is also required because you don't know how much time you spend in system queue waiter (it can be equal to timeout value, but if timeout was not set, it will wait infinite time for FDs events).
from nim-chronos.
I understand that the function is called - I'm asking two things:
- compared to other calls in the loop, what % of time is taken by the clock call? for example, if the clock call takes 0.01% of the total loop time and 99.99% is spent in
GetQueuedCompletionStatus
, it really doesn't matter if one clock takes 5x the other - for the above question, also take into account that this is mostly relevant when there is actually work to do, so you would need to measure when there is work to do.. otherwise, if the loop is idling, again, it doesn't matter
- where did you get the 5x performance from? on linux,
CLOCK_REALTIME
andCLOCK_MONOTONIC
are near identical, measured in cycles
once you have the answers to those questions, you have to weigh that against the massive improvement in ergonomics:
- you cannot make a mistake with the unit of time
- on platforms with finer grained clocks you get access to shorter loop times if you want to (no artificial timing barrier)
- applications that actually make use of the std lib are not penalized by ugly casts - we're having this problem downstream in
nim-eth-p2p
andnim-beacon-chain
where we need to be careful with clocks or the system breaks down. imagine the embarrassment and massive cost of nimbus freezing on a leap second because we wanted to save a few cycles in an idle loop.
#include <stdint.h>
#include <time.h>
uint64_t rdtsc(){
unsigned int lo,hi;
__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
return ((uint64_t)hi << 32) | lo;
}
#include <stdio.h>
const int iters = 1024*1024;
int main(int a, char** b) {
while(1) {
uint64_t start, mid, end;
uint64_t t = 0;
struct timespec tt;
start = rdtsc();
for (int i = 0; i < iters; ++i) {
t+=clock_gettime(CLOCK_REALTIME, &tt);
}
mid = rdtsc();
for (int i = 0; i < iters; ++i) {
t += clock_gettime(CLOCK_MONOTONIC, &tt);
}
end = rdtsc();
printf("%ld %ld\n", (mid-start)/iters, (end-mid)/iters);
}
}
on my crappy laptop, I see more variation due to thermal throttling than between the two calls..
from nim-chronos.
@arnetheduck first of all i'm making asyncdispatch not for Linux only, so if Linux has advantage on some behavior it doesn't matter for library, because it must produce equal behavior on all OSes.
compared to other calls in the loop, what % of time is taken by the clock call? for example, if the clock call takes 0.01% of the total loop time and 99.99% is spent in GetQueuedCompletionStatus, it really doesn't matter if one clock takes 5x the other
This is my old benchmarks but data still can be used to understand timer's impact on performance.
from nim-chronos.
The benefit is in the hardware, not the operating system, and the point is that you need to use the correct clock.
On windows, that's QueryPerformanceCounters
for monotonic and fast relative time - windows clocks are a bit of a mystery because they tend to store hardware time in local time and use 100ns increments generally in their time structures. No idea what kind of conversions happen in the background when you call that filetime function (which is UTC) - but notably, your fastEpochTime
and epochTime
in nim call the same underlying windows function - clearly the performance difference on windows is not in the clock itself.
However, the benchmarks I see in that post for timers look mostly irrelevant - they measure timing without optimizations (-d:release
). The core difference on windows between your timer and the upstream timer is the float conversion - that conversion is unnecessary any way you look at it, and you'll notice a Duration
doesn't use it either - and it's the kind of thing made worse by the lack of optimizations.
The second thing is that the incoming value in the API can be converted to whatever underlying clock you want - in fact, it should be. The key is to have a clear, powerful and unambiguous ABI followed up by an efficient implementation that makes the best use of the hardware available. Using milliseconds is strange (not an SI unit for time) and suboptimal any way you look at it.
from nim-chronos.
The actual benefit is not in hardware, i can't use rdtsc
because it is not cross-platform. It depends on how OS handles time, is it switches to kernel mode to get counters or not, is it uses some heavy calculations or not.
from nim-chronos.
I'm not saying we should use RDTSC
- the proposal is to use Duration
in the API (for developer friendliness) and a monotonic clock in the implementation (for correctness).. clock_gettime(CLOCK_MONOTONIC, &tt)
or QueryPerformanceCounters
- there's no perf difference between these and whatever you're using - probably, there's an improvement on windows.
I'm also claiming that the we could use just a normal slow clock and probably wouldn't notice the difference, because the benchmark is kind of irrelevant - when there's no load, it doesn't really matter how many loops per second you can do because you're sleeping most of the time and when when there is load, the timing part is usually dwarfed by actual work (ie a beacon node packet arriving, in our case).. thus I'd focus on correctness and the use of good data structures (I see for example we're using a heap for the timers which seems much more sane than a seq or a linked list - in general) before worrying too much about micro-benchmarks
from nim-chronos.
Now i will show my measurements, to benchmark i have used this code https://gist.github.com/cheatfate/d184c9f11fd49e9d0c75d166bd2d2b05.
So Linux benchmark is (compiled by nim c -d:release
:
100_000_000 calls to fastEpochTimeMono() takes 1545765406ns
100_000_000 calls to fastEpochTime() takes 1537696307ns
100_000_000 calls to fastEpochTimeRaw() takes 23333947291ns
100_000_000 calls to fastEpochTimeMono() takes 1538502155ns
100_000_000 calls to fastEpochTime() takes 1537760239ns
100_000_000 calls to fastEpochTimeRaw() takes 23364704565ns
100_000_000 calls to fastEpochTimeMono() takes 1537733110ns
100_000_000 calls to fastEpochTime() takes 1538136763ns
100_000_000 calls to fastEpochTimeRaw() takes 23364710793ns
100_000_000 calls to fastEpochTimeMono() takes 1540042299ns
100_000_000 calls to fastEpochTime() takes 1538359638ns
100_000_000 calls to fastEpochTimeRaw() takes 23374689710ns
100_000_000 calls to fastEpochTimeMono() takes 1539592896ns
100_000_000 calls to fastEpochTime() takes 1538834740ns
100_000_000 calls to fastEpochTimeRaw() takes 23360426822ns
100_000_000 calls to fastEpochTimeMono() takes 1538338091ns
100_000_000 calls to fastEpochTime() takes 1537669102ns
100_000_000 calls to fastEpochTimeRaw() takes 23369929946ns
100_000_000 calls to fastEpochTimeMono() takes 1538175000ns
100_000_000 calls to fastEpochTime() takes 1538563607ns
100_000_000 calls to fastEpochTimeRaw() takes 23368758867ns
100_000_000 calls to fastEpochTimeMono() takes 1540537125ns
100_000_000 calls to fastEpochTime() takes 1546923064ns
100_000_000 calls to fastEpochTimeRaw() takes 23364539543ns
100_000_000 calls to fastEpochTimeMono() takes 1538087034ns
100_000_000 calls to fastEpochTime() takes 1537477042ns
100_000_000 calls to fastEpochTimeRaw() takes 23364063367ns
100_000_000 calls to fastEpochTimeMono() takes 1539868275ns
100_000_000 calls to fastEpochTime() takes 1539015094ns
100_000_000 calls to fastEpochTimeRaw() takes 23364882319ns
Windows benchmark is:
100_000_000 calls to fastEpochTimeMono() takes 1799527000ns
100_000_000 calls to fastEpochTime() takes 346997100ns
100_000_000 calls to fastEpochTimeMono() takes 1766980600ns
100_000_000 calls to fastEpochTime() takes 344437400ns
100_000_000 calls to fastEpochTimeMono() takes 1791302400ns
100_000_000 calls to fastEpochTime() takes 342270300ns
100_000_000 calls to fastEpochTimeMono() takes 1784696100ns
100_000_000 calls to fastEpochTime() takes 337763600ns
100_000_000 calls to fastEpochTimeMono() takes 1805359700ns
100_000_000 calls to fastEpochTime() takes 337840100ns
100_000_000 calls to fastEpochTimeMono() takes 1755878000ns
100_000_000 calls to fastEpochTime() takes 343742100ns
100_000_000 calls to fastEpochTimeMono() takes 1756078800ns
100_000_000 calls to fastEpochTime() takes 341183100ns
100_000_000 calls to fastEpochTimeMono() takes 1763183100ns
100_000_000 calls to fastEpochTime() takes 336385800ns
100_000_000 calls to fastEpochTimeMono() takes 1773583000ns
100_000_000 calls to fastEpochTime() takes 336432200ns
100_000_000 calls to fastEpochTimeMono() takes 1769893700ns
100_000_000 calls to fastEpochTime() takes 333117400ns
from nim-chronos.
As you can see on Windows using fastEpochTimeMono()
is 500% slower then fastEpochTime()
.
from nim-chronos.
Now about Linux results, this is quote from man clock_gettime
.
CLOCK_MONOTONIC
Clock that cannot be set and represents monotonic time since some unspecified starting point. This clock is not affected by discontinuous jumps in the system time (e.g., if the system administrator manually changes
the clock), but is affected by the incremental adjustments performed by adjtime(3) and NTP.
CLOCK_MONOTONIC_RAW (since Linux 2.6.28; Linux-specific)
Similar to CLOCK_MONOTONIC, but provides access to a raw hardware-based time that is not subject to NTP adjustments or the incremental adjustments performed by adjtime(3).
As you can see CLOCK_MONOTONIC
is also affected to adjustments which can be made by ntpd
daemon silently. While CLOCK_MONOTONIC_RAW
is not affected by all this changes, but its speed for some reason is 1500+% slower.
from nim-chronos.
So usage of MONOTONIC
clock you are proposing is also not totally correct.
from nim-chronos.
so from adjtime:
NOTES
The adjustment that adjtime() makes to the clock is carried out in such a manner that the clock is always monotonically increasing. Using adjtime() to adjust the time prevents the problems that can be caused for certain applications (e.g., make(1)) by
abrupt positive or negative jumps in the system time.
the two work together to do (mostly) the right thing. specifically, they don't jump back in time, and don't cause large disruptions in timing, beyond what normally would happen anyway with non-realtime OS's.
for windows, you can try GetTickCount64
, or you can try moving the frequency multiplication elsewhere.
all that said, this is not an argument about performance, primarily. you will be calling the clock function only when it doesn't matter, relatively speaking. it's possible to construct benchmarks that focus on the timer, but these will be far removed from reality.
from nim-chronos.
Fixed in #24
from nim-chronos.
Related Issues (20)
- compiler error on simple async proc HOT 1
- asyncloop `isCounterLeaked` returns incorrect value?
- chronos is not available in Nim Playground? HOT 2
- httpclient: The content of HostHeader is incorrect when redirecting to a different host HOT 1
- Write to `Future.internalValue` directly in macro-translated code
- `getAcceptInfo` can raise an unlisted exception: ref ValueError HOT 2
- Apparent garbage collect of localCopy in datagram sendTo before data gets send in writeDatagramLoop HOT 2
- Cycle-based memory Leak under --mm:arc HOT 1
- chronos/threadsync errors when compiling documentation
- bug: TokenBucket update calculates wrong with lastUpdate timestamp HOT 1
- Reading a file asynchronously? HOT 1
- Async proc: both Option[seq[T]] and Opt[seq[T]] as return type and one of the params is generic, failed to compile with Nim-devel HOT 1
- Completing `Futures` in refc may have unexpected behavior HOT 1
- `cancelAndWait` on `or` triggers deadlock
- `SIGSEGV` in `asyncTest`-derived scenario (also, in `asyncTest`) HOT 5
- Can't compile on GCC 14
- ambiguous identifier: `'Result'` HOT 2
- Change header size limits HOT 2
- Its impossible to specify Future[T].Raising([]) as argument.
- What is the intended method for catching exceptions raised from asyncCheck'd Futures? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nim-chronos.