mengrao / tscns Goto Github PK

View Code? Open in Web Editor NEW

274.0 12.0 69.0 54 KB

A low overhead nanosecond clock based on x86 TSC

License: MIT License

Shell 0.72% C++ 99.28%

timestamp real-time linux windows x86

tscns's Introduction

TSCNS 2.0

What's the problem with clock_gettime/gettimeofday/std::chrono::XXX_clock?

Although current Linux systems are using VDSO to implement clock_gettime/gettimeofday/std::chrono::XXX_clock, they still have a nonnegligible overhead with latency from 20 to 100 ns. The problem is even worse on Windows as the latency is more unstable and could be as high as 1 us, also on Windows, the high resolution clock is at only 100 ns precison.

These problems are not good for time-critical tasks where high precison timestamp is required and latency of getting timestamp itself should be minimized.

How is TSCNS different?

TSCNS uses rdtsc instruction and simple arithmatic operations to implement a thread-safe clock with 1 ns precision, and is much faster and stable in terms of latency in less than 10 ns, comprising latency of rdtsc(4 ~ 7 ns depending on platforms) plus calculations in less than 1 ns.

Also it can be closely synchronized with the system clock, which makes it a good alternative of standard system clocks. However, real-time synchronization requires the clock to be calibrated at a proper interval, but it's a easy and cheap job to do.

Usage

Initialization:

TSCNS tscns;

tscns.init();

Getting nanosecond timestamp in a single step:

int64_t ns = tscns.rdns();

Or just recording a tsc in some time-critical tasks and converting it to ns in jobs that can be delayed:

// in time-critical task
int64_t tsc = tscns.rdtsc();
...
// in logging task
int64_t ns = tscns.tsc2ns(tsc);

Calibration with some interval in the background:

while(running) {
  tscns.calibrate();
  std::this_thread::sleep_for(std::chrono::seconds(1));
}

More about calibration

Actually the init function has two optional parameters: void init(int64_t init_calibrate_ns, int64_t calibrate_interval_ns): the initial calibration wait time and afterwards calibration interval. The initial calibration is used to find a proper tsc frequency to start with, and it's blocking in tscns.init(), so the default wait time is set to a small value: 20 ms. User can choose to wait a longer time for a more precise initial calibration, e.g. 1 second.

calibrate_interval_ns sets the minimum calibration interval to keep tscns synced with system clock, the default value is 3 seconds. Also user need to call calibrate() function to trigger calibration in an interval no larger than calibrate_interval_ns. The calibrate() function is non-blocking and cheap to call but not thread-safe, so user should have only one thread calling it. The calibrations will adjust tsc frequency in the library to trace that of the system clock and keep timestamp divergence in a minimum level. During calibration, rdns() results in other threads is guaranteed to be continuous: there won't be jump in values and especially timestamp won't go backwards. Below picture shows how these routine calibrations suppress timestamp error caused by the initial coarse calibration and system clock speed correction. Also user can choose not to calibrate after initialization: just don't call the calibrate() function and tscns will always go with the initial tsc frequency.

Differences with TSCNS 1.0

TSCNS 2.0 supports routine calibrations in addition to only initial calibration in 1.0, so time drifting awaying from system clock can be radically eliminated. Also tsc_ghz can't be set by the user any more and the cheat method in 1.0 are also obsolete. In 2.0, tsc2ns() added a sequence lock to protect from parameters change caused by calibrations, the added performance cost is less than 0.5 ns.
Windows is supported now. We believe Windows applications will benefit much more from TSCNS because of the drawbacks of the system clock we mentioned at the beginning.

tscns's People

Contributors

Stargazers

Watchers

tscns's Issues

精度不高的一个简单解决方案

思路是利用线程缓存

int64_t steady_time_now()
{
    using namespace std::chrono;
    static thread_local int64_t tsc = 0;
    static thread_local int64_t now = 0;

    int64_t new_tsc = __builtin_ia32_rdtsc();
    if (new_tsc - tsc > 1000 * 1000)  // according to the precision
    {
        tsc = new_tsc;
        now = duration_cast<milliseconds>(steady_clock::now().time_since_epoch()).count();
    }

    return now;
}

Concurrency safe?

Hi ! I wanted to know if this library was safe to use in a concurrent environment. I have multiple applications running which send messages to each other, so I wanted to know if this library could be used to calculate time taken by each thread to finish a task after receiving some data from another thread. also would it possible to use tscns for calculating startup and shutdown time for the whole application (including all the threads) ?

rdns of

请教误差范围

在我的服务器上，无论是程序自己calibrate还是cheat，误差都会在一分钟之内增大到微秒级甚至毫秒级
cpu是双路志强8255C，是支持constant_tsc的

另外在家里的电脑8700K上测试结果也是一样，系统都是Ubuntu 20.04.1 LTS

这里的误差是 rdtsc -> clock_gettime -> rdtsc，然后两次rdtsc结果平均值和clock_gettime结果的差

想请教一下，README中提到的时钟速度跳变，在什么情况下会发生呢

关于sleep影响计时准确性的问题

注释掉 std::this_thread::sleep_for(std::chrono::seconds(1));
结果完全不一样，我的机器上（gcc 4.5.8, centos 7.4）rdsysns_latency基本在35ns左右，偶尔也会彪到100+
看上去调用频率影响了函数执行时间？

如果不定期calibrate()，int64_t转double做运算有数值溢出或精度损失风险

请教为什么从rdtscp改回了rdtsc

我没理解错的话rdtsc不是应该需要加一个lfence或者mfence吗

Use constructor instead of init function

含有编译时错误

https://github.com/MengRao/tscns/blob/580322f853f51e9133a8a1b56336fb31f70e78ad/tscns.h#LL146C23-L146C23

alignas(64) std::atomic<uint32_t> param_seq_ = 0;

这里的初始化语句试图调用被deleted的拷贝构造函数，使代码无法通过编译。

可考虑改为：

alignas(64) std::atomic<uint32_t> param_seq_{0};

do it like the kernel

What do you think of this to overcome NTP issues:

A process which opens a shared memory segment and recalibrates periodically using syncTime and writes the base_ns and base_tsc and ghz to the shared memory segment. A process can then use rdtsc and read the base_ns and base_tsc and GHZ from the shared memory to compute the timestamp. This in fact replicates the kernel VDSO setup but we can use a finer grained ghz value. It has the advantage that we are more closely tracking wall clock so we don't drift over time due to NTP slewing. My problem is the NTP drift as this tscns currently does it and I don't want to recalibrate in my fast path.
Any flaws in this approach?