Timer resolution part 2
In a previous post I looked at the lowest practical resolution using the rdtsc instruction call.
What about more common or garden ways like clock_gettime() or even the humble gettimeofday()?
We can use the rdtsc method we developed earlier to measure the timing system calls. In this post I'll use the same test harness to measure the practical resolution of timing calls.
We're mainly interested in 4 calls, three of which are variants of clock_gettime();
- clock_gettime(CLOCK_MONOTONIC, struct timespec *t)
- clock_gettime(CLOCK_REALTIME, struct timespec *t)
- clock_gettime(CLOCK_PROCESS_CPUTIME_ID, struct timespec *t)
- gettimeofday(struct timeval *t);
clock_gettime uses a struct timespec
struct timespec {time_t tv_sec; /* seconds */
long tv_nsec; /* nanoseconds */
};
.
Which looks good. Nanoseconds !
gettimeofday uses a struct timeval
struct timeval {time_t tv_sec; /* seconds */
suseconds_t tv_usec; /* microseconds */
};
.
Using clock_getres we can, for the clock_gettime clocks get an idea of the "official" resolution or precision.
struct timespec res;
clock_getres(CLOCK_REALTIME,&res);
printf("CLOCK_REALTIME resolution = %lu s %lu ns\n",res.tv_sec,res.tv_nsec);
which gives us this;
CLOCK_REALTIME resolution = 0 s 1 ns
CLOCK_MONOTONIC resolution = 0 s 1 ns
CLOCK_PROCESS_CPUTIME_ID resolution = 0 s 1 ns
One nanosecond resolution (precision). Fantastic! Just what we need.
To be fair to the man page, it does say;
The function clock_getres() finds the resolution (precision) of the
specified clock clk_id, and, if res is non-NULL, stores it in the
struct timespec pointed to by res. The resolution of clocks depends on
the implementation and cannot be configured by a particular process.
We'll run 100 calls in a tight loop, using the rdtsc method in the previous post to tell us how many picoseconds have elapsed per call. We'll use the timing results from the calls themselves to gove us an idea of what the internal mean and standard deviation are.
Clock | Measured Mean | Measured 1α | Measured 3σ | Picos per call (rdtsc) |
clock_gettime CLOCK_MONOTONIC | 464 ns | 34 ns | 104 ns | 494690 |
clock_gettime CLOCK_REALTIME | 467 ns | 35 ns | 105 ns | 466940 |
clock_gettime CLOCK_PROCESS_CPUTIME_ID | 244 ns | 31 ns | 94 ns | 256040 |
gettimeofday | 0 μs | 0.7 μs | 2 μs | 490250 |
.
All times measured on a 2.7GHz celeron E3500, which has the following rdtsc resolution
RDTSC Mean = 35 ticks, 13219 picos, 1 sigma = 2392.573145, 3 sigma = 7177.719434Stability wise, subsequent runs tend to differ by a couple of nanoseconds, as you'd expect from the rdtsc numbers. All runs are run as root, at a high priority and with cpu affinity set (using taskset -c 1 chrt -f 99)
So, the practical limit is 250ns on this machine, using the clock_gettime clockid that comes with the same warning as the rdtsc method - again from the clock_gettime man page;
The CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clocks are
realized on many platforms using timers from the CPUs (TSC on i386,
AR.ITC on Itanium). These registers may differ between CPUs and as a
consequence these clocks may return bogus results if a process is
migrated to another CPU.
(Another reason for running under tasket). The more bullet proof variants give us a practical resolution of 500ns or 1/2 a microsecond. And thats with vsyscall on. Also of interest is that consistently CLOCK_REALTIME seems to take about 30ns less to run in reality than CLOCK_MONOTONIC.
Without vsyscall64 on (i.e. without using the the vdso library described in a previous post)
Clock | Measured Mean | Measured 1α |
Measured 3σ |
Picos per call (rdtsc) |
clock_gettime CLOCK_MONOTONIC | 561 ns | 14 ns |
41 ns | 595330 |
clock_gettime CLOCK_REALTIME | 560 ns |
12 ns |
36 ns |
563880 |
clock_gettime CLOCK_PROCESS_CPUTIME_ID | 240 ns |
27 ns |
82 ns |
249380 |
gettimeofday | 0 μs | 0.7 μs | 2 μs | 580530 |
.
Which adds suprisingly little - 100ns over the system call or so. Again, CLOCK_REALTIME is about 30ns consistently faster than CLOCK_MONOTONIC according to the time stamp counters.
So the moral of the story is, just because clock_getres says the precision is 1ns, thats not necessarily so. In practise on this system, its more like 500ns.
(and yes, I do need to spend a few nanoseconds myself on fixing the css for tables ...)