Timer resolution part 2

In a previous post I looked at the lowest practical resolution using the rdtsc instruction call.

What about more common or garden ways like clock_gettime() or even the humble gettimeofday()?

We can use the rdtsc method we developed earlier to measure the timing system calls. In this post I'll use the same test harness to measure the practical resolution of timing calls.

We're mainly interested in 4 calls, three of which are variants of clock_gettime();

clock_gettime(CLOCK_MONOTONIC, struct timespec *t)
clock_gettime(CLOCK_REALTIME, struct timespec *t)
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, struct timespec *t)
gettimeofday(struct timeval *t);

clock_gettime uses a struct timespec

struct timespec { time_t tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ };

Which looks good. Nanoseconds !

gettimeofday uses a struct timeval

struct timeval { time_t tv_sec; /* seconds */ suseconds_t tv_usec; /* microseconds */ };

Using clock_getres we can, for the clock_gettime clocks get an idea of the "official" resolution or precision.

struct timespec res;     
clock_getres(CLOCK_REALTIME,&res);
printf("CLOCK_REALTIME resolution = %lu s %lu ns\n",res.tv_sec,res.tv_nsec);

which gives us this;

CLOCK_REALTIME resolution = 0 s 1 ns
CLOCK_MONOTONIC resolution = 0 s 1 ns
CLOCK_PROCESS_CPUTIME_ID resolution = 0 s 1 ns

One nanosecond resolution (precision). Fantastic! Just what we need.

To be fair to the man page, it does say;

The function clock_getres() finds the resolution (precision) of the
specified clock clk_id, and, if res is non-NULL, stores it in the
struct timespec pointed to by res. The resolution of clocks depends on
the implementation and cannot be configured by a particular process.

We'll run 100 calls in a tight loop, using the rdtsc method in the previous post to tell us how many picoseconds have elapsed per call. We'll use the timing results from the calls themselves to gove us an idea of what the internal mean and standard deviation are.

Clock	Measured Mean	Measured 1α	Measured 3σ	Picos per call (rdtsc)
clock_gettime CLOCK_MONOTONIC	464 ns	34 ns	104 ns	494690
clock_gettime CLOCK_REALTIME	467 ns	35 ns	105 ns	466940
clock_gettime CLOCK_PROCESS_CPUTIME_ID	244 ns	31 ns	94 ns	256040
gettimeofday	0 μs	0.7 μs	2 μs	490250

All times measured on a 2.7GHz celeron E3500, which has the following rdtsc resolution

RDTSC Mean = 35 ticks, 13219 picos, 1 sigma = 2392.573145, 3 sigma = 7177.719434

Stability wise, subsequent runs tend to differ by a couple of nanoseconds, as you'd expect from the rdtsc numbers. All runs are run as root, at a high priority and with cpu affinity set (using taskset -c 1 chrt -f 99)

So, the practical limit is 250ns on this machine, using the clock_gettime clockid that comes with the same warning as the rdtsc method - again from the clock_gettime man page;

The CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clocks are
realized on many platforms using timers from the CPUs (TSC on i386,
AR.ITC on Itanium). These registers may differ between CPUs and as a
consequence these clocks may return bogus results if a process is
migrated to another CPU.

(Another reason for running under tasket). The more bullet proof variants give us a practical resolution of 500ns or 1/2 a microsecond. And thats with vsyscall on. Also of interest is that consistently CLOCK_REALTIME seems to take about 30ns less to run in reality than CLOCK_MONOTONIC.

Without vsyscall64 on (i.e. without using the the vdso library described in a previous post)

Clock	Measured Mean	Measured 1α	Measured 3σ	Picos per call (rdtsc)
clock_gettime CLOCK_MONOTONIC	561 ns	14 ns	41 ns	595330
clock_gettime CLOCK_REALTIME	560 ns	12 ns	36 ns	563880
clock_gettime CLOCK_PROCESS_CPUTIME_ID	240 ns	27 ns	82 ns	249380
gettimeofday	0 μs	0.7 μs	2 μs	580530

Which adds suprisingly little - 100ns over the system call or so. Again, CLOCK_REALTIME is about 30ns consistently faster than CLOCK_MONOTONIC according to the time stamp counters.

So the moral of the story is, just because clock_getres says the precision is 1ns, thats not necessarily so. In practise on this system, its more like 500ns.

(and yes, I do need to spend a few nanoseconds myself on fixing the css for tables ...)

Written by atp

Sunday 10 July 2011 at 3:56 pm

Posted in Linux

atp

Timer resolution part 2

Search

Archives

Categories

Meta