atp

Atp's external memory

Timer resolution part 2

In a previous post I looked at the lowest practical resolution using the rdtsc instruction call.

What about more common or garden ways like clock_gettime() or even the humble gettimeofday()?

We can use the rdtsc method we developed earlier to measure the timing system calls. In this post I'll use the same test harness to measure the practical resolution of timing calls.

We're mainly interested in 4 calls, three of which are variants of clock_gettime();

  • clock_gettime(CLOCK_MONOTONIC, struct timespec *t)
  • clock_gettime(CLOCK_REALTIME, struct timespec *t)
  • clock_gettime(CLOCK_PROCESS_CPUTIME_ID, struct timespec *t)
  • gettimeofday(struct timeval *t);

clock_gettime uses a struct timespec

  struct timespec {
               time_t   tv_sec;        /* seconds */
               long     tv_nsec;       /* nanoseconds */
           };

 .

Which looks good. Nanoseconds !

gettimeofday uses a struct timeval

           struct timeval {
               time_t      tv_sec;     /* seconds */
               suseconds_t tv_usec;    /* microseconds */
           };

 .

Using clock_getres we can, for the clock_gettime clocks get an idea of the "official" resolution or precision.

struct timespec res;     
clock_getres(CLOCK_REALTIME,&res);
printf("CLOCK_REALTIME resolution = %lu s %lu ns\n",res.tv_sec,res.tv_nsec);

which gives us this;

CLOCK_REALTIME resolution = 0 s 1 ns
CLOCK_MONOTONIC resolution = 0 s 1 ns
CLOCK_PROCESS_CPUTIME_ID resolution = 0 s 1 ns

One nanosecond resolution (precision). Fantastic! Just what we need.

To be fair to the man page, it does say;

The function clock_getres() finds the  resolution  (precision)  of  the
 specified  clock  clk_id,  and,  if  res  is non-NULL, stores it in the
 struct timespec pointed to by res.  The resolution of clocks depends on
 the  implementation  and  cannot be configured by a particular process.

We'll run 100 calls in a tight loop, using the rdtsc method in the previous post to tell us how many picoseconds have elapsed per call. We'll use the timing results from the calls themselves to gove us an idea of what the internal mean and standard deviation are.

Clock Measured Mean Measured 1α Measured 3σ Picos per call (rdtsc)
clock_gettime CLOCK_MONOTONIC 464 ns 34 ns 104 ns 494690
clock_gettime CLOCK_REALTIME 467 ns 35 ns 105 ns 466940
clock_gettime CLOCK_PROCESS_CPUTIME_ID 244 ns 31 ns 94 ns 256040
gettimeofday 0 μs 0.7 μs 2 μs 490250

.

All times measured on a 2.7GHz celeron E3500, which has the following rdtsc resolution

RDTSC Mean = 35 ticks, 13219 picos, 1 sigma = 2392.573145, 3 sigma = 7177.719434

Stability wise, subsequent runs tend to differ by a couple of nanoseconds, as you'd expect from the rdtsc numbers. All runs are run as root, at a high priority and with cpu affinity set (using taskset -c 1 chrt -f 99)

So, the practical limit is 250ns on this machine, using the clock_gettime clockid that comes with the same warning as the rdtsc method - again from the clock_gettime man page;

The  CLOCK_PROCESS_CPUTIME_ID  and  CLOCK_THREAD_CPUTIME_ID  clocks are
 realized on many platforms using timers from the  CPUs  (TSC  on  i386,
AR.ITC  on  Itanium).  These registers may differ between CPUs and as a
consequence these clocks may return  bogus  results  if  a  process  is
migrated to another CPU.

(Another reason for running under tasket). The more bullet proof variants give us a practical resolution of 500ns or 1/2 a microsecond. And thats with vsyscall on. Also of interest is that consistently CLOCK_REALTIME seems to take about 30ns less to run in reality than CLOCK_MONOTONIC.

Without vsyscall64 on (i.e. without using the the vdso library described in a previous post)

Clock  Measured Mean Measured 1α
Measured 3σ
 Picos per call (rdtsc)
 clock_gettime CLOCK_MONOTONIC  561 ns  14 ns
 41 ns  595330
 clock_gettime CLOCK_REALTIME  560 ns
 12 ns
 36 ns
 563880
 clock_gettime CLOCK_PROCESS_CPUTIME_ID  240 ns
  27 ns
 82 ns
 249380
 gettimeofday  0 μs  0.7 μs  2 μs  580530

.

Which adds suprisingly little - 100ns over the system call or so. Again, CLOCK_REALTIME is about 30ns consistently faster than CLOCK_MONOTONIC according to the time stamp counters. 

So the moral of the story is, just because clock_getres says the precision is 1ns, thats not necessarily so. In practise on this system, its more like 500ns.

(and yes, I do need to spend a few nanoseconds myself on fixing the css for tables ...)

Written by atp

Sunday 10 July 2011 at 3:56 pm

Posted in Linux

Leave a Reply