Re: maybe dump question

From: Willy Tarreau <w#1wt.eu>
Date: Thu, 31 Jan 2008 20:17:53 +0100


On Wed, Jan 30, 2008 at 10:17:59PM +0100, Krzysztof Oledzki wrote:
>
>
> On Wed, 30 Jan 2008, Aleksandar Lazic wrote:
>
> >Hi,
> >
> >sometimes I make a diff from previous_version current_version, for my
> >curiosity ;-)
> >
> >
> >I have done this with haproxy-1.3.14.1 + haproxy-1.3.14.2
> >
> >diff -ru haproxy-1.3.14.1 haproxy-1.3.14.2|less
> >
> >and asked me why you have not combine this statement into a block?
> >
> >I think mybe for performance issue or some other reason?!
> >
> >---- in haproxy-1.3.14.2/src/backend.c
> >
> >srv_count_retry_down()
> >.
> >.
> >if (t->srv)
> > t->srv->cum_sess++;
> >if (t->srv)
> > t->srv->failed_conns++;
> >
> >srv_retryable_connect()
> >.
> >.
> >case SN_ERR_INTERNAL:
> >
> >if (t->srv)
> > t->srv->cum_sess++;
> >if (t->srv)
> > t->srv->failed_conns++;
> >---
>
> Probably to help solving a conflict with
> 25b501a6b12d8f4ca8cabe946b1286aa4020755c
>
> (...)
> - if (t->srv)
> + if (t->srv) {
> t->srv->cum_sess++;
> - if (t->srv)
> t->srv->failed_conns++;
> + t->srv->redispatches++;
> + }
> (...)
>
> Just guess... ;)

No, I remember why. It is because GCC emits crappy sub-optimal code when put as a block.

If you write :

   if (a)

       b++;
   if (a)

       c++;

then it emits code looking like this :

   x = b + 1       # 1 cycle, simple trivial addition
   if (a) b = x    # 1 cycle, conditional move, no jump
   x = c + 1       # 1 cycle
   if (a) c = x    # 1 cycle

On all modern archs, the two middle instruction even merge in one cycle because there's no functional dependency. So you do all that in 3 cycles without any jump (so you don't hurt the branch predictor).

Now with a block :

  if (!a) # 1 cycle (test)
    jump block # 1 cycle (jump), most often predicted next:
  ...
  return
block:

   x = r         # 1 cycle for both
   y = b
   b = x + 1     # 1 cycle for both
   b = y + 1
   jump next     # 1 cycle, predicted


The second case is slower, awful and tends to flush pipes, not counting the bad cache efficiency caused by jumping everywhere. So when you have a small number of operations like this to perform (2..3), it's better to write the test 2 or 3 times, the compiler will optimize it away.

Cheers,
Willy   Received on 2008/01/31 20:17

This archive was generated by hypermail 2.2.0 : 2008/01/31 21:00 CET