Hypermail

From: Willy Tarreau <w#1wt.eu>
Date: Fri, 21 Dec 2007 10:14:01 +0100

Hi John,

On Thu, Dec 20, 2007 at 05:30:15PM -0500, John Marrett wrote:
> > 1) multiple RRs
>
> > But in this case, use BGP and manipulate
> > the weights so that one site can never take over the other
> > one as long as the other one is up.
>
> This is interesting, and something I hadn't considered before. In the
> event that a users's ISP lost connectivity with our primary hosting
> center, they would then be routed to the secondary site as announced by
> BGP. So we would still see some traffic coming to the secondary site?

Logically no if you add enough AS hops to prevent this. For instance, if you notice that you never cross more than 10 AS to reach any of your customers, you can virtually pretend that your second site is 10 AS deeper so that it never gets used as long as the other one is available. The downside here is that you will need at least one AS per site if you want to use both sites simultaneously for different IP networks.

> > 3) TTL
> > as said in 2) you have very little control over what people
> > do with the TTL,
> > and there is even a risk of being blacklisted by some
> > providers if you run
> > with too low TTLs which put a lot of stress on their DNS
> > forwarders (eg: one
> > minute).
>
> I've never heard of this type of policy, by provider do you mean a
> external DNS provider that is hosting your DNS; or a customer's ISP DNS
> might blacklist us (rather than just bumping the TTL like some do).

I mean "ISP" (sorry, in france we abuse the term "provider" to mean "ISP"). I'm aware of at least one who did this to a few domains for which their DNS relays were getting too many requests. And it's something that could be accepted. Imagine if requests to Google were not cacheable by customers, it would be really problematic for their ISP.

> > 1) availability
> > 2) nearest site
> > 3) scalability
>
> An interesting twist on the usual pick 3. I think we'll concentrate on 1
> & 3, and then, for targeted geographic regions, deploy a domain specific
> solution to address locality. Thankfully, for us, the site that a
> customer connects to has little impact on our scalability.

OK.

> > Also, if you can afford a dedicated link (redudant) between
> > your front routers
> > and set up a local preference for each IP, then you've won,
> > because you can
> > announce your IPs from wherever you want, and the final step
> > will be performed
> > by those routers for customers entering via the wrong site.
>
> Thankfully, I do enjoy a very high capacity, redundant link that between
> our primary and secondary site. We have a third site that is not linked
> in such a manner, but we don't necessarily need to route so much front
> end traffic into it, and could instead switch over to it in case of
> cataastrophic failure.
>
> By encapsulated, I had meant a GRE, IPSec or similar tunnel over the
> Internet, which would of course break if you lost internet access.

Then if you have a fat pipe between sites, simply connect their frontend routers, announce all the IP addresses using BGP, use local pref on the routers to always route the same IP on the same site as long as it's UP, announce both sites'IP in DNS RR records and you're done. You will have no trouble with broken TCP sessions and you will not need to do any trick on the DNS when a site goes down. You will even be able to manage the failover on a per-IP basis.

> I also have the means to provider secure encapsulation on the backend
> link, so we can have this public network traffic transit there, in a
> secure fashion.

That's great.

> > If your sites are really far apart, you should avoid layer 2
> > as much as
> > possible. The latencies you can encounter on long distance links may
> > sometimes cause some IPs to appear for a short time on a site then
> > disappear.
>
> None of the sites are really far apart. The third site is 200km away,
> the first two are quite close. Is 200km "really" far apart?

No, that's not far apart at all. 200 km is 2ms RTT which is perfectly acceptable for layer 2. You'd encounter more problems above 100-200ms.

> > Yes it is interesting, but unfortunately it is hard to find
> > people to discuss this subject.
>
> Couldn't agree more. On this subject, if you are speaking at or
> attending any relevant conventions? Let me / the list know.

No such activity :-)
I know someone with whom I work and who has excellent skills in this area (in fact he's the one who explained the basics of BGP to me while setting it up), but unfortunately he does not practise English often enough to sustain a discussion like this. I will try to push him into the discussion anyway :-)

> > BTW, you may be interested in a short article I
> > wrote last year
> > as an introduction to load balancing. Maybe you will like to
>
> I read it when I was considering different content load balancing
> options. Interesting indeed.

OK.
Best regards,
Willy Received on 2007/12/21 10:14

Re: BGP / GSLB and HAProxy