Hypermail

From: Jeff Buchbinder <jbuchbinder#ravemobilesafety.com>
Date: Fri, 19 Aug 2011 09:05:53 -0400

On Thu, 2011-08-18 at 15:13 -0500, Willy Tarreau wrote:
> Hi Jeff,
>
> On Sun, Aug 14, 2011 at 04:01:53PM -0400, Jeff Buchbinder wrote:
> > I've been working on an "API" patch, where certain functionality is
> > exposed over the stats HTTP service. The "fork" where I have been
> > working on this is available here:
> >
> > https://github.com/jbuchbinder/haproxy
> >
> > The full patch (which I imported from my old working tree) is here, if
> > you want to see just the changes:
> >
> > https://github.com/jbuchbinder/haproxy/commit/0f924468977fc71f2530837e3e44cf47fc00fd0f
> >
> > Documentation is available here:
> >
> > https://github.com/jbuchbinder/haproxy/blob/master/README.API
> >
> > It was recently suggested that I attempt to get this patch included
> > upstream.
>
> Well, you apparently did a nice amount of work. I'm not opposed to an API
> (after all, we've developped one at Exceliance too), but we need to respect
> a certain amount of basic component behaviour rules. An API will never do
> more than what the component itself is able to do, it's just a convenient
> (or sometimes at least a different) way of making it do something it is
> able to do.
>
> If you look at how other load balancers work (and most network equipments
> too BTW), you generally have multiple interaction levels between the user
> and the internal state :
>
> - monitoring : the user wants to check the current state of the product ;
> - stats : the user wants to check some stats that were aggregated over
> a period, sometimes since last clear or last reboot. Those generally
> are counters ;
> - operational status : the user wants to temporarily change something
> for the current running session, because this can help him make some
> other operations more transparent or better resist an unexpected
> condition (eg: imbalanced servers after a failure).
> - configuration status : the user wants a change to be performed and
> kept until a new configuration change undoes it.
>
> For quite some time, monitoring and stats have been available under various
> forms (http or unix socket). Recently, a few operational status changes were
> brought first on the unix socket and later on the web page. Those are still
> limited (enable/disable of a server, clear a table entry, change a server's
> weight) and even more limited for the web access. Still that starts to fit
> some usages.

The API stats (pool.content, etc) calls that I had implemented are essentially the same, except that they format the data in a way that is much more easily consumed by other services. (JSON formatted data beats out CSV in an ease-of-use context for anything other than feeding to a spreadsheet or awk/sed/cut/grepping data.)

> All of these accesses are performed for the currently running session. This
> means that if you restart the process, everything is lost, quite similarly
> as what you get if you reboot your router while you temporarily killed a
> BGP session or shut a port. And this is expected, you want those changes
> to be temporary because they're made in the process of something else.
>
> The configuration status has a very different usage pattern : the change
> that is performed must absolutely meet two important requirements :
> - the changes that are performed must be kept across a restart ;
> - what is performed must have the same effect after the restart that
> it had during the hot change.
>
> The first one is crucial : if your process dies and is restarted by a
> monitoring tool without the user knowing it, all changes are lost and
> nobody knows. Also, the process would restart with an old invalid config
> which does not match what was running before the restart, until someone
> pushes the changes again (provided someone is able to determine the diff
> between what's in the file at the moment of restart and what was running
> before it). Worse, some orthogonal changes may be performed live and in
> the config file, making the addition of both incompatible. For instance,
> you would add a server on the live API and in parallel, someone would
> add the cookie from the config as well as to all other servers. If after
> a restart you re-apply the same changes, you'll get a wrong config with
> the last added server which does not have any cookie.
>
> => This means that you need your file-based config to always be in
> sync with the API changes. There is no reliable way of doing so
> in any component if the changes are applied at two distinct
> places at the same time !

It depends what you're using haproxy for. If you're populating the configuration from the API (which is my eventual goal, if possible) for an elastic/dynamic server pool scenario where servers will be brought into the pool dynamically, it doesn't matter as much about configuration file persistence.

> The second point is important too : even if we assume that you find a way
> to more-or-less ensure that your config file gets the equivalent changes
> and is up to date, you must absolutely ensure that what is there will work
> upon a restart and will exhibit the exact same behaviour.
>
> There are a large number of issues that can arise from performing changes
> in a different order than what is done at once upon start-up. Most people
> who had to deal with Alteon LBs for instance, know that sometimes something
> does not behave as expected after a change and a reboot fixes the issue
> (eg: renumbering a large number of filters, or changing health checking
> methods). And there's nothing really wrong with that, it's just that the
> problem by itself is complex. On unix-like systems, many of us have already
> been hit by an issue involving two services bound to the same port, one on
> the real IP and the other one bound to 0.0.0.0. If you bind 0.0.0.0 first,
> on most systems both may bind, while the reverse is not allowed. Back to
> our API case, imagine you have two frontends declared this way :
>
> frontend f1
> bind 1.2.3.4:8888
>
> frontend f2
> bind 3.4.5.6:8080
>
> You first change the IP of the second one to become 0.0.0.0:8080. It's fine,
> the change is accepted. Now you change the second one's port to become 8080.
> It's fine too, the address is more precise than 0.0.0.0, the change is
> accepted. You perform the same change in parallel to your file-based config :
>
> frontend f1
> bind 1.2.3.4:8080
>
> frontend f2
> bind 0.0.0.0:8080
>
> One saturday evening you get a power hicup and your server reboots. Haproxy
> will complain that frontend f2 cannot bind : "Address already in use".
>
> There is only one way to solve these classes of issues, by respecting those
> two rules :
> - the changes must be performed to one single place, which is the reference
> (here the config file)
> - the changes must then be applied using the normal process from this
> reference

I would think it would also be possible to "replay" a list of modifications to the original configuration, which would not require rewriting the original config. Not a perfect solution, but another possibility. (The downside would potentially be that a change to the original configuration would change the way that the replayed actions would behave.)

> What this means is that anything related to changing more than an operational
> status must be performed on the config file first, then propagated to the
> running processes using the same method that is used upon start up (config
> parsing and loading).

That assumes that you're not dealing with a transient configuration (as I had mentioned earlier. It's an admirable goal to allow configuration persistence for things like the pool.add and pool.remove methods (since those are, at the moment, the only two that touch the configuration in a way that would seriously break a stored config file).

Also, outside of pool.add and pool.remove, I'm not really doing anything conceptually outside of what the "stats" control socket already has been doing. Weight and maintenance mode are not persisted to the configuration file. The only difference is the way that I'm allowing access to it (disregarding pool.add and pool.remove, of course).

> Right now haproxy is not able to reload a config once it's started. And since
> we chroot it, it will not be able to access the FS afterwards. However we can
> reload a new process with the new config (that's what most of us are currently
> doing).

That's also what I'm doing in our production setup. The importance of an accessible API, though, is that it allows third party services (for example, a software deployer or cloud management service) to control certain aspects of the proxy without having to resort to kludges like using ssh to remotely push commands into a socket with socat. (Which, by the way, works just fine run locally with a wrapper script, but makes it more difficult to integrate into a deployment process.)

> But I predict that we'll one day reach a point where the restarting process
> involves calling the haproxy program to parse and compile the config file,
> then feed it compiled to either the already running process or the new one
> just being forked for that purpose.
>
> That's why I'm very interested in Simon's work concerning the master-worker
> model. After a first positive impression we found a number of structural
> issues that make it very hard to implement quickly in haproxy as it is now,
> but the time we spend trying to understand and solve those issues comforts
> us in the idea that it clearly is the way to go for a more dynamic process
> with hot config changes and the like.
>
> With all that, I'd say that I'm very open to have an API to check/change
> monitoring, stats and operational state, but I will no accept an API to
> change the running config without passing via the config file which is
> the only reference we can trust. I would possibly accept extending the
> operational changes to a wider area than what seems reasonable, in order
> to help getting out of silly situations, but I doubt that there would be
> any real justification for doing such specific things via an API (eg:
> changing a running server's IP address, or killing a frontend to release
> the port without stopping the rest).

I hadn't seriously considered changing IP addresses or any backend configuration outside of weighting and maintenance mode ; the only changes I was making was the adding or removal of a backend server in a pre-existing proxy server pool, and the only reason I had been trying to implement that was to make it easier for those with dynamic environments (read: cloud/dynamic/elastic clusters) to reconfigure haproxy remotely.

That being said, if we could make it a little safer to add and remove servers dynamically, what would be the harm in conditionally allowing non-persisted changed if it's explicitly noted that those changes will not be persisted past a restarting of the server process? (Or enabling a runtime switch to allow hot changes which don't persist.) I'm thinking that there are some use-cases where it would be advantageous to be able to build a running config "on the fly" from a software deployment or management system.

> Last, there are a number of remaining issues that are not overcome by the
> current patch, making it risky for you to use it in production :
> - a heavily loaded proxy may lack memory (I'm used to see that quite
> often during benchmarks), and a number of malloc() or strdup() are
> not checked. This means that calling the API for a change under
> harsh conditions might cause a process crash. It's not always easy
> to perform atomic changes because you may need to undo many things
> when the failure happens late in the change (especially requests that
> involve changing something which exists). I noticed a significant
> number of strdup() that were not needed and that could simply be
> removed though instead of fixing them.

I just went through and audited out the superfluous strdup() calls.

https://github.com/jbuchbinder/haproxy/commit/943fb837201ec318af76155c88dfc9773ba0db41

> - resources are allocated upon startup. If you want to add a server,
> you need to reserve a file descriptor for its checks. Not doing so
> will mean that by adding many servers, you'll sensibly reduce the
> number of available FDs, causing connection resets to some clients
> when the limit is reached. Similarly, adding listening sockets to
> frontends will consume some permanent FDs that deduce from the total
> amount of available ones. If you have a 1024 fd limit by default and
> add a binding to a 1024 port range, you'll eat all possible sockets
> and you won't be able to reconnect to disable your change ; Adding
> frontends, loggers and servers requires adding FDs so those extra FDs
> reserved for that purpose should be configurable and the limit
> enforced on your API (eg: "too many servers added").

Okay. I could always either disable or remove the pool.add and pool.remove code until there's a sensible way to deal with that.

> - since you make use of some malloc(), you need to be extremely careful
> about the risk of memory leak, especially in error paths. An API usually
> gets pushed to automated components, and you wouldn't like your LBs to
> eat all their memory in 2 days and need regular reboots. That's why
> malloc() is never used outside the pools. The pools are less convenient
> to use but much more robust, and specifically the care they require from
> the developer ensures that those issues can almost never happen.

Just audited my malloc() calls, and you were right, there were a few places where I forgot to free the allocated memory before dropping out with an error condition.

https://github.com/jbuchbinder/haproxy/commit/6672e59c3855963682162bca815f0909f56e2c57

> Another important point is that an API must be discussed with all its
> adopters. At exceliance, we discussed ours with interested customers to
> take their ideas into account. It's not because they were customers but
> because they were future adopters. It's very possible that the way you
> designed it perfectly fits your purpose but will be unusable to many other
> people for a variety of reasons. Designing a usable an evolutive API may
> take months of discussions but it's probably worth it.

I'd love to open a discussion regarding this. As the patch hasn't be accepted upstream, it's clearly open to changes and improvements, as nothing is "set in stone". I started putting it together based on a wishlist I had, concerning things that I felt would be ideal to be able to control in a running haproxy instance.

If there's the eventual possibility of upstream integration, I'd be happy to continue maintaining the patch outside of the main distribution until you and a fair portion of the community are satisfied with it. At the moment I'm keeping my tree in sync with yours.

-- 
Jeff Buchbinder
Principal Engineer / Interim Director of Infrastructure
Rave Mobile Safety, Inc
jbuchbinder#ravemobilesafety.com

Received on 2011/08/19 15:05

Re: haproxy API patch