Hypermail

From: Willy Tarreau <w#1wt.eu>
Date: Sun, 28 Aug 2011 09:31:58 +0200

Hi Jeff,

sorry for the late response, your message is one of the few I found unread in my mail box after moving a lot of ML junk out of it.

On Fri, Aug 19, 2011 at 09:05:53AM -0400, Jeff Buchbinder wrote:
> The API stats (pool.content, etc) calls that I had implemented are
> essentially the same, except that they format the data in a way that is
> much more easily consumed by other services. (JSON formatted data beats
> out CSV in an ease-of-use context for anything other than feeding to a
> spreadsheet or awk/sed/cut/grepping data.)

I'm well aware of this too. That's why I wanted that we used JSON in our API at Exceliance.

(...)
> > => This means that you need your file-based config to always be in
> > sync with the API changes. There is no reliable way of doing so
> > in any component if the changes are applied at two distinct
> > places at the same time !
>
> It depends what you're using haproxy for. If you're populating the
> configuration from the API (which is my eventual goal, if possible) for
> an elastic/dynamic server pool scenario where servers will be brought
> into the pool dynamically, it doesn't matter as much about configuration
> file persistence.

But you still need to populate your config before starting the daemon, otherwise a restart may be fatal just because the few first seconds before you update its conf break the site.

(...)
> > There is only one way to solve these classes of issues, by respecting those
> > two rules :
> > - the changes must be performed to one single place, which is the reference
> > (here the config file)
> > - the changes must then be applied using the normal process from this
> > reference
>
> I would think it would also be possible to "replay" a list of
> modifications to the original configuration, which would not require
> rewriting the original config. Not a perfect solution, but another
> possibility. (The downside would potentially be that a change to the
> original configuration would change the way that the replayed actions
> would behave.)

Yes that's the problem. Replaying is only valid in an independant context. That's the problem we have with the "defaults" sections. They're quite handy but they're changing a lot of semantics when it comes to configuring the sections that depend on them. If your main config file gets a change, it's very possible that replaying your changes will not do the right thing again.

> > What this means is that anything related to changing more than an operational
> > status must be performed on the config file first, then propagated to the
> > running processes using the same method that is used upon start up (config
> > parsing and loading).
>
> That assumes that you're not dealing with a transient configuration (as
> I had mentioned earlier. It's an admirable goal to allow configuration
> persistence for things like the pool.add and pool.remove methods (since
> those are, at the moment, the only two that touch the configuration in a
> way that would seriously break a stored config file).

As I indicated above, the idea of a transient config file scares me a lot. Either you have no server in it and you serve 503 errors to everyone when you start, until the config is updated, or you have a bunch of old servers and in environments such as EC2, you send traffic to someone else's servers because they were assigned your previous IP.

> Also, outside of pool.add and pool.remove, I'm not really doing anything
> conceptually outside of what the "stats" control socket already has been
> doing. Weight and maintenance mode are not persisted to the
> configuration file. The only difference is the way that I'm allowing
> access to it (disregarding pool.add and pool.remove, of course).

Even the weight has different semantics in the config file and on the stats socket. The stats socket controls the effective weight without affecting the configured weight. The reason is that you can set the weight to 100% on the stats socket and you get back the configured weight.

> > Right now haproxy is not able to reload a config once it's started. And since
> > we chroot it, it will not be able to access the FS afterwards. However we can
> > reload a new process with the new config (that's what most of us are currently
> > doing).
>
> That's also what I'm doing in our production setup. The importance of an
> accessible API, though, is that it allows third party services (for
> example, a software deployer or cloud management service) to control
> certain aspects of the proxy without having to resort to kludges like
> using ssh to remotely push commands into a socket with socat. (Which, by
> the way, works just fine run locally with a wrapper script, but makes it
> more difficult to integrate into a deployment process.)

Oh I know that well too ;-)
At the company, we decided to address precisely this issue with the API we developped : it only affects the config file and never plays with the socket because right now we have not implemented any operational status changes. However, feeding the config file to push changes is a safe way to ensure those changes are correctly taken into account.

(...)
> I hadn't seriously considered changing IP addresses or any backend
> configuration outside of weighting and maintenance mode ;

You didn't but that's a question that regularly comes on the list :-)

> the only
> changes I was making was the adding or removal of a backend server in a
> pre-existing proxy server pool, and the only reason I had been trying to
> implement that was to make it easier for those with dynamic environments
> (read: cloud/dynamic/elastic clusters) to reconfigure haproxy remotely.

Don't get me wrong, as I said, I *really* understand why you have to do this. I'm just arguing that I don't like it being done *that way*, but I know there is a real need for it.

> That being said, if we could make it a little safer to add and remove
> servers dynamically, what would be the harm in conditionally allowing
> non-persisted changed if it's explicitly noted that those changes will
> not be persisted past a restarting of the server process? (Or enabling a
> runtime switch to allow hot changes which don't persist.) I'm thinking
> that there are some use-cases where it would be advantageous to be able
> to build a running config "on the fly" from a software deployment or
> management system.

Quite honnestly, I don't see a valid use of non-persisted changes when it comes to adding/removing servers. At one point you'll have to mix persisted and non-persisted changes, and that's impossible to sort out. And for having seen equivalent things in the past, I know how that will end up : people will regularly need to reload their config for some updates, and will be annoyed by the loss of the non-persistent changes. So they will provide patches to add more and more configurable settings for the non persistent changes in order to avoid a reload, which will push the problem even further, and we'll end up with a huge amount of crap here.

In fact, if we had a way to decide that the non-persistent changes could be dumped into the config file, it would be much different, because the API could then be the entry point for the persistent changes. In short, you'd perform your changes then you would save. I started with this goal a few years ago, when the config file and line numbers were added to a lot of internal structs. The idea was to be able to dump the whole config as it was parsed. Then this config could survive updates, and it would become the main entry point for config changes. This would address the risk of conflicts between the changes performed on the daemon and the changes performed with "vi" on the config file.

But doing so would only be possible in the master-worker model, otherwise attempting to do this on a multi-process instance would result in a random config.

(..)
> > - resources are allocated upon startup. If you want to add a server,
> > you need to reserve a file descriptor for its checks. Not doing so
> > will mean that by adding many servers, you'll sensibly reduce the
> > number of available FDs, causing connection resets to some clients
> > when the limit is reached. Similarly, adding listening sockets to
> > frontends will consume some permanent FDs that deduce from the total
> > amount of available ones. If you have a 1024 fd limit by default and
> > add a binding to a 1024 port range, you'll eat all possible sockets
> > and you won't be able to reconnect to disable your change ; Adding
> > frontends, loggers and servers requires adding FDs so those extra FDs
> > reserved for that purpose should be configurable and the limit
> > enforced on your API (eg: "too many servers added").
>
> Okay. I could always either disable or remove the pool.add and
> pool.remove code until there's a sensible way to deal with that.

I think that the only way to deal with that is to declare in the configuration how many servers, backends, etc... we plan to support, so that resources are pre-allocated accordingly. It's quite difficult for frontends since we have port ranges. However, if we had an automatic limit that was displayed on startup, maybe it could help understand how many resources are left available. Alternatively, we could decide to have a reserve of XXX fd and YYY megs and enforce that limit on changes.

> > Another important point is that an API must be discussed with all its
> > adopters. At exceliance, we discussed ours with interested customers to
> > take their ideas into account. It's not because they were customers but
> > because they were future adopters. It's very possible that the way you
> > designed it perfectly fits your purpose but will be unusable to many other
> > people for a variety of reasons. Designing a usable an evolutive API may
> > take months of discussions but it's probably worth it.
>
> I'd love to open a discussion regarding this. As the patch hasn't be
> accepted upstream, it's clearly open to changes and improvements, as
> nothing is "set in stone". I started putting it together based on a
> wishlist I had, concerning things that I felt would be ideal to be able
> to control in a running haproxy instance.

Fine, but I'm insisting on the difference between the two types of controls, the persistent changes and non-persistent changes.

> If there's the eventual possibility of upstream integration, I'd be
> happy to continue maintaining the patch outside of the main distribution
> until you and a fair portion of the community are satisfied with it. At
> the moment I'm keeping my tree in sync with yours.

OK. I think that several issues should be separately addressed :

resource pre-allocation
config file synchronization
compatibility with multi-process mode

Best regards,
Willy Received on 2011/08/28 09:31

Re: haproxy API patch