Re: Q about http-parser

From: Willy Tarreau <w#1wt.eu>
Date: Wed, 14 Nov 2007 19:04:33 +0100


On Wed, Nov 14, 2007 at 06:23:32PM +0100, Aleksandar Lazic wrote:
> >No, I really think we're close to being able to support basic
> >keep-alive, at least for adjacent content-less requests such as HEAD
> >and GET. When the HTTP processing moves into process_http function, it
> >should get even easier because the underlying states will serve only
> >data transfer.
>
> Hm, do you mean this comment in src/client.c:
>
> /*
> * FIXME: This should move to the STREAM_SOCK code then split into TCP
> * and HTTP.
> */
>
> right ;-)

yes, part of this.
I realized that process_cli and process_srv only differ by the connection establishment part. But this part has nothing to do with the way we want to manage a socket. Right now, we have :

All those functions work for HTTP and TCP with a ton of if/else.

What I want to achieve (and which has begun) is this :

Basically, the tcp_process_session() will only consist in :

  {
    if (client_side_buffer->state != SILENT)

       update_session(client_side_buffer);     if (server_side_buffer->state != SILENT)

       update_session(server_side_buffer);     if (client_side == CLOSED && server_side == CLOSED)

       end_of_session;
  }

The http_process_session() will have to support the HTTP state in addition to this, so it will look more like this (synthetic) :

  {
    if (client_side_buffer->state != SILENT)

       update_session(client_side_buffer);     if (server_side_buffer->state != SILENT)

       update_session(server_side_buffer);

    switch (session->state) {

      case REQ_HEADERS:
         filter_request(); rewrite_request();
         capture_headers(); capture_cookies();
         apply_persistence(); select_server();
         break;         

      case REP_HEADERS:
         filter_response(); rewrite_response();
         capture_response(); capture_cookies();
         update_persistence(); 
         break;         

      case DATA:
         transfer_at_most_content_length();
     }

    if (client_side == CLOSED && server_side == CLOSED)
       end_of_session;

  }

> Do you plan to make a:
>
> ---
> switch(mode):
> case HTTP: process_http; break; <= here you must handle also the
> tcp parts
> case TCP: process_tcp; break;
> case ...: process_...; break;
> ---
>
> a chain like:
>
> ---
> process_tcp
> check_again:
> if http_mode then process_http
> if ssl_mode then process_ssl set http_mode goto: check_again
>
> if ..._mode then process_...
> ----
>
> or a complete different way ;-)?!

Like I described above in fact. It would also become easier to support SSL since we will "just" have to attach the client-side buffer to another buffer which gets decrypted traffic :

                TCP  req_buffer   SSL      req_buffer   HTTP
  client_sock <---->            <-------->            <-------> server_sock
                     rep_buffer            rep_buffer

The "master" processing here is HTTP. The config will inform the proxy that the client side relies on SSL, which itself relies on TCP, so this sort of "stack" will have to be built upon accept(). Sometimes we build above us by calling the upper level accept(), sometimes we call below us by calling the lower level connect(). I think that the chain of protocols should simply be specified in an per-proxy array, and each accept() at level N calls the accept at level N+1. In the example above, the functions would be called that way :

  tcp_accept()

     ssl_accept()
        http_accept()
          task->process = http_process_session()

  http_process_session() :
     http_connect()
        tcp_connect()

We still need to make the distinction between socket-level functions which work on a file-descriptor, and buffer_level functions which work on a buffer. Eg: tcp_accept() receives a socket and creates a buffer. http_accept() receives a buffer. The buffer will have to "relay" many information between all the parties. For instance, the http_connect() might want to know the client's IP address to add an x-forwarded-for. In case of HTTPS, the first buffer will have a pointer to the client's IP, and the second buffer will relay the pointer. http_process_session() will just rely on the information passed by the buffer it receives. I want to priorize the relaying of data through pointers (or even values), compared to the object approach where abstraction is performed by calling functions at every level, which is simply not acceptable when you want to keep a high performance level.

The client-side is already quite a bit abstracted since we can support TCP, HTTP and UNIX on the socket without the upper functions being aware of it. The goal is to abstract slightly more, but the "smart" way, by filling the function pointers at the earliest moment so that we can reduce the number of if/else and also reduce the number of mis-preducted branches and the length of the jumps (which has an impact on the CPU cache efficiency).

Later (far later), this will lead to more modular code which we might put in external modules to support exotic protocols if needed.

Regards,
Willy Received on 2007/11/14 19:04

This archive was generated by hypermail 2.2.0 : 2007/11/14 19:45 CET