Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming mode of operation. Part I: request streaming #1067

Closed
wants to merge 2,381 commits into from

Conversation

vankoven
Copy link
Contributor

Obsoletes #1012 . Original implementation was full of missed questions, trying to solve them has brought me to a new design. I've spilt the full PR in two parts: streaming of requests and streaming of responses. It's too gigantic for single PR and there tons of very specific changes hard to track altogether.

I. Per-connection memory limit and receive window steering

The idea behind is to have only one virtual buffer shared between receive queue and requests stored inside Tempesta. Advertised receive window must not overcome free space in that virtual buffer.

To implement this, SOCK_RCVBUF_LOCK user lock is set for the client connection socket and sk->sk_rcvbuf size is manually controlled by Tempesta. Unlike to userspace behaviour of the SO_RCVBUF option, sk->sk_rcvbuf is not fixed. Instead it's Initialized with net.ipv4.tcp_rmem[1] as usual TCP connection and is increased if the client has a good transmission rate. The new client_rmem replaces net.ipv4.tcp_rmem[2], so client connections has theirs' own memory limits, while system-wide limit describes the limits for the server connections.

Second thing changed here - tracking per socket memory. Once skb added to the receive queue, the sk->sk_rmem_alloc counter is increased by skb_true_size. The odd between sk->sk_rcvbuf size and sk->sk_rmem_alloc is used to calculate a new receive window size. When an skb is detached from the receive queue it's not tracked by sk->sk_rmem_alloc any more, and receive window size returns back to it's original size. In the proposed solution after skb is orphaned and a new HTTP request is parsed from the skb, original size of HTTP request is added to sk->sk_rmem_alloc to calculate new and smaller window size. After response is sent to the client, the request is destroyed and sk->sk_rmem_alloc is decreased to produce bigger receive window size and allow the client to send more data. No other changes are made in window calculation, so all the other features works well: advertised widow is always multiples of mss, advertised window is never shrank and so on.

Actually limit is 2*client_rmem bytes. It's done to provide the same behaviour as well-known SO_RCVBUF socket option. If the receive buffer is depleted, zero window size is advertised by the kernel code and the client sends window probes, after keepalive_timeout seconds client will be disconnected.

II. Streaming requests to server connections

Basic idea: request requests longer than client_msg_buffering bytes are send by parts, as soon as a new chunk arrives. Headers are always fully buffered, even if header part is longer than the limit. To reduce extra work, a newly parsed skb is added to buffered part in full without breaking it into parts.

client_msg_buffering must be less than client_rmem. Default configuration must safe, so buffering limit is set to 1Mb by default.

Streamed requests brings new problems, which doesn't appear in full buffered mode:

  • Request processing starts right after end of buffered part. The rest of request may be received and processed after buffered pard is assigned to a backend connection for transmission.

  • Streamed message can't be re-sent. Streamed part of the request is destroyed once sent to backend, there is nothing to re-send.

  • If only a part of request is sent out to the backend, but an error happen during receiving a new request chunk, backend connection must be closed to recover.

  • Imagine that first N of requests in the server connection's forward queue was streamed to backend and (N+1) request is partly streamed. If an error happen during receiving a new chunk of that partly streamed request, backend connection must be closed. But immediate close of backend connection will degrade quality of service, since previous request can't be re-sent and evicted. To provide a better level of service, we have to wait until all sent requests will be responded. Another option is streaming requests one-by-one without pipelining, but that will cause extra buffering overhead and overall performance impact. The implementation provides the first solution.

  • Backend may respond to streamed request earlier that request is fully received. E.g. Nginx sends 200 OK reply to the GET request with chunked body before the body is received; or backend may send 4xx or 5xx error and close the connection before request is fully received (Just like Tempesta in case of attacks or errors). In case of early responses all the following request chuncks must be forwarded to the backend.

  • A new chunk of streamed request may be unneeded and should be dropped without processing, if request is served from cache, redirect response is prepared (sticky cookie module), or backend responded already responded

  • More than one response may be created for a single request, e.g. response may be built from cache, but later an error is detected in a new request chunk, and Tempesta is configured to send a reply. The proposed solution doesn't send a response before request is fully received, and never shares the content with clients that sends errors. But responses from backend are treated as primary action: if the error happen, but backend is already responded, backend response will be used.

  • Forwarding from backend's forward queue and parsing a new request chunk may be happen simultaneously, and multiple access to the same request may happen. Some kind of synchronisation is required. In the proposed solution streamed message is split into two parts: TfwHttpReq which mustn't be modified after enlisting into forward queue, and TfwHttpMsgPart - streamed part of the request, which can contain the rest of the body and trailer part headers and can be modified at any time.

  • As for me, client_rmem is highly recommended for using with client_msg_buffering. Multiple clients may stream to the same backend connection, but Tempesta may stream only one int the same time.
    Two options are possible: pause other clients by sending them zero windows, or buffer streamed requests until backend is ready. The later is implemented.

  • According to RFC trailer headers can be excluded from processing by streaming proxy and forwarded. But we aimed to check consistence of every forwarded header, and we provide some options to modify headers before forwarding. No exclusions for trailer headers in streamed message part.

III. Other fixes and improvements.

  • Fix [Cache] Unathorized clients should be blocked before accessing cache #899: check sticky cookie before forwarding request to cache, (the same fix also partly implements Fast sticky session scheduler is broken #1043).

  • Fix Drop client connections on response blocking #962: Drop client connections on response blocking.

  • Block messages with not allowed trailer headers. Partly done: all headers known by Tempesta by this moment are prohibited in trailer part. The list must be extended according to RFC (Todo comment is provided).

  • Don't try to send error message to client on response parse errors, if client is the Health Monitor subsystem.

  • Drop request/response and close the connection if splitting skb for sibling messages has failed. In this case original message will contain more than one message.

  • Replace TFW_HTTP_SUSPECTED request flag with usual TFW_HTTP_CONN_CLOSE flag to set up correct Connection: header value.

  • Bit operations with message flags are replaced by common test_bit/set_bit() API.

  • TfwHttpParser is allocated on per-connection, not on per-message basis.

  • Some other minor improvements.

aleksostapenko and others added 30 commits March 14, 2018 11:48
Split skb if can't allocate a new response
generating lua config from test

Add 1k and 1M test

Test responses

add 1M response test
expect 403 for wrong length

Parsing multiple responses

Expect exactly 1 parsing error

check for garbage after response end

Write special testers for wrong length
Add missing length tests

Close connection after response w/out content-length

Add tests for multiple or invalid content-length
Fix warning:
  /tmp/client/request_1M.lua: /tmp/client/request_1M.lua:5:
    unexpected symbol near '='
Long body in responses and requests
If ss_skb_split() fails, a part of the next http message will be
added to the end of last parsed message. If the error happened,
there is no recourse, we must close the connection.
Buffered part of the message is must be detached from the
connection as early as possible to be forwarded to the
receiver. Streamed part is constantly updated by the sender.
If both parts are used inside the same object, synchronization
will be required, since buffered part may be accessed by multiple
cpus.
No error processing in this commit.
Don't try to send error response to Health Monitor subsystem.
It's a locally generated request and it's not assigned to any
client connection.
The same client may stream requests to multiple server connections,
e.g. in full-stream mode. Close the server connections to recover
after sending incomplete request.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants