-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming mode of operation. Part I: request streaming #1067
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fix missed semicolon
1M backends
Split skb if can't allocate a new response
generating lua config from test Add 1k and 1M test Test responses add 1M response test
expect 403 for wrong length Parsing multiple responses Expect exactly 1 parsing error check for garbage after response end Write special testers for wrong length
Add missing length tests Close connection after response w/out content-length Add tests for multiple or invalid content-length
Fix warning: /tmp/client/request_1M.lua: /tmp/client/request_1M.lua:5: unexpected symbol near '='
Long body in responses and requests
If ss_skb_split() fails, a part of the next http message will be added to the end of last parsed message. If the error happened, there is no recourse, we must close the connection.
Buffered part of the message is must be detached from the connection as early as possible to be forwarded to the receiver. Streamed part is constantly updated by the sender. If both parts are used inside the same object, synchronization will be required, since buffered part may be accessed by multiple cpus.
No error processing in this commit.
Don't try to send error response to Health Monitor subsystem. It's a locally generated request and it's not assigned to any client connection.
The same client may stream requests to multiple server connections, e.g. in full-stream mode. Close the server connections to recover after sending incomplete request.
Merged
krizhanovsky
force-pushed
the
ik-streaming-redesign
branch
from
January 26, 2019 22:04
c59e6a9
to
6147875
Compare
This was referenced Feb 13, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Obsoletes #1012 . Original implementation was full of missed questions, trying to solve them has brought me to a new design. I've spilt the full PR in two parts: streaming of requests and streaming of responses. It's too gigantic for single PR and there tons of very specific changes hard to track altogether.
I. Per-connection memory limit and receive window steering
The idea behind is to have only one virtual buffer shared between receive queue and requests stored inside Tempesta. Advertised receive window must not overcome free space in that virtual buffer.
To implement this,
SOCK_RCVBUF_LOCK
user lock is set for the client connection socket andsk->sk_rcvbuf
size is manually controlled by Tempesta. Unlike to userspace behaviour of theSO_RCVBUF
option,sk->sk_rcvbuf
is not fixed. Instead it's Initialized withnet.ipv4.tcp_rmem[1]
as usual TCP connection and is increased if the client has a good transmission rate. The newclient_rmem
replacesnet.ipv4.tcp_rmem[2]
, so client connections has theirs' own memory limits, while system-wide limit describes the limits for the server connections.Second thing changed here - tracking per socket memory. Once skb added to the receive queue, the
sk->sk_rmem_alloc
counter is increased by skb_true_size. The odd betweensk->sk_rcvbuf
size andsk->sk_rmem_alloc
is used to calculate a new receive window size. When an skb is detached from the receive queue it's not tracked bysk->sk_rmem_alloc
any more, and receive window size returns back to it's original size. In the proposed solution after skb is orphaned and a new HTTP request is parsed from the skb, original size of HTTP request is added tosk->sk_rmem_alloc
to calculate new and smaller window size. After response is sent to the client, the request is destroyed andsk->sk_rmem_alloc
is decreased to produce bigger receive window size and allow the client to send more data. No other changes are made in window calculation, so all the other features works well: advertised widow is always multiples of mss, advertised window is never shrank and so on.Actually limit is
2*client_rmem
bytes. It's done to provide the same behaviour as well-knownSO_RCVBUF
socket option. If the receive buffer is depleted, zero window size is advertised by the kernel code and the client sends window probes, afterkeepalive_timeout
seconds client will be disconnected.II. Streaming requests to server connections
Basic idea: request requests longer than
client_msg_buffering
bytes are send by parts, as soon as a new chunk arrives. Headers are always fully buffered, even if header part is longer than the limit. To reduce extra work, a newly parsed skb is added to buffered part in full without breaking it into parts.client_msg_buffering
must be less thanclient_rmem
. Default configuration must safe, so buffering limit is set to 1Mb by default.Streamed requests brings new problems, which doesn't appear in full buffered mode:
Request processing starts right after end of buffered part. The rest of request may be received and processed after buffered pard is assigned to a backend connection for transmission.
Streamed message can't be re-sent. Streamed part of the request is destroyed once sent to backend, there is nothing to re-send.
If only a part of request is sent out to the backend, but an error happen during receiving a new request chunk, backend connection must be closed to recover.
Imagine that first N of requests in the server connection's forward queue was streamed to backend and (N+1) request is partly streamed. If an error happen during receiving a new chunk of that partly streamed request, backend connection must be closed. But immediate close of backend connection will degrade quality of service, since previous request can't be re-sent and evicted. To provide a better level of service, we have to wait until all sent requests will be responded. Another option is streaming requests one-by-one without pipelining, but that will cause extra buffering overhead and overall performance impact. The implementation provides the first solution.
Backend may respond to streamed request earlier that request is fully received. E.g. Nginx sends 200 OK reply to the GET request with chunked body before the body is received; or backend may send 4xx or 5xx error and close the connection before request is fully received (Just like Tempesta in case of attacks or errors). In case of early responses all the following request chuncks must be forwarded to the backend.
A new chunk of streamed request may be unneeded and should be dropped without processing, if request is served from cache, redirect response is prepared (sticky cookie module), or backend responded already responded
More than one response may be created for a single request, e.g. response may be built from cache, but later an error is detected in a new request chunk, and Tempesta is configured to send a reply. The proposed solution doesn't send a response before request is fully received, and never shares the content with clients that sends errors. But responses from backend are treated as primary action: if the error happen, but backend is already responded, backend response will be used.
Forwarding from backend's forward queue and parsing a new request chunk may be happen simultaneously, and multiple access to the same request may happen. Some kind of synchronisation is required. In the proposed solution streamed message is split into two parts: TfwHttpReq which mustn't be modified after enlisting into forward queue, and TfwHttpMsgPart - streamed part of the request, which can contain the rest of the body and trailer part headers and can be modified at any time.
As for me,
client_rmem
is highly recommended for using withclient_msg_buffering
. Multiple clients may stream to the same backend connection, but Tempesta may stream only one int the same time.Two options are possible: pause other clients by sending them zero windows, or buffer streamed requests until backend is ready. The later is implemented.
According to RFC trailer headers can be excluded from processing by streaming proxy and forwarded. But we aimed to check consistence of every forwarded header, and we provide some options to modify headers before forwarding. No exclusions for trailer headers in streamed message part.
III. Other fixes and improvements.
Fix [Cache] Unathorized clients should be blocked before accessing cache #899: check sticky cookie before forwarding request to cache, (the same fix also partly implements Fast sticky session scheduler is broken #1043).
Fix Drop client connections on response blocking #962: Drop client connections on response blocking.
Block messages with not allowed trailer headers. Partly done: all headers known by Tempesta by this moment are prohibited in trailer part. The list must be extended according to RFC (Todo comment is provided).
Don't try to send error message to client on response parse errors, if client is the Health Monitor subsystem.
Drop request/response and close the connection if splitting skb for sibling messages has failed. In this case original message will contain more than one message.
Replace TFW_HTTP_SUSPECTED request flag with usual TFW_HTTP_CONN_CLOSE flag to set up correct
Connection:
header value.Bit operations with message flags are replaced by common
test_bit/set_bit()
API.TfwHttpParser is allocated on per-connection, not on per-message basis.
Some other minor improvements.