Streaming mode of operation. Part I: request streaming #1067

vankoven · 2018-09-25T12:00:08Z

Obsoletes #1012 . Original implementation was full of missed questions, trying to solve them has brought me to a new design. I've spilt the full PR in two parts: streaming of requests and streaming of responses. It's too gigantic for single PR and there tons of very specific changes hard to track altogether.

I. Per-connection memory limit and receive window steering

The idea behind is to have only one virtual buffer shared between receive queue and requests stored inside Tempesta. Advertised receive window must not overcome free space in that virtual buffer.

To implement this, SOCK_RCVBUF_LOCK user lock is set for the client connection socket and sk->sk_rcvbuf size is manually controlled by Tempesta. Unlike to userspace behaviour of the SO_RCVBUF option, sk->sk_rcvbuf is not fixed. Instead it's Initialized with net.ipv4.tcp_rmem[1] as usual TCP connection and is increased if the client has a good transmission rate. The new client_rmem replaces net.ipv4.tcp_rmem[2], so client connections has theirs' own memory limits, while system-wide limit describes the limits for the server connections.

Second thing changed here - tracking per socket memory. Once skb added to the receive queue, the sk->sk_rmem_alloc counter is increased by skb_true_size. The odd between sk->sk_rcvbuf size and sk->sk_rmem_alloc is used to calculate a new receive window size. When an skb is detached from the receive queue it's not tracked by sk->sk_rmem_alloc any more, and receive window size returns back to it's original size. In the proposed solution after skb is orphaned and a new HTTP request is parsed from the skb, original size of HTTP request is added to sk->sk_rmem_alloc to calculate new and smaller window size. After response is sent to the client, the request is destroyed and sk->sk_rmem_alloc is decreased to produce bigger receive window size and allow the client to send more data. No other changes are made in window calculation, so all the other features works well: advertised widow is always multiples of mss, advertised window is never shrank and so on.

Actually limit is 2*client_rmem bytes. It's done to provide the same behaviour as well-known SO_RCVBUF socket option. If the receive buffer is depleted, zero window size is advertised by the kernel code and the client sends window probes, after keepalive_timeout seconds client will be disconnected.

II. Streaming requests to server connections

Basic idea: request requests longer than client_msg_buffering bytes are send by parts, as soon as a new chunk arrives. Headers are always fully buffered, even if header part is longer than the limit. To reduce extra work, a newly parsed skb is added to buffered part in full without breaking it into parts.

client_msg_buffering must be less than client_rmem. Default configuration must safe, so buffering limit is set to 1Mb by default.

Streamed requests brings new problems, which doesn't appear in full buffered mode:

Request processing starts right after end of buffered part. The rest of request may be received and processed after buffered pard is assigned to a backend connection for transmission.
Streamed message can't be re-sent. Streamed part of the request is destroyed once sent to backend, there is nothing to re-send.
If only a part of request is sent out to the backend, but an error happen during receiving a new request chunk, backend connection must be closed to recover.
Imagine that first N of requests in the server connection's forward queue was streamed to backend and (N+1) request is partly streamed. If an error happen during receiving a new chunk of that partly streamed request, backend connection must be closed. But immediate close of backend connection will degrade quality of service, since previous request can't be re-sent and evicted. To provide a better level of service, we have to wait until all sent requests will be responded. Another option is streaming requests one-by-one without pipelining, but that will cause extra buffering overhead and overall performance impact. The implementation provides the first solution.
Backend may respond to streamed request earlier that request is fully received. E.g. Nginx sends 200 OK reply to the GET request with chunked body before the body is received; or backend may send 4xx or 5xx error and close the connection before request is fully received (Just like Tempesta in case of attacks or errors). In case of early responses all the following request chuncks must be forwarded to the backend.
A new chunk of streamed request may be unneeded and should be dropped without processing, if request is served from cache, redirect response is prepared (sticky cookie module), or backend responded already responded
More than one response may be created for a single request, e.g. response may be built from cache, but later an error is detected in a new request chunk, and Tempesta is configured to send a reply. The proposed solution doesn't send a response before request is fully received, and never shares the content with clients that sends errors. But responses from backend are treated as primary action: if the error happen, but backend is already responded, backend response will be used.
Forwarding from backend's forward queue and parsing a new request chunk may be happen simultaneously, and multiple access to the same request may happen. Some kind of synchronisation is required. In the proposed solution streamed message is split into two parts: TfwHttpReq which mustn't be modified after enlisting into forward queue, and TfwHttpMsgPart - streamed part of the request, which can contain the rest of the body and trailer part headers and can be modified at any time.
As for me, client_rmem is highly recommended for using with client_msg_buffering. Multiple clients may stream to the same backend connection, but Tempesta may stream only one int the same time.
Two options are possible: pause other clients by sending them zero windows, or buffer streamed requests until backend is ready. The later is implemented.
According to RFC trailer headers can be excluded from processing by streaming proxy and forwarded. But we aimed to check consistence of every forwarded header, and we provide some options to modify headers before forwarding. No exclusions for trailer headers in streamed message part.

III. Other fixes and improvements.

Fix [Cache] Unathorized clients should be blocked before accessing cache #899: check sticky cookie before forwarding request to cache, (the same fix also partly implements Fast sticky session scheduler is broken #1043).
Fix Drop client connections on response blocking #962: Drop client connections on response blocking.
Block messages with not allowed trailer headers. Partly done: all headers known by Tempesta by this moment are prohibited in trailer part. The list must be extended according to RFC (Todo comment is provided).
Don't try to send error message to client on response parse errors, if client is the Health Monitor subsystem.
Drop request/response and close the connection if splitting skb for sibling messages has failed. In this case original message will contain more than one message.
Replace TFW_HTTP_SUSPECTED request flag with usual TFW_HTTP_CONN_CLOSE flag to set up correct Connection: header value.
Bit operations with message flags are replaced by common test_bit/set_bit() API.
TfwHttpParser is allocated on per-connection, not on per-message basis.
Some other minor improvements.

Fix missed semicolon

1M backends

Split skb if can't allocate a new response

generating lua config from test Add 1k and 1M test Test responses add 1M response test

expect 403 for wrong length Parsing multiple responses Expect exactly 1 parsing error check for garbage after response end Write special testers for wrong length

Add missing length tests Close connection after response w/out content-length Add tests for multiple or invalid content-length

Fix warning: /tmp/client/request_1M.lua: /tmp/client/request_1M.lua:5: unexpected symbol near '='

Fix #926

Long body in responses and requests

If ss_skb_split() fails, a part of the next http message will be added to the end of last parsed message. If the error happened, there is no recourse, we must close the connection.

Buffered part of the message is must be detached from the connection as early as possible to be forwarded to the receiver. Streamed part is constantly updated by the sender. If both parts are used inside the same object, synchronization will be required, since buffered part may be accessed by multiple cpus.

No error processing in this commit.

Don't try to send error response to Health Monitor subsystem. It's a locally generated request and it's not assigned to any client connection.

The same client may stream requests to multiple server connections, e.g. in full-stream mode. Close the server connections to recover after sending incomplete request.

…and_log()

…ut processing

…message

…error

aleksostapenko and others added 30 commits March 14, 2018 11:48

Merge branch 'master' into ao-903

3c7c10a

Unit tests correction (PR#952).

593394d

Fix missed semicolon

9d72d05

Merge pull request #954 from tempesta-tech/ik-fix-debug-build

f5b836b

Fix missed semicolon

Fix #926

2611687

Change HMAC calculation in http_sticky unit tests (PR#952).

07b5a60

Set tcp_max_orphans on tempesta

d25950e

Merge pull request #885 from tempesta-tech/vlts-680-1M1

cc72221

1M backends

Split skb if can't allocate a new response

8deeb81

fix code review comments

4106c59

Merge pull request #957 from tempesta-tech/ik-fix-split-sibling

3bd1734

Split skb if can't allocate a new response

Add long body tests

dc30315

generating lua config from test Add 1k and 1M test Test responses add 1M response test

Request tests with wrong length

0b1e9cc

expect 403 for wrong length Parsing multiple responses Expect exactly 1 parsing error check for garbage after response end Write special testers for wrong length

responses tests

de8924f

Add missing length tests Close connection after response w/out content-length Add tests for multiple or invalid content-length

Fixes

37ef594

fixes after review

e63ea17

move saving config to file to test

df74b83

func tests: remove extra commas unwanded by wrk

89df4f9

Fix warning: /tmp/client/request_1M.lua: /tmp/client/request_1M.lua:5: unexpected symbol near '='

Fixes after review

f5143cd

Fix handling asyncore and parsing error

46eeadb

handle close

fa85353

Merge pull request #955 from tempesta-tech/ak-926

03f0aea

Fix #926

Don't measure number of connections during tests with invalid response

ae3af02

fix TestInvalidResponse

416e450

Merge pull request #897 from tempesta-tech/vlts-619

81cf13f

Long body in responses and requests

Don't rate limit warnings on configuration parsing and starting

da33926

func tests: mark test TestPurge as expected to fail until #788 is solved

309c961

func tests: update required packages for running tests

8be1613

update outdated function

da61ae1

fix copy-paste error

1d1c13b

vankoven added 22 commits September 15, 2018 02:49

parser: add helper function to allow processing of streamed parts

c8f53b5

Fix #899: check for sticky cookie before passing request to cache

65bf41f

unit tests: set correct connection types for sticky test

32198f9

block message if not-allowed header was listed in chunked trailer part

9bedda8

split skb as soon http_parser returned TFW_PASS

fd043e5

If ss_skb_split() fails, a part of the next http message will be added to the end of last parsed message. If the error happened, there is no recourse, we must close the connection.

Don't cache streamed responses for now

74b6a61

Stream request to the backend server

0788d5f

No error processing in this commit.

Evict streamed request after first and only possible forwarding attempt

e01528b

Unlink resp from the connection before processing it

53043f8

Destroy response from HM if response parsing has failed

1601a9b

Don't try to send error response to Health Monitor subsystem. It's a locally generated request and it's not assigned to any client connection.

Close server connection if client disconnected while streaming

231f7e6

The same client may stream requests to multiple server connections, e.g. in full-stream mode. Close the server connections to recover after sending incomplete request.

Don't close server connection for every incomplete request

287e2ef

Fix incorrect connection close conditions in tfw_http_cli_error_resp_…

01cad5c

…and_log()

Fix locking issues while streaming requests

da43ba2

If streamed request can't be delivered, close the both side connections

ffcfd49

Process streamed req part before forward if to backend server

5ba849a

Skip the streamed request part if it's not going to be forwarded

5335a18

Server may send response before streamed request is fully sent

d990af0

Server may send response for already dropped client, destroy it witho…

fd814e2

…ut processing

Close client connection if error happen during receiving of streamed …

71e87b5

…message

Multiple responses may be created for one request, forward only last …

6147875

…error

vankoven mentioned this pull request Sep 25, 2018

Add message streaming mode #1012

Closed

vankoven mentioned this pull request Oct 8, 2018

Sticky cookies load balancing #685

Closed

i-rinat mentioned this pull request Nov 28, 2018

relax Content-Length limit from UINT_MAX to ULONG_MAX #1123

Merged

vankoven mentioned this pull request Dec 13, 2018

Fix various misspells #1132

Merged

krizhanovsky closed this Jan 26, 2019

krizhanovsky force-pushed the ik-streaming-redesign branch from c59e6a9 to 6147875 Compare January 26, 2019 22:04

This was referenced Feb 13, 2019

A new iteration of #498: message streaming mode. NOT done and work in process #1183

Closed

HTTP message buffering and streaming #498

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming mode of operation. Part I: request streaming #1067

Streaming mode of operation. Part I: request streaming #1067

vankoven commented Sep 25, 2018

Streaming mode of operation. Part I: request streaming #1067

Streaming mode of operation. Part I: request streaming #1067

Conversation

vankoven commented Sep 25, 2018

I. Per-connection memory limit and receive window steering

II. Streaming requests to server connections

III. Other fixes and improvements.