Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/2 implementation: HTTP/2-cache (#309). #1383

Conversation

aleksostapenko
Copy link
Contributor

No description provided.

@aleksostapenko aleksostapenko changed the title HTTP/2 Parser implementation: HTTP/2-cache (#309). HTTP/2 implementation: HTTP/2-cache (#309). Feb 18, 2020
static inline unsigned short
tfw_h2_pseudo_index(unsigned short status)
{
switch (status) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be faster wit array?
Of course will use minimum 512 bytes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler will do binary search over the labels which is much faster than accessing 4 cache line array. The compiler also generates lookup table (exactly the array you proposed), but for more localized values.

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did only quick and highlevel review, so I left only several comments and can not say whether it's LGTM or not. A more solid review from @ikoveshnikov is required.

r = write_actor(db, trec, resp, p, s->len, &dc_iter);
if (likely(!r))
*acc_len += dc_iter.acc_len;
goto out;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless goto instead of simple return r

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

* version of this header.
*/
if ((field->flags & TFW_STR_HBH_HDR)
|| hid == TFW_HTTP_HDR_SERVER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we skip Via as well if the response already contains the header?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must preserve all the already existing Via headers in the message while forwarding it (https://tools.ietf.org/html/rfc7230#section-5.7.1). The resulting order of the Via headers remains correct, as we add our Via header in the end - after all the response headers are already copied.

TfwH2TransOp op = cache ? TFW_H2_TRANS_EXPAND : TFW_H2_TRANS_ADD;

if (!h_mods)
return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double condition check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

@@ -1529,44 +1978,33 @@ tfw_cache_build_resp_body(TDB *db, TfwHttpResp *resp, TdbVRec *trec,
* TODO use iterator and passed skbs to be called from net_tx_action.
*/
static TfwHttpResp *
tfw_cache_build_resp(TfwHttpReq *req, TfwCacheEntry *ce)
tfw_cache_h2_build_resp(TfwHttpReq *req, TfwCacheEntry *ce, time_t lifetime,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function builds HTTP/1 responses as well as HTTP/2, so why to rename it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected (reverted names).


static inline int
tfw_cache_h2_set_status(TDB *db, TfwCacheEntry *ce, TfwHttpResp *resp,
TdbVRec **trec, char **p, unsigned long *acc_len)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function works for both HTTP versions, so I propose not to use h2 in names of such functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

if (st_index) {
if (tfw_cache_h2_copy_int(&ce->hdr_len, st_index, 0xF,
p, trec, tot_len))
return -ENOMEM;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's cool: now we store all static-table-indexed headers by indexes only in the cache!

It seems if we send a response from the cache, we don't need and should skip the while (!hdrs_end) loop calling tfw_hpack_encode() for all the headers in tfw_h2_resp_adjust_fwd().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - it is exactly how it works now: the response from cache is served through tfw_http_req_cache_service() -> tfw_h2_resp_fwd() flow, bypassing the tfw_h2_resp_adjust_fwd() procedure.

* likely - aggregated into one general-purpose procedure, after HTTP/1.1-parser
* extending (see also TODO-comment for @tfw_http_hdr_split_cp() procedure); for
* now this procedure leaves RWS in the resulting @val and this is not correct
* in context of HTTP/2<=>HTTP/1.1 comparisons, transformations etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leave a comment to remember that the TODO must be fixed in upcoming PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO is fixed.

* TODO: this procedure should be evicted after HTTP/1.1-parser extending
* (and @tfw_http_hdr_split()/@tfw_h2_msg_hdr_length procedures upgrading) in
* order to split headers into separate chunks for name, ':', LWS, value, and
* RWS - during the HTTP-message parsing stage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leave a comment to remember that the TODO must be fixed in upcoming PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO is fixed.

Parse name, colon, LWS, value and RWS of HTTP/1.1-response headers into
separate chunks to facilitate the name/value splitting and colon/OWS
eviction during HTTP/1.1=>HTTP/2 response transformation.
Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and needs an accurate review from @ikoveshnikov

if (unlikely(!idx))
break;
--idx;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is used instead of tfw_h2_msg_hdr_length() (e.g. in __set_etag()) and now the loop body w/o this while looks much better and acceptable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - there is no need to iterate characters inside the chunks any more.

@@ -302,6 +300,9 @@ enum {
((!hmmsg->conn || TFW_CONN_TYPE(hmmsg->conn) & Conn_Srv) && \
hmmsg->pair && TFW_MSG_H2(hmmsg->pair))

#define H2_STAT_VAL_LEN 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either move the definition close to existing macros in the beginning of the http_msg.c or add a comment here. http.h is too overloaded and messy, additions of a new undocumented macro definitions is not a good choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: moved to http_msg.h with comment.

@@ -302,6 +300,9 @@ enum {
((!hmmsg->conn || TFW_CONN_TYPE(hmmsg->conn) & Conn_Srv) && \
hmmsg->pair && TFW_MSG_H2(hmmsg->pair))

#define H2_STAT_VAL_LEN 3
#define RESP_BUF_LEN 128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macro defines size of buffer for some temporal operations, it's not related to http itself and it's better to define it right above buffer definitions (which are static and never exposed outside their translation unit).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

struct sk_buff **skb_head = &resp->msg.skb_head;
TfwStr crlf = { .data = S_CRLF, .len = SLEN(S_CRLF) };

r = tfw_http_msg_expand_data(&mit->iter, skb_head,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, probably we can avoid skb expand operations during serving from cache. We allocate skb big enough to fit all headers stored in cache, but we can add some extra space to put a set-cookie header: we know everything about it: header name size, cookie name, and length of Tempesta sticky cookie. If you're agree, just drop a TODO here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that along with copying headers from cache and addition the known in advance headers (e.g. set-cookie, date etc.) we must also process configured headers corrections (which can be addition/substitution/deletion/appending) and we cannot predict here the resulting size of memory for the skbs to be allocated in the target response.

resp = NULL;
goto out;
if (TFW_MSG_H2(req)) {
id = tfw_h2_stream_id(req);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tfw_h2_stream_id() gets stream id under h2 context lock. Why? The request can't be reassigned to other stream and stream can't be destroyed until request is alive, aren't they?

Copy link
Contributor Author

@aleksostapenko aleksostapenko Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream can't be destroyed until request is alive

Actually can: during processing of request on http layer, the framing layer can receive the RST_STREAM from the client and stream will be moved to the final closed state (and consequently - will be removed).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, i will check that, but it seems to be resource management question then. Stream can be removed, but stream object must exist until referencing request exists. Same as http1.1 requests keep connection object alive.

Anyway. no need to change this right now, i was curious for locking reason.

Copy link
Contributor

@vankoven vankoven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Great job!

st_index = hdr->hpack_idx;
h_len = tfw_h2_hdr_size(s_nm.len, s_val.len, st_index);

/* Don't split short stprings. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Don't split short stprings. */
/* Don't split short strings. */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

* version of this header.
*/
if ((field->flags & TFW_STR_HBH_HDR)
|| hid == TFW_HTTP_HDR_SERVER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's skip Set-Cookie header. Actually backend should protect responses with private cookies from caching, but I'll be good not to spread the same cookie among all clients if it mistakenly leaks. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a result of discussion in the chat, it was decided to postpone this change and additionally investigate it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some research and moved the requirement to #1075 which is all about cookies and security.

@aleksostapenko aleksostapenko merged commit 7b4fc72 into ao-309-parser-transformation Mar 4, 2020
@aleksostapenko aleksostapenko deleted the ao-309-parser-transformation-h2-cache branch March 4, 2020 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants