Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix farious frang errors #1200

Merged
merged 36 commits into from
Aug 1, 2019
Merged

Fix farious frang errors #1200

merged 36 commits into from
Aug 1, 2019

Conversation

vankoven
Copy link
Contributor

@vankoven vankoven commented Mar 1, 2019

fix #1028

Fixed:

  • test trailer headers across frang limits after body is processed: maximum header len, don't allow same header in header and trailer parts. Last check requires to know, whether is message is fully parsed.
  • close|block client connection immediately once response breaking code block rules is encountered
  • fix possible null pointer dereferencing while testing response code block rules for Health Monitor originated requests.
  • add missing frang limits names to tempesta_fw.conf
  • TfwHttpReq->frang_st duplicates existing Gfsm state in frang FSM and it was removed. Frang states are hookable now, so advanced analytics can be attached to it.

UPD

Major updates in handling frang directives.

  • Frang directives now are really per-vhost and per-location. Each vhost and location may include frang_limits section now. Previous behaviour: frang limits may be defined only globally and only inside locations (and not inside vhosts!), but limits was listed in locations without grouping into section frang_limits

  • Nested locations and vhosts inherit Frang configuration from parents. frang_limits section now can appear many times on the same level to force new defaults before new bunch of vhosts|locations. Previous behaviour: no inheritance, every location must define it's own frang settings, so less copy-paste now.

  • Frang limits was split into two groups: global and per-vhost. Since messages are received by parts, it's not possible to deduce which per-vhost limits should be applied before the whole message is received. So some limits has no sense in beeing per-vhost, they're global now.

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a very quick high-level review, a more accurate reviews are required.

tempesta_fw/http_limits.c Outdated Show resolved Hide resolved
tempesta_fw/http_limits.c Outdated Show resolved Hide resolved
tempesta_fw/http_limits.c Show resolved Hide resolved
tempesta_fw/http_limits.c Outdated Show resolved Hide resolved
tempesta_fw/http_limits.c Show resolved Hide resolved
tempesta_fw/http_limits.c Show resolved Hide resolved
tempesta_fw/http_limits.c Outdated Show resolved Hide resolved
Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems not all #1028 requirements are satisfied and some logic should be reworked.

tempesta_fw/http_limits.c Outdated Show resolved Hide resolved
tempesta_fw/gfsm.c Outdated Show resolved Hide resolved
tempesta_fw/gfsm.c Outdated Show resolved Hide resolved
tempesta_fw/http_limits.c Outdated Show resolved Hide resolved
tempesta_fw/http_limits.c Outdated Show resolved Hide resolved
@vankoven vankoven force-pushed the ik-fix-frang-errors branch 2 times, most recently from e10a035 to ce8d74d Compare May 25, 2019 23:27
Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically this version is good, but there are several architectural questions and notes, so I'd like to discuss the notes and review the next changes afterwards.

tempesta_fw/http_limits.c Show resolved Hide resolved
tempesta_fw/http_limits.c Show resolved Hide resolved
tempesta_fw/http_limits.c Show resolved Hide resolved
tempesta_fw/http_limits.c Outdated Show resolved Hide resolved
tempesta_fw/http_limits.c Outdated Show resolved Hide resolved
tempesta_fw/http_limits.h Outdated Show resolved Hide resolved
tempesta_fw/http_limits.h Show resolved Hide resolved
tempesta_fw/http_limits.c Show resolved Hide resolved
tempesta_fw/http_limits.c Show resolved Hide resolved
tempesta_fw/vhost.c Show resolved Hide resolved
Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes since last review, please fix them ASAP.

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small improvements are still required. The most crucial change is to call Frang on moving from HTTP headers to HTTP body.

tempesta_fw/http_limits.c Outdated Show resolved Hide resolved
if (hdr_tmt) {
req->tm_header = jiffies;
if (!f_cfg->http_trailer_split)
return TFW_PASS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add framg_msg() here to notify about the reason for a request blocking

: req->vhost->frang_gconf;
tfw_cli_conn_close_all_sync((TfwClient *)req->conn->peer);
if (fg_cfg->ip_block)
tfw_filter_block_ip(&FRANG_ACC2CLI(ra)->addr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is till not addressed. If we block IP, but leave a client connection, then we can send data to a client, but we won't receive neither ACK not probbing response from it - the connection will spend our resources untill OS closes it.

I believe we should call tfw_filter_block_ip() only with connection closing, i.e. there should be a new API function or tfw_filter_block_ip() just can call connection closing. Next, we can not do normal close here, because on blocked IP, we can not receive FIN-ACK and the connection hangs. We need to use RST closing or just free the connection resources silently.

The proper closing is the subject for #861 and sycnhronous doesn't solve the issue.

: req->vhost->frang_gconf;
tfw_cli_conn_close_all_sync((TfwClient *)req->conn->peer);
if (fg_cfg->ip_block)
tfw_filter_block_ip(&FRANG_ACC2CLI(ra)->addr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense just to remove or comment (out?) the synchronous closing with TODO referering #860.

if (r == TFW_PASS && resp_cblk)
r = frang_bad_resp_limit(ra, resp_cblk);
__FRANG_FSM_MOVE(Frang_Req_Hdr_Start);
r = frang_http_req_incomplete_body_check(ra, data, fg_cfg, f_cfg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, make sense. Please write explicitly in the Wiki that we process trailers accordingly the body limits.

tempesta_fw/http_limits.h Show resolved Hide resolved

bool http_ct_required;
bool http_host_required;
bool http_trailer_split;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment the field above

.deflt = "false",
.handler = tfw_cfgop_frang_trailer_split,
.allow_reconfig = true,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wiki and #673 must be updated for the new configuration option.

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Frang changes are move to #1302 since they depend on #309. So now I'm generally good with the PR and only small changes are required.

Copy link
Contributor

@i-rinat i-rinat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patchset is large so I'll submit my comments so far. From the part I looked into, I found no major issues. Mostly spelling typos, and also some side notes. (Feel free to mark them as resolved at your discretion.)
I'll continue to read the patchset further.

@@ -921,10 +921,14 @@

# TAG: frang_limits
#
# The section containing static limits for the classifier.
# The section containing static limits for the classifier. Can apper at top
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The section containing static limits for the classifier. Can apper at top
# The section containing static limits for the classifier. Can appear at the top

# The section containing static limits for the classifier.
# The section containing static limits for the classifier. Can apper at top
# level, inside 'vhost' directive, inside 'location' directive. Once the
# directive is listed in the configuration it redefine default vaues for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# directive is listed in the configuration it redefine default vaues for
# directive is listed in the configuration it redefines default values for the

* - get vhost from the HTTP session information (by Sticky cookie);
* - get Vhost according to TLS SNI header parsing.
* But vhost returned from all that functions can be very different
* depending on HTTP chain configuration due to it flexibility and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* depending on HTTP chain configuration due to it flexibility and
* depending on HTTP chain configuration due to its flexibility and

* But vhost returned from all that functions can be very different
* depending on HTTP chain configuration due to it flexibility and
* client (attacker) behaviour.
* The most secure and a slow way: compare all of them and reject if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The most secure and a slow way: compare all of them and reject if
* The most secure and slow way: compare all of them and reject if

* depending on HTTP chain configuration due to it flexibility and
* client (attacker) behaviour.
* The most secure and a slow way: compare all of them and reject if
* at least some doesn't match. Can be too strict, service quality can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* at least some doesn't match. Can be too strict, service quality can
* at least some do not match. Can be too strict, service quality can

@@ -298,6 +310,21 @@ tfw_cli_conn_close_all(void *data)
return tfw_peer_for_each_conn(cli, conn, list, __cli_conn_close_cb);
}

/**
* Close all connections with given client, called on security events. Unlike
* asynchronous function above, this one must guarantee that all the close
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose to change "asynchronous function above" to "tfw_cli_conn_close_all". It's too easy to add another function between these two, and so to accidentally make the comment wrong.

* Close all connections with given client, called on security events. Unlike
* asynchronous function above, this one must guarantee that all the close
* requests will be done. Attackers can spam Tempesta with lot of requests and
* connections, trying to cause work queue overrun and delay of security events
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* connections, trying to cause work queue overrun and delay of security events
* connections, trying to cause a work queue overrun and delay security events

}

if (!READ_ONCE(tfw_tls.fallback_to_dflt_sni) || !vhost->vhost_dflt) {
tfw_vhost_put(vhost);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function name says it "gets", but here it puts the refcounter on vhost. Wouldn't be better to leave putting vhosts to the calling side?

/**
* Find matching vhost according to server name in SNI extension. The function
* is also called if there is no SNI extension and fallback to some default
* configuration is required. In the last case @data is NULL and @len is 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* configuration is required. In the last case @data is NULL and @len is 0.
* configuration is required. In the latter case @data is NULL and @len is 0.

return -TTLS_ERR_BAD_HS_CLIENT_HELLO;

if (data && len)
vhost = tfw_vhost_lookup(&srv_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vhost = tfw_vhost_lookup(&srv_name);
vhost = tfw_vhost_lookup(&srv_name);

Copy link
Contributor

@i-rinat i-rinat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found no issues with the PR, looks good.

There are some more spelling comments below. And also a comment about locking required for configuration processing. If that turns out to be a real issue and not something I mistakenly saw as a bug, its fix is beyond the scope of this PR anyway.

@@ -625,6 +630,74 @@ static TfwConnHooks tls_conn_hooks = {
.conn_send = tfw_tls_conn_send,
};

static bool
tfw_tls_get_if_configured(TlsCtx *ctx, TfwVhost *vhost)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function only uses ctx to set ctx->peer_conf. This parameter can be easily removed along with making function to return cfg on success or NULL on failure. That way it will be obvious that function won't touch TlsCtx at all, and only needs TfwVhost to make a decision.

ttls_pk_context key;
unsigned long crt_pg_addr;
unsigned int crt_pg_order;
bool allow_unknown_sni;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Allow unknown server name indication" sounds a bit off. Perhaps, "allow_unknown_server_name"?

Also, what does this flag mean? As far as I can understand from the code, it's something like: "If client did not provide SNI extension, or if the name in that extension did not correspond to any known vhosts, use the default one". Is that correct?

unsigned long crt_pg_addr;
unsigned int crt_pg_order;
bool allow_unknown_sni;
bool fallback_to_dflt_sni;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"sni" here also contains "indication" part, which doesn't make sense in a flag name.

Also, what does this flag mean? As far as I can tell, it's: "If selected vhost does not contain required TLS configuration, use configuration from the default vhost". Is that correct?

static int
tfw_tls_peer_tls_init(TfwVhost *vhost)
{
TlsConfEntry *conf = (TlsConfEntry *)vhost->tls_cfg.priv;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicit cast is not necessary here.

static inline TlsCertConf *
tfw_tls_get_cert_conf(TfwVhost *vhost, unsigned int directive)
{
TlsConfEntry *conf = (TlsConfEntry *)vhost->tls_cfg.priv;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type casting is not required here.

return 0;

/*
* Implicit default vhost is still useful even if it never used to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Implicit default vhost is still useful even if it never used to
* Implicit default vhost is still useful even if it's never used to

if (tfw_frang_cfg_inherit(vh_dflt->loc_dflt->frang_cfg,
&tfw_frang_vhost_reconfig))
return -ENOMEM;
sg_def = tfw_sg_lookup_reconfig(TFW_VH_DFT_NAME, SLEN("default"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sg_def = tfw_sg_lookup_reconfig(TFW_VH_DFT_NAME, SLEN("default"));
sg_def = tfw_sg_lookup_reconfig(TFW_VH_DFT_NAME, SLEN(TFW_VH_DFT_NAME));

rcu_read_lock();
vh_list = rcu_dereference(tfw_vhosts);
rcu_read_unlock();
rcu_assign_pointer(tfw_vhosts, tfw_vhosts_reconfig);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe here is an issue with multiple threads calling this function. For example, if administrator starts Tempesta multiple times in parallel. Then we can get here in two or more threads, get non-NULL vh_list, and then try to release its contents multiple times.

The obvious solution is to wrap rcu_dereference and rcu_assign_pointer with a lock. But since this line uses tfw_vhosts_reconfig which contents is updated in many places, we need a larger scope lock that protect whole configuration sequence.

static TfwCfgSpec tfw_vhost_location_specs[] = {
/*
* Not all Frang specs can be applied to nested locations and can be applied
* only as high-level options. It's possible to provide it's own sets for global
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* only as high-level options. It's possible to provide it's own sets for global
* only as high-level options. It's possible to provide their own sets for global

tls/ttls.h Outdated
* ttls_set_hs_ca_chain() as well as the client
* authentication mode with \c ttls_set_hs_authmode(),
* certificate is found, the callback must fill the
* peer tls configuration part in TLS context,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* peer tls configuration part in TLS context,
* peer tls configuration part in the TLS context,

@avbelov23
Copy link
Contributor

avbelov23 commented Jul 15, 2019

Corrected tests, because in this PR blocking by http_resp_code_block occurs at the response stage, and this correct.
tempesta-tech/tempesta-test#90

@@ -252,7 +252,7 @@ typedef struct {
do { \
char abuf[TFW_ADDR_STR_BUF_SIZE] = {0}; \
tfw_addr_fmt(addr, TFW_NO_PORT, abuf); \
TFW_DBG("frang: " fmt_msg, abuf, ##__VA_ARGS__); \
T_DBG("frang: " fmt_msg, abuf, ##__VA_ARGS__); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange space at the end of the line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a tabulation. As there is \ at the end of the line, some spacing is required anyway.

tls/ttls.h Outdated
typedef struct {
/** allowed ciphersuites per version */
const int *ciphersuite_list[4];
/** allowed curves */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange tabs at the end of comments and also below

vankoven added 26 commits July 25, 2019 12:22
…ined

Default vhost stores default policies for the connection or requests.
Remove differences between implicit and explicit 'default' vhosts
and make it possible to assign such vhost to a connection or request.
It was discussed, that all client connections with unknown
or not set sni must always use default vhost. Thus there is no
need in 'tls_fallback_default' directive at all: it would be
always enabled.
# Conflicts:
#	tempesta_fw/vhost.c
@vankoven vankoven force-pushed the ik-fix-frang-errors branch from 6b4f1be to e0e1fb7 Compare July 31, 2019 07:39
@vankoven vankoven merged commit 18f852d into master Aug 1, 2019
@vankoven vankoven deleted the ik-fix-frang-errors branch August 1, 2019 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Frang limits doesn't work properly for messages containing multiple skbs
4 participants