http_parser: missing the obs-text #938

kamyuentse · 2018-03-06T16:41:24Z

This is an awesome work, thanks a lot! I am playing with the code of this repo. But I am confused by the HTTP parser here. tempesta use SIMD to boost up the parser, the most important primitive here is the __tfw_strspn_simd, but I find a problem I am trying to run the following tests, according to the comments in the code, the tfw_match_ctext_vchar SHOULD match 0xff(ctext = .... | obs-text), while it doesn't. The reason here might the usage of _mm256_shuffle_epi8 when the most significant bit = 1, the shuffled result would be 0x00, so when the input char = 0b1xxxxxxx, we not accept it, which means we lost the obs-text?

void test0() {
    char buffer[256];
    int i;
    for (i=0; i< 256; i++)
        buffer[i] = 'A';

    size_t res = tfw_match_ctext_vchar(buffer, 256);

    printf("%d\n", res);
}

void test1() {
    char buffer[256];
    int i;
    for (i=0; i< 256; i++)
        buffer[i] = 0xff;

    size_t res = tfw_match_ctext_vchar(buffer, 256);

    printf("%d\n", res);
}

int main() {
    tfw_str_init_const();

    test0();
    test1();
}

Output:

256
0

The text was updated successfully, but these errors were encountered:

krizhanovsky · 2018-03-07T18:02:18Z

Hi @kamyuentse ,

thanks for the report! I can confirm that there is a bug: the test aa34dc9 fails with:

[ 1260.905205] tfw_test: TEST_RUN(cstr, simd_match_ctext_vchar)
[ 1260.907304] tfw_test: /root/tempesta/tempesta_fw/t/unit/test_tfw_str.c:119: error: 
[ 1260.908574] tfw_test: FAIL:
[ 1260.909115] tfw_test:   test__cstr__simd_match_ctext_vchar()
[ 1260.909960] tfw_test:   EXPECT_TRUE((tfw_match_ctext_vchar(buf, 300) == 300)) => 0x0
[ 1260.911779] tfw_test: /root/tempesta/tempesta_fw/t/unit/test_tfw_str.c:123: error: 
[ 1260.912913] tfw_test: FAIL:
[ 1260.913624] tfw_test:   test__cstr__simd_match_ctext_vchar()
[ 1260.914610] tfw_test:   EXPECT_TRUE((tfw_match_ctext_vchar(buf, 300) == 23 + 0x7f)) => 0x0
[ 1260.917873] tfw_test: /root/tempesta/tempesta_fw/t/unit/test_tfw_str.c:125: error: 
[ 1260.919521] tfw_test: FAIL:
[ 1260.920570] tfw_test:   test__cstr__simd_match_ctext_vchar()
[ 1260.921969] tfw_test:   EXPECT_TRUE((tfw_match_ctext_vchar(buf, 300) == 23 + 0x1f)) => 0x0
[ 1260.925262] tfw_test: /root/tempesta/tempesta_fw/t/unit/test_tfw_str.c:127: error: 
[ 1260.927406] tfw_test: FAIL:
[ 1260.928565] tfw_test:   test__cstr__simd_match_ctext_vchar()
[ 1260.930113] tfw_test:   EXPECT_TRUE((tfw_match_ctext_vchar(buf, 300) == 23 + 0xa)) => 0x0
[ 1260.933639] tfw_test: /root/tempesta/tempesta_fw/t/unit/test_tfw_str.c:129: error: 
[ 1260.935244] tfw_test: FAIL:
[ 1260.936384] tfw_test:   test__cstr__simd_match_ctext_vchar()
[ 1260.937764] tfw_test:   EXPECT_TRUE((tfw_match_ctext_vchar(buf, 300) == 23 + 8)) => 0x0
[ 1260.939958] tfw_test: /root/tempesta/tempesta_fw/t/unit/test_tfw_str.c:131: error: 
[ 1260.941319] tfw_test: FAIL:
[ 1260.942042] tfw_test:   test__cstr__simd_match_ctext_vchar()
[ 1260.942966] tfw_test:   EXPECT_TRUE((tfw_match_ctext_vchar(buf, 300) == 23)) => 0x0

kamyuentse · 2018-03-10T03:11:11Z

@krizhanovsky Thanks for your response, I am looking for a solution for this. Extend the original approach to match 0x00 - 0xFF seems impossible, but specified for HTTP parsing, because obs-text is only one range, we can combine _mm256_cmpgt_epi8 and _mm256_and_si256 to match it, finally, we use or to combine the result.

But, come up with a generalized strspn_simd which support 0x00 - 0xFF is best and valuable.

krizhanovsky · 2018-03-10T14:27:31Z

@kamyuentse , thank you for working on the problem.

Yes this is fundamental problem of the algorithm. It was designed to solve URI processing issues and during the design I forgot about obs-text matching filelds. Actually there is only one matching function which must care about obs-text, tfw_match_ctext_vchar(), - all other matchers work fine. However, the function is used relatively frequently, e.g. for matching User-Agent headers (which can be kilobytes in size) or all other, unknown for Tempesta FW, headers.

Of course, we can just add a range check, e.g. compare all the vector bytes for signed < 0 or unsigned > 0x7f, but it adds additional vector instructions which hurts other matching functions. So we have implement tfw_match_ctext_vchar() separately from generic __tfw_strspn_simd().

Next, tfw_match_ctext_vchar() matches just 3 ranges x9, x20-x7e and x80-xff. Maybe CloudFlare's approach can be optimized to provide more speed closer to current tfw_match_ctext_vchar().

Moreover, we want to provide custom characters set for URI and HTTP headers, so which seems also require two function: the current __tfw_strspn_simd() accepting custom allowed alphabet with all characters less than 0xFF and the second one must work with full 0-256 range and typically just do double work of current __tfw_strspn_simd(). Unfortunately, I don't see any optimization opportunities for the last one since we have no true 32-byte shuffle in AVX2 and even AVX512...

Fix #938. See PoC and benchmarks at tempesta-tech/blog@d0b016d

Backport of the fix of #938

krizhanovsky added a commit that referenced this issue Mar 7, 2018

A test for tfw_match_ctext_vchar(): confirm issue #938

aa34dc9

krizhanovsky added the bug label Mar 7, 2018

krizhanovsky self-assigned this Mar 7, 2018

krizhanovsky added this to the 0.5 alpha milestone Mar 7, 2018

krizhanovsky modified the milestones: 0.5 alpha, 0.6 KTLS Mar 10, 2018

krizhanovsky changed the title ~~A question about the parser: missing the obs-text ?~~ http_parser: missing the obs-text Mar 10, 2018

krizhanovsky added a commit to tempesta-tech/blog that referenced this issue Apr 1, 2018

ctext_vchar matcher, see tempesta-tech/tempesta#938

d0b016d

krizhanovsky closed this as completed in bb88c9e Apr 4, 2018

krizhanovsky added a commit that referenced this issue Apr 4, 2018

Merge pull request #994 from tempesta-tech/ak-938

1f74979

Fix #938. See PoC and benchmarks at tempesta-tech/blog@d0b016d

krizhanovsky added a commit that referenced this issue Apr 4, 2018

Backport of the fix of #938

2b8d0a6

krizhanovsky added a commit that referenced this issue Apr 4, 2018

Merge pull request #998 from tempesta-tech/ak-938-backport-0.5

11b2678

Backport of the fix of #938

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

http_parser: missing the obs-text #938

http_parser: missing the obs-text #938

kamyuentse commented Mar 6, 2018

krizhanovsky commented Mar 7, 2018

kamyuentse commented Mar 10, 2018

krizhanovsky commented Mar 10, 2018

http_parser: missing the obs-text #938

http_parser: missing the obs-text #938

Comments

kamyuentse commented Mar 6, 2018

krizhanovsky commented Mar 7, 2018

kamyuentse commented Mar 10, 2018

krizhanovsky commented Mar 10, 2018