Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: simplify checkInvalidHeaderChar #18381

Closed

Conversation

sethbrenith
Copy link
Contributor

@sethbrenith sethbrenith commented Jan 25, 2018

In the spirit of 17399, we can also simplify checkInvalidHeaderChar to use regex matching instead of a loop. This makes it faster on long matches and slower on short matches or non-matches. This change also includes some sample data from an AcmeAir benchmark run, as a rough proxy for real-world data.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)
  • http
Benchmark results
                                                                                                improvement confidence      p.value
 http\\check_invalid_header_char.js n=1000000 input="\177"                                        -56.65 %        *** 1.320705e-40
 http\\check_invalid_header_char.js n=1000000 input=""                                            -70.08 %        *** 1.329603e-60
 http\\check_invalid_header_char.js n=1000000 input="\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tFoo bar baz"   198.82 %        *** 1.052799e-76
 http\\check_invalid_header_char.js n=1000000 input="1"                                           -50.00 %        *** 7.800619e-22
 http\\check_invalid_header_char.js n=1000000 input="20091"                                        -6.44 %        *** 7.491287e-04
 http\\check_invalid_header_char.js n=1000000 input="ä,-æ-╪å`¢"                                   -33.16 %        *** 5.648190e-21
 http\\check_invalid_header_char.js n=1000000 input="close"                                        22.88 %        *** 1.497247e-25
 http\\check_invalid_header_char.js n=1000000 input="en-US"                                        24.31 %        *** 4.582121e-46
 http\\check_invalid_header_char.js n=1000000 input="foo\\nbar"                                    -5.84 %        *** 2.581802e-08
 http\\check_invalid_header_char.js n=1000000 input="group_acmeair"                                19.81 %        *** 2.764867e-37
 http\\check_invalid_header_char.js n=1000000 input="gzip"                                          4.54 %          * 1.921983e-02
 http\\check_invalid_header_char.js n=1000000 input="Here is a value that is real..."(truncated)  280.31 %        *** 8.462204e-37
 http\\check_invalid_header_char.js n=1000000 input="keep-alive"                                   80.49 %        *** 4.132616e-53
 http\\check_invalid_header_char.js n=1000000 input="private"                                      41.84 %        *** 1.856953e-15
 http\\check_invalid_header_char.js n=1000000 input="SAMEORIGIN"                                   82.00 %        *** 2.317592e-50
 http\\check_invalid_header_char.js n=1000000 input="Sat, 07 May 2016 16:54:48 GMT"               199.94 %        *** 2.698766e-29
 http\\check_invalid_header_char.js n=1000000 input="text/html; charset=utf-8"                    176.87 %        *** 1.854601e-33
 http\\check_invalid_header_char.js n=1000000 input="text/plain"                                   75.47 %        *** 1.241438e-24

@nodejs-github-bot nodejs-github-bot added the http Issues or PRs related to the http subsystem. label Jan 25, 2018
return true;
}
return false;
return !headerCharRegex.test(val);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Rather than doing !/^[...]*$/.test(val), it might be simpler to do /[^...]/.test(val).
  2. It seems there are some regressions in the benchmark. If they still exist after swapping the conditions as I did above, we might want to add a fast case for things like the empty string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am relatively certain that there will still be a certain amount of regression even with that change. If not: wonderful. But in case the regressions continue to exist: I would like to keep the implementation and add the RegExp for val.length > 5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed response, and thanks for the feedback!

  1. Agreed, /[^...]/.test(val) is prettier. I'll update it.
  2. I think regressing some particular strings is okay if the expected overall throughput on realistic data is improved. I don't have real-world data, but I do have data from an AcmeAir run (which uses popular modules in relatively straightforward ways, so might be somewhat representative of real data), and it tends to call this method with longer strings where the regex is faster. The same sort of suggestions came up in the previous change to simplify checkIsHttpToken, and people ended up deciding that the readability win was worth regressing a few benchmark cases. Furthermore, V8 folks like @bmeurer and @psmarshall are committed to ensuring that idiomatic code is performant, and there is already an open bug for improving regex performance on short inputs.

Copy link
Contributor Author

@sethbrenith sethbrenith Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison between iterations (old is !/^[...]*$/.test(val), new is /[^...]/.test(val)):

                                                                                                improvement confidence      p.value
 http\\check_invalid_header_char.js n=1000000 input="\177"                                        -10.43 %        *** 1.145512e-11
 http\\check_invalid_header_char.js n=1000000 input=""                                             20.55 %        *** 5.135039e-20
 http\\check_invalid_header_char.js n=1000000 input="\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tFoo bar baz"    -1.96 %            3.430864e-01
 http\\check_invalid_header_char.js n=1000000 input="1"                                            18.08 %        *** 1.373559e-22
 http\\check_invalid_header_char.js n=1000000 input="20091"                                        13.54 %        *** 4.766386e-09
 http\\check_invalid_header_char.js n=1000000 input="ä,-æ-╪å`¢"                                    -9.09 %        *** 2.598671e-09
 http\\check_invalid_header_char.js n=1000000 input="close"                                        14.75 %        *** 5.152685e-10
 http\\check_invalid_header_char.js n=1000000 input="en-US"                                        15.73 %        *** 4.681282e-09
 http\\check_invalid_header_char.js n=1000000 input="foo\\nbar"                                    -1.83 %            1.630593e-01
 http\\check_invalid_header_char.js n=1000000 input="group_acmeair"                               -13.64 %        *** 8.914585e-15
 http\\check_invalid_header_char.js n=1000000 input="gzip"                                         14.60 %        *** 7.965079e-26
 http\\check_invalid_header_char.js n=1000000 input="Here is a value that is real..."(truncated)   55.78 %        *** 9.914895e-30
 http\\check_invalid_header_char.js n=1000000 input="keep-alive"                                   10.51 %        *** 5.515997e-09
 http\\check_invalid_header_char.js n=1000000 input="private"                                       9.98 %        *** 4.858798e-09
 http\\check_invalid_header_char.js n=1000000 input="SAMEORIGIN"                                    7.52 %        *** 1.855577e-13
 http\\check_invalid_header_char.js n=1000000 input="Sat, 07 May 2016 16:54:48 GMT"                -7.75 %        *** 1.633302e-04
 http\\check_invalid_header_char.js n=1000000 input="text/html; charset=utf-8"                     -8.10 %        *** 2.957433e-07
 http\\check_invalid_header_char.js n=1000000 input="text/plain"                                    7.21 %        *** 2.673265e-11

These changes are generally pretty small compared to the change-versus-baseline numbers. I'll update the table in the description as soon as I push the updated code.

@BridgeAR
Copy link
Member

BridgeAR commented Feb 1, 2018

Ping @sethbrenith

@maclover7 maclover7 added the wip Issues and PRs that are still a work in progress. label Feb 3, 2018
@sethbrenith sethbrenith force-pushed the invalid-header-char-regex branch from e4b1747 to 54f84e9 Compare February 6, 2018 00:09
In the spirit of [17399](nodejs#17399), we
can also simplify checkInvalidHeaderChar to use regex matching instead
of a loop. This makes it faster on long matches and slower on short
matches or non-matches. This change also includes some sample data from
an AcmeAir benchmark run, as a rough proxy for real-world data.
@sethbrenith sethbrenith force-pushed the invalid-header-char-regex branch from 54f84e9 to 1179bab Compare February 8, 2018 16:37
@BridgeAR BridgeAR added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. and removed wip Issues and PRs that are still a work in progress. labels Feb 10, 2018
@BridgeAR
Copy link
Member

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mcollina
Copy link
Member

This should no be backported to 4 or 6, as I suspect the performance might be completely different.

Copy link
Member

@bmeurer bmeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

BridgeAR pushed a commit to BridgeAR/node that referenced this pull request Feb 16, 2018
In the spirit of [17399](nodejs#17399),
we can also simplify checkInvalidHeaderChar to use regex matching
instead of a loop. This makes it faster on long matches and slower
on short matches or non-matches. This change also includes some
sample data from an AcmeAir benchmark run, as a rough proxy for
real-world data.

PR-URL: nodejs#18381
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Benedikt Meurer <benedikt.meurer@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
@BridgeAR
Copy link
Member

Landed in 862389b 🎉

@BridgeAR BridgeAR closed this Feb 16, 2018
@sethbrenith
Copy link
Contributor Author

Thanks everyone!

MylesBorins pushed a commit that referenced this pull request Feb 21, 2018
In the spirit of [17399](#17399),
we can also simplify checkInvalidHeaderChar to use regex matching
instead of a loop. This makes it faster on long matches and slower
on short matches or non-matches. This change also includes some
sample data from an AcmeAir benchmark run, as a rough proxy for
real-world data.

PR-URL: #18381
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Benedikt Meurer <benedikt.meurer@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
@MylesBorins MylesBorins mentioned this pull request Feb 21, 2018
MylesBorins pushed a commit that referenced this pull request Feb 21, 2018
In the spirit of [17399](#17399),
we can also simplify checkInvalidHeaderChar to use regex matching
instead of a loop. This makes it faster on long matches and slower
on short matches or non-matches. This change also includes some
sample data from an AcmeAir benchmark run, as a rough proxy for
real-world data.

PR-URL: #18381
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Benedikt Meurer <benedikt.meurer@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
MayaLekova pushed a commit to MayaLekova/node that referenced this pull request May 8, 2018
In the spirit of [17399](nodejs#17399),
we can also simplify checkInvalidHeaderChar to use regex matching
instead of a loop. This makes it faster on long matches and slower
on short matches or non-matches. This change also includes some
sample data from an AcmeAir benchmark run, as a rough proxy for
real-world data.

PR-URL: nodejs#18381
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Benedikt Meurer <benedikt.meurer@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
@jasnell
Copy link
Member

jasnell commented Aug 17, 2018

Does not land cleanly on 8.x. Will need a backport PR in order to land.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. http Issues or PRs related to the http subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants