-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve validation in HTTP parser #8074
Conversation
No functional changes; old and new re.Pattern[str] differ still: >>> re.compile(r"[!#$%&'*+\-.^_`|~0-9A-Za-z]+").pattern "[!#$%&'*+\\-.^_`|~0-9A-Za-z]+" >>> re.compile("[%s0-9A-Za-z]+" % re.escape("!#$%&'*+-.^_`|~")).pattern "[!\\#\\$%\\&'\\*\\+\\-\\.\\^_`\\|\\~0-9A-Za-z]+" re.escape() escapes characters even when they (as of Python 3.12, anyway) lose their special meaning inside character classes anyway.
doing this on Unicode strings instead of on bytes only works when we are not accepting *any* 8-bit characters (otherwise, multibyte sequences ruin it)
PR template demands changes are documented, and changelog demands backwards incompatible changes - which this definitely + intentionally contains - are marked as such.
Co-authored-by: Sviatoslav Sydorenko (Святослав Сидоренко) <sviat@redhat.com>
CI does not like my placeholder, shut it up for now
towncrier-fragments complained
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #8074 +/- ##
==========================================
- Coverage 97.44% 97.40% -0.04%
==========================================
Files 107 107
Lines 32346 32675 +329
Branches 3748 3823 +75
==========================================
+ Hits 31518 31826 +308
- Misses 627 641 +14
- Partials 201 208 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Backport to 3.9: 💔 cherry-picking failed — conflicts found❌ Failed to cleanly apply 33ccdfb on top of patchback/backports/3.9/33ccdfb0a12690af5bb49bda2319ec0907fa7827/pr-8074 Backporting merged PR #8074 into master
🤖 @patchback |
Backport to 3.10: 💔 cherry-picking failed — conflicts found❌ Failed to cleanly apply 33ccdfb on top of patchback/backports/3.10/33ccdfb0a12690af5bb49bda2319ec0907fa7827/pr-8074 Backporting merged PR #8074 into master
🤖 @patchback |
Co-authored-by: Paul J. Dorn <pajod@users.noreply.github.com> Co-authored-by: Sviatoslav Sydorenko (Святослав Сидоренко) <sviat@redhat.com> (cherry picked from commit 33ccdfb)
Co-authored-by: Paul J. Dorn <pajod@users.noreply.github.com> Co-authored-by: Sviatoslav Sydorenko (Святослав Сидоренко) <sviat@redhat.com> (cherry picked from commit 33ccdfb)
|
||
|
||
def test_http_request_bad_status_line_separator(parser: Any) -> None: | ||
# single code point, old, multibyte NFKC, multibyte NFKD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pajod, could you expand what this comment refers to? Does the utf8sep
variable contain a value that matches all the listed cases? I'm rather confused. Or did you mean to test different cases but added just one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That utf8sep has those properties (and LTR), a popular choice for being multiple edge cases in one. None of which are strictly needed for.. comparing to the literal ASCII dot as we do here, but some of which I expect to regain relevance on future refactoring.
@@ -710,6 +808,31 @@ def test_http_request_upgrade(parser: Any) -> None: | |||
assert tail == b"some raw data" | |||
|
|||
|
|||
def test_http_request_parser_utf8_request_line(parser: Any) -> None: | |||
if not isinstance(response, HttpResponseParserPy): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pajod FYI it's much cleaner to use the @pytest.mark.xfail
decorator since it allows pytest to make the decisions earlier in the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that only explains why a llhttp-only code coverage report complains!? There is something much more wrong here. Like, I completely broke that test levels of wrong.
Edit: Sorry, I did. Four times. I meant to acknowledge (in two cases, expected to be changed) behaviour differences of the C parser, while keeping my tests parametrized to keep running both parsers anyway. But each time I copied the not isinstance(response, HttpResponseParserPy)
line where it should say not isinstance(parser, HttpRequestParserPy)
. And mypy would have told me, if not for that Any
and the duplicate use of the response
identifier (global scope function but also function scope variable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I noticed there was no coverage on these tests for some reason. Something to look at later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I started looking into the coverage drop and that's how I noticed this thing.
FWIW we really should refactor how the tests with and without extensions are parametrized/generated, like I did in multidict recently. Essentially, there was an import loop in some places in tests that prevented C-extension tests from being executed (aio-libs/multidict#837 / aio-libs/multidict#915 / https://multidict.aio-libs.org/en/latest/changes/#contributor-facing-changes). So I fixed that by having an explicit global option for requiring one mode or the other, with a collection of fixtures reused everywhere and zero magic around import attempts and handling failures in weird ways.
Another thing I configured is the module classification in Codecov with different expected coverage thresholds — the goal should be that tests get 100% coverage in every CI run (from all jobs combined, of course). And the actual project code coverage should be measured as a separate metric. Currently, a global threshold value allows coverage to drop in tests if it's compensated by coverage in the project, meaning that we may be gaining more dead code (read: tests that are never executed), which results in a false sense of things being tested, when they aren't.
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [aiohttp](https://github.com/aio-libs/aiohttp) | `==3.9.1` -> `==3.9.2` | [![age](https://developer.mend.io/api/mc/badges/age/pypi/aiohttp/3.9.2?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/aiohttp/3.9.2?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/aiohttp/3.9.1/3.9.2?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/aiohttp/3.9.1/3.9.2?slim=true)](https://docs.renovatebot.com/merge-confidence/) | ### GitHub Vulnerability Alerts #### [CVE-2024-23829](https://github.com/aio-libs/aiohttp/security/advisories/GHSA-8qpw-xqxj-h4r2) ### Summary Security-sensitive parts of the *Python HTTP parser* retained minor differences in allowable character sets, that must trigger error handling to robustly match frame boundaries of proxies in order to protect against injection of additional requests. Additionally, validation could trigger exceptions that were not handled consistently with processing of other malformed input. ### Details These problems are rooted in pattern matching protocol elements, previously improved by PR #​3235 and GHSA-gfw2-4jvh-wgfg: 1. The expression `HTTP/(\d).(\d)` lacked another backslash to clarify that the separator should be a literal dot, not just *any* Unicode code point (result: `HTTP/(\d)\.(\d)`). 2. The HTTP version was permitting Unicode digits, where only ASCII digits are standards-compliant. 3. Distinct regular expressions for validating HTTP Method and Header field names were used - though both should (at least) apply the common restrictions of rfc9110 `token`. ### PoC `GET / HTTP/1ö1` `GET / HTTP/1.𝟙` `GET/: HTTP/1.1` `Content-Encoding?: chunked` ### Impact Primarily concerns running an aiohttp server without llhttp: 1. **behind a proxy**: Being more lenient than internet standards require could, depending on deployment environment, assist in request smuggling. 2. **directly accessible** or exposed behind proxies relaying malformed input: the unhandled exception could cause excessive resource consumption on the application server and/or its logging facilities. ----- Patch: [https://github.com/aio-libs/aiohttp/pull/8074](https://github.com/aio-libs/aiohttp/pull/8074)/files #### [CVE-2024-23334](https://github.com/aio-libs/aiohttp/security/advisories/GHSA-5h86-8mv2-jq9f) ### Summary Improperly configuring static resource resolution in aiohttp when used as a web server can result in the unauthorized reading of arbitrary files on the system. ### Details When using aiohttp as a web server and configuring static routes, it is necessary to specify the root path for static files. Additionally, the option 'follow_symlinks' can be used to determine whether to follow symbolic links outside the static root directory. When 'follow_symlinks' is set to True, there is no validation to check if a given file path is within the root directory.This can lead to directory traversal vulnerabilities, resulting in unauthorized access to arbitrary files on the system, even when symlinks are not present. i.e. An application is only vulnerable with setup code like: ``` app.router.add_routes([ web.static("/static", "static/", follow_symlinks=True), # Remove follow_symlinks to avoid the vulnerability ]) ``` ### Impact This is a directory traversal vulnerability with CWE ID 22. When using aiohttp as a web server and enabling static resource resolution with `follow_symlinks` set to True, it can lead to this vulnerability. This vulnerability has been present since the introduction of the `follow_symlinks` parameter. ### Workaround Even if upgrading to a patched version of aiohttp, we recommend following these steps regardless. If using `follow_symlinks=True` outside of a restricted local development environment, disable the option immediately. This option is NOT needed to follow symlinks which point to a location _within_ the static root directory, it is _only_ intended to allow a symlink to break out of the static directory. Even with this CVE fixed, there is still a substantial risk of misconfiguration when using this option on a server that accepts requests from remote users. Additionally, aiohttp has always recommended using a reverse proxy server (such as nginx) to handle static resources and _not_ to use these static resources in aiohttp for production environments. Doing so also protects against this vulnerability, and is why we expect the number of affected users to be very low. ----- Patch: [https://github.com/aio-libs/aiohttp/pull/8079](https://github.com/aio-libs/aiohttp/pull/8079)/files --- ### Release Notes <details> <summary>aio-libs/aiohttp (aiohttp)</summary> ### [`v3.9.2`](https://github.com/aio-libs/aiohttp/releases/tag/v3.9.2): 3.9.2 [Compare Source](https://github.com/aio-libs/aiohttp/compare/v3.9.1...v3.9.2) ## Bug fixes - Fixed server-side websocket connection leak. *Related issues and pull requests on GitHub:* [#​7978](https://github.com/aio-libs/aiohttp/issues/7978). - Fixed `web.FileResponse` doing blocking I/O in the event loop. *Related issues and pull requests on GitHub:* [#​8012](https://github.com/aio-libs/aiohttp/issues/8012). - Fixed double compress when compression enabled and compressed file exists in server file responses. *Related issues and pull requests on GitHub:* [#​8014](https://github.com/aio-libs/aiohttp/issues/8014). - Added runtime type check for `ClientSession` `timeout` parameter. *Related issues and pull requests on GitHub:* [#​8021](https://github.com/aio-libs/aiohttp/issues/8021). - Fixed an unhandled exception in the Python HTTP parser on header lines starting with a colon -- by :user:`pajod`. Invalid request lines with anything but a dot between the HTTP major and minor version are now rejected. Invalid header field names containing question mark or slash are now rejected. Such requests are incompatible with :rfc:`9110#section-5.6.2` and are not known to be of any legitimate use. *Related issues and pull requests on GitHub:* [#​8074](https://github.com/aio-libs/aiohttp/issues/8074). - Improved validation of paths for static resources requests to the server -- by :user:`bdraco`. *Related issues and pull requests on GitHub:* [#​8079](https://github.com/aio-libs/aiohttp/issues/8079). ## Features - Added support for passing :py:data:`True` to `ssl` parameter in `ClientSession` while deprecating :py:data:`None` -- by :user:`xiangyan99`. *Related issues and pull requests on GitHub:* [#​7698](https://github.com/aio-libs/aiohttp/issues/7698). ## Breaking changes - Fixed an unhandled exception in the Python HTTP parser on header lines starting with a colon -- by :user:`pajod`. Invalid request lines with anything but a dot between the HTTP major and minor version are now rejected. Invalid header field names containing question mark or slash are now rejected. Such requests are incompatible with :rfc:`9110#section-5.6.2` and are not known to be of any legitimate use. *Related issues and pull requests on GitHub:* [#​8074](https://github.com/aio-libs/aiohttp/issues/8074). ## Improved documentation - Fixed examples of `fallback_charset_resolver` function in the :doc:`client_advanced` document. -- by :user:`henry0312`. *Related issues and pull requests on GitHub:* [#​7995](https://github.com/aio-libs/aiohttp/issues/7995). - The Sphinx setup was updated to avoid showing the empty changelog draft section in the tagged release documentation builds on Read The Docs -- by :user:`webknjaz`. *Related issues and pull requests on GitHub:* [#​8067](https://github.com/aio-libs/aiohttp/issues/8067). ## Packaging updates and notes for downstreams - The changelog categorization was made clearer. The contributors can now mark their fragment files more accurately -- by :user:`webknjaz`. The new category tags are: * ``bugfix`` * ``feature`` * ``deprecation`` * ``breaking`` (previously, ``removal``) * ``doc`` * ``packaging`` * ``contrib`` * ``misc`` *Related issues and pull requests on GitHub:* [#​8066](https://github.com/aio-libs/aiohttp/issues/8066). ## Contributor-facing changes - Updated :ref:`contributing/Tests coverage <aiohttp-contributing>` section to show how we use `codecov` -- by :user:`Dreamsorcerer`. *Related issues and pull requests on GitHub:* [#​7916](https://github.com/aio-libs/aiohttp/issues/7916). - The changelog categorization was made clearer. The contributors can now mark their fragment files more accurately -- by :user:`webknjaz`. The new category tags are: * ``bugfix`` * ``feature`` * ``deprecation`` * ``breaking`` (previously, ``removal``) * ``doc`` * ``packaging`` * ``contrib`` * ``misc`` *Related issues and pull requests on GitHub:* [#​8066](https://github.com/aio-libs/aiohttp/issues/8066). ## Miscellaneous internal changes - Replaced all `tmpdir` fixtures with `tmp_path` in test suite. *Related issues and pull requests on GitHub:* [#​3551](https://github.com/aio-libs/aiohttp/issues/3551). *** </details> --- ### Configuration 📅 **Schedule**: Branch creation - "" (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/allenporter/pyrainbird). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4xNTMuMiIsInVwZGF0ZWRJblZlciI6IjM3LjE1My4yIiwidGFyZ2V0QnJhbmNoIjoibWFpbiJ9--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
No description provided.