Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve validation in HTTP parser #8074

Merged
merged 24 commits into from
Jan 28, 2024
Merged

Improve validation in HTTP parser #8074

merged 24 commits into from
Jan 28, 2024

Conversation

Dreamsorcerer
Copy link
Member

No description provided.

pajod and others added 23 commits December 18, 2023 12:26
No functional changes; old and new re.Pattern[str] differ still:

>>> re.compile(r"[!#$%&'*+\-.^_`|~0-9A-Za-z]+").pattern
"[!#$%&'*+\\-.^_`|~0-9A-Za-z]+"
>>> re.compile("[%s0-9A-Za-z]+" % re.escape("!#$%&'*+-.^_`|~")).pattern
"[!\\#\\$%\\&'\\*\\+\\-\\.\\^_`\\|\\~0-9A-Za-z]+"

re.escape() escapes characters even when they (as of Python 3.12, anyway)
 lose their special meaning inside character classes anyway.
doing this on Unicode strings instead of on bytes only works when
 we are not accepting *any* 8-bit characters (otherwise, multibyte sequences ruin it)
PR template demands changes are documented, and changelog demands backwards incompatible changes - which this definitely + intentionally contains - are marked as such.
Co-authored-by: Sviatoslav Sydorenko (Святослав Сидоренко) <sviat@redhat.com>
CI does not like my placeholder, shut it up for now
towncrier-fragments complained
@Dreamsorcerer Dreamsorcerer added backport-3.9 backport-3.10 Trigger automatic backporting to the 3.10 release branch by Patchback robot labels Jan 28, 2024
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Jan 28, 2024
Copy link

codecov bot commented Jan 28, 2024

Codecov Report

Attention: 30 lines in your changes are missing coverage. Please review.

Comparison is base (5e44ba4) 97.44% compared to head (6b82936) 97.40%.
Report is 35 commits behind head on master.

Files Patch % Lines
tests/test_http_parser.py 62.66% 24 Missing and 4 partials ⚠️
aiohttp/http_parser.py 85.71% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8074      +/-   ##
==========================================
- Coverage   97.44%   97.40%   -0.04%     
==========================================
  Files         107      107              
  Lines       32346    32675     +329     
  Branches     3748     3823      +75     
==========================================
+ Hits        31518    31826     +308     
- Misses        627      641      +14     
- Partials      201      208       +7     
Flag Coverage Δ
CI-GHA 97.31% <66.29%> (-0.04%) ⬇️
OS-Linux 96.98% <66.29%> (-0.04%) ⬇️
OS-Windows 95.49% <66.29%> (-0.03%) ⬇️
OS-macOS 96.80% <66.29%> (-0.05%) ⬇️
Py-3.10.11 95.41% <66.29%> (-0.03%) ⬇️
Py-3.10.13 96.80% <66.29%> (-0.04%) ⬇️
Py-3.11.6 ?
Py-3.11.7 96.45% <66.29%> (-0.09%) ⬇️
Py-3.12.0 ?
Py-3.12.1 96.60% <66.29%> (-0.01%) ⬇️
Py-3.8.10 95.38% <66.29%> (-0.03%) ⬇️
Py-3.8.18 96.73% <66.29%> (-0.04%) ⬇️
Py-3.9.13 95.38% <66.29%> (-0.03%) ⬇️
Py-3.9.18 96.76% <66.29%> (-0.04%) ⬇️
Py-pypy7.3.13 ?
Py-pypy7.3.15 96.32% <66.29%> (?)
VM-macos 96.80% <66.29%> (-0.05%) ⬇️
VM-ubuntu 96.98% <66.29%> (-0.04%) ⬇️
VM-windows 95.49% <66.29%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Dreamsorcerer Dreamsorcerer enabled auto-merge (squash) January 28, 2024 16:24
@Dreamsorcerer Dreamsorcerer merged commit 33ccdfb into master Jan 28, 2024
28 of 32 checks passed
@Dreamsorcerer Dreamsorcerer deleted the thing1 branch January 28, 2024 16:27
Copy link
Contributor

patchback bot commented Jan 28, 2024

Backport to 3.9: 💔 cherry-picking failed — conflicts found

❌ Failed to cleanly apply 33ccdfb on top of patchback/backports/3.9/33ccdfb0a12690af5bb49bda2319ec0907fa7827/pr-8074

Backporting merged PR #8074 into master

  1. Ensure you have a local repo clone of your fork. Unless you cloned it
    from the upstream, this would be your origin remote.
  2. Make sure you have an upstream repo added as a remote too. In these
    instructions you'll refer to it by the name upstream. If you don't
    have it, here's how you can add it:
    $ git remote add upstream https://github.com/aio-libs/aiohttp.git
  3. Ensure you have the latest copy of upstream and prepare a branch
    that will hold the backported code:
    $ git fetch upstream
    $ git checkout -b patchback/backports/3.9/33ccdfb0a12690af5bb49bda2319ec0907fa7827/pr-8074 upstream/3.9
  4. Now, cherry-pick PR Improve validation in HTTP parser #8074 contents into that branch:
    $ git cherry-pick -x 33ccdfb0a12690af5bb49bda2319ec0907fa7827
    If it'll yell at you with something like fatal: Commit 33ccdfb0a12690af5bb49bda2319ec0907fa7827 is a merge but no -m option was given., add -m 1 as follows instead:
    $ git cherry-pick -m1 -x 33ccdfb0a12690af5bb49bda2319ec0907fa7827
  5. At this point, you'll probably encounter some merge conflicts. You must
    resolve them in to preserve the patch from PR Improve validation in HTTP parser #8074 as close to the
    original as possible.
  6. Push this branch to your fork on GitHub:
    $ git push origin patchback/backports/3.9/33ccdfb0a12690af5bb49bda2319ec0907fa7827/pr-8074
  7. Create a PR, ensure that the CI is green. If it's not — update it so that
    the tests and any other checks pass. This is it!
    Now relax and wait for the maintainers to process your pull request
    when they have some cycles to do reviews. Don't worry — they'll tell you if
    any improvements are necessary when the time comes!

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

Copy link
Contributor

patchback bot commented Jan 28, 2024

Backport to 3.10: 💔 cherry-picking failed — conflicts found

❌ Failed to cleanly apply 33ccdfb on top of patchback/backports/3.10/33ccdfb0a12690af5bb49bda2319ec0907fa7827/pr-8074

Backporting merged PR #8074 into master

  1. Ensure you have a local repo clone of your fork. Unless you cloned it
    from the upstream, this would be your origin remote.
  2. Make sure you have an upstream repo added as a remote too. In these
    instructions you'll refer to it by the name upstream. If you don't
    have it, here's how you can add it:
    $ git remote add upstream https://github.com/aio-libs/aiohttp.git
  3. Ensure you have the latest copy of upstream and prepare a branch
    that will hold the backported code:
    $ git fetch upstream
    $ git checkout -b patchback/backports/3.10/33ccdfb0a12690af5bb49bda2319ec0907fa7827/pr-8074 upstream/3.10
  4. Now, cherry-pick PR Improve validation in HTTP parser #8074 contents into that branch:
    $ git cherry-pick -x 33ccdfb0a12690af5bb49bda2319ec0907fa7827
    If it'll yell at you with something like fatal: Commit 33ccdfb0a12690af5bb49bda2319ec0907fa7827 is a merge but no -m option was given., add -m 1 as follows instead:
    $ git cherry-pick -m1 -x 33ccdfb0a12690af5bb49bda2319ec0907fa7827
  5. At this point, you'll probably encounter some merge conflicts. You must
    resolve them in to preserve the patch from PR Improve validation in HTTP parser #8074 as close to the
    original as possible.
  6. Push this branch to your fork on GitHub:
    $ git push origin patchback/backports/3.10/33ccdfb0a12690af5bb49bda2319ec0907fa7827/pr-8074
  7. Create a PR, ensure that the CI is green. If it's not — update it so that
    the tests and any other checks pass. This is it!
    Now relax and wait for the maintainers to process your pull request
    when they have some cycles to do reviews. Don't worry — they'll tell you if
    any improvements are necessary when the time comes!

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

Dreamsorcerer added a commit that referenced this pull request Jan 28, 2024
Co-authored-by: Paul J. Dorn <pajod@users.noreply.github.com>
Co-authored-by: Sviatoslav Sydorenko (Святослав Сидоренко) <sviat@redhat.com>
(cherry picked from commit 33ccdfb)
Dreamsorcerer added a commit that referenced this pull request Jan 28, 2024
Co-authored-by: Paul J. Dorn <pajod@users.noreply.github.com>
Co-authored-by: Sviatoslav Sydorenko (Святослав Сидоренко) <sviat@redhat.com>
(cherry picked from commit 33ccdfb)
Dreamsorcerer added a commit that referenced this pull request Jan 28, 2024
Co-authored-by: Paul J. Dorn <pajod@users.noreply.github.com>
Co-authored-by: Sviatoslav Sydorenko (Святослав Сидоренко)
<sviat@redhat.com>
(cherry picked from commit 33ccdfb)
Dreamsorcerer added a commit that referenced this pull request Jan 28, 2024
Co-authored-by: Paul J. Dorn <pajod@users.noreply.github.com>
Co-authored-by: Sviatoslav Sydorenko (Святослав Сидоренко)
<sviat@redhat.com>
(cherry picked from commit 33ccdfb)


def test_http_request_bad_status_line_separator(parser: Any) -> None:
# single code point, old, multibyte NFKC, multibyte NFKD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pajod, could you expand what this comment refers to? Does the utf8sep variable contain a value that matches all the listed cases? I'm rather confused. Or did you mean to test different cases but added just one?

Copy link
Contributor

@pajod pajod Jan 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That utf8sep has those properties (and LTR), a popular choice for being multiple edge cases in one. None of which are strictly needed for.. comparing to the literal ASCII dot as we do here, but some of which I expect to regain relevance on future refactoring.

@@ -710,6 +808,31 @@ def test_http_request_upgrade(parser: Any) -> None:
assert tail == b"some raw data"


def test_http_request_parser_utf8_request_line(parser: Any) -> None:
if not isinstance(response, HttpResponseParserPy):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pajod FYI it's much cleaner to use the @pytest.mark.xfail decorator since it allows pytest to make the decisions earlier in the process.

Copy link
Contributor

@pajod pajod Jan 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that only explains why a llhttp-only code coverage report complains!? There is something much more wrong here. Like, I completely broke that test levels of wrong.

Edit: Sorry, I did. Four times. I meant to acknowledge (in two cases, expected to be changed) behaviour differences of the C parser, while keeping my tests parametrized to keep running both parsers anyway. But each time I copied the not isinstance(response, HttpResponseParserPy) line where it should say not isinstance(parser, HttpRequestParserPy). And mypy would have told me, if not for that Any and the duplicate use of the response identifier (global scope function but also function scope variable)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I noticed there was no coverage on these tests for some reason. Something to look at later.

Copy link
Member

@webknjaz webknjaz Jan 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I started looking into the coverage drop and that's how I noticed this thing.

FWIW we really should refactor how the tests with and without extensions are parametrized/generated, like I did in multidict recently. Essentially, there was an import loop in some places in tests that prevented C-extension tests from being executed (aio-libs/multidict#837 / aio-libs/multidict#915 / https://multidict.aio-libs.org/en/latest/changes/#contributor-facing-changes). So I fixed that by having an explicit global option for requiring one mode or the other, with a collection of fixtures reused everywhere and zero magic around import attempts and handling failures in weird ways.

Another thing I configured is the module classification in Codecov with different expected coverage thresholds — the goal should be that tests get 100% coverage in every CI run (from all jobs combined, of course). And the actual project code coverage should be measured as a separate metric. Currently, a global threshold value allows coverage to drop in tests if it's compensated by coverage in the project, meaning that we may be gaining more dead code (read: tests that are never executed), which results in a false sense of things being tested, when they aren't.

renovate bot referenced this pull request in allenporter/pyrainbird Feb 1, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [aiohttp](https://github.com/aio-libs/aiohttp) | `==3.9.1` ->
`==3.9.2` |
[![age](https://developer.mend.io/api/mc/badges/age/pypi/aiohttp/3.9.2?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/aiohttp/3.9.2?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/aiohttp/3.9.1/3.9.2?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/aiohttp/3.9.1/3.9.2?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

### GitHub Vulnerability Alerts

####
[CVE-2024-23829](https://github.com/aio-libs/aiohttp/security/advisories/GHSA-8qpw-xqxj-h4r2)

### Summary
Security-sensitive parts of the *Python HTTP parser* retained minor
differences in allowable character sets, that must trigger error
handling to robustly match frame boundaries of proxies in order to
protect against injection of additional requests. Additionally,
validation could trigger exceptions that were not handled consistently
with processing of other malformed input.

### Details
These problems are rooted in pattern matching protocol elements,
previously improved by PR #&#8203;3235 and GHSA-gfw2-4jvh-wgfg:

1. The expression `HTTP/(\d).(\d)` lacked another backslash to clarify
that the separator should be a literal dot, not just *any* Unicode code
point (result: `HTTP/(\d)\.(\d)`).

2. The HTTP version was permitting Unicode digits, where only ASCII
digits are standards-compliant.

3. Distinct regular expressions for validating HTTP Method and Header
field names were used - though both should (at least) apply the common
restrictions of rfc9110 `token`.

### PoC
`GET / HTTP/1ö1`
`GET / HTTP/1.𝟙`
`GET/: HTTP/1.1`
`Content-Encoding?: chunked`

### Impact
Primarily concerns running an aiohttp server without llhttp:
1. **behind a proxy**: Being more lenient than internet standards
require could, depending on deployment environment, assist in request
smuggling.
2. **directly accessible** or exposed behind proxies relaying malformed
input: the unhandled exception could cause excessive resource
consumption on the application server and/or its logging facilities.

-----

Patch:
[https://github.com/aio-libs/aiohttp/pull/8074](https://github.com/aio-libs/aiohttp/pull/8074)/files

####
[CVE-2024-23334](https://github.com/aio-libs/aiohttp/security/advisories/GHSA-5h86-8mv2-jq9f)

### Summary
Improperly configuring static resource resolution in aiohttp when used
as a web server can result in the unauthorized reading of arbitrary
files on the system.

### Details
When using aiohttp as a web server and configuring static routes, it is
necessary to specify the root path for static files. Additionally, the
option 'follow_symlinks' can be used to determine whether to follow
symbolic links outside the static root directory. When 'follow_symlinks'
is set to True, there is no validation to check if a given file path is
within the root directory.This can lead to directory traversal
vulnerabilities, resulting in unauthorized access to arbitrary files on
the system, even when symlinks are not present.

i.e. An application is only vulnerable with setup code like:
```
app.router.add_routes([
    web.static("/static", "static/", follow_symlinks=True),  # Remove follow_symlinks to avoid the vulnerability
])
```

### Impact
This is a directory traversal vulnerability with CWE ID 22. When using
aiohttp as a web server and enabling static resource resolution with
`follow_symlinks` set to True, it can lead to this vulnerability. This
vulnerability has been present since the introduction of the
`follow_symlinks` parameter.

### Workaround
Even if upgrading to a patched version of aiohttp, we recommend
following these steps regardless.

If using `follow_symlinks=True` outside of a restricted local
development environment, disable the option immediately. This option is
NOT needed to follow symlinks which point to a location _within_ the
static root directory, it is _only_ intended to allow a symlink to break
out of the static directory. Even with this CVE fixed, there is still a
substantial risk of misconfiguration when using this option on a server
that accepts requests from remote users.

Additionally, aiohttp has always recommended using a reverse proxy
server (such as nginx) to handle static resources and _not_ to use these
static resources in aiohttp for production environments. Doing so also
protects against this vulnerability, and is why we expect the number of
affected users to be very low.

-----

Patch:
[https://github.com/aio-libs/aiohttp/pull/8079](https://github.com/aio-libs/aiohttp/pull/8079)/files

---

### Release Notes

<details>
<summary>aio-libs/aiohttp (aiohttp)</summary>

###
[`v3.9.2`](https://github.com/aio-libs/aiohttp/releases/tag/v3.9.2):
3.9.2

[Compare
Source](https://github.com/aio-libs/aiohttp/compare/v3.9.1...v3.9.2)

## Bug fixes

-   Fixed server-side websocket connection leak.

    *Related issues and pull requests on GitHub:*
    [#&#8203;7978](https://github.com/aio-libs/aiohttp/issues/7978).

-   Fixed `web.FileResponse` doing blocking I/O in the event loop.

    *Related issues and pull requests on GitHub:*
    [#&#8203;8012](https://github.com/aio-libs/aiohttp/issues/8012).

- Fixed double compress when compression enabled and compressed file
exists in server file responses.

    *Related issues and pull requests on GitHub:*
    [#&#8203;8014](https://github.com/aio-libs/aiohttp/issues/8014).

-   Added runtime type check for `ClientSession` `timeout` parameter.

    *Related issues and pull requests on GitHub:*
    [#&#8203;8021](https://github.com/aio-libs/aiohttp/issues/8021).

- Fixed an unhandled exception in the Python HTTP parser on header lines
starting with a colon -- by :user:`pajod`.

Invalid request lines with anything but a dot between the HTTP major and
minor version are now rejected.
Invalid header field names containing question mark or slash are now
rejected.
Such requests are incompatible with :rfc:`9110#section-5.6.2` and are
not known to be of any legitimate use.

    *Related issues and pull requests on GitHub:*
    [#&#8203;8074](https://github.com/aio-libs/aiohttp/issues/8074).

- Improved validation of paths for static resources requests to the
server -- by :user:`bdraco`.

    *Related issues and pull requests on GitHub:*
    [#&#8203;8079](https://github.com/aio-libs/aiohttp/issues/8079).

## Features

- Added support for passing :py:data:`True` to `ssl` parameter in
`ClientSession` while
    deprecating :py:data:`None` -- by :user:`xiangyan99`.

    *Related issues and pull requests on GitHub:*
    [#&#8203;7698](https://github.com/aio-libs/aiohttp/issues/7698).

## Breaking changes

- Fixed an unhandled exception in the Python HTTP parser on header lines
starting with a colon -- by :user:`pajod`.

Invalid request lines with anything but a dot between the HTTP major and
minor version are now rejected.
Invalid header field names containing question mark or slash are now
rejected.
Such requests are incompatible with :rfc:`9110#section-5.6.2` and are
not known to be of any legitimate use.

    *Related issues and pull requests on GitHub:*
    [#&#8203;8074](https://github.com/aio-libs/aiohttp/issues/8074).

## Improved documentation

- Fixed examples of `fallback_charset_resolver` function in the
:doc:`client_advanced` document. -- by :user:`henry0312`.

    *Related issues and pull requests on GitHub:*
    [#&#8203;7995](https://github.com/aio-libs/aiohttp/issues/7995).

-   The Sphinx setup was updated to avoid showing the empty
    changelog draft section in the tagged release documentation
    builds on Read The Docs -- by :user:`webknjaz`.

    *Related issues and pull requests on GitHub:*
    [#&#8203;8067](https://github.com/aio-libs/aiohttp/issues/8067).

## Packaging updates and notes for downstreams

-   The changelog categorization was made clearer. The
    contributors can now mark their fragment files more
    accurately -- by :user:`webknjaz`.

    The new category tags are:

        * ``bugfix``

        * ``feature``

        * ``deprecation``

        * ``breaking`` (previously, ``removal``)

        * ``doc``

        * ``packaging``

        * ``contrib``

        * ``misc``

    *Related issues and pull requests on GitHub:*
    [#&#8203;8066](https://github.com/aio-libs/aiohttp/issues/8066).

## Contributor-facing changes

- Updated :ref:`contributing/Tests coverage <aiohttp-contributing>`
section to show how we use `codecov` -- by :user:`Dreamsorcerer`.

    *Related issues and pull requests on GitHub:*
    [#&#8203;7916](https://github.com/aio-libs/aiohttp/issues/7916).

-   The changelog categorization was made clearer. The
    contributors can now mark their fragment files more
    accurately -- by :user:`webknjaz`.

    The new category tags are:

        * ``bugfix``

        * ``feature``

        * ``deprecation``

        * ``breaking`` (previously, ``removal``)

        * ``doc``

        * ``packaging``

        * ``contrib``

        * ``misc``

    *Related issues and pull requests on GitHub:*
    [#&#8203;8066](https://github.com/aio-libs/aiohttp/issues/8066).

## Miscellaneous internal changes

-   Replaced all `tmpdir` fixtures with `tmp_path` in test suite.

    *Related issues and pull requests on GitHub:*
    [#&#8203;3551](https://github.com/aio-libs/aiohttp/issues/3551).

***

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "" (UTC), Automerge - At any time (no
schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/allenporter/pyrainbird).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4xNTMuMiIsInVwZGF0ZWRJblZlciI6IjM3LjE1My4yIiwidGFyZ2V0QnJhbmNoIjoibWFpbiJ9-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-3.10 Trigger automatic backporting to the 3.10 release branch by Patchback robot bot:chronographer:provided There is a change note present in this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants