Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with on_body callback with bare LF #241

Closed
Dreamsorcerer opened this issue Aug 21, 2023 · 17 comments · Fixed by #243
Closed

Problem with on_body callback with bare LF #241

Dreamsorcerer opened this issue Aug 21, 2023 · 17 comments · Fixed by #243

Comments

@Dreamsorcerer
Copy link
Contributor

Dreamsorcerer commented Aug 21, 2023

I'm testing the fix for #236, with a payload based on a user report:

HTTP/1.0 200 OK\nFoo: abc\nBar: def\n\nBODY\n

If I add CR before the body, then it works fine:

HTTP/1.0 200 OK\nFoo: abc\nBar: def\r\n\r\nBODY\n

But, with the original, it appears that the on_headers_complete callback is not happening. The on_body callback happens without a call to on_headers_complete.

This seems like something is broken as we are not getting the headers.
But, also, should we always get on_headers_complete before on_body happens? Currently our code fails with an exception, as it assumes that on_body will only be called after on_headers_complete is called.

@Dreamsorcerer
Copy link
Contributor Author

It appears we have some users who rely on this behaviour to interface with switch hardware. Given that they likely can't rewrite the firmware of the switch, they'll only be able to use this when this issue is fixed and we can parse responses that do not use CRLF at all.

@ShogunPanda
Copy link
Contributor

I see. I'll take a look between today and tomorrow.

@Dreamsorcerer
Copy link
Contributor Author

We're hoping to do a new release next week, is there any chance this might be resolved and released in that timeframe?

@ShogunPanda
Copy link
Contributor

Sorry, I got busy with other things. I'll try to look into it on Mon or Tue next week.

@ShogunPanda
Copy link
Contributor

@Dreamsorcerer As promised, this is now fixed in #243 (a new flag is needed to enable the fix).
This will be included in llhttp 9.1.0.

@Dreamsorcerer
Copy link
Contributor Author

Perfect, if you have time for another small addition, another user has just reported they have a device that responds with whitespace in the chunk sizes (e.g. 37 \r\n): aio-libs/aiohttp#7597 (comment)

I couldn't find any lenient options that allow that to be permitted (I've asked them to file a new issue here if they want it to work).

@ShogunPanda
Copy link
Contributor

I'm sorry but I don't think I will implement that. We can't support any possible uncompliant case and this seems really peculiar.

Not sure how you embed llhttp but I'd rather try to intercept the data before feeding it into llhttp.

@Dreamsorcerer
Copy link
Contributor Author

Well, I'll let you know if any other users report the same problem. They did point out that this is permitted by atleast firefox, python-requests, curl, perls lwp-request, chrome, and previous versions of aiohttp/llhttp.

@ShogunPanda
Copy link
Contributor

@mcollina @RafaelGSS @nodejs/tsc Any opinion on this? Shall we support this specific case?

@Dreamsorcerer
Copy link
Contributor Author

@ShogunPanda Just tested 9.1.1 and I seem to still get no on_headers_complete.

Enabled options are:
llhttp_set_lenient_headers
llhttp_set_lenient_optional_cr_before_lf
llhttp_set_lenient_spaces_after_chunk_size

@ShogunPanda
Copy link
Contributor

Can you post the entire message you are trying to parse?

@Dreamsorcerer
Copy link
Contributor Author

The original post contains the entire message. I'm just using that for a unit test, which is still failing:

def test_http_response_parser_bad_crlf(response: Any) -> None:
    messages, upgrade, tail = response.feed_data(
        b"HTTP/1.0 200 OK\nFoo: abc\nBar: def\n\nBODY\n"
    )
    msg = messages[0][0]

    assert msg.headers["Foo"] == "abc"
    assert msg.headers["Bar"] == "def"

@ShogunPanda
Copy link
Contributor

Sorry, you are right. Despite the message being parsed, the callback is ignored. Working on it as we speak.

@Dreamsorcerer
Copy link
Contributor Author

The chunked whitespace is passing now, so that's good atleast.

@ShogunPanda
Copy link
Contributor

This should have been fixed in 9.1.2. Can you please confirm?

@Dreamsorcerer
Copy link
Contributor Author

Dreamsorcerer commented Sep 13, 2023

Test is passing now, thanks.

Can we expect that on_body will never be called without a call to on_headers_complete first, now? I realised when we testing this, that our code assumes that can't happen, and hits an exception if it does.

@ShogunPanda
Copy link
Contributor

In theory yes.
In practice if it happens then we have a bug :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants