-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing an empty host #79
Comments
Wow, I thought I accounted for this. "host state" should account for this somehow. Making host null would make the serializer unhappy as far as I can tell. So I guess the best solution would be to just not invoke the host parser for the empty string. I.e., change "Let host be the result of host parsing buffer." to "Let host be the empty string if buffer is the empty string, and the result of host parsing buffer otherwise." |
Host state already says (in 2 places):
Maybe a step could be added between 1 and 2: “If buffer is the empty string, let host be an empty domain” (an url’s host is not a string) and old 2 be made an “Otherwise”. |
Yeah, that sounds pretty good. |
Though I think it’s more robust to leave the host state alone and fix the IPv4 parser instead. |
Why? In case something else invokes the IPv4 parser directly with the empty string? |
Yes, but more generally it’s good for the thing that maintain an invariant (such as "this string is non-empty") to be close to the things that rely on that invariant. The URL parser algorithm is full of things like that: one bit of pseudo-code relies on another bit of pseudo-code that doesn’t seem directly related to have had some non-obvious side effect. It’s not wrong, but it makes it harder for humans to read the algorithm without making mistakes. |
Okay, how about we add "Assert: input is not the empty string." to the IPv4 parser and make the host parser return an empty domain if the input is the empty string. |
(And you're right, fixed my way this would have tripped |
@SimonSapin, any thoughts? |
This would work, but: The IPv4 parser already has three different kind of outcomes: return a string (the input unchanged, used when the input is not in the expected format), return failure (used when numbers are too large) and return an IPv4 address. Why introduce a fourth? (Abort with an assertion error, a concept that this spec didn’t have so far.) Doesn’t returning failure for the empty string have the desired effect? |
Here is another way to fix. Currently this is step 3:
It could be changed to:
After that, step 6.1 "If part is the empty string, return input." applies. |
That sounds pretty good. |
Note that the main problem here is being fixed by #185, but it seems good to also fix this in-depth as discussed. |
If I’m reading the parsing algorithm correctly, parsing
some-non-special-scheme:///
causes the host parser to be called with an empty string. Since domain to ASCII sets VerifyDnsLength to false, asciiDomain is also empty (not failure) and IPv4 parsing is called with an empty string. There, parts is first a list of one empty string, then an empty list:Later, numbers stays empty and some of the steps are nonsensical when they talk about the last item of numbers.
So:
The text was updated successfully, but these errors were encountered: