Reject octal zeros in IPv4 addresses #86984

syvb · 2021-07-08T18:27:49Z

This fixes #86964 by rejecting octal zeros in IP addresses, such that 192.168.00.00000000 is rejected with a parse error, since having leading zeros in front of another zero indicates it is a zero written in octal notation, which is not allowed in the strict mode specified by RFC 6943 3.1.1. Octal rejection was implemented in #83652, but due to the way it was implemented octal zeros were still allowed.

rust-highfive · 2021-07-08T18:27:52Z

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

cuviper · 2021-07-08T21:26:43Z

I understand we're currently a bit "loose" here, but is there any practical difference? It's a numeric 0 either way you look at it. So with your change, we would stop accepting octal zeros just for the sake of strictness?

zacknewman · 2021-07-08T23:34:18Z

The documentation for std::net::Ipv4Addr pretty clearly states that the octets are in "decimal notation"; thus I personally don't even understand why the octet 001 is rejected since that is a perfectly reasonable base-10 representation of the integer 1. While that octet can be interpreted in any numeral system (including base-8), the octet 127 can also be interpreted in any numeral system starting from base-8. To me, this is inconsistent and very much confusing. It's like the parser says "127 can be in base-8, base-9, base-10, etc.; but we will default to interpreting it as base-10"; but for some reason, the parser doesn't apply that same logic to the octet 001.

The documentation also clearly cites IETF RFC 6943, but Section 3.1.1. in that RFC defines the "strict" form—the form that forbids other numeral systems (e.g., base-8)—as 4 octets in decimal notation separated by a "." such that each octet is between one and three digits. The current behavior doesn't conform to any obvious standard.

Apparently other languages default to interpreting any integer with padded 0s as base-8, but that is not true in most areas of Rust except here (e.g., assert_eq!("010".parse::<u32>().unwrap(), 10u32);).

To me if for some reason we want to forbid 0-padded octets, then it should be applied consistently for all integers not all integers except 0.

cuviper · 2021-07-09T00:18:04Z

AIUI, it was defined in POSIX that a leading 0 in an IPv4 address should indicate octal. Rust was always parsing as decimal, but this could lead to bugs, even security issues, if you have system that combines Rust and POSIX ideas of a given input address. You might validate "010" in one mode and then use it in the other with a different interpretation. The Rust parser was changed to reject octal to avoid this confusion altogether.

You could debate whether "0" alone really has a leading zero to make it octal rather than decimal, but it usually doesn't matter. Longer strings of zeros are more plausibly octal, I guess, but it still seems pedantic to reject it.

zacknewman · 2021-07-09T01:09:42Z

Hm, maybe this is yet another area where my math background is biting me in the butt because it's rather foreign to me to treat leading 0s as "more likely" to be base-8 than any other base since all numeral systems are based on the same fundamental algorithm. Seems overly restrictive to assume leading 0s means base-8 (with the exception of 0 apparently) too since clearly something like 008 is not base-8 due to the 8 not being a symbol in the base-8 numeral system; and since the documentation states each octet is in base-10, the problem of "how to interpret each octet" is already answered: base-10. I'm clearly in the minority with this point-of-view though, so I'll drop it.

cuviper · 2021-07-09T02:59:09Z

Octal 0... goes back to C, at least: https://en.cppreference.com/w/c/language/integer_constant

decimal-constant is a non-zero decimal digit (1, 2, 3, 4, 5, 6, 7, 8, 9), followed by zero or more decimal digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

octal-constant is the digit zero (0) followed by zero or more octal digits (0, 1, 2, 3, 4, 5, 6, 7)

(which does mean a lone 0 is technically octal as well in C, not that it matters)

RalfJung · 2021-07-09T06:56:24Z

The documentation for std::net::Ipv4Addr pretty clearly states that the octets are in "decimal notation"; thus I personally don't even understand why the octet 001 is rejected since that is a perfectly reasonable base-10 representation of the integer 1.

The docs should probably be clarified: the octets are in decimal notation, and leading 0s are disallowed (except, as usual, for a plain 0).

This almost matches current behavior, except for 0000. So with this PR, those docs would be correct. Without this PR, the docs should say: the octets are in decimal notation, and leading 0s are disallowed, except for the number 0 itself which may have an arbitrary amount of leading 0s.

(I don't have an opinion between these two options, but this PR does not update the docs so IMO it is incomplete either way.)

Seems overly restrictive to assume leading 0s means base-8

I'm not disagreeing, but this is what other parts of the ecosystem do. It is much better for Rust to reject 011 in IP addresses than for both Rust and POSIX to accept it and interpret it differently -- would you agree to that? Rust does not exist in a vacuum, we have to take into account the surrounding reality here, and reality often uses a leading 0 to indicate octal notation (and yes this makes no mathematical sense; whoever decided that decades ago didn't ask mathematicians it seems, but without a time machine no amount of good arguments is going to change this). We could probably stretch the limits here and reject 011 but accept 008 (since POSIX could not accept the latter), but that seems even more surprising and weird than just rejecting leading 0s outright.

The docs possibly should say that this is why leading 0s are rejected.

zacknewman · 2021-07-09T15:08:52Z

The docs should probably be clarified: the octets are in decimal notation, and leading 0s are disallowed (except, as usual, for a plain 0).

That is perfectly OK to me. While I would prefer something like 001.001.001.001 to be parseable because that is technically valid base-10 (at least mathematically)—I know my Nintendo Switch requires exactly 3 base-10 digits when typing in the IP—I realize that POSIX compatibility is very important.

I'm not disagreeing, but this is what other parts of the ecosystem do. It is much better for Rust to reject 011 in IP addresses than for both Rust and POSIX to accept it and interpret it differently -- would you agree to that? Rust does not exist in a vacuum, we have to take into account the surrounding reality here, and reality often uses a leading 0 to indicate octal notation (and yes this makes no mathematical sense; whoever decided that decades ago didn't ask mathematicians it seems, but without a time machine no amount of good arguments is going to change this). We could probably stretch the limits here and reject 011 but accept 008 (since POSIX could not accept the latter), but that seems even more surprising and weird than just rejecting leading 0s outright.

I absolutely agree. I was not aware POSIX stated that leading 0s meant base-8 until @cuviper pointed that out (i.e., my "complaint" would have been purely about the documentation and nothing else had I known this). I admit that I am probably in the minority when it comes to my background as a Rustacean. I've never written a line of assembly, C, C++, or kernel code in my life—something I'm not proud of, but c'est la vie—so it very well could be the case that "decimal notation" obviously means "mathematical base-10 with the added restriction that leading 0s are not allowed with the exception of 0 itself which is allowed to have any non-negative integer amount of leading 0s" to the majority of other Rustaceans.

Also want to add that what's considered "pedantry" to one may not be "pedantry" to another. While my 30+ years on this planet has shown me that my brain is wired very differently than most, I don't think this is an example of my brain thinking that differently. Case in point, I have a .CSV file of IPv4 addresses. Each row contains exactly 16 UTF-8 code units—with the last Unicode scalar value being a newline (i.e., U+000A). Based on the documentation, I wrote code that used this parsing algorithm; but of course, I was dismayed when I saw it fail. Ideally when one reads the documentation, they should have a good sense of the invariants and what's supposed to work and not. Obviously it didn't take long for me to change my implementation, but it would have been nice if I knew something like 001.001.001.001 would lead to a parsing failure a priori. Honestly even if two examples were added to the documentation that highlighted something like 001.001.001.001 leading to a parsing failure and 0000000.0.0000.00 leading to a parsing success would be enough for someone like me to deduce what "decimal" notation means in this context.

RalfJung · 2021-07-09T16:12:43Z

so it very well could be the case that "decimal notation" obviously means "mathematical base-10 with the added restriction that leading 0s are not allowed with the exception of 0 itself which is allowed to have any non-negative integer amount of leading 0s" to the majority of other Rustaceans.

I don't think it does. :) Leading 0s being code for "base 8" is a weird C thing that Rust did not do for its own integer literals, but that doesn't help us here. And either way the docs should be understandable for people with lots of different backgrounds, so if it is confusing for some, we should fix that! It's near impossible for one person (whoever writes the docs) to foresee all the ways it can be misleading, which is why documentation feedback and bugreports are extremely valuable. :)

So, we are in full agreement that the docs should document this more clearly. Usually, lading 0s are allowed, as you pointed out in your bugreport.

RalfJung · 2021-07-09T16:19:42Z

While I would prefer something like 001.001.001.001 to be parseable because that is technically valid base-10 (at least mathematically)—I know my Nintendo Switch requires exactly 3 base-10 digits when typing in the IP—I realize that POSIX compatibility is very important.

What I could imagine being useful (but I am not a libs person) is a parsing function that explicitly is not POSIX compatible, and that does interpret 010.010.010.010 entirely in decimal. But that should not be the default parsing function. I am not sure if the Rust stdlib is the right place for this, but it might be worth suggesting.

I am honestly quite shocked that POSIX would interpret this as 8.8.8.8, and I did not know that this is the case... but my ping agrees:

$ ping 010.010.010.010
PING 010.010.010.010 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=118 time=11.5 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=118 time=10.4 ms

The way octal literals are written in IP addresses differs from the way they are written in Rust code, so the way that octal/hex literals in IPs are written is explictly mentioned.

syvb · 2021-07-09T16:35:57Z

I've updated the documentation for Ipv4Addr to be more clear as what octal/hex literals are, and added some examples of which ones are rejected.

Now that there can't be a bunch of leading zeros, parsing can be optimized a bit.

zacknewman · 2021-07-09T17:35:03Z

Now this probably is a pedantic request—thus something I am not that passionate about and OK with not doing—but an example like the parsing of 0000.0.0.0 leading to a successful result despite having leading 0s is informative as that result does not obviously follow from the documentation (also has the property that more than 3 digits are allowed despite going against the "strict" form mentioned in RFC 6943 Section 3.1.1.).

Once I know how to do "pull requests", I won't ask others to do such a thing.

library/std/src/net/parser.rs

marmeladema · 2021-07-11T06:57:25Z

Does this require libs team signoff or an fcp or something? This is another breaking change after the one that rejected the octal notation. I've encountered the regression at work and will probably encounter this one too when its available in stable.

That being said, I think the PR does make things more consistent and coherent.

Mark-Simulacrum · 2021-07-11T14:57:25Z

r? @joshtriplett for libs-api(?) examination of this issue. I think it probably merits an FCP, at least...

library/std/src/net/parser.rs

Co-authored-by: Cheng XU <3105373+xu-cheng@users.noreply.github.com>

joshtriplett · 2021-10-04T03:22:27Z

I think we should go ahead with this change.

@rfcbot merge

rfcbot · 2021-10-04T03:22:28Z

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

BurntSushi · 2021-10-05T12:10:05Z

I was personally somewhat on the side of just interpreting octets as decimal, even with a leading zero, POSIX be damned. But one thing that convinced me otherwise was the fact that, well, those leading zeroes are indeed interpreted as indicating octal base in many other places. So if our routine accepts the same IP but gives a different result than others, that could indeed be quite surprising. So I think it's the right decision to fail here. (Or alternatively, just implement octal support.)

(I am not aware of the history of rejecting octal in this routine, so I may just be re-treading old ground, but wanted to mention it since there was some discussion above.)

rfcbot · 2021-10-05T15:59:14Z

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot · 2021-10-15T16:03:53Z

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

m-ou-se · 2021-10-20T18:37:58Z

@bors r+

bors · 2021-10-20T18:38:00Z

📌 Commit 403d269 has been approved by m-ou-se

Reject octal zeros in IPv4 addresses This fixes rust-lang#86964 by rejecting octal zeros in IP addresses, such that `192.168.00.00000000` is rejected with a parse error, since having leading zeros in front of another zero indicates it is a zero written in octal notation, which is not allowed in the strict mode specified by RFC 6943 3.1.1. Octal rejection was implemented in rust-lang#83652, but due to the way it was implemented octal zeros were still allowed.

Rollup of 14 pull requests Successful merges: - rust-lang#86984 (Reject octal zeros in IPv4 addresses) - rust-lang#87440 (Remove unnecessary condition in Barrier::wait()) - rust-lang#88644 (`AbstractConst` private fields) - rust-lang#89292 (Stabilize CString::from_vec_with_nul[_unchecked]) - rust-lang#90010 (Avoid overflow in `VecDeque::with_capacity_in()`.) - rust-lang#90029 (Add test for debug logging during incremental compilation) - rust-lang#90031 (config: add the option to enable LLVM tests) - rust-lang#90048 (Add test for line-number setting) - rust-lang#90071 (Remove hir::map::blocks and use FnKind instead) - rust-lang#90074 (2229 migrations small cleanup) - rust-lang#90077 (Make `From` impls of NonZero integer const.) - rust-lang#90097 (Add test for duplicated sidebar entries for reexported macro) - rust-lang#90098 (Add test to ensure that the missing_doc_code_examples is not triggered on foreign trait implementations) - rust-lang#90099 (Fix MIRI UB in `Vec::swap_remove`) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup

Disallow octal zeros in IPv4 addresses

734bfde

rust-highfive assigned Mark-Simulacrum Jul 8, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 8, 2021

zacknewman mentioned this pull request Jul 9, 2021

std::net::Ipv4Addr parsing violates the "strict" form in IETF RFC 6943 Section 3.1.1. #86964

Closed

Clarify docs on what IPv4 octal addresses are

69de693

The way octal literals are written in IP addresses differs from the way they are written in Rust code, so the way that octal/hex literals in IPs are written is explictly mentioned.

Reject too-long IPs quicker

b9b97bb

Now that there can't be a bunch of leading zeros, parsing can be optimized a bit.

xu-cheng reviewed Jul 10, 2021

View reviewed changes

library/std/src/net/parser.rs Outdated Show resolved Hide resolved

rust-highfive assigned joshtriplett and unassigned Mark-Simulacrum Jul 11, 2021

syvb added 2 commits July 11, 2021 11:33

Simplify leading zero checks

a331e5f

Add example with a bunch of leading zeos

ace518d

camelid added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jul 29, 2021

xu-cheng reviewed Aug 9, 2021

View reviewed changes

library/std/src/net/parser.rs Outdated Show resolved Hide resolved

Specify maximum IP address length

403d269

Co-authored-by: Cheng XU <3105373+xu-cheng@users.noreply.github.com>

camelid added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 29, 2021

joshtriplett added the relnotes Marks issues that should be documented in the release notes of the next release. label Oct 4, 2021

rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Oct 4, 2021

joshtriplett added S-waiting-on-fcp Status: PR is in FCP and is awaiting for FCP to complete. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 4, 2021

rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Oct 5, 2021

rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Oct 5, 2021

bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Oct 20, 2021

GuillaumeGomez mentioned this pull request Oct 20, 2021

Rollup of 9 pull requests #90109

Closed

JohnTitor mentioned this pull request Oct 21, 2021

Rollup of 15 pull requests #90118

Closed

JohnTitor mentioned this pull request Oct 21, 2021

Rollup of 14 pull requests #90119

Merged

bors merged commit 09de34c into rust-lang:master Oct 21, 2021

rustbot added this to the 1.58.0 milestone Oct 21, 2021

apiraino removed the to-announce Announce this issue on triage meeting label Oct 21, 2021

GREsau mentioned this pull request Sep 14, 2024

"ipv4" format validator rejects 0.x.y.z IPs Stranger6667/jsonschema#512

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reject octal zeros in IPv4 addresses #86984

Reject octal zeros in IPv4 addresses #86984

syvb commented Jul 8, 2021

rust-highfive commented Jul 8, 2021

cuviper commented Jul 8, 2021

zacknewman commented Jul 8, 2021 •

edited

Loading

cuviper commented Jul 9, 2021

zacknewman commented Jul 9, 2021 •

edited

Loading

cuviper commented Jul 9, 2021

RalfJung commented Jul 9, 2021 •

edited

Loading

zacknewman commented Jul 9, 2021 •

edited

Loading

RalfJung commented Jul 9, 2021 •

edited

Loading

RalfJung commented Jul 9, 2021

syvb commented Jul 9, 2021 •

edited

Loading

zacknewman commented Jul 9, 2021 •

edited

Loading

marmeladema commented Jul 11, 2021

Mark-Simulacrum commented Jul 11, 2021

joshtriplett commented Oct 4, 2021

rfcbot commented Oct 4, 2021 •

edited by m-ou-se

Loading

BurntSushi commented Oct 5, 2021 •

edited

Loading

rfcbot commented Oct 5, 2021

rfcbot commented Oct 15, 2021

m-ou-se commented Oct 20, 2021

bors commented Oct 20, 2021

Reject octal zeros in IPv4 addresses #86984

Reject octal zeros in IPv4 addresses #86984

Conversation

syvb commented Jul 8, 2021

rust-highfive commented Jul 8, 2021

cuviper commented Jul 8, 2021

zacknewman commented Jul 8, 2021 • edited Loading

cuviper commented Jul 9, 2021

zacknewman commented Jul 9, 2021 • edited Loading

cuviper commented Jul 9, 2021

RalfJung commented Jul 9, 2021 • edited Loading

zacknewman commented Jul 9, 2021 • edited Loading

RalfJung commented Jul 9, 2021 • edited Loading

RalfJung commented Jul 9, 2021

syvb commented Jul 9, 2021 • edited Loading

zacknewman commented Jul 9, 2021 • edited Loading

marmeladema commented Jul 11, 2021

Mark-Simulacrum commented Jul 11, 2021

joshtriplett commented Oct 4, 2021

rfcbot commented Oct 4, 2021 • edited by m-ou-se Loading

BurntSushi commented Oct 5, 2021 • edited Loading

rfcbot commented Oct 5, 2021

rfcbot commented Oct 15, 2021

m-ou-se commented Oct 20, 2021

bors commented Oct 20, 2021

zacknewman commented Jul 8, 2021 •

edited

Loading

zacknewman commented Jul 9, 2021 •

edited

Loading

RalfJung commented Jul 9, 2021 •

edited

Loading

zacknewman commented Jul 9, 2021 •

edited

Loading

RalfJung commented Jul 9, 2021 •

edited

Loading

syvb commented Jul 9, 2021 •

edited

Loading

zacknewman commented Jul 9, 2021 •

edited

Loading

rfcbot commented Oct 4, 2021 •

edited by m-ou-se

Loading

BurntSushi commented Oct 5, 2021 •

edited

Loading