Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General floating point formatting in Debug with {:g?} #2729

Closed
wants to merge 4 commits into from

Conversation

ExpHP
Copy link

@ExpHP ExpHP commented Jul 21, 2019

Rendered

Proposes extending {:?} with a general number floating point flag g, similar to {:x?}:

assert_eq!(
    format!("{:g?}", vec![1e2, 1e4, 1e6, 1e8]),
    "[100.0, 10000.0, 1e6, 1e8]",
);

@ExpHP ExpHP changed the title General floating point formatting in Debug with {:g?}: General floating point formatting in 'Debug' with '{:g?}': Jul 21, 2019
@ExpHP ExpHP changed the title General floating point formatting in 'Debug' with '{:g?}': General floating point formatting in Debug with {:g?}: Jul 21, 2019
@ExpHP ExpHP changed the title General floating point formatting in Debug with {:g?}: General floating point formatting in Debug with {:g?} Jul 21, 2019
@Centril Centril added A-debugging Debugging related proposals & ideas A-primitive Primitive types related proposals & ideas T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Jul 21, 2019
@ExpHP
Copy link
Author

ExpHP commented Jul 31, 2019

Note: If people were on board with the much simpler alternative of simply changing the output of {:?} (which I honestly regard as broken), I'd be 100% behind that as well. I mean, strictly speaking, users aren't supposed to be relying on the output of Debug...

(though one major question is how to deal with the precision flag, since changing it to count sig figs would change the meaning of every line of code that uses Debug with a precision; perhaps adding a precision always switches to fixed-point)

@ghost
Copy link

ghost commented Aug 2, 2019

Just braimstorming. As the format invocation and the format string parsing is generated by the compiler, what if instead of providing this single char (radix?) formating, let the user provide some generic trait impl for this functionality:

To format numbers as hex.
'''
format!("{:?Hex}", 12u32);
impl Formatter for Hex {
... //Something simmilar to Debug::fmt
}
'''
If there no "specialization" for a type, it emits error or use the default formatter. Also the question remains what about structures. If the formatter applies to the structure or to the members (primitive types like integers) only.

I don't know if it is possible to implement. It is just an idea that makes formting costomizable by users.

@ExpHP
Copy link
Author

ExpHP commented Aug 3, 2019

Issue where the current behavior of Display actually creates confusion, because it implies that digits are zero when they are, in fact, not:

rust-lang/rust#63171

@hanna-kruppe
Copy link

hanna-kruppe commented Aug 3, 2019

@ExpHP

Note: If people were on board with the much simpler alternative of simply changing the output of {:?} (which I honestly regard as broken), I'd be 100% behind that as well. I mean, strictly speaking, users aren't supposed to be relying on the output of Debug...

This was attempted way back in rust-lang/rust#24612 but unfortunately this format change broke (and likely still breaks) real code, see rust-lang/rust#24612 (comment)

@ExpHP
Copy link
Author

ExpHP commented Aug 3, 2019

Notice that the snippet in rust-lang/rust#24612 (comment) uses Display rather than Debug. My hope is that a change to Debug should not break any legitimate code (and a lot of the "illegitimate" code can be fixed with a crater run).

@hanna-kruppe
Copy link

Notice that the snippet in rust-lang/rust#24612 (comment) uses Display rather than Debug.

This is true but I don't think it's particularly relevant. The two snippets there are examples, not exhaustive: the specific library they're from has become largely irrelevant in the meantime, but the pattern (ensuring .0 is present and doing it in a way that would break if scientific notation is emitted) seems timeless and just as likely to occur with Debug as with Display.

My hope is that a change to Debug should not break any legitimate code (and a lot of the "illegitimate" code can be fixed with a crater run).

I do not think it is useful to try to draw a distinction between "legitimate" and "illegitimate" here.

@gnzlbg
Copy link
Contributor

gnzlbg commented Oct 15, 2019

This is true but I don't think it's particularly relevant. The two snippets there are examples, not exhaustive: the specific library they're from has become largely irrelevant in the meantime, but the pattern (ensuring .0 is present and doing it in a way that would break if scientific notation is emitted) seems timeless and just as likely to occur with Debug as with Display.

Can't such code just switch from Debug to Display instead? It looks to me that this is what this code should be doing anyways if it wants to rely on the format of the output.

@KodrAus KodrAus added the Libs-Tracked Libs issues that are tracked on the team's project board. label Jul 29, 2020
@Andlon
Copy link

Andlon commented Jan 12, 2021

I was about to post on the internal threads, but I finally discovered this issue. Thanks for the detailed write-up @ExpHP!

As someone working on numerical code every day, I am encountering this issue all the time. I consider the current Debug implementation for f64 to be fundamentally broken, and I think we should absolutely fix it. The documentation even clearly states that it is not stable, and that you should not rely on it. I think with such a clear warning we simply cannot make accommodations for people who decide to rely on it in their code. Yes, it might break real world code, but only code that is arguably incorrect in the first place. But I do not think that should stop us from fixing something that is entirely broken.

Here's an example from my work today, using proptest to automatically generate some floating point input for tests (I added some manual line breaks so that it doesn't all end up in one line on GitHub):

; minimal failing input: (a, b) = (MockDenseMatrix { data: [26865945789578758000000000000000000000000000000000000000000000000000000000000000000000000.0, 0.00041121048146133344,
0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003088902966225219,
0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001755300981153612, 
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000008487974564642952,
0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002205322390190589],
rows: 2, cols: 3 },
MockDenseMatrix { data: [1958839630278024200000000000000000000000000000000000000000000000000000000000000000000000000000000.0, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006436185012755095, 
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000017683643727271159, 
0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006716059524056942, 
2625148506222154000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0,
 84310111535078340000000000000000000000000000000000000000000000000000000000000000.0],
rows: 2, cols: 3 })

I think it would be nice to get the proposed specifier, but can we no matter what please fix the Debug output?

I would also like to note that having a new format specifier also does not help me, because in many cases (such as the proptest case above), output is only produced as part of some generic machinery with T: Debug. In these cases, the machinery does not know anything about what it's outputting, so only the default specifier is reasonable to use.

@Andlon
Copy link

Andlon commented Jan 12, 2021

For completeness I'll also leave here parts of what I intended to post on the internals thread, before I discovered this RFC. I don't in any way intend any of this to supercede what @ExpHP presented - which is in any case much more thorough and well-researched, but just to serve as some (hopefully helpful) additional feedback.

Let me try to make some hopefully uncontroversial observations:

  • 0.1 is preferable over 1e-1.
  • 1.5 is preferable over 1.5e0.
  • 14.3 is preferable over 1.43e1.
  • 3.4e-8 is preferable over 0.000000034.
  • 1.23e8 is preferable over 123000000.
  • Bigger numbers tend to be easier to parse than smaller numbers. For example: 100000 appears to be easier to parse for a human than 0.00001, even though the former is 1e5 and the latter is 1e-5.

Based on these observations, a simple and effective alternative is to decide whether to use decimal or scientific notation would be to define a certain range for floating-point numbers x, L <= x < U, where L is the lower bound and U the upper bound. If x resides in this interval, then we use decimal format, otherwise we use scientific notation. If this sounds like a reasonable approach, then choosing L and U is where the potential bikeshedding begins. Let's try to keep in mind that we don't need a "perfect" solution here - since this anyway is very subjective and maybe also culturally dependent. I think almost any value of L and U would be an improvement over the current situation, however.

In any case, we need to get the ball rolling, and based on my observations above, I'd say that up to 5 digits before the decimal seems reasonable to parse, and maybe only 3 after the decimal. This would make L = 1e-3 and U = 1e6. To exemplify:

  • 9.99999e-4 (etc.) is the last number outside the range on the lower end. 0.001, 0.002 and so on follows.
  • In the middle we have 1, 10, 100, 1000, 10000, 100000, 999999.999999
  • Starting in the millions we have 1e6, 1.1e6 and so on.

Note that this is actually not quite my personal preference: as a scientist I'd personally prefer U to be lower and L to be higher, but I recognize this probably does not apply to most people.

@ExpHP
Copy link
Author

ExpHP commented Jan 16, 2021

I would also like to note that having a new format specifier also does not help me, because in many cases (such as the proptest case above), output is only produced as part of some generic machinery with T: Debug. In these cases, the machinery does not know anything about what it's outputting, so only the default specifier is reasonable to use.

Yes! This is precisely the kind of stuff that drives me to the solution of replacing Debug!

The documentation even clearly states that it is not stable, and that you should not rely on it. I think with such a clear warning we simply cannot make accommodations for people who decide to rely on it in their code. Yes, it might break real world code, but only code that is arguably incorrect in the first place. But I do not think that should stop us from fixing something that is entirely broken.

So, here, I'm actually going to play a bit of devil's advocate. Because of course, I agree with you; but as much as I like to argue that "people shouldn't be relying on Debug output," in retrospect it's not entirely fair to conclude from this that "therefore, there's no problem." I suppose at this point it is important for us to at least try and understand the reasons why people may be relying on it anyways; and who knows, perhaps discussing this could help us find a way to finally move forward?

I can think of at least a couple of reasons somebody might depend on the output of {:?}:

  • I myself sometimes use {:?} for floats in meaningful output, specifically in places where I require a trailing .0 on integers. Oftentimes I do this in code that generates code, where I just need it to produce a valid floating-point literal in some language at roundtrip precision, in projects that are too small and short-lived for me to care to add a dependency like dtoa. Of course, this is really what {:#} ought to do...
  • The insta crate lets you create tests that take snapshots of some string output, similar to rustc's own tests for warnings and errors. Whenever the output changes, the tests fail and let you review all of the changes to make sure that the new outputs fit with expectation. Notably, the crate provides a assert_debug_snapshot! macro which compares {:#?} outputs; I can imagine that many people may find this attractive as it very easily lets one use insta on arbitrary structs.

My gut feeling is that most breakage is probably unit test breakage like the second bullet, but I don't know for sure.

It might also be worth looking back at the fallout of the change that added .0 to see what other kinds of breakage occurred...

@Andlon
Copy link

Andlon commented Jan 17, 2021

So, here, I'm actually going to play a bit of devil's advocate. Because of course, I agree with you; but as much as I like to argue that "people shouldn't be relying on Debug output," in retrospect it's not entirely fair to conclude from this that "therefore, there's no problem." I suppose at this point it is important for us to at least try and understand the reasons why people may be relying on it anyways; and who knows, perhaps discussing this could help us find a way to finally move forward?

I completely agree - it's not a change that should be made lightly, and it's important to understand current use cases, but I'd also be wary of being overly cautious - in this case we might never see this change implemented, which is ultimately to the detriment of developers.

  • I myself sometimes use {:?} for floats in meaningful output, specifically in places where I require a trailing .0 on integers. Oftentimes I do this in code that generates code, where I just need it to produce a valid floating-point literal in some language at roundtrip precision, in projects that are too small and short-lived for me to care to add a dependency like dtoa. Of course, this is really what {:#} ought to do...

I can understand this sort of code for quick and dirty projects - I could see myself doing this for unimportant or very short-lived code. However, for any serious project, this is exactly the kind of "incorrect" use case I had in mind earlier - if you're actively going against the warnings of the documentation, then you'll have to be able to accept the consequences if the behavior changes in the future.

  • The insta crate lets you create tests that take snapshots of some string output, similar to rustc's own tests for warnings and errors. Whenever the output changes, the tests fail and let you review all of the changes to make sure that the new outputs fit with expectation. Notably, the crate provides a assert_debug_snapshot! macro which compares {:#?} outputs; I can imagine that many people may find this attractive as it very easily lets one use insta on arbitrary structs.

Yes, test code like this will break, that's true. I think, however, that must be considered acceptable breakage.

My gut feeling is that most breakage is probably unit test breakage like the second bullet, but I don't know for sure.

I hope so!

Do you have an idea for an actionable plan for finding out about current use cases for the Debug output, so that we can make progress on the issue of whether improving the default Debug output is justifiable?

@ExpHP
Copy link
Author

ExpHP commented Jan 19, 2021

Hm. Well, while it only finds a relatively small portion of all code that will break, by far the most powerful tool at our disposal for discovering such possible problems is crater. In particular:

  1. We can review the results of the past crater run, when .0 was added.
  2. We can do a new crater run for changing {:?} to go exponential.

For the second bullet, from what I recall, it is not uncommon for experimental PRs to be filed against the Rust repo without the intent to merge, in order to have crater runs performed on them. I don't know if there's any more to the process than this. (Are eRFCs even still a thing? 🤷)


In the meanwhile, I did a bit of looking into number 1 and here's what I found. Here's some notable links:

I would expect to find useful things in the linkbacks to these pages. The PR has very few linkbacks, but the crater issue has a couple, most notably some of the changes people made to fix the errors. These are all of the ones I noticed:

Unsurprisingly, all of these changes appear to simply be fixing failing tests, as evidenced by the fact that they all modify test code without modifying non-test code.

However, I also notice something that I didn't consider before: I expected that these would all simply change the expected output string. In actuality, four out of five of these also changed the test, in order to make it produce the same output in both old and new versions of rust (e.g. using 1.5 instead of 1.0). I imagine this is because these these crates either have MSRVs, or they at least probably test both stable and nightly rust in CI testing. Nonetheless, it does tell us that people will likely need some way to write tests in a way that produces identical output for both old and new rust.

For the one crate that only changed expected outputs instead of tests (turtle), it's author also left this comment on the issue:

[ ... ]

In this case, for should_panic, we only get to pass in a naive string, so of course changes like this will break a test like that. We don't even get to set formatting options so any of the workarounds suggested will not work. I'm including the debug representation in what I am testing because I want to make sure that its values match what I expect. Changing is a big deal and it broke my code in a way I really couldn't have prevented.

[ ... ]

...and I feel like this theme of "prevention" is similar to what I remarked above.

@Andlon
Copy link

Andlon commented Jan 19, 2021

Thanks for digging through this @ExpHP! That's some very useful insight. It strikes me that most people who rely on the current floating-point Debug implementation probably do so for numbers that would remain unaffected with the proposed changes. For example, if we assume that most usage is in tests, then unit tests are probably likely to use "human-sized" floating point numbers (e.g. 1.5, 1.0 or similar), not tiny or very large numbers.

This is for example the case with the turtle changes. Of course - that was for a different issue, so it is not representative. It would be interesting to attempt to test this hypothesis with a crater run...

@yaahc
Copy link
Member

yaahc commented Jun 2, 2021

Hi @ExpHP, we discussed this RFC in today's libs team meeting and we agreed with the problem and solutions you've described. In particular we were most interested in the second alternative proposal you suggested in the RFC and in this comment:

Note: If people were on board with the much simpler alternative of simply changing the output of {:?} (which I honestly regard as broken), I'd be 100% behind that as well. I mean, strictly speaking, users aren't supposed to be relying on the output of Debug...

We're much more apprehensive about adding a new formatting flags than we are about changing the Debug format, given that our Debug stability policy explicitly reserves the right to change Debug output. As you noted we have already broken debug output before, and the process of resolving breakages is exactly as you described in #2729 (comment). We will run crater to find crates whose tests are broken by depending on Debug formats (assuming we only break tests) and work with them to get their tests updated.

As such we'd like to see an implementation of the proposed general float format as part of the default output of {:?}. There's no need for this to continue to be managed as an RFC, so please don't worry about updating the RFC to focus on the alternative proposal (unless you really want to :D). The next steps will be to implement the proposed change in a PR which we will approve via FCP directly.

Also, our discussion did raise a few concerns that need to be resolved before we can merge the aforementioned PR:

  • investigate breakage (aka run crater)
  • Make sure no precision is lost. Should round-trip with parse().

Thanks again for the proposal and well written RFC!

@yaahc yaahc removed the I-nominated label Jun 2, 2021
I will be referring to certain parts of this RFC from a rust-lang/rust PR (in particular the research on other languages) and wanted to make sure these details were present
@ExpHP
Copy link
Author

ExpHP commented Jun 20, 2021

I am now working on a PR to change Debug, but for the moment I'm going to post a bit about a consideration that is different for {:?} than it was for {:g?}. This comment is intended to help bridge between some of the arguments made in the RFC, and some of the decisions I'm making in the PR (and I'll be linking to it from there).


Choice of Thresholds for {:?}

In the RFC, I chose 1e-4 and 1e6 as the thresholds for {:g?}, based on the very widespread prevalence of these thresholds for %g in other languages. However, {:g?} was something that you opt into. 1e6 seems pretty aggressive for {:?}, and from the survey in the RFC it seems clear that most other languages with Debug analogues agree.

Initially in my PR I am choosing 1e-4 and 1e16. These happen to coincide with the thresholds of nim.

Why such an asymmetric interval? We want a large upper threshold to avoid being overzealous, and numbers up to at least 2**52 for f64 make effective use of all of the digits in their non-exponential representation. In contrast, numbers less than 1 always begin with a leading string of zeros that serves no purpose other than to indicate their magnitude.

Why 1e16? Really, any large limit would do, but this is a "convenient" threshold because, for f64, it dodges cases like format!(2e16 + 8.0) == 20000000000000010.0" where dummy zeroes left of the decimal point on integers look absurd due to the additional precision suggested by the .0.

Important: this is only a bonus feature of this limit. f32 will still have examples of the dummy-zero phenomenon, and I don't think this alone is enough to justify assigning a different threshold to f32. (That said, I did look at nim's float32, and while it uses the same thresholds, it doesn't print dummy zeros for exact integers (contrast with rust). Make of that what you will...)

@ExpHP
Copy link
Author

ExpHP commented Oct 19, 2021

Status: rust-lang/rust#86479 has made it through FCP. That PR makes automatic exponential the default output of Debug (suppressed when a precision is provided), which adequately addresses the motivations of this RFC.

Thus, I'll be closing this for now as it seems obsolete.

@ExpHP ExpHP closed this Oct 19, 2021
JohnTitor added a commit to JohnTitor/rust that referenced this pull request Oct 19, 2021
…, r=yaahc

Automatic exponential formatting in Debug

Context: See [this comment from the libs team](rust-lang/rfcs#2729 (comment))

---

Makes `"{:?}"` switch to exponential for floats based on magnitude. The libs team suggested exploring this idea in the discussion thread for RFC rust-lang/rfcs#2729. (**note:** this is **not** an implementation of the RFC; it is an implementation of one of the alternatives)

Thresholds chosen were 1e-4 and 1e16.  Justification described [here](rust-lang/rfcs#2729 (comment)).

**This will require a crater run.**

---

As mentioned in the commit message of 8731d4d, this behavior will not apply when a precision is supplied, because I wanted to preserve the following existing and useful behavior of `{:.PREC?}` (which recursively applies `{:.PREC}` to floats in a struct):

```rust
assert_eq!(
    format!("{:.2?}", [100.0, 0.000004]),
    "[100.00, 0.00]",
)
```

I looked around and am not sure where there are any tests that actually use this in the test suite, though?

All things considered, I'm surprised that this change did not seem to break even a single existing test in `x.py test --stage 2`.  (even when I tried a smaller threshold of 1e6)
@Pat-Lafon Pat-Lafon mentioned this pull request Jun 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-debugging Debugging related proposals & ideas A-primitive Primitive types related proposals & ideas Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants