Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default floating-point formatting does not produce shortest outputs; mismatch with std::format #3649

Closed
jk-jeon opened this issue Sep 18, 2023 · 13 comments

Comments

@jk-jeon
Copy link
Contributor

jk-jeon commented Sep 18, 2023

As far as I understand, the default formatting option should produce the shortest output, not just in the number of significand digits, but also in the number of actual characters. At least that seems to be how std::format is specified, according to the std::to_chars specifications.

However, it seems currently fmt picks the fixed-point format whenever the exponent is between -4 and 16, regardless of the number of characters it will produce:

const int exp_lower = -4, exp_upper = 16;

Is this an intended divergence? Or maybe I misunderstood how std::format is specified?

For what it's worth, it seems MS STL implementation of std::format does what I described.

@vitaut
Copy link
Contributor

vitaut commented Sep 24, 2023

fmt::format is modeled after Python's str.format where shortest refers to the precision, not the full output. std::format diverged a bit because it was specified in terms of to_chars.

@jk-jeon
Copy link
Contributor Author

jk-jeon commented Sep 26, 2023

I honestly feel like the shortest string is what people may expect, but that's of course just a subjective opinion. If you are going to change the behavior (or accept a PR that does so) in the future, it would be great. If not, please feel free to close this, but I think this difference needs to be documented anyway in places like https://fmt.dev/dev/api.html#compatibility-with-c-20-std-format.

@vitaut
Copy link
Contributor

vitaut commented Sep 30, 2023

I am open to PRs to address this backed by more analysis of the effects of the change and concrete examples.

@scurest
Copy link

scurest commented Feb 10, 2024

Note that this also results in the rather surprising (to me) behavior that eg 123456792.0f formats as "123456790", the last digit apparently being wrong. But these roundtrip to the same float and 123456790 is shorter in the sense of having fewer sigfigs.

std::to_chars formats it as 123456792.

@vitaut
Copy link
Contributor

vitaut commented Feb 10, 2024

This is unrelated and I am surprised that to_chars produces "garbage" digits in this case.

@jessey-git
Copy link

Why is that "garbage" in this case? That value is perfectly representable as a float. Here's a nicely formatted sweep of some values for example: https://godbolt.org/z/a3Y8r1v6K

Is there a way to control the number of digits that rounds in this particular case, and without exponential notation, or should this be filed as another issue altogether?

@vitaut
Copy link
Contributor

vitaut commented Feb 10, 2024

That's the term they used in the Steele and White paper. You can control precision, so there is no issue here.

@jk-jeon
Copy link
Contributor Author

jk-jeon commented Feb 12, 2024

So this seems to be because std::to_chars is specified in terms of the number of characters, not the number of decimal digits. 123456784 and 123456780 are both of the shortest length, but the former is closer to the true value, so the implementation faithfully following the std spec must print the former.

So... this is interesting... we may need to look at what std::to_chars implementers have done if we ever want this behavior to be implemented in fmt.

EDIT:
Here is the relevant code from microsoft/STL:

https://github.com/microsoft/STL/blob/192a84008a59ac4d2e55681e1ffac73535788674/stl/inc/xcharconv_ryu.h#L1368
https://github.com/microsoft/STL/blob/192a84008a59ac4d2e55681e1ffac73535788674/stl/inc/xcharconv_ryu.h#L1406

@vitaut
Copy link
Contributor

vitaut commented Jan 26, 2025

Did some investigation what other languages do:

Python:

>>> 1234000000000000.0
1234000000000000.0
>>> 12340000000000000.0
1.234e+16

Java:

System.out.println(1234000.0);
System.out.println(12340000.0);
1234000.0
1.234E7

Rust (using debug since the default format appears to be fixed):

println!("{:?}", 1234000000000000.0);
println!("{:?}", 12340000000000000.0);
1234000000000000.0
1.234e16

JavaScript:

> 123400000000000000000.0.toString()
'123400000000000000000'
> 1234000000000000000000.0.toString()
'1.234e+21'

Swift:

print(1234000000000000.0)
print(12340000000000000.0)
1234000000000000.0
1.234e+16

Python and Rust have the same threshold as {fmt} which is not surprising since the latter two formatting facilities are based on Python's. Swift has the same threshold too.

Java has threshold of 7: https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#toString-double-.

JavaScript has even larger threshold of 21 because why not.

@vitaut
Copy link
Contributor

vitaut commented Jan 26, 2025

Another observation is that having a threshold of 16 is slightly weird for FP representations other than IEEE754 binary64.

@vitaut
Copy link
Contributor

vitaut commented Jan 26, 2025

The choice of shortest output size doesn't seem to be motivated in P0067, contradict other choices of representation (redundant exponent digits and sign) and was definitely not intentional in std::format, just an artefact of its dependence on std::to_chars. It is probably too late to change the standard but we don't need to repeat the same mistake in {fmt} so I think the resolution here is to use a better threshold for smaller FP representations and document this small divergence (that doesn't affect round trip).

@vitaut
Copy link
Contributor

vitaut commented Jan 26, 2025

Clarified the difference in 373855c and will consider writing a paper to fix this in std::format. Thanks for reporting!

@vitaut vitaut closed this as completed Jan 26, 2025
@jk-jeon
Copy link
Contributor Author

jk-jeon commented Jan 28, 2025

Nice, it would be nice if this fix also lands in std::format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants