Default floating-point formatting does not produce shortest outputs; mismatch with `std::format` #3649

jk-jeon · 2023-09-18T22:52:21Z

As far as I understand, the default formatting option should produce the shortest output, not just in the number of significand digits, but also in the number of actual characters. At least that seems to be how std::format is specified, according to the std::to_chars specifications.

However, it seems currently fmt picks the fixed-point format whenever the exponent is between -4 and 16, regardless of the number of characters it will produce:

fmt/include/fmt/format.h

Line 2644 in 3baaa8d

const int exp_lower = -4, exp_upper = 16;

Is this an intended divergence? Or maybe I misunderstood how std::format is specified?

For what it's worth, it seems MS STL implementation of std::format does what I described.

The text was updated successfully, but these errors were encountered:

vitaut · 2023-09-24T14:26:14Z

fmt::format is modeled after Python's str.format where shortest refers to the precision, not the full output. std::format diverged a bit because it was specified in terms of to_chars.

jk-jeon · 2023-09-26T17:15:37Z

I honestly feel like the shortest string is what people may expect, but that's of course just a subjective opinion. If you are going to change the behavior (or accept a PR that does so) in the future, it would be great. If not, please feel free to close this, but I think this difference needs to be documented anyway in places like https://fmt.dev/dev/api.html#compatibility-with-c-20-std-format.

vitaut · 2023-09-30T15:00:27Z

I am open to PRs to address this backed by more analysis of the effects of the change and concrete examples.

scurest · 2024-02-10T13:08:47Z

Note that this also results in the rather surprising (to me) behavior that eg 123456792.0f formats as "123456790", the last digit apparently being wrong. But these roundtrip to the same float and 123456790 is shorter in the sense of having fewer sigfigs.

std::to_chars formats it as 123456792.

vitaut · 2024-02-10T14:47:58Z

This is unrelated and I am surprised that to_chars produces "garbage" digits in this case.

jessey-git · 2024-02-10T21:34:09Z

Why is that "garbage" in this case? That value is perfectly representable as a float. Here's a nicely formatted sweep of some values for example: https://godbolt.org/z/a3Y8r1v6K

Is there a way to control the number of digits that rounds in this particular case, and without exponential notation, or should this be filed as another issue altogether?

vitaut · 2024-02-10T22:36:35Z

That's the term they used in the Steele and White paper. You can control precision, so there is no issue here.

jk-jeon · 2024-02-12T07:03:09Z

So this seems to be because std::to_chars is specified in terms of the number of characters, not the number of decimal digits. 123456784 and 123456780 are both of the shortest length, but the former is closer to the true value, so the implementation faithfully following the std spec must print the former.

So... this is interesting... we may need to look at what std::to_chars implementers have done if we ever want this behavior to be implemented in fmt.

EDIT:
Here is the relevant code from microsoft/STL:

https://github.com/microsoft/STL/blob/192a84008a59ac4d2e55681e1ffac73535788674/stl/inc/xcharconv_ryu.h#L1368
https://github.com/microsoft/STL/blob/192a84008a59ac4d2e55681e1ffac73535788674/stl/inc/xcharconv_ryu.h#L1406

vitaut · 2025-01-26T18:16:38Z

Did some investigation what other languages do:

Python:

>>> 1234000000000000.0
1234000000000000.0
>>> 12340000000000000.0
1.234e+16

Java:

System.out.println(1234000.0);
System.out.println(12340000.0);

1234000.0
1.234E7

Rust (using debug since the default format appears to be fixed):

println!("{:?}", 1234000000000000.0);
println!("{:?}", 12340000000000000.0);

1234000000000000.0
1.234e16

JavaScript:

> 123400000000000000000.0.toString()
'123400000000000000000'
> 1234000000000000000000.0.toString()
'1.234e+21'

Swift:

print(1234000000000000.0)
print(12340000000000000.0)

1234000000000000.0
1.234e+16

Python and Rust have the same threshold as {fmt} which is not surprising since the latter two formatting facilities are based on Python's. Swift has the same threshold too.

Java has threshold of 7: https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#toString-double-.

JavaScript has even larger threshold of 21 because why not.

vitaut · 2025-01-26T18:18:03Z

Another observation is that having a threshold of 16 is slightly weird for FP representations other than IEEE754 binary64.

vitaut · 2025-01-26T19:00:59Z

The choice of shortest output size doesn't seem to be motivated in P0067, contradict other choices of representation (redundant exponent digits and sign) and was definitely not intentional in std::format, just an artefact of its dependence on std::to_chars. It is probably too late to change the standard but we don't need to repeat the same mistake in {fmt} so I think the resolution here is to use a better threshold for smaller FP representations and document this small divergence (that doesn't affect round trip).

vitaut · 2025-01-26T21:51:31Z

Clarified the difference in 373855c and will consider writing a paper to fix this in std::format. Thanks for reporting!

jk-jeon · 2025-01-28T03:57:35Z

Nice, it would be nice if this fix also lands in std::format.

vitaut mentioned this issue Sep 30, 2023

non-obvious formatting of floating point numbers #3657

Closed

jk-jeon mentioned this issue Feb 20, 2024

Some more issues about to_chars boostorg/charconv#166

Open

jk-jeon mentioned this issue Jul 6, 2024

Floating-point numbers are uglified, a.k.a. write shortest floating-point representation with round-trip guarantee jbeder/yaml-cpp#1289

Closed

jk-jeon mentioned this issue Jul 20, 2024

JSON compliant to_chars? jk-jeon/dragonbox#65

Closed

vitaut added a commit that referenced this issue Jan 26, 2025

Make exponent threshold depend on representation (#3649)

5fa88c8

vitaut added a commit that referenced this issue Jan 26, 2025

Make exponent threshold depend on representation (#3649)

52eeeb5

vitaut closed this as completed Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default floating-point formatting does not produce shortest outputs; mismatch with `std::format` #3649

Default floating-point formatting does not produce shortest outputs; mismatch with `std::format` #3649

jk-jeon commented Sep 18, 2023

vitaut commented Sep 24, 2023

jk-jeon commented Sep 26, 2023

vitaut commented Sep 30, 2023

scurest commented Feb 10, 2024

vitaut commented Feb 10, 2024

jessey-git commented Feb 10, 2024

vitaut commented Feb 10, 2024 •

edited

Loading

jk-jeon commented Feb 12, 2024 •

edited

Loading

vitaut commented Jan 26, 2025 •

edited

Loading

vitaut commented Jan 26, 2025

vitaut commented Jan 26, 2025

vitaut commented Jan 26, 2025

jk-jeon commented Jan 28, 2025

Default floating-point formatting does not produce shortest outputs; mismatch with std::format #3649

Default floating-point formatting does not produce shortest outputs; mismatch with std::format #3649

Comments

jk-jeon commented Sep 18, 2023

vitaut commented Sep 24, 2023

jk-jeon commented Sep 26, 2023

vitaut commented Sep 30, 2023

scurest commented Feb 10, 2024

vitaut commented Feb 10, 2024

jessey-git commented Feb 10, 2024

vitaut commented Feb 10, 2024 • edited Loading

jk-jeon commented Feb 12, 2024 • edited Loading

vitaut commented Jan 26, 2025 • edited Loading

vitaut commented Jan 26, 2025

vitaut commented Jan 26, 2025

vitaut commented Jan 26, 2025

jk-jeon commented Jan 28, 2025

Default floating-point formatting does not produce shortest outputs; mismatch with `std::format` #3649

Default floating-point formatting does not produce shortest outputs; mismatch with `std::format` #3649

vitaut commented Feb 10, 2024 •

edited

Loading

jk-jeon commented Feb 12, 2024 •

edited

Loading

vitaut commented Jan 26, 2025 •

edited

Loading