-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Result length is wrong when some arguments use negative char values #3058
Comments
Not sure about msvc but the output and width look correct on gcc: https://godbolt.org/z/bGaMEn7xE. Please note that width is an estimated display width, not code units. So if you are using Unicode, 11 or 12 is correct depending on normalization and 10 is incorrect. 10 is only correct if you are using a legacy encoding. |
If you want width in code units you can wrap the string in
|
Agreed on the overall, though this is ANSI Western code source. The accents string effectively contains the 8 bits quantities { -23, 99, 111, 108, 101 }. The very same line of code, same source code, same compiler, using std::format instead of fmt::format does not exhibit the same length issue. Tracing the code while running shows that at some point this function is called: and that code_point_length() is called, taking the first byte value { -23 } for UTF-8. template // Compute the pointer to the next character early so that the next Is there a way to break that assumption that string arguments are UTF-8 encoded, other than patching 3 bazillions string arguments with fmt::bytes() to overcome this? Locales? Maybe the default locale implies UTF-8? I fully agree the issue does not seem to occur using some other compilers on Compiler Explorer, though I couldn't understand why, looking at the v9.0 code (which I'm not sure those other compilers were using). |
|
I missed that you are using a legacy encoding in your example, sorry. With the fix to handling invalid UTF-8 (#3056) it should give 10 now because an invalid code unit is counted as 1.
{fmt} is mostly encoding-agnostic. In your example encoding is only used for width estimation. So most call sites shouldn't be affected. |
Using fmt 9.0, with MSVC 17.x (C++20) :
With some gcc, the wrong cases give 11 instead of 12, but that is wrong too.
The text was updated successfully, but these errors were encountered: