Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed UTF8/16 converters to support empty string input #676

Closed
wants to merge 3 commits into from

Conversation

vgalka-sl
Copy link
Contributor

Previously an exception was thrown since Win32 WideCharToMultiByte() API returns error on zero-length input.

Previously an exception was thrown since Win32 WideCharToMultiByte API returns error on zero-length input.
@vitaut
Copy link
Contributor

vitaut commented Mar 12, 2018

UTF16ToUTF8 has been renamed to utf16_to_utf8 in the master to follow standards conventions. Could you update the PR?

@vgalka-sl
Copy link
Contributor Author

Sure, thanks! My apologies, I ported it from v4.1.0 based branch.
Pushed, lets see if all is OK now...

Copy link
Contributor

@vitaut vitaut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few minor comments.

@@ -280,9 +280,16 @@ const uint64_t internal::basic_data<T>::POWERS_OF_10_64[] = {

FMT_FUNC internal::utf8_to_utf16::utf8_to_utf16(string_view s) {
static const char ERROR_MSG[] = "cannot convert string from UTF-8 to UTF-16";
if (s.size() > INT_MAX)
if (s.size() > INT_MAX || s.data() == nullptr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add a nullptr check? Won't MultiByteToWideChar return an error in this case already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but without this check the function will now succeed for invalid input of nullptr and 0 length (because the size check precedes the call to MultiByteToWideChar). Clearly this is not the best behavior. Moreover, it fails some tests (UTF16ToUTF8Error/UTF8ToUTF16Error/UTF16ToUTF8Convert).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is nullptr and 0 invalid? If size is zero than the data can be anything including nullptr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is a conceptual discussion, and probably a matter of definition and taste... Personally, I think that nullptr is a special case reserved for "invalid data" and this fact should not be silently discarded. I don't see significant advantages to the other approach, but if you prefer it, the code can be easily changed by reverting this check.
However, please note that the above mentioned tests assume that (nullptr, 0) is invalid input. So, if we decide that it becomes valid, we need to redefine the tests logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty string view has the null data pointer (https://godbolt.org/g/JtWcst), so I suggest removing the nullptr check here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good point :-)
Thanks!

buffer_.resize(1);
buffer_[0] = 0;
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the Google style guide (2-space indent and open brace on the same line:

if (s_size == 0) { // MultiByteToWideChar does not support zero length, handle separately
  buffer_.resize(1);
  ...

Same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will fix.
Shall I submit a new (v2) pull-request, to reduce the fix commits clutter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they can be squashed together before merging.

@vitaut
Copy link
Contributor

vitaut commented Mar 14, 2018

Merged with minor changes in acb469a, thanks!

@vitaut vitaut closed this Mar 14, 2018
@vitaut
Copy link
Contributor

vitaut commented Aug 17, 2019

@vgalka-sl, could you please review the updated CONTIBUTING document, particularly the part about licensing, and let me know if you agree with it being applied to your contributions to {fmt}? The library is likely to be relicensed (#1073) so I'm collecting approval from all earlier contributors. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants