Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fputws for outputting wide strings #1243

Merged
merged 1 commit into from
Jul 25, 2019

Conversation

jackoalan
Copy link
Contributor

The Windows console host will not correctly present wchar_t character streams unless the file mode is set to _O_WTEXT. These changes modify the various users of fwrite_fully to temporarily set the character mode with the aid of a RAII helper class.

I agree that my contributions are licensed under the {fmt} license, and agree to future changes to the licensing.

@jackoalan jackoalan force-pushed the windows-wtext-mode branch 2 times, most recently from 4c7501c to 46c3c82 Compare July 22, 2019 07:08
@vitaut
Copy link
Contributor

vitaut commented Jul 22, 2019

This should be done once per stream by the application, not per write call in the library.

@vitaut vitaut closed this Jul 22, 2019
@jackoalan
Copy link
Contributor Author

Unfortunately that means I cannot use fmtlib as a drop in replacement for an application containing a mix of printf and wprintf calls.

There is more to Microsoft's wprintf than simply "format text and fwrite it out", there is definitely some sort of _setmode operation taking place as well.

int main(int argc, char** argv) {
  printf("Hello %s!\n", "World");
  wprintf(L"Hello %s!\n", L"World");
  fmt::print("Hello {}!\n", "World");
  fmt::print(L"Hello {}!\n", L"World");
  return 0;
}

Appears as:
Annotation 2019-07-22 171213

Calling _setmode(_fileno(stdout), _O_WTEXT) once in the application is not an option either. This results in a failed assert on the first printf call in wtext mode.

For my stdout prints, I can use a fmt::print wrapper that does the necessary _setmode for wchar_t formatting strings, but I still think you should be aware of this behavior in Microsoft's C Runtime implementation of wprintf.

@jackoalan
Copy link
Contributor Author

Another solution for the correct behavior is using fputws with a null-terminated buffer rather than fwrite.

@Antidote
Copy link

I think the reasoning behind rejecting this PR isn't well thought out, normally implementing code doesn't need to worry about setting the mode of the stream since the API handles it automatically. Since fmtlib uses other means of implementing stream writing it's responsible for properly setting the mode imho.

@vitaut
Copy link
Contributor

vitaut commented Jul 23, 2019

You can easily write your own wrapper around print that does this if you need to switch between modes a lot.

@Antidote
Copy link

I think having it as part of fmtlib is the proper solution.

@vitaut
Copy link
Contributor

vitaut commented Jul 23, 2019

There is more to Microsoft's wprintf than simply "format text and fwrite it out", there is definitely some sort of _setmode operation taking place as well.

Hmm, OK. This is not a great design, but let's do the same for consistency reason.

@vitaut vitaut reopened this Jul 23, 2019
@Antidote
Copy link

Antidote commented Jul 23, 2019

Since when does anything Microsoft do qualify as good design? lol, thanks for reconsidering.

@jackoalan
Copy link
Contributor Author

@vitaut After doing some more experimentation, I've determined that wprintf does not do a _setmode after all. It actually does a WideCharToMultiByte conversion according to the C runtime locale (not the Windows ACP or other "locale" sources). Wide characters that fall outside the range of the target codepage are simply translated as ?.

The following vprint implementation achieves the same output as wprintf and is partially based on the wine implementation:

FMT_FUNC void vprint(std::FILE* f, wstring_view format_str, wformat_args args) {
  wmemory_buffer buffer;
  internal::vformat_to(buffer, format_str, args);

  UINT code_page = ___lc_codepage_func();
  int len_a =
    WideCharToMultiByte(code_page, WC_NO_BEST_FIT_CHARS, buffer.data(),
                        static_cast<int>(buffer.size()), nullptr, 0, nullptr,
                        nullptr);
  if (len_a == 0)
    return;

  memory_buffer mb_buffer;
  mb_buffer.resize(len_a);

  WideCharToMultiByte(code_page, WC_NO_BEST_FIT_CHARS, buffer.data(),
                      static_cast<int>(buffer.size()), mb_buffer.data(),
                      len_a, nullptr, nullptr);

  internal::fwrite_fully(mb_buffer.data(), 1, mb_buffer.size(), f);
}

Aside from Microsoft's dubious handling of character encodings, the ___lc_codepage_func accessor has me concerned because MSDN claims that it's an "internal" function which may potentially break. Despite that, it appears to have been historically available going all the way back to Windows 2000.

With no other changes aside from vprint, the following program

int main(int argc, char** argv) {
  std::setlocale(LC_ALL, "en_US.UTF-8");
  std::printf("Hello %s\n", "");
  std::wprintf(L"Hello %s\n", L"\x30C4");
  fmt::print("Hello {}\n", "");
  fmt::print(L"Hello {}\n", L"\x30C4");
  return 0;
}

appears as (Japanese code page)
Annotation 2019-07-23 175633

It also works in the fancy new Terminal application with the UTF-8 code page
Annotation 2019-07-23 180000

@vitaut
Copy link
Contributor

vitaut commented Jul 24, 2019

After doing some more experimentation, I've determined that wprintf does not do a _setmode after all. It actually does a WideCharToMultiByte conversion according to the C runtime locale (not the Windows ACP or other "locale" sources).

Interesting. Are you suggesting that we should do the same?

@jackoalan
Copy link
Contributor Author

If the goal is functional parity with wprintf (and fwprintf) then yes, this sort of character conversion is necessary on Windows.

On the POSIX side of things, Linux man pages of wprintf and fwide suggest the first I/O write will determine the stream's "orientation" (byte vs. wide) and further writes must remain consistent. No mention of what happens when this consistency is violated. I can experiment on that platform later.

@jackoalan
Copy link
Contributor Author

I just tried it out under glibc 2.29 on Arch Linux.

Violating fwide consistency in printf or wprintf will set errno to EILSEQ and refuse to write any characters of the offending call. Even more interesting is it also performs wide->multi-byte character conversion! Since glibc is open source I can even see how it works.

glibc essentially uses separate vtables for low-level I/O operations. byte and wide oriented streams each have separate tables implemented in fileops.c and wfileops.c respectively. Byte-oriented streams are pretty much direct write-troughs via the _IO_new_file_xsputn. Wide-oriented streams convert using a codecvt implementation internal to glibc

Unfortunately, fwrite absolutely requires byte-oriented streams otherwise it will write zero bytes without even setting errno. The only recourse for this is using fputws. I found some discussion on this matter.

@jackoalan
Copy link
Contributor Author

A POSIX-compliant pair of vprint implementations may look like so

FMT_FUNC void vprint(std::FILE* f, string_view format_str, format_args args) {
  if (std::fwide(f, -1) >= 0) {
    errno = EILSEQ;
    return;
  }
  memory_buffer buffer;
  internal::vformat_to(buffer, format_str,
                       basic_format_args<buffer_context<char>>(args));
  internal::fwrite_fully(buffer.data(), 1, buffer.size(), f);
}

FMT_FUNC void vprint(std::FILE* f, wstring_view format_str, wformat_args args) {
  if (std::fwide(f, 1) <= 0) {
    errno = EILSEQ;
    return;
  }
  wmemory_buffer buffer;
  internal::vformat_to(buffer, format_str, args);
  buffer.push_back(L'\0');
  if (std::fputws(buffer.data(), f) == -1) {
    FMT_THROW(system_error(errno, "cannot write to file"));
  }
}

Those errno sets might be better done with exceptions, but I'll leave that decision with you.

@huangqinjin
Copy link
Contributor

The behavior of wprintf is quite complex on Windows. It is influenced by the file translation mode, the c locale and whether FILE* is a regular file or Console. I suggest fputws or

    _lock_file(f);
    foreach wc in buffer
      _fputwc_nolock(wc, f);
    _unlock_file(f);

See also https://github.com/huangqinjin/wmain#ucrt-and-utf-8 and UCRT source.

@jackoalan
Copy link
Contributor Author

jackoalan commented Jul 24, 2019

I agree with @huangqinjin's assessment. There's a lot of systemic nuance that fmtlib really shouldn't have to take on; even in POSIX-land. My vote is for fputws if appending the null-terminator is an acceptable compromise.

The only discrepancy I've observed between fputws and wprintf on Windows is untranslatable characters come out as ? on wprintf (which indicates use of WideCharToMultiByte). fputws simply stops writing further characters (and returns -1) when this happens. This is a discrepancy I can live with though.

@jackoalan
Copy link
Contributor Author

jackoalan commented Jul 24, 2019

Since Microsoft implemented fwide as an inline passthrough of the second argument, those POSIX-compliant vprint implementations also work correctly on Windows.

EDIT: That if statement gets totally optimized out, so no harm in leaving it in
https://msvc.godbolt.org/z/Mf9j4q

@jackoalan jackoalan changed the title Add Windows _setmode calls to ensure correct character interpretation Use fputws for outputting wide strings Jul 25, 2019
@vitaut
Copy link
Contributor

vitaut commented Jul 25, 2019

OK, let's go with fputws then. @jackoalan, could you submit another PR or update this one?

@vitaut
Copy link
Contributor

vitaut commented Jul 25, 2019

Nevermind, I didn't see that you've already done this.

include/fmt/format-inl.h Outdated Show resolved Hide resolved
include/fmt/format-inl.h Outdated Show resolved Hide resolved
@huangqinjin
Copy link
Contributor

@vitaut @jackoalan , I suggest not to issue fwide. Each stream has an orientation and it's the users' responsibility to decide which orientation should be for the files they opened.

See https://stackoverflow.com/questions/8947949/mixing-cout-and-wcout-in-same-program.

@vitaut
Copy link
Contributor

vitaut commented Jul 25, 2019

I suggest not to issue fwide. Each stream has an orientation and it's the users' responsibility to decide which orientation should be for the files they opened.

I agree. fwide is redundant in the wide case and of little use in the multibyte case because wchar_t is virtually unused outside of Windows, so it just adds an overhead.

@jackoalan
Copy link
Contributor Author

Each stream has an orientation and it's the users' responsibility to decide which orientation should be for the files they opened.

By the time the corresponding overload of vprint is called, the user will have made their choice for that stream, even if it's the first write. I'm using fwide as a way to catch an inevitable exception before it occurs within fprint or fputws. I understand Windows will not encounter such orientation errors, but MS implemented fwide to behave as a no-op when used for pre-emptive error detection like this.

@jackoalan
Copy link
Contributor Author

I'll go ahead and remove the fwide. One way or another the user will see an exception.

Also adds fwide byte/wide orientation checking to verify streams are
able to receive the character type in question.

On Windows, the fwide calls are no-ops that pass through the second
arg and optimize out the if statement entirely.
@vitaut vitaut merged commit 8bd59ec into fmtlib:master Jul 25, 2019
@vitaut
Copy link
Contributor

vitaut commented Jul 25, 2019

Thanks!

@jackoalan
Copy link
Contributor Author

You're welcome ^^

@jackoalan jackoalan deleted the windows-wtext-mode branch July 25, 2019 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants