You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From discussion with @williamjcm and @sthalikon Gitter. To avoid UTF-16 conversion in every Utility::Path API that deals with the filesystem on Windows (and other areas, like environment access), we could pass UTF-8 support directly into the A APIs. It's an opt-in feature, and there's three ways to achieve this:
Changing the global code-page in Windows settings. Requires user interaction, so not a viable option.
Linking to UCRT instead of MSVCRT and calling setlocale(LC_ALL, ".utf-8"). Available since Windows 10 SDK 1803.
(Likely also linking to UCRT) and adding an entry to the app manifest. Requires Windows 10 SDK 1903+.
The second option could be done inside CorradeMain, the third variant documented alongside HiDPI support, for example. Then, all Path APIs would check the prerequisites (Windows version, UCRT vs MSVCRT, and if the codepage is set to UTF-8) and pick a more optimal path in that case.
TODOs left:
Figure out a way how to robustly check that we can use UTF-8. Is UTF-8 codepage presence enough (checked with setlocale(LC_ALL, nullptr))?. Or do we also need to check for Windows version and/or UCRT presence?
Figure out a way how to check just once and store it in some global variable instead of doing the check again in every Path API, without running into thread safety, thread-local variables, duplicated globals and other nasty issues in yet another place.
Though some rough 3rd party code could setlocale() on its own and break it, so there's probably no way around checking every time :/
In the ANSI version of this function, the name is limited to MAX_PATH characters. To extend this limit to 32,767 wide characters, call the Unicode version of the function and prepend \\?\ to the path. For more information, see Naming Files, Paths, and Namespaces.
Which makes this whole effort rather useless. But maybe there's other ways how to circumvent this?
Maybe it could fall back to the *W APIs if the input UTF-8 path is longer than MAX_PATH? That could make it work for 90% of use cases, OTOH it means we have to explicitly test each and every Path API to handle this well. Though since we have to have that fallback for when the locale changes again (as noted above) anyway, it shouldn't mean that much extra code.
Setting the code page to UTF-8 may be considered "not nice" to 3rd party libraries that still rely on *A APIs. Consider if a compile-time opt-out for this feature is enough or if it should be opt-in (for example to be enabled by the users if they know it won't break 3rd party stuff).
Or, possibly, don't set anything but use UTF-8 if the codepage is discovered to be UTF-8? Seems like the least intrusive option, but still without falling back to UTF-16 conversion.
The text was updated successfully, but these errors were encountered:
From discussion with @williamjcm and @sthalik on Gitter. To avoid UTF-16 conversion in every
Utility::Path
API that deals with the filesystem on Windows (and other areas, like environment access), we could pass UTF-8 support directly into theA
APIs. It's an opt-in feature, and there's three ways to achieve this:setlocale(LC_ALL, ".utf-8")
. Available since Windows 10 SDK 1803.The second option could be done inside
CorradeMain
, the third variant documented alongside HiDPI support, for example. Then, all Path APIs would check the prerequisites (Windows version, UCRT vs MSVCRT, and if the codepage is set to UTF-8) and pick a more optimal path in that case.TODOs left:
Figure out a way how to robustly check that we can use UTF-8. Is UTF-8 codepage presence enough (checked with
setlocale(LC_ALL, nullptr)
)?. Or do we also need to check for Windows version and/or UCRT presence?Figure out a way how to check just once and store it in some global variable instead of doing the check again in every
Path
API, without running into thread safety, thread-local variables, duplicated globals and other nasty issues in yet another place.setlocale()
on its own and break it, so there's probably no way around checking every time :/The
*A
APIs still have theMAX_PATH
limitation, and it's apparently impossible to work around that:Which makes this whole effort rather useless. But maybe there's other ways how to circumvent this?
*W
APIs if the input UTF-8 path is longer thanMAX_PATH
? That could make it work for 90% of use cases, OTOH it means we have to explicitly test each and every Path API to handle this well. Though since we have to have that fallback for when the locale changes again (as noted above) anyway, it shouldn't mean that much extra code.Setting the code page to UTF-8 may be considered "not nice" to 3rd party libraries that still rely on
*A
APIs. Consider if a compile-time opt-out for this feature is enough or if it should be opt-in (for example to be enabled by the users if they know it won't break 3rd party stuff).The text was updated successfully, but these errors were encountered: