-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Globalization: ToolTask batch file encoding does not respect UTF-8 encoding #5724
Comments
This looks similar to #4870 |
A few extra details: The reason why Exec/ToolTask saves the batch file using the OEM encoding is that it detects that all characters can be encoded using this encoding, since there is an The fix may be to have I also want to point out that this is a world readiness issue since the message will be printed correctly or not depending on the OEM encoding (hence OS language), and it surfaces in the use of |
We can't use |
Then you can P/Invoke for |
This came very late, but I could break this repro by switching |
@jgoshi checking if you've had a chance to review Forgind's comment |
@marcpopMSFT Do you mean if I've had a chance to try out to see if this bug is fixed: If I change all appearances of StdOutEncoding="UTF-8" to UseUtf8Encoding="ALWAYS" in my projects? I haven't had a chance to try that out yet. I'll try to get access to a Japanese OS machine. @MrTrillian do you see any issues with this approach (if it works)? It would mean we'd need to change the Kitware CMake code to emit UseUtf8Encoding="ALWAYS" instead of what they currently do (emit StdOutEncoding="UTF-8"). |
I've just spent an hour trying to page back in the subtleties of this. I think the answer is that in the original bug, we're not using msbuild/src/Utilities/ToolTask.cs Line 1376 in 74cbf25
There might be other reasons why it would be problematic even if we had |
I can make something to add a UseUtf8Encoding property to ToolTask, have it default to detect, and let you overwrite it to always, then use it on the line you linked. Then we can try to test it. If that doesn't work for some reason, I'd say it's time to formally get a permanent exception. I think we've both forgotten about this bug and spent time trying to remember details more than once at this point. How does that sound? |
https://github.com/Forgind/msbuild/tree/UseUtf8Encoding has the change to permit specifying UseUtf8Encoding in a ToolTask. |
If I recall, the problem is this: If |
Could you save it off, call MSBuild, and restore it? I remember trying to do something like that in MSBuild, and it hit a snag because there wasn't anything to attach a code page to, but presumably you could do it? Also, I'm a little surprised, since I thought we spun up a new process for executing the code page work, hence that you changing the code page to 65001 didn't affect whether MSBuild used utf-8 or not. I may be misremembering that part. |
In this scenario, there is no Visual Studio involved to save off and restore the chcp. Anyone can use cmake.exe directly from the command line. New processes don't guarantee isolation from code page changes, it depends on how the new processes are created. By default, they will share the console with their parent, and the console is what holds the code page state. msbuild probably needs child processes to share the console with their parent so they can output to it. |
I wasn't necessarily thinking of saving it in association with anything in particular. If it were C#, I'd just assign a random variable to the code page and call chcp at the end. From the command line, maybe an environment variable? I'm assuming you wouldn't have to do this for multiple different code pages in a single build. |
We can't implement this because anyone (say, my uncle) could call |
Issue Description
ToolTask supports setting the StandardOutputEncoding but does not work properly for UTF-8 encoding.
Steps to Reproduce
BatchEncodingBug.zip
a. chcp 65001
b. MSBuild.exe BatchEncodingBug.vcxproj /t:TestExec
Expected Behavior
The output test should correctly display Japanese characters.
Actual Behavior
Some of the text is garbled.
Analysis
If you look at the provided vcxproj file, we use the task and set the output encoding to UTF-8. Because (in the repro steps) we ran "chcp 65001" we should be using UTF-8. When when msbuild runs the Exec, the batch file it generates is saved in ANSI instead of UTF-8. After running msbuild you'll see "file.cmd" which is a copy of the msbuild generated batch file and can verify that it is in ANSI.
This is coming from this file:
https://github.com/dotnet/msbuild/blob/f2c4bfd563f559daca27ea1cd8ae40db24e2e7cd/src/Utilities/ToolTask.cs
Look for EncodingUtilities.BatchFileEncoding. Ideally msbuild should respect the console code page.
The text was updated successfully, but these errors were encountered: