-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unicode parts of env are lost when running createProcess #152
Comments
withCEnvironment uses Foreign.C.String.withCString, which has this implementation:
getForeignEncoding is described as "The Unicode encoding of the current locale, but where undecodable bytes are replaced with their closest visual match." So I think that is what is stripping some characters (and perhaps changing others). Using getFileSystemEncoding instead should avoid the problem. I notice that its haddock actually says it's "used to decode and encode command line arguments and environment variables on non-Windows platforms". (emphasis mine) And indeed, System.Posix.Process.executeFile does use it to encode the environment (via System.Posix.Internals.withFilePath). |
My test case above is a little bit off, because the way "¡" is encoded causes withFilePath to throw an exception, "recoverEncode: invalid argument (invalid character)" (when LANG=C). (Similarly, this program crashes when LANG=C: A better/more realistic test case is this, where the unicode value is read in from the system in some way, and so ghc's encoding layer is able to get it encoded in a way that would normally round-trip back out.
This improved test case shows the bug in process, when run in a directory with unicode in its name:
And I've verified that this patch to process makes the test case work the same with LANG=C as it does with LANG set to a unicode locale.
|
I'm very much opposed to the environment-variable-locale-detection logic we use throughout base, so I'm sympathetic to this change. It's a bit of a surprise to me that "file system encoding" applies to environment variables like this, but your round trip test case is pretty convincing. My biggest concern would be regressions, or even changes in behavior. However, I'm having a hard time thinking of a use case that would be broken by this change. Can you think of anything? And just confirming, though it seems pretty clear: there is no Windows impact from this patch, right? I'm overall in favor of this change, with the caveat about changes in behavior above. If we can't come up with any such issues, I'd be happy to receive a PR. Final question: does this apply to command line arguments as well? |
Michael Snoyman wrote:
My biggest concern would be regressions, or even changes in behavior. However,
I'm having a hard time thinking of a use case that would be broken by this
change. Can you think of anything?
No, I can't think of anything.
And just confirming, though it seems pretty clear: there is no Windows impact
from this patch, right?
Windows is not impacted; withFilePath is ifdefed for Windows and that
branch is not affected. (Also System.Process.Posix is not used on
Windows) AIUI, Windows always uses unicode filenames.
I'm overall in favor of this change, with the caveat about changes in behavior
above. If we can't come up with any such issues, I'd be happy to receive a PR.
Final question: does this apply to command line arguments as well?
Arguments are already packed using withFilePath (so is the command name)
which avoids the issue affecting them.
…--
see shy jo
|
OK, then I'm on board with a PR for this if you'd like to submit one. |
In a non-unicode locale, such as LANG=C, running createProcess with an env that contains a unicode character, such as '¡' results in it being stripped out of the value that is seen by the child proccess; only the ascii characters remain.
A program demonstrating the bug is:
This program execs itself, so needs to be compiled, not run in ghci. On linux, built with process-1.6.5.0 and ghc-8.6.5, it behaves like this:
Using
strace -vf foo
shows that the non-ascii characters do not make it to exec:[pid 4335] execve("/home/joey/foo", ["/home/joey/foo"], ["FOO=foo!"] <unfinished ...>
This was discovered affecting a program that sets
GIT_INDEX_FILE
to a path when running git; if the path happens to contain unicode, this results in an corrupted path being passed to git.The text was updated successfully, but these errors were encountered: