-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] Fix Issue 23179 - Unicode in symbol names in DLLs breaks MSVC linker #14207
Conversation
Thanks for your pull request and interest in making D better, @rikkimax! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla references
Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "stable + dmd#14207" |
I would reenable runnable/testmodule.d, but yeah that'll be reverted the disabling so it will just create issues. |
The freebsd failure on auto-tester may not be something to do with this (std.stdio erroring with closing file). It won't matter regardless if this is Windows only (which I'll add the code for later). |
Alternatively instead of hex encoding Punycode would be a valid transformation. It would be semi-understandable if it contains ascii, but that is a lot more work to implement than to/from hex. |
|
||
version(Windows) { | ||
import std.algorithm : canFind; | ||
static assert(!哪里.mangleof.canFind("哪里")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not test the full mangle instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of the mangle is irrelevant to the test, which would mean that unrelated parts to it if changed will cause this to error. This isn't ideal, but neither is the phobos import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to say that this would prevent to import phobos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a spec update and removal of phobos (and a rebase)
I'm waiting to hear that the general change is good before doing other PR's to update spec and the demangler. I know it should only be turned on for Windows and extern(D). So I'll add the test change to next commit with all that. |
I don't see anything wrong with it, but you might want to make a forum post about it and see what people think. |
19b059f
to
464200c
Compare
I think on my end this is good to go, @WalterBright, @ibuclaw, @kinke. Next step is N.G. post, before I do the demangler and spec update PR's. |
I don't think this issue is very important in general, at least with reverted DMD ModuleInfos exports on Windows. Plus, it works with LLD, so it's not a total blocker. So I don't think some not-fully-analyzed MS linker issue (e.g., maybe it supports wide versions of linker directives?) is enough to pessimize D mangling in general.
|
FWIW, LDC/LLVM emits this in asm:
So actually, LDC emits an |
I did some more digging into why Rust went with Punycode as they did. It isn't just Windows that doesn't support it, it turns out. Very interestingly Unicode has an opinion on what an identifier should be in a programming language TR31, which is now pretty standardized. Swift is using Punycode too. https://github.com/apple/swift/blob/main/docs/ABI/Mangling.rst#identifiers From what I can see, we are very much going at it alone by supporting Unicode in symbol names. It's a bit of a surprise that we haven't hit this before now. EDIT: I couldn't get a c++ compiler to not emit Unicode, which is interesting. |
At least wasn't just Windows back in 2013. ;) - I'd say keeping Unicode as-is in mangled names is nice and preferable for readability, and an obfuscating encoding scheme only a workaround that we might not really need. |
I've updated the bug report with my proposed patch. In favor of waiting for it to appear in the wild before fixing it. |
This is my attempt at fixing 23179, as it seems Microsoft does not intend for symbols to contain anything but ANSI.
So this introduces hex encoding for Unicode identifiers.