Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Fix Issue 23179 - Unicode in symbol names in DLLs breaks MSVC linker #14207

Closed
wants to merge 1 commit into from

Conversation

rikkimax
Copy link
Contributor

This is my attempt at fixing 23179, as it seems Microsoft does not intend for symbols to contain anything but ANSI.

So this introduces hex encoding for Unicode identifiers.

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @rikkimax! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Auto-close Bugzilla Severity Description
23179 blocker Unicode in symbol names in DLLs breaks MSVC linker

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "stable + dmd#14207"

@rikkimax
Copy link
Contributor Author

I would reenable runnable/testmodule.d, but yeah that'll be reverted the disabling so it will just create issues.

@rikkimax
Copy link
Contributor Author

rikkimax commented Jun 12, 2022

The freebsd failure on auto-tester may not be something to do with this (std.stdio erroring with closing file).

It won't matter regardless if this is Windows only (which I'll add the code for later).

@rikkimax
Copy link
Contributor Author

Alternatively instead of hex encoding Punycode would be a valid transformation. It would be semi-understandable if it contains ascii, but that is a lot more work to implement than to/from hex.


version(Windows) {
import std.algorithm : canFind;
static assert(!哪里.mangleof.canFind("哪里"));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not test the full mangle instead ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of the mangle is irrelevant to the test, which would mean that unrelated parts to it if changed will cause this to error. This isn't ideal, but neither is the phobos import.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to say that this would prevent to import phobos.

Copy link
Contributor

@thewilsonator thewilsonator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a spec update and removal of phobos (and a rebase)

@thewilsonator thewilsonator added the Review:Needs Spec PR A PR updating the language specification needs to be submitted to dlang.org label Jun 13, 2022
@rikkimax
Copy link
Contributor Author

Needs a spec update and removal of phobos (and a rebase)

I'm waiting to hear that the general change is good before doing other PR's to update spec and the demangler.

I know it should only be turned on for Windows and extern(D). So I'll add the test change to next commit with all that.

@thewilsonator
Copy link
Contributor

I don't see anything wrong with it, but you might want to make a forum post about it and see what people think.

@rikkimax
Copy link
Contributor Author

I think on my end this is good to go, @WalterBright, @ibuclaw, @kinke.
It would be good to get you three to confirm that it won't lead to surprises.

Next step is N.G. post, before I do the demangler and spec update PR's.

@kinke
Copy link
Contributor

kinke commented Jun 13, 2022

I don't think this issue is very important in general, at least with reverted DMD ModuleInfos exports on Windows. Plus, it works with LLD, so it's not a total blocker. So I don't think some not-fully-analyzed MS linker issue (e.g., maybe it supports wide versions of linker directives?) is enough to pessimize D mangling in general.

GetProcAddress() not supporting a wide string is IMO a bit surprising, but I'd say users currently do and will have to cope with such an OS limitation when linking manually at runtime (works fine with implicit linking via import libs). I haven't heard any complaints in that regard yet, plus with D, there are many ways to work around this.

@kinke
Copy link
Contributor

kinke commented Jun 13, 2022

FWIW, LDC/LLVM emits this in asm:

.section	.drectve,"yn"
.ascii	" /EXPORT:\"_D3run17unicode_06_\345\223\252\351\207\2146\345\223\252\351\207\214FiZi\""
.ascii	" /EXPORT:\"_D3run17unicode_06_\345\223\252\351\207\21412__ModuleInfoZ\",DATA"
.ascii	" /INCLUDE:\"_D3run17unicode_06_\345\223\252\351\207\21411__moduleRefZ\""

So actually, LDC emits an /INCLUDE directive for each non-betterC module on Windows. So using non-ANSI in a module identifier should currently always fail with the MS linker. I don't remember anyone complaining.

@rikkimax
Copy link
Contributor Author

rikkimax commented Jun 13, 2022

I did some more digging into why Rust went with Punycode as they did.

It isn't just Windows that doesn't support it, it turns out.

rust-lang/rust#7539

Very interestingly Unicode has an opinion on what an identifier should be in a programming language TR31, which is now pretty standardized.

Swift is using Punycode too. https://github.com/apple/swift/blob/main/docs/ABI/Mangling.rst#identifiers

From what I can see, we are very much going at it alone by supporting Unicode in symbol names. It's a bit of a surprise that we haven't hit this before now.

EDIT: I couldn't get a c++ compiler to not emit Unicode, which is interesting.

@kinke
Copy link
Contributor

kinke commented Jun 13, 2022

It isn't just Windows that doesn't support it, it turns out.

At least wasn't just Windows back in 2013. ;) - I'd say keeping Unicode as-is in mangled names is nice and preferable for readability, and an obfuscating encoding scheme only a workaround that we might not really need.

@rikkimax
Copy link
Contributor Author

I've updated the bug report with my proposed patch. In favor of waiting for it to appear in the wild before fixing it.

@rikkimax rikkimax closed this Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Review:Needs Spec PR A PR updating the language specification needs to be submitted to dlang.org Severity:Bug Fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants