Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial support for raw-dylib linkage #84171

Merged
merged 1 commit into from
Jun 6, 2021

Conversation

ricobbe
Copy link
Contributor

@ricobbe ricobbe commented Apr 13, 2021

First cut of functionality for issue #58713: add support for #[link(kind = "raw-dylib")] on extern blocks in lib crates compiled to .rlib files. Does not yet support #[link_name] attributes on functions, or the #[link_ordinal] attribute, or #[link(kind = "raw-dylib")] on extern blocks in bin crates; I intend to publish subsequent PRs to fill those gaps. It's also not yet clear whether this works for functions in extern "stdcall" blocks; I also intend to investigate that shortly and make any necessary changes as a follow-on PR.

This implementation calls out to an LLVM function to construct the actual .idata sections as temporary .lib files on disk and then links those into the generated .rlib.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @petrochenkov (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 13, 2021
@rust-log-analyzer

This comment has been minimized.

@petrochenkov
Copy link
Contributor

Previous attempt - #71497 @tinaun @est31.
Nice to see some progress here, I'll review this during the weekend, most likely.

@rust-log-analyzer

This comment has been minimized.

@kennykerr
Copy link
Contributor

I tested a build of this PR with the Windows crate it works wonderfully! 🎉

Here's the test branch.

@rust-log-analyzer

This comment has been minimized.

compiler/rustc_metadata/src/native_libs.rs Outdated Show resolved Hide resolved
compiler/rustc_metadata/src/native_libs.rs Outdated Show resolved Hide resolved
compiler/rustc_metadata/src/native_libs.rs Outdated Show resolved Hide resolved
compiler/rustc_codegen_ssa/src/lib.rs Outdated Show resolved Hide resolved
compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp Outdated Show resolved Hide resolved
compiler/rustc_codegen_llvm/src/back/archive.rs Outdated Show resolved Hide resolved
compiler/rustc_codegen_llvm/src/back/archive.rs Outdated Show resolved Hide resolved
compiler/rustc_codegen_llvm/src/back/archive.rs Outdated Show resolved Hide resolved
compiler/rustc_codegen_llvm/src/back/archive.rs Outdated Show resolved Hide resolved
compiler/rustc_codegen_llvm/src/back/archive.rs Outdated Show resolved Hide resolved
@petrochenkov
Copy link
Contributor

I also have a big question about the implementation strategy.

In this PR the imports are added to .rlib archives as regular object code (from a temporary .lib file).
(As a result the produced .rlib will be a mix of a static library and an import library, which some people find weird.)
They are also written to the rlib's metadata, but it should be unnecessary, the written metadata is never used.
It's not clear why a temporary file is used in this case, linker is not involved here in any way, so we should be able to add object code to the archive directly?

The alternative strategy is to delay generation of the temporary .lib file and its addition to .exe/.dll until the final linking step.
In that case the metadata of all rlibs in the dependency graph will indeed be used to produce the final import table and write it to a .lib file.
A temporary file will also make sense in this case because it will be fed to the linker which will add it to the final .exe/.dll.

Any specific reasons to choose the first strategy and not the second one?
What happens it two rlibs import the same symbol and then linked together into a single exe?

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 18, 2021
@ricobbe
Copy link
Contributor Author

ricobbe commented Apr 20, 2021

I also have a big question about the implementation strategy.

In this PR the imports are added to .rlib archives as regular object code (from a temporary .lib file).
(As a result the produced .rlib will be a mix of a static library and an import library, which some people find weird.)
They are also written to the rlib's metadata, but it should be unnecessary, the written metadata is never used.
It's not clear why a temporary file is used in this case, linker is not involved here in any way, so we should be able to add object code to the archive directly?

The use of a temporary file is primarily a consequence of my choice to use LLVM's function to generate the import library. The relevant function generates the import library and writes it to disk; I haven't yet found a way to perform the first action without the second without modifying LLVM. (An early look at the implementation of writeImportLibrary indicated pretty strongly that the implementation of the two actions are not easily separated.)

The alternative strategy is to delay generation of the temporary .lib file and its addition to .exe/.dll until the final linking step.
In that case the metadata of all rlibs in the dependency graph will indeed be used to produce the final import table and write it to a .lib file.
A temporary file will also make sense in this case because it will be fed to the linker which will add it to the final .exe/.dll.

Any specific reasons to choose the first strategy and not the second one?

Honestly, the possibility never occurred to me; I'm new enough to rustc that metadata wasn't really on my radar screen.

What happens it two rlibs import the same symbol and then linked together into a single exe?

Yes, that is a use case that I've been aware of; I'm going to investigate tomorrow to see what happens there. (I'm also curious about what happens if one does the analogous actions with C static libraries linked into a single exe or dll. The only thing I've heard about that so far is that "nobody does that," so I'm just going to have to try it myself.)

If I understand correctly, we won't be able to avoid the weird-hybrid-of-static-library-and-import-library in all cases, such as when someone compiles an rlib with raw-dylib externs with the --crate-type static-lib switch. I'm still happy to investigate the metadata-based solution, however.

@rust-log-analyzer

This comment has been minimized.

@retep998
Copy link
Member

What happens it two rlibs import the same symbol and then linked together into a single exe?

As described in the RFC, the generated import library should map from the Rust mangled name to a dll + symbol tuple. Thus, if there are two places that both link to the same symbol but from different dlls, they should each correctly resolve to their specific dll as there will be no ambiguity between the mangled names. This is different from the traditional linker model where import libraries map from the unmangled (but still possibly calling convention decorated) name to a dll + symbol tuple, which would be ambiguous and result in linker shenanigans.

@petrochenkov
Copy link
Contributor

petrochenkov commented Apr 20, 2021

@retep998
My question was about this specific implementation (idata bundled into rlibs) rather than about language behavior, from the language point of view it should "just work", I don't disagree with that.

@petrochenkov
Copy link
Contributor

petrochenkov commented Apr 20, 2021

@ricobbe
The current solution with bundling idata to rlibs (*) may potentially be better for performance in some cases.

For example, if we build several dlls and executables using the same rlib with imports, then the idata will be generated only once and then reused.
In my alternative we'd have to do this work again and again for each final binary.
Although I have no idea how costly the idata generation is, maybe it's very cheap then this whole argument is moot.

(*) Assuming it works in general, merging idata from different rlibs work, mixed static+import libraries don't confuse mingw linker, etc.

UPD: We could potentially use the bundle modifier (https://github.com/rust-lang/rfcs/blob/master/text/2951-native-link-modifiers.md#bundle, #83507) to implement both strategies and switch between them if necessary.

@petrochenkov
Copy link
Contributor

Honestly, the possibility never occurred to me; I'm new enough to rustc that metadata wasn't really on my radar screen.

Taking the metadata from all rlibs and using them during the final linking step is how other linking kinds work (like dynamic or static-nobundle).
Native libraries in those cases are actually linked in only when producing the final executable or dynamic library, before that they exist only as metadata records in rlibs.

@ricobbe
Copy link
Contributor Author

ricobbe commented Apr 21, 2021

@petrochenkov

The current solution with bundling idata to rlibs (*) may potentially be better for performance in some cases.

For example, if we build several dlls and executables using the same rlib with imports, then the idata will be generated only once and then reused.
In my alternative we'd have to do this work again and again for each final binary.
Although I have no idea how costly the idata generation is, maybe it's very cheap then this whole argument is moot.

(*) Assuming it works in general, merging idata from different rlibs work, mixed static+import libraries don't confuse mingw linker, etc.

UPD: We could potentially use the bundle modifier (https://github.com/rust-lang/rfcs/blob/master/text/2951-native-link-modifiers.md#bundle, #83507) to implement both strategies and switch between them if necessary.

Assuming the current solution works in at least some cases and doesn't completely break mingw, do you see a resolution to this question as necessary in order to complete this PR, or is this something that we could possibly address as follow-on work? If possible, I'd like to get the current implementation in (subject to caveats above, of course), so we could get some experience with the feature while continuing to discuss this issue. (We've got some folks here who are really ready to start using this functionality, so there are definitely opportunities for feedback.)

It may be worth considering that the mingw platform does seem to be having some difficulties with this solution, although I haven't yet determined if this is due to an incompatibility between this implementation technique and the mingw linker, or if it's just because I haven't yet got my windows-gnu toolchain set up correctly. I intend to continue investigating that question.

@petrochenkov
Copy link
Contributor

@ricobbe

do you see a resolution to this question as necessary in order to complete this PR, or is this something that we could possibly address as follow-on work?

No, not necessary, can be done as a follow up work.

@ricobbe
Copy link
Contributor Author

ricobbe commented May 3, 2021

Status update: the current implementation doesn't work with the windows-gnu toolchain. The test succeeds in creating the .exe file, but it AVs on execution, and the failure seems to occur when it tries to call one of the functions imported from a .dll. I've been working with @mati865 to address this, but it's a slow process (mostly because the time difference makes for high-latency communcation).

The case where multiple .rlibs import functions from the same raw-dylib DLL and are linked into the same final EXE does work as intended with the current implementation on windows-msvc (and, IIRC, the linker does remove any duplicate symbols from the final import library), but I haven't yet been able to test this on windows-gnu.

So my plan at this point is to switch to the implementation strategy suggested above: adding raw-dylib import information to .rlib metadata, and deferring the actual generation of the import library until we're linking the .exe, .dll, static lib, or generally anything that isn't an .rlib. @petrochenkov, you had some concerns about the performance consequences of this decision, because we'd be generating the import libraries potentially many times for a given rlib; do you think I need to investigate that further? I don't have hard performance numbers on that, but all of my experience so far is that generating the import libraries does not take much time.

@mati865
Copy link
Contributor

mati865 commented May 3, 2021

I'm not sure if that switch will actually help. Like I said in the private message LLVM generates MSVC style import libraries which are poorly supported by BFD:

Ah, those are MSVC style import libraries.
We are surely hitting https://sourceware.org/bugzilla/show_bug.cgi?id=25541 and maybe https://sourceware.org/bugzilla/show_bug.cgi?id=25374. Just confirmed it works fine with LLD when targetting windows-gnu.

@petrochenkov
Copy link
Contributor

petrochenkov commented May 4, 2021

I'm not sure if that switch will actually help.

Well, at least it will avoid mixed import-static libraries (*), which is one of the problems.

LLVM generates MSVC style import libraries

Are there multiple styles of import libraries? o_O
What are the differences between them?

Can rustc generate the same style as used in https://github.com/retep998/winapi-rs/tree/0.3/x86_64/lib, because that style is something that is known to work.

(Also, is there a style that works for both msvc and gnu targets?)


(*) Modulo the staticlib case, where there's no other choice, but in that case the mixed library is not an intermediate result like rlib, but a final product, so consuming it is not our problem.

@ChrisDenton
Copy link
Member

Are there multiple styles of import libraries? o_O

According to the spec there are "short" and "long" styles. The short format provides only the information needed for the linker to construct the .idata section. The long format provides actual object files with .idata sections.

A quick look at winapi suggests they create .def files and pass this to a tool that generates the .a libraries. The generated libraries you linked to seem to be using the long format.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jun 1, 2021
@ricobbe
Copy link
Contributor Author

ricobbe commented Jun 1, 2021

I just pushed a fix for the failure on i686; I figured I'd squash commits again after dealing with any review feedback for the change.

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 2, 2021
@mstorsjo
Copy link

mstorsjo commented Jun 4, 2021

LLVM generates MSVC style import libraries

Are there multiple styles of import libraries? o_O
What are the differences between them?

Can rustc generate the same style as used in https://github.com/retep998/winapi-rs/tree/0.3/x86_64/lib, because that style is something that is known to work.

(Also, is there a style that works for both msvc and gnu targets?)

FWIW, I made an attempt at a patch for LLVM to make it possible to produce GNU style long import libraries at some point, but I never finished and upstreamed it as my code left like a big mess, but if necessary I could try to dig it up.

Overall, LLD should work with both formats. Binutils ld.bfd kinda works with the MSVC style but has known issues if trying to link more than one such library. MSVC link.exe kinda works with the GNU style, but there's vague gotchas (e.g. if you use an import library created as the output of -Wl,--out-implib from ld.bfd, link.exe often runs into problems if linking more than one such library, but the ones created by binutild dlltool often works better IIRC).

With the rust generated internal import libraries, would you still end up linking against import libs from the SDK, or would it only use the generated ones? If linking against two different import libraries for the same DLL, you might run into problems - at least for GNU style import libraries. Or maybe it'd work fine and you'd just have two separate DLL import entries for the same DLL name?

Oh, that is probably due to differences in how name mangling works on Windows. Looks like lld-link will try to mangle the name before calling createImportLibrary. Makes sense, as one needs to put the decorated/mangled version of the name in the .def if you were creating it by hand. So we need to appropriately mangle the symbol names on x86.

Edit: Actually, you should just be able to specify the symbol names undecorated in a def file. Unsure how to make LLVM output the equivalent here.

There's indeed a vague difference between GNU tools there; with e.g. binutils dlltool (where you just create an import library from a def file and nothing else, and using the option -k) you'd provide the symbol with the trailing stdcall decoration.

When LLD links a DLL with stdcall symbols, the def file contains the symbol names without trailing decorations. After completing linking and finding all actual symbols, it does some amount of fixup, e.g. if the def file specifies to export the symbol MyFunc but no such symbol exists (or _MyFunc to be more precise), it looks for a corresponding stdcall decorated _MyFunc@<n> - that symbol name then ends up in the import library.

But without actual symbols, there's nowhere to deduce the suffix from, so then only the dlltool style behaviour with the def file containing the decorated form would work.

@bors

This comment has been minimized.

…k(kind = "raw-dylib")].

This does not yet support #[link_name] attributes on functions, the #[link_ordinal]
attribute, #[link(kind = "raw-dylib")] on extern blocks in bin crates, or
stdcall functions on 32-bit x86.
@ricobbe
Copy link
Contributor Author

ricobbe commented Jun 5, 2021

@mstorsjo

With the rust generated internal import libraries, would you still end up linking against import libs from the SDK, or would it only use the generated ones?

That depends on whether you specify kind = "dylib" or kind = "raw-dylib" in the #[link(...)] attribute, I would think. I'm not yet sure how things would work to link against both a rustc-generated import library and a 3rd-party import library.

@ricobbe
Copy link
Contributor Author

ricobbe commented Jun 5, 2021

@rustbot label -S-waiting-on-author +S-waiting-on-review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jun 5, 2021
@petrochenkov
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Jun 5, 2021

📌 Commit 6aa45b7 has been approved by petrochenkov

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 5, 2021
@bors
Copy link
Contributor

bors commented Jun 6, 2021

⌛ Testing commit 6aa45b7 with merge 9a57617...

@bors
Copy link
Contributor

bors commented Jun 6, 2021

☀️ Test successful - checks-actions
Approved by: petrochenkov
Pushing 9a57617 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 6, 2021
@bors bors merged commit 9a57617 into rust-lang:master Jun 6, 2021
@rustbot rustbot added this to the 1.54.0 milestone Jun 6, 2021
@ricobbe ricobbe deleted the raw-dylib-via-llvm branch June 7, 2021 19:03
@crlf0710 crlf0710 added the F-raw_dylib `#![feature(raw_dylib)]` label Jun 12, 2021
bjorn3 pushed a commit to bjorn3/rust that referenced this pull request Jul 7, 2021
…enkov

Partial support for raw-dylib linkage

First cut of functionality for issue rust-lang#58713: add support for `#[link(kind = "raw-dylib")]` on `extern` blocks in lib crates compiled to .rlib files.  Does not yet support `#[link_name]` attributes on functions, or the `#[link_ordinal]` attribute, or `#[link(kind = "raw-dylib")]` on `extern` blocks in bin crates; I intend to publish subsequent PRs to fill those gaps.  It's also not yet clear whether this works for functions in `extern "stdcall"` blocks; I also intend to investigate that shortly and make any necessary changes as a follow-on PR.

This implementation calls out to an LLVM function to construct the actual `.idata` sections as temporary `.lib` files on disk and then links those into the generated .rlib.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-raw_dylib `#![feature(raw_dylib)]` merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.