Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using compiler-builtins with FFI, linking with libc? #345

Open
apullin opened this issue Feb 29, 2020 · 10 comments
Open

Using compiler-builtins with FFI, linking with libc? #345

apullin opened this issue Feb 29, 2020 · 10 comments

Comments

@apullin
Copy link

apullin commented Feb 29, 2020

I am having some issues using a rust staticlib that is being called from C via FFI, being built for an embedded target (Cortex M). This issue is sort-of a result of needing this library, not necessarily an error in the code.

To get the linking step to succeed, I have to link the rust staticlib in last. If not, I will get link errors for multiple definitions of memset, where one implementation is providing the unmangled symbol from this compiler-builtins crate.

Linking the rustlib last does resolve that. However, it causes all uses of memset in the C code to be linked to the compiler-builtins implementations.

Even beyond that, using a softfp MCU, even FP multiple like float a=b*c will end up linking the __aebi_fmul to the Rust implementation!

Example of how invoke linking:
arm-none-eabi-gcc -mcpu=cortex-m0plus -mthumb -specs=nano.specs -specs=nosys.specs -TSTM32L072CZEx_FLASH.ld -lnosys -Wl,-Map=target/double.map,--cref -Wl,--gc-sections target/startup_stm32l072xx.o target/stm32l0xx_it.o target/main.o target/thumbv6m-none-eabi/debug/libdouble_input.a -o target/double

(where the name "double" is because I am using the libdouble_input from FFI examples and adding a few other tests: https://github.com/alexcrichton/rust-ffi-examples/tree/master/c-to-rust )

Is this a known limitation?
Is there a workaround to avoid that level of hybridization of the C codes into the rust code, given the overlapping implementations?

As far as I understand the notes in the README, that does not cover this case for the behavior I am trying to get.

I have tried partially linking all the C code first with -lc, and it does pick up the libc implementations of memcpy, float multiply, etc, but then the final linking step fails becasue there are still two memset objects in the output files.

@alexcrichton
Copy link
Member

Yeah unfortunately there's a few things in play here which makes this situtation a bit unfortunate:

  • We can't pick Rust-specific names for all our intrinsics. The names of intrinsics are hardcoded into LLVM (AFAIK) so we can't have our own separate namespace from the standard system compiler builtins. This means that we inevitably have to deal with the clashing problem at some point.
  • Most system compiler builtins have one function per object file, but the compiler-builtins crate in Rust does not do that. This gives rise to the duplicate symbol error you're likely seeing. The problem here is that your project may require a compiler-builtins Rust object file due to one symbol, and then that object file might also have a symbol like memcpy which was already pulled in from elsewhere (e.g. libc.a). This then clashes and causes the linker to fail.

Unfortunately I don't really know of a great workaround for this. The "best" thing to do would be to somehow excise the compiler-builtins objects out of the Rust object file, and then hope the system compiler-builtins has enough intrinsics for the Rust code you're building. In the limit it doesn't, but for most projects it should suffice.

@apullin
Copy link
Author

apullin commented Mar 2, 2020

The second point is exactly the case:
compiler-builtins crate provides a memset that is always included in the output staticlib, so it collides with libc memset even when memset is not used in the rust code.

I am not quite sure why __aebi_fmul does not cause the same collision, but it also ends up getting linked from only one place, which has to be Rust due to the link-Rust-last solution.

Ultimately, we may not want to excise anything: keep the compiler-builtins implementations for the Rust statiblic, and let the C code get its implementations from libc, so there is no subtle FFI going on.

After doing some tinkering, I was able to do:
arm-none-eabi-objcopy -L memset target/thumbv6m-none-eabi/debug/libdouble_input.a
then re-run the linking with -lc before any of the object files, which resulted in picking up the libc memset implementation.
It stands to reason that perhaps this could be done for all the symbols provided by compiler-builtin, so they will not be exported to any C code.

@alexcrichton
Copy link
Member

I think that makes sense yeah. I also don't know what's going on with __aebi_fmul and whatnot, but -L looks like it's localizing a symbol which probably means it doesn't participate in linkage at all, which is sort of what you want in this case because you don't want the Rust version of memcpy/memset/etc/ to get linked in.

@archseer
Copy link

archseer commented Aug 4, 2020

I'm having a similar issue on embedded where we produce a static library then link it into an embedded project (TI RTOS). We ran into issues because TI's compiler would produce unaligned reads that would work with their memcpy, but crash when the implementation they provide got replaced with compiler-builtins memcpy.

Their libc.a implementation already provides everything compiler-builtins does so it would be great if we could opt out of including it in the static library.

I've been working around it by modifying the built archive to remove all of the compiler-builtins object files:

arm-none-eabi-ar t libfoo.a | grep compiler_builtins | xargs -n 15 -I % arm-none-eabi-ar dv libfoo.a %

It's sort of slow to do this on every build though, it takes about 27s to strip all the compiler_builtins*.o files.

@apullin
Copy link
Author

apullin commented Oct 22, 2021

Some follow-up on this:

It appears that the cleanest solution for this might be to first do:
cargo install cargo-binutils
so then you can do:
cargo objcopy --release --lib -- --weaken

This appears to make a rust staticlib link properly in C static linking.

It looks like objcopy does not have an option to localize all symbols into the library (maybe there is a compile-time flag for this?), but weakening them in favor of the libc implementation makes sense, and if the behavior I was looking for.

As far as I can tell, this collision is unavoidable - for static linking, even global "hidden" symbols will still collide with libc builtins at link time.
I am a little surprised it has not come up with FFI for x86/amd64 systems, since it is not unheard of for people to target monolithic statically-linked executables even for "host" systems....

One other oddity to tack on here:
It looks like rustc/cargo will emit an archive .a file that includes a pathname with a slash in it,bin/thumbv7em-none-eabihf.o
but, GNU binutils ar cannot handle such a pathname. It appears to be undefined behavior for the archive format.
But llvm does appear to define this behavior and support the pathname with a slash:
https://sourceware.org/bugzilla/show_bug.cgi?id=28485#:~:text=https%3A//releases.llvm.org/2.8/docs/CommandGuide/html/llvm-ar.html%23%3A~%3Atext%3DThe%2520path%2520name%2520is%2520null%2520terminated%2520and%2520may%2520contain%2520the%2520slash%2520(/)%2520character
(and since an llvm tool did build the archive via cargo)

@Amanieu
Copy link
Member

Amanieu commented Oct 23, 2021

The proper way to deal with this is to make sure each function in compiler-builtins is in a separate object file. We already have hacks in place to do this.

This works because linkers will only pull in an object file from an archive if it has a symbol that is needed by the compilation. If memset is already provided by libc then the memset.o from compiler builtins will not be pulled in by the linker.

@samueljero
Copy link

samueljero commented Nov 22, 2021

I also ran into this issue. Unfortunately, I needed some symbols from compiler_builtins. I'm assuming this is due to Rust using LLVM and that I'm compiling my C code with gcc.

Anyway, my solution was to pull the library .a apart, make all symbols from compiler_builtins weak using objcopy, then put the .a back together.

mkdir -p tmp/
cd tmp
$ARCH-ar x lib.a
for f in compiler_builtins-*;
do
$ARCH-objcopy --weaken $f
done
$ARCH-ar cr lib.a *
cd ..
rm -r tmp

Would be great to have a better fix.

@apullin
Copy link
Author

apullin commented Nov 23, 2021

@Amanieu Are your "hacks" in your own C-side build system?

I am encountering this problem when trying to do a pretty straightforward linking of a rustlib .a to a C program, follow the path of CMake -> external command to run cargo -> target the .a as an external library, then link it into the C executable being built, there gcc is the final linker there.

Fortunately, the cargo objcopy solution I eventually found above makes it fairly clean, and can just be added to the rust staticlib build step.

As far as I know (and please correct me if wrong): there just is no solution for a global label collision at link-time.
So weakening or omitting symbols MUST be done.

I can only guess that truly fixing this would require a declaration of "C compiler builtins" that are extern pub fn, and let them remain as unresolved symbols to be filled in by libc at C-linking time?

Could that be accomplished with a feature flag for this crate?
Would/could it require a lang_items addition?
Possibly a separate crate that the user's staticlib project could explicitly depend upon?

@Amanieu
Copy link
Member

Amanieu commented Nov 26, 2021

See the comment here which explains how we ensure the intrinsics are split into separate object files internally:

// This is the final catch-all rule. At this point we generate an
// intrinsic with a conditional `#[no_mangle]` directive to avoid
// interfering with duplicate symbols and whatnot during testing.
//
// The implementation is placed in a separate module, to take advantage
// of the fact that rustc partitions functions into code generation
// units based on module they are defined in. As a result we will have
// a separate object file for each intrinsic. For further details see
// corresponding PR in rustc https://github.com/rust-lang/rust/pull/70846
//
// After the intrinsic is defined we just continue with the rest of the
// input we were given.

The issue is that the intrinsics! macro isn't used everywhere. In particular, I believe the memory builtins are missing it.

@t-moe
Copy link

t-moe commented Sep 26, 2023

Any updates on this? I run into the same problem today?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants