Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#[no_mangle] is unsafe in the presence of name collisions #28179

Closed
geofft opened this issue Sep 2, 2015 · 32 comments
Closed

#[no_mangle] is unsafe in the presence of name collisions #28179

geofft opened this issue Sep 2, 2015 · 32 comments
Labels
A-linkage Area: linking into static, shared libraries and binaries C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@geofft
Copy link
Contributor

geofft commented Sep 2, 2015

On some platforms (at least GNU/Linux, but I hear Windows and several others too), if you link together two static libraries that both export a symbol of the same name, it's undefined which symbol actually gets linked. In practice on my machine, the first library seems to win. This lets you defeat type/memory safety without the unsafe keyword, by having two crates export a #[no_mangle] pub fn with different signatures but compatible calling conventions:

// one.rs
#![crate_type = "lib"]

#[no_mangle]
pub fn convert(x: &'static i32) -> Result<i32, f32> {
    Ok(*x)
}
// two.rs
#![crate_type = "lib"]

#[no_mangle]
pub fn convert(x: &'static f32) -> Result<i32, f32> {
    Err(*x)
}
// transmute.rs
extern crate one;
extern crate two;             

fn main() {
    static X: f32 = 3.14;
    let y: i32 = two::convert(&X).unwrap();
    println!("{}", y);
}
geofft@titan:/tmp/transmute$ rustc one.rs
geofft@titan:/tmp/transmute$ rustc two.rs
geofft@titan:/tmp/transmute$ rustc transmute.rs -L .
geofft@titan:/tmp/transmute$ ./transmute 
1078523331

Despite the stated call to two::convert, it's actually one::convert that gets called, which interprets the argument as a &'static i32. (It may be clearer to understand with this simpler example, which doesn't break type safety.)

On at least GNU/Linux but not other platforms like Windows or Darwin, dynamically-linked symbols have the same ambiguity.

I don't know what the right response is here. The following options all seem pretty reasonable:

  1. Acknowledge it and ignore it. Maybe document it as a possible source of unsafety, despite not using the unsafe keyword.
  2. Have #[no_mangle] export both a mangled and un-mangled name, and have Rust crates call each other via mangled names only, on the grounds that #[no_mangle] is for external interfaces, not for crates linking to crates. ("External interfaces" includes other Rust code using FFI, but FFI is unsafe.) This is analogous to how extern "C" fns export both a Rust-ABI symbol as well as a C-ABI shim, and a direct, safe Rust call to those function happens through the Rust ABI, not through the C ABI. I'm pretty sure that all production uses of #[no_mangle] are extern "C", anyway (see e.g. Can't define unsafe extern "C" fn #10025).
  3. Deprecate #[no_mangle] on safe functions and data, and introduce a new #[unsafe_no_mangle], so it substring-matches unsafe. (#[no_mangle] on unsafe functions or mutable statics is fine, since you need the unsafe keyword to get at them.)

All of these are, I think, backwards-compatible.

@alexcrichton
Copy link
Member

I think this even leads to a linker error on some platforms (depending on the situation), so in that sense I'd also be fine just having a set of all #[no_mangle] symbols for each crate and if they overlap we generate a hard compile time error. While strictly not backwards compatible I believe it's practically fine and it's basically what most crates what anyway

@cuviper
Copy link
Member

cuviper commented Sep 3, 2015

You can #[no_mangle] override an existing mangled function too:

fn main() {
  println!("ok")
}

#[no_mangle]
#[allow(non_snake_case)]
pub fn _ZN2io5stdio6_print20h94cd0587c9a534faX3gE() {
    unreachable!()
}

This makes the playpen timeout on stable 1.2: http://is.gd/sxdUl3

@geofft
Copy link
Contributor Author

geofft commented Sep 3, 2015

While strictly not backwards compatible I believe it's practically fine

Yeah, I'd argue that having multiply-defined #[no_mangle] symbols is, in essence, undefined behavior (in the sense that nobody defined it), so making it a hard error seems fine.

@nagisa
Copy link
Member

nagisa commented Sep 3, 2015

I doubt there’s anything sensible (other than writing our own linker) that can be done here to truly fix the issue.

I’m not against the idea of making less of these cases possible, though. e.g.

#[no_mangle]
fn main(){}

doesn’t work already. But most of the proposed solutions (exception being keeping more symbols in memory) sound like a non-option to me.

@arielb1
Copy link
Contributor

arielb1 commented Sep 8, 2015

@nagisa

#[no_mangle] will always be always unsafe as you can use it to hijack C library functions, .e.g make malloc do something evil.

@nagisa
Copy link
Member

nagisa commented Sep 8, 2015

@arielb1 i didn’t say anything about safety; only about our capability to do anything about this issue.

Either way, if you have to be wary of code you control unknowingly hijacking other code or doing evil things, something went very wrong where Rust (or any language, really) can’t help.

@arielb1
Copy link
Contributor

arielb1 commented Sep 15, 2015

@nagisa

JS is designed for that use-case. Rust is theoretically supposed to fill that role too, but both rustc and the type-system are way not sufficiently trustworthy for that.

@bstrie bstrie added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-nominated labels Oct 19, 2016
@bstrie
Copy link
Contributor

bstrie commented Oct 19, 2016

I agree that unmangled symbol collisions should be a compilation error. Breakage here is extraordinarily unlikely, and the only code that it could break is code that won't be working as intended anyway, which is exactly the sort of the code that we ought to break.

@nikomatsakis nikomatsakis added T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 20, 2016
@nikomatsakis
Copy link
Contributor

Reassigning to lang team. What a pain.

@nikomatsakis
Copy link
Contributor

triage: P-low

Seems like a real problem, if not that concerning. @rust-lang/lang feels that the fix proposed by @alexcrichton in this comment is the correct one. Check at link time for duplicate #[no_mangle] symbols in rust code; don't try to solve similar problems that might occur with C code and other stuff out of our purview.

@rust-highfive rust-highfive added P-low Low priority and removed I-nominated labels Nov 3, 2016
@steveklabnik steveklabnik added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed A-compiler labels Mar 24, 2017
@Mark-Simulacrum Mark-Simulacrum added the C-bug Category: This is a bug. label Jul 24, 2017
@bstrie bstrie added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label Jan 24, 2018
@vi
Copy link
Contributor

vi commented Jan 27, 2018

Why is calling #[no_mangle] functions safe in the first place?

Also #[no_mangle] feels like a first step towards #[naked], which should also require unsafe to be called.

@geofft
Copy link
Contributor Author

geofft commented Jan 27, 2018

Why is calling #[no_mangle] functions safe in the first place?

They can't do anything that normal functions can't do. And there isn't in theory any way to use them to violate safety (this issue is a bug; it would be totally fine in theory for rustc to fail to compile code with any symbol collisions in any objects, whether they come from Rust or from another language).

#[naked], which should also require unsafe to be called.

As far as I can tell, this is basically because naked functions have no guaranteed ret opcode, so #[naked] fn foo() -> {println!("hi!")} will just let the instruction pointer run past the end of a function, which is rather memory-unsafe. If they did (and if there were a more coherent story about accessing arguments and returning values from pure Rust code), I'm having trouble seeing what makes them unsafe.

That is, the reason #[naked] functions should require unsafe is that the compiler expects a particular thing to be done by the implementation in order for them to be safely callable, and it has no way to check that. (I sort of think that, like unsafe traits, it should be possible for the implementor to say "Trust me on this" and others to be able to call the function from safe code, but maybe this is a discussion for the naked function RFC, not here.) There's nothing special that #[no_mangle] functions need to take care to do; they're just the same as normal functions except for their linkage name.

@vi
Copy link
Contributor

vi commented Jan 27, 2018

They can't do anything that normal functions can't do.

It's not the function itself that is unsafe but the call. The call to something unmangled is FFI-ish, therefore looks (and is) unsafe.

#[naked] functions are unsafe per se, so calls to them should be unsafe not directly, but as a consequence.

@nikomatsakis
Copy link
Contributor

There are some other examples of abusing #[no_mangle] that may be worth mentioning.

For example, this hack overrides main with custom assembly:

#![no_main]

#[link_section=".text"]
#[no_mangle]
pub static main: [u32; 9] = [
    3237986353,
    3355442993,
    120950088,
    822083584,
    252621522,
    1699267333,
    745499756,
    1919899424,
    169960556,
];

Sigh.

@vi
Copy link
Contributor

vi commented Jan 29, 2018

#![forbid(unsafe)] code should be assumed free of such hacks...

@LeSeulArtichaut LeSeulArtichaut removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jun 9, 2020
@RalfJung
Copy link
Member

In this comment, @dtolnay brings up the idea of "unsafety at the item level". I feel like that is also relevant for this issue here, which basically concerns an unsafe function-level attribute.

@bstrie
Copy link
Contributor

bstrie commented Feb 3, 2022

#[export_name] has the same theoretical problem as #[no_mangle] here, right? https://doc.rust-lang.org/reference/abi.html#the-export_name-attribute

@tarcieri
Copy link
Contributor

tarcieri commented Feb 3, 2022

@bstrie there's a tracking issue for all attributes with unsafe behaviors here, which also tracks adding lints for them: #82499

@jhpratt
Copy link
Member

jhpratt commented Apr 26, 2023

This issue will be resolved by RFC 3325 if it is accepted and stabilized.

@tmandry
Copy link
Member

tmandry commented Jan 11, 2024

I think this is just a compiler bug. Am I missing something?

On ELF for example, if we linked #[no_mangle] symbols as STB_GLOBAL, we would get a linker error with multiple definitions. I can only infer that we must be linking them as STB_WEAK, but don't see any reason why we should be. That's the binding we would use if we wanted to combine multiple definitions of the same thing, e.g. multiple monomorphizations of a crate's generics.

Speculating a bit more, there's a chance that weak binding was a convenient way to allow multiple codegen units to instantiate the same function, but I think we could prevent that pretty easily for #[no_mangle] and #[export_name] functions.

I would be surprised if the equivalent linker behavior did not exist in the executable formats of every major platform we support.

@ChrisDenton
Copy link
Member

A linker will trawl through static libraries looking for missing symbols. Once it finds a relevant symbol it essentially extracts that object file from the static library then stops looking. Similarly, if a symbol is already resolved it won't go looking in static libraries for others of the same name.

In this scenario we can always use no_mangle to override symbols in static libraries and this behaviour may even be desirable.

@tmandry
Copy link
Member

tmandry commented Jan 11, 2024

I believe most linkers have options that can force them to link an object file. But yes, there are still opportunities for interactions when there's a global symbol namespace. The important thing for Rust is that the behavior is predictable and consistent.

@ChrisDenton
Copy link
Member

Sure but we can't make it a safe attribute? Even if we can guard against specific issues.

@cuviper
Copy link
Member

cuviper commented Jan 11, 2024

Overriding malloc was mentioned earlier: #28179 (comment)

Here's a trivial example of that:

fn main() {
    println!("Hello!");
}

#[no_mangle]
pub fn malloc() -> usize { 1 }

i.e. safe code that crashes with a segfault.

@tmandry
Copy link
Member

tmandry commented Jan 11, 2024

I agree that the attribute still needs to be unsafe because of its potential interactions with native code, the ability to override the same symbol in shared libraries, and other potential weird linking behavior the compiler can't check.

That said, I think we could remove a lot of the fangs with thoughtful consideration of better linking defaults. That would need to come with ways of overriding them via link attributes.

@RalfJung
Copy link
Member

This is fixed by #123757. :)

@tmandry I don't know enough about ELF linkers to understand what could be done here by changing linker flags. Could you open a new issue describing that? I think this one here is already filled with too much discussion going in a different direction.

@geofft
Copy link
Contributor Author

geofft commented Aug 28, 2024

Super exciting to see this issue from 2015 / Rust 1.2 resolved - thanks to everyone who made it happen! I agree that the unsafety check is the best that can be done here.


Regarding the comments about static linking - it appears to me that there's a distinction between linking an object file and linking an archive file containing that exact same object file. The linker complains about collisions in STB_GLOBAL symbols in two .o files but not in two .a files (or .rlib files, which are also ar archives).

$ cat main.c
#include <stdio.h>

extern int collision(void);

int main(void) {
	printf("%d\n", collision());
}
$ cat collide1.c
int collision(void) {
	return 1;
}
$ cat collide2.c
int collision(void) {
	return 2;
}
$ cc -c collide1.c
$ cc -c collide2.c
$ cc -o main main.c collide1.o collide2.o
/usr/bin/ld: collide2.o: in function `collision':
collide2.c:(.text+0x0): multiple definition of `collision'; collide1.o:collide1.c:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status
$ ar rcs libcollide1.a collide1.o
$ ar rcs libcollide2.a collide2.o
$ cc -o main main.c -lcollide1 -lcollide2 -L.
$ ./main
1
$ cc -o main main.c -lcollide2 -lcollide1 -L.
$ ./main
2
$ readelf -s collide1.o | grep collision
     9: 0000000000000000    15 FUNC    GLOBAL DEFAULT    1 collision
$ readelf -s libcollide1.a | grep collision
     9: 0000000000000000    15 FUNC    GLOBAL DEFAULT    1 collision

(GCC 9.4 and ld 2.34 as shipped in Ubuntu 20.04)

But you do get the error if you use --whole-archive:

$ cc -o main main.c -Wl,--whole-archive -lcollide2 -lcollide1 -Wl,--no-whole-archive -L.
/usr/bin/ld: ./libcollide1.a(collide1.o): in function `collision':
collide1.c:(.text+0x0): multiple definition of `collision'; ./libcollide2.a(collide2.o):collide2.c:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status

GNU ld's documentation of that option says that it makes the linker "include every object file in the archive in the link, rather than searching the archive for the required object files." I'm interpreting that to mean that with the two static archives, it decides that it's found collision in libcollide1's collide1.o, and when going through libcollide2, it doesn't need anything from collide2.o and skips it.

This seems like relatively intentional behavior on the part of the linker, and I'm not sure it would work / be a good idea for rustc to use --whole-archive for rlibs, but maybe that's worth exploring. I don't think it will help for collisions with libc and other standard libraries, though, because if you leave off the -Wl,--no-whole-archive flag at the end, you get a whole lot of collisions from libraries implicitly added by the compiler driver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-linkage Area: linking into static, shared libraries and binaries C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests