-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LLVM atomic memcpy intrinsics, expose in core/std #58599
Comments
@alexcrichton Do you have thoughts on this? Want to make sure I get the go-ahead before I send somebody on TWiR's CFP down a dead end. |
I suspect there'll be opinions on stabilization, but seems fine to add as unstable at least to me! |
OK cool, thanks! |
I'd like to take a stab at implementing this. |
Awesome, thanks @tmccombs ! This will unblock a lot of important work :) |
@alexcrichton On the subject of exposing in |
Currently it's largely just a "we strongly discorage platform-specific APIs" in libcore, there's already platform-specific pieces to the implementation, especially some math-related pieces on MSVC |
I saw there was some discussions around use cases on #59155 so I thought I'd bring up mine (and perhaps you can all cross-check me that this would be the proper solution). I maintain a crate for zeroizing memory (for e.g. wiping cryptographic keys/plaintexts) which presently relies on undefined behavior - namely mixed volatile / non-volatile memory accesses (in conjunction with compiler and hardware fences): https://github.com/iqlusioninc/crates/tree/develop/zeroize I'm presently (ab)using What I'd really like is an atomic |
I don't know why you think that mixing volatile and non-volatile accesses is UB. I see no issue with that per se. That said, atomic memset might still be useful. |
It is explicitly documented as UB: https://doc.rust-lang.org/core/ptr/fn.write_volatile.html
Here is a thread on the subject of using volatile writes for zeroing sensitive memory: https://internals.rust-lang.org/t/volatile-and-sensitive-memory/3188 |
That sentences does not say "UB" or does it say "must not". The section on when this function is UB does not mention mixed accesses. AFAIK, it mostly refers to "normal" usage of volatile, namely interaction with IO registers: you don't want to mix volatile and non-volatile accesses on those. But I do not know of any form of UB there. @rkruppe, do you? EDIT: Opened #60972 |
Note that that thread does not reflect our current understanding of volatile accesses and the issues around that. See rust-lang/unsafe-code-guidelines#33 for the latest discussion. I'd love to amend that thread but it got auto-closed :( |
@RalfJung interesting (and also lengthy)! I'll have to read through it later. Discrepancies between volatile and atomic aside, there's still no stable API for volatile memset either (only the unstable |
remove confusing remarks about mixed volatile and non-volatile accesses These comments were originally added by @ecstatic-morse in rust-lang@911d35f and then later edited by me. The intention, I think, was to make sure people do both their reads and writes with these methods if the affected memory really is used for communication with external devices. However, [people read this as saying that mixed volatile/non-volatile accesses are UB](rust-lang#58599 (comment)), which -- to my knowledge -- they are not. So better remove this. Cc @rkruppe @rust-lang/wg-unsafe-code-guidelines
This is also particularly useful to implement a seqlock. Currently all implementations read unprotected data on the reader side with or without More details can be read at this C++ proposal. |
It depends on LLVM semantics ensuring that read-write races are not UB, and instead yield
That's interesting, thanks! I am surprised though that they don't even mention the LLVM semantics, which are an alternative approach to solve the issue with SeqLocks. Seems like relevant related work. |
Considering how much focus there is on Wasm, it would be great for LLVM "unordered" loads and stores to have a path to stabilization in order to have stable-Rust facilities for implementing host services for multi-threaded Wasm. (Since Wasm code is untrusted, if Wasm thread A calls into a host service and designates a range of Wasm heap for reading/writing, the Rust code implementing the host service can't be optimized with the assumption of freedom from data races in that memory region, because Wasm thread B could access the region without adhering to the Rust/C11/C++11 rules.) |
Wait a second. Actually, It could be argued that no sane compiler optimization or known hardware behavior could ever turn this UB into something dangerous in practice. But well... UB is UB, and we never know what the future holds. So if you want to strictly avoid UB, the correct thing to do in your case is to tweak the WASM compiler so that all reads and writes to the WASM heap that may be used to interact with the host are atomic. They may be unordered, they may be relaxed, it doesn't really matter from an UB-correctness point of view (though it may matter from a performance point of view). But they must be atomic. |
Without these intrinsics, is there a supported way to read or write shared memory that is being concurrently and arbitrarily mutated by another processor? If the other processor is executing code in a different security boundary, then there may be no way to prevent it from mutating the memory with an arbitrary mix of atomic, non-atomic, and unaligned accesses. But clearly this works at the processor level, because this is the foundation for most OS and virtualization communcation models. Right now, for example, Google's crosvm VMM uses an (arbitrary?) mix of ptr::{write_volatile, copy, write_bytes} while accessing memory that is shared between its process and a VM. It seems to me that at least some of this is UB today, and therefore future LLVM optimizations could break this code, but I wouldn't know how to fix this with what Rust offers today (especially while retaining good performance). The requested intrinsics would seem to be what crosvm, and software like it, needs. |
Yes, use atomic accesses. A data race of an atomic and a non-atomic access in one program is UB if at least one of them is a write; but if another separately compiled program races with your atomic accesses, the only side that has to expect problems is that side. In some way, the data race is UB "only on the non-atomic side", which is not a distinction that usually makes any sense because UB is a whole-program property, but it does if one side is encapsulated and compiled separately. Mind you, this is informal. You are outside of the scope of any formal model I am aware of. |
Over in the
Is there advice for programmers of virtual machines for how to use atomics safely with a non-conforming execution on the other processor such that LLVM devs agree that the advice may be safely relied upon? |
The point seems to be
Indeed compilers are allowed to optimize atomic accesses to non-atomic accesses if they can prove that that does not introduce a race or otherwise change behavior. The problem we are having here is that the C/C++ memory model is inherently a "whole-program spec". The question we are asking though only makes any sense when considering "separate compilation". We want some kind of modular notion of correctness when linked with arbitrary code -- and worse, with the additional constraint that we have run-time mechanisms in place that ensure that the arbitrary code does not cause UB. However, I rest my case that I think using atomic instructions will work for this. GCC can only remove synchronization when it knows all accesses done to a location. In case of a VM, I don't know how memory gets allocated and handed to the program inside the VM, but surely it happens in a way that the compiler must consider the pointer "escaped"? Even though the spec does not consider separate compilation, that's what compilers do in practice -- they have to compile a compilation unit in a way that works fine with any conforming other compilation unit. Hence they cannot remove fences from atomic accesses to locations that have "escaped". But I will admit that this is all based on a somewhat idealized understanding of the VM and compilers, the details of which I am not extremely familiar with. I just cannot imagine a way to implement a compiler that supports separate compilation of concurrent code, such that using atomics for this use-case does not have the intended semantics. |
One issue with these intrinsics is that they use LLVM's "unordered" atomic memory ordering, which Rust currently does not have, and I have some reservations to adding it. Are there "relaxed" versions of these intrinsics that we could use instead? |
Not sure. Anecdotally, I benchmarked a loop of relaxed atomic loads (loading from a slice of For my use case in particular, speed is paramount, which is why I'm interested in these intrinsics. That said, I completely understand your concern with introducing "unordered" into our memory model. Perhaps a middle-road path would be to add these as unstable with a big warning in their doc comments saying "we will probably never stabilize these because of memory model issues"? That way we could still use them for the time being while we let the various discussions about the semantics of inter-process shared memory play out, and hopefully switch to the "official" solution once such a solution exists. |
My current theory for why relaxed is so much slower is that LLVM is really bad at optimizing it.
I am not opposed to any unstable experimentation. I am okay with giving people enough rope to hang themselves, at least on nightly. ;) |
@joshlf The cause of this particular problem is known. LLVM's optimizer is currently unable to fuse together neighboring relaxed atomic loads and stores, so a loop of AtomicU8 operations will translate into actual byte operations in hardware assembly, which is stupidly slow. According to @comex, that's an optimizer bug, and the relaxed atomic semantics should actually allow for such code to be optimized into e.g. an efficient |
Note that relaxed vs unordered has no effect on inter-process shared memory (between distrusting parties). General consensus is (to my knowledge) still that this does and will require |
IIUC from the discussion in rust-lang/unsafe-code-guidelines#152, normal volatile may be sufficient if combined with a freeze operation (which doesn't yet exist, of course)? |
That is true if we adapt LLVM's handling of read-write races. So, we would be deviating from C++ concurrency there as well. This is slightly better studied than LLVM's handling of write-write races, but that doesn't say much (as the latter is entirely unstudied to my knowledge). |
Triage: I don't believe there's been any changes here. |
Wg21's p1478r6 might be interesting. It proposes to add to C++ an |
Yeah that sounds like a great solution. :) |
Cool. Time to add those intrinsics and and add this API as unstable to play around with then. :) Not sure if I'll have time for that soon. So if anyone else has, please do! ^^ |
@Amanieu mentions here (#59155 (comment)):
Does that mean we could provide these |
How do we know that? |
That sounds like something that is fixable in LLVM? (And if these operations are really adopted by C/C++ and used for seqlocks, they will want to do that anyway.) |
@m-ou-se, could you clarify which intrinsics and what API surface you'd like to see added here? I'd like to post this issue on This Week in Rust's CFP section to see if we can find someone to implement it. |
Expose LLVM's "element wise atomic memory intrinsics". In particular:
llvm.memcpy.element.unordered.atomic
asatomic_element_unordered_copy_memory_nonoverlapping
llvm.memmove.element.unordered.atomic
asatomic_element_unordered_copy_memory
llvm.memset.element.unordered.atomic
asatomic_element_unordered_set_memory
Expose these through functions in the
std::ptr
module. Each function is implemented by the equivalent intrinsic or by a loop of relaxed atomic operations if the intrinsic is not available on the target platform (TODO: Given this platform-specific behavior, can this also be exposed incore::ptr
?)copy_nonoverlapping_atomic_unordered
, backed byatomic_element_unordered_copy_memory_nonoverlapping
copy_atomic_unordered
, backed byatomic_element_unordered_copy_memory
write_atomic_unordered
, backed byatomic_element_unordered_set_memory
Previously discussed on the internals forum here.
Folks with the authority to approve this: Let me know if this is OK to move forward; I'd like to post it to This Week in Rust's CFP.
The text was updated successfully, but these errors were encountered: