-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsoundness if two separate allocations happen to be next to each other #8
Comments
Thanks for the report and find! I really didn't think about that. You're right that you can currently achieve your behaviour, which is UB both from the side of implementation details of slice, and from llvm's pointer aliasing rules. Thus we should mark all of those functions as We could provide a safe interface by taking the full original allocation in addition to the two sub-slices. That way we don't even need fn concat_slice<'a, T>(all: &'a [T], sub_a: &'a [T], sub_b: &'a [T]) -> &'a [T] {
let all_end = unsafe { all.as_ptr().add(all.len()) };
let sub_a_end = unsafe { sub_a.as_ptr().add(sub_a.len()) };
let sub_b_end = unsafe { sub_b.as_ptr().add(sub_b.len()) };
// check that sub-slices are in-bounds in full allocation
assert!(all.as_ptr() <= sub_a.as_ptr() && all.as_ptr() <= sub_b.as_ptr()
&& all_end >= sub_a_end && all_end >= sub_b_end);
// check that sub-slices are next to each other
assert!(sub_a_end == sub_b.as_ptr());
// concatenate
let num_elems = sub_a.len() + sub_b.len();
let start_elem = match mem::size_of::<T> {
0 => panic!("We can't find out the starting element of sub_a"),
s => unsafe { sub_a.as_ptr().sub(all.as_ptr()) } / s,
}
&all[start_elem..][..num_elems]
} Is it somehow possible to find out where a subslice of a slice of ZSTs points to, i.e. which is the first element the subslice points to? Is it safe to just ignore the starting element and make the new concatenated slice start from the beginning of the full original slice? |
All the elements are at the same address anyway, so I think you should be able to get away with just slicing with |
Syntactically, I also think that should be fine. But considering the latest discussion I'm unsure if that's also semantically correct and not actually able to trigger some sort of undefined behaviour in the guarantees inside the compiler. Especially since the behaviour around slices of ZSTs seem highly underdocumented :) |
I had an idea going in a different direction. Keep the current interface but mark it pub struct SameAllocation<'a, T> {
phantom: PhantomData<&'a [T]>,
base: usize,
bytes: usize,
}
impl<'a, T> SameAllocation<'a, T> {
pub fn new(ptr: &'a [T]) -> Self {
SameAllocation {
phantom: PhantomData,
base: ptr.as_ptr() as usize,
bytes: ptr.len() * mem::size_of::<T>(),
}
}
// Pretend to borrow but return the complete array.
pub fn new_mut(ptr: &'a mut [T]) -> (Self, &'a mut [T]) {
let result = SameAllocation {
phantom: PhantomData,
base: ptr.as_ptr() as usize,
bytes: ptr.len() * mem::size_of::<T>(),
};
(result, ptr)
}
// Possible `unsafe fn new_unchecked` variant omitted
// Join two slices that must be part of the allocation used in the construction.
pub fn join<'b: 'a>(&self, a: &'b mut [T], b: &'b mut [T]) -> Result<&'b mut [T], Error> {
// Assert the beginning is within the allocation. Instead of asserting we could error.
// This guarantees each complete slice is within the same allocation.
assert!(self.base <= a.as_ptr() as usize && a.as_ptr() as usize <= self.base + self.bytes);
assert!(self.base <= b.as_ptr() as usize && b.as_ptr() as usize <= self.base + self.bytes);
unsafe {
// SAFETY: Guaranteed the exact check we need for the current method.
concat_slice_unchecked(a, b)
}
}
} |
Usage of the above proposal: // usage:
fn main() {
let mut data = [0xa; 16];
let (allocation, data) = SameAllocation::new_mut(&mut data[..]);
let (a, b) = data.split_at_mut(10);
let rejoined = allocation.join(a, b).unwrap();
assert_eq!(rejoined, &[0xa; 16]);
} play. If this is sound it would give me incredible satisfaction since it can only be done in Rust due to its reliance on lifetimes and thus C/C++ would face the same problem with UB safety we currently face but can not resolve it. |
|
Why would I need that? The code never accesses the mutable slice underlying the allocation through any non-mutable reference. I'm not even sure if the type parameter |
Edit: Doesn't work as discussed in #8 (comment) Here is a super-crazy idea: The main problem here is that the compiler can use the UB to perform optimizations based on usage of the original slices and its knowledge where the resulting slice is coming from. For example in the original code below, the compiler optimizes let buf1 = [0; 16];
let buf2 = [0; 16];
concat_slice(&buf1, &buf2).unwrap()[20]; However, if we rid the compiler of any information it knew about the input and output slices, we might get away with technically having UB. The compiler can't perform any optimizations on the assumptions we break, because it doesn't know about any of those guarantees anymore. We can do that by simply You can find a godbolt example here. Without any Obviously, this is playing with fire:
lea rax, [rsp + 32]
lea r14, [rsp + 32]
cmp rax, r14
jne .LBB10_1
What do you think? Could this be the very first safe unsound function? ;) |
No, it couldn't |
@oberien see rust-lang/rfcs#2360 for a detailed discussion about |
Implementation of my above idea: https://github.com/HeroicKatora/str-concat/tree/proof-instance |
I've read a bit more into
Any use of a For all of the above reasons, I'll drop the idea of having a safe unsound function. |
@oberien also see https://youtu.be/nXaxk27zwlk?t=2445 for Chandler (of LLVM fame) saying that (the equivalent of)
|
Everything is marked as unsafe as of #7 and we'll add proper reasoning to the readme in #9. I'll make a new release after #9 and yank all previously released versions. Ifneedbe, we can try implementations of the approaches suggested in #8 (comment) and #8 (comment). |
One solution for this problem would be to allow this method only for strings, and then to add a null-terminator (or any arbitrary byte) to the end of every string, which is then not included in slices to the string content. |
Even if we were willing to add this to |
Right, so you would have to add end padding of at least one byte to every block of memory that is allocated and then adjust the safety requirements for all from_utf8 methods for strings, all so that strs can be rejoined. FTR I'm not advocating for this but if a pointer crossing allocation boundaries is UB then there is no other way to do this safely. |
I don't see how padding all allocations is possible (what about memory allocated outside Rust? what about Rust code calling into other allocators via FFI? etc.). Even if it was, the overhead would likely be way too big to accept. So I guess that leaves just the status quo of "not safely possible"? |
It would be possible to make every allocation that occurs within the standard library and every allocation that the rust compiler makes to have an extra padding. As for outside allocations, those require unsafe code anyway so the solution for that would be to simply adjust the safety requirements of methods that convert from bytes to str to include some sentence like "every block of memory passed into here must have at least one extra byte allocated passed the end". |
Padding allocations with a spare byte might still make an interesting custom type, outside the standard library. If joining is really necessary for some algorithm or performance reasons then this would be neat to have. I don't see a point in an (essentially) user requirement affecting the standard library and, fundamentally, the memory model underlying it. That sounds horribly contagious given that there are a ton of programs that apparently wouldn't even benefit from it as they work fine without |
If you want a safe generic So a generic |
@oberien If you want to keep this issue open for tracking whether there is a possible safe wrapper (and what difficulties there are) then I think the above discussion is off-topic. The actual issue was fixed with the release of |
This statement makes an assertion that I hadn't heard before and doesn't provide any evidence. Did you mean this in a colloquial sense, that there were many attempts but all failed, or in a true mathematical one? |
There's an LLVM IR operation, "get element pointer inbounds", which must stay in bounds or point to the byte one past the end after the offset (similar to a C array offset/indexing), otherwise it's UB. However, this gives better optimization than using the less restricted offset operation. So since indexing in rust has to be in bounds anyway, rust uses GEPI for indexing (once you go through all the compile laters). So, pretty fundamentally, this entire goal of being able to join two slices or strings that happen to be next to each other in memory just won't work. |
It's more than just the "inbounds" operation. Even the non-"inbounds" variant will always stay within the same allocation. Likewise, it is impossible in C by pure pointer arithmetic to switch between allocations. This is also why the
Casting to an integer, doing arithmetic and casting back should work, but unfortunately the details of integer-pointer casts are fuzzy enough and compilers are buggy enough around those details that it's hard to say. |
If I understand correctly, what @maplant is suggesting, is to add allocation delimiters to rust's memory rules, such that we can check those during a safe join. The solution presented is, however, not sufficient, not even for strings, because those can contain zero-bytes. UTF-8 allows strings to contain the ASCII-Null character, which is represented by Instead of changing all of the rust memory rules, as @HeroicKatora suggested, this could make an interesting library type, and is also somewhat related to what the As this issue is fixed with |
My statement implicitly requires a change to the memory model so I'm not sure exactly what you're getting at, apologies if I'm not making my comments as clear as they should be. My point is that it is possible to have this implemented opaquely in Rust internally. It is not, however, possible to have this work with all rust code currently, because now there is a new requirement that all allocated blocks have to have an extra byte. That being said, it would be possible to implement this without breaking any rust code. I am also quite confident that this would never be accepted :-)
Are you asking me if I have a mathematical proof that it is impossible to ensure that two blocks of memory with the same lifetime are non-continuguous, I do not at the top of my head. @oberien erp!!! sorry I commented right after you closed the issue. My apologies |
Race conditions :) Feel free to open another issue to further explore and discuss your idea, I'm interested if this might be a possible solution. |
You initially asserted in this comment that changes to the memory model are necessary. I'm asking for a something backing up the assertion that in the current memory model a safe |
@maplant That is demonstrably false. The ¹ For example, did you know that |
That is not what I'm suggesting, I'm suggesting changing the way the alloc trait is used, not adding unsafe requirements to alloc. The invariants would not change. The unsafe requirements would be added to other library constructors. In older code this invariant would be broken however since "join" would be the only method that uses this and older code by definition cannot use this new method and violating these invariants would never invoke UB in practice. Should that code be changed to use join, it would immediately be broken. You are right that there are a number of issues with this. Again I'm not advocating for it, it's just a random idea I had. Regardless, this issue is closed and your tone seems more intent on trying to teach me something than trying to have a meaningful discussion this will be the last comment I make. If you would like to further this discussion feel free to send me an email. |
First off, kudos for carefully considering each of the stated safety requirements for each unsafe operation used and documenting these so clearly. However, I believe I found a subtle issue that can lead to UB. It relates to a requirement that is unfortunately not explicitly noted (unless I missed something), but follows from how slices and
str
work. For simplicity I'll demonstrate with u8 slices and the currently-internalconcat_slice
function, but of coursestr
is just as affected.To set the stage, note that it is possible for two separate allocations (for example, two
String
s, or two u8 arrays on the stack) to be adjacent in memory. When that happens, this crate will happily detect that they're adjacent and concatenate them. I don't think there's any way to avoid that, and while one can't rely on two allocations being adjacent, maybe someone wants to make use of it when the stars align (or they simply made a mistake when passing the pointers).The problem arises after the concatenated slice is constructed: indexing into it or slicing it (and likely other operations) calculates addresses in ways that assume the pointer doesn't cross allocation boundaries, otherwise it's UB. For example, given
bytes: &[u8]
, the expressionbytes[i]
boils down to something involving the pointerbytes.as_ptr().add(i)
. That can cross from the first allocation into the second in the scenario we're considering, but<*const T>::add
states among other safety preconditions:So the following code will have UB in the cases where
concat_slice
returnsOk(..)
:In effect, slices and
str
s and other safe references can't span allocation boundaries. Unfortunately, since I also don't know a way to ensure that two references come from the same allocation, I believe this makes it impossible to soundly provide the operations this crate exists to provide 😭The text was updated successfully, but these errors were encountered: