Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use struct pointer as pthread_t #13

Closed

Conversation

ian-h-chamberlain
Copy link
Member

rust3ds/ctru-rs#48

This allows us to hold the OS thread ID as well as the ctru Thread, which makes lookups for schedparam work a bit nicer. One downside is that pthread_self leaks the struct when called...

I considered trying to stuff the whole PThread struct into the pthread_t (if it were a c_ulonglong instead of c_ulong), but it requires mem::transmute, and I tried it in Miri and it was flagged as Undefined Behavior (and also didn't seem to work), so ... probably not the best idea.

Open to ideas for avoiding the leak, although arguably leaking a 64-bit struct once in a while is not so bad compared to the zombie processing getting left around by svcGetThreadList.

@Meziu @AzureMarker

This allows us to hold the OS thread ID as well as the ctru `Thread`,
which makes lookups for schedparam work a bit nicer. One downside is
that pthread_self leaks a struct when called...
@AzureMarker
Copy link
Member

I think I have an alternative design for keeping track of these PThread objects which would fix the leak issue...

src/lib.rs Outdated Show resolved Hide resolved
@AzureMarker
Copy link
Member

AzureMarker commented Feb 17, 2022

@ian-h-chamberlain Idea: Do something similar to the thread local keys implementation by having a RwLock<BTreeMap<u32, PThread>, spin::Yield> and set pthread_t to u32. Then store the u32 for that thread in a thread local (to make pthread_self work). The u32 values can come from an AtomicUsize like in the thread keys impl.

Edit: could hold the current thread's key in a Cell

This introduces some synchronization, but as long as we drop the locks at the right times (we can make Pthread Copy and drop right after we read the map) I don't think it will need to wait for locks much at all.

Edit: by the way, we also should think about running the thread local destructors on thread deletion...

@ian-h-chamberlain
Copy link
Member Author

Update: I tried a variant of the transmute for stuffing the whole struct into an int again, and it seems to pass Miri validation now? I don't think I was using Box before for the pointer field, so that might have been the real issue.

This seems kinda gnarly but I think it avoids the leaking issue, and shouldn't require any allocation or whatever. I still haven't been able to find any resources that definitively declare this as undefined behavior, and Miri accepts it, so maybe it's okay?

I am a bit hesitant to use thread locals and stuff like that since it feels like reimplementing what's already there in libctru, so if we can use their impl instead I think that would be preferable.

Re: destructors – yes, that's a good point and I will see if I can do that as part of my next push, seems fairly straightforward, I think.


Side note: from reading comments on rust-lang/rust#29594 and rust-lang/rust#91659 – does #[thread_local] actually do anything on this platform? I didn't have a chance to look at disassembly or anything so I'm wondering if it actually compiles down as expected. If so, we should probably mark the target as #[cfg(target_thread_local)] in rustc.

@Meziu
Copy link
Member

Meziu commented Feb 17, 2022

Side note: from reading comments on rust-lang/rust#29594 and rust-lang/rust#91659 – does #[thread_local] actually do anything on this platform? I didn't have a chance to look at disassembly or anything so I'm wondering if it actually compiles down as expected. If so, we should probably mark the target as #[cfg(target_thread_local)] in rustc.

I am pretty sure it does work (or at least, the thread_local! macro does). We had problems with thread storage in the past, but it should work now.

@ian-h-chamberlain
Copy link
Member Author

I am pretty sure it does work (or at least, the thread_local! macro does). We had problems with thread storage in the past, but it should work now.

My understanding, though limited, is that thread_local! uses emulation on unsupported targets (i.e. calling the pthread_key_* etc. functions implemented in this crate), but uses #[thread_local] in its implementation on supported targets. #[thread_local] seems to lower to an LLVM attribute or something like that on those targets, but I'm not sure how we can tell if that's supported on our target, other than perhaps comparing disassembled code or something like that. Perhaps @AzureMarker knows more from working on the thread local implementation...

@Meziu
Copy link
Member

Meziu commented Feb 17, 2022

I am pretty sure it does work (or at least, the thread_local! macro does). We had problems with thread storage in the past, but it should work now.

My understanding, though limited, is that thread_local! uses emulation on unsupported targets (i.e. calling the pthread_key_* etc. functions implemented in this crate), but uses #[thread_local] in its implementation on supported targets. #[thread_local] seems to lower to an LLVM attribute or something like that on those targets, but I'm not sure how we can tell if that's supported on our target, other than perhaps comparing disassembled code or something like that. Perhaps @AzureMarker knows more from working on the thread local implementation...

There is the has_thread_local option for the target spec, which I also remember using at one point. I don’t know if that changes anything though, or it it is just a target flag for conditional code.

Edit: we are using a raw #[thread_local] call for the implementation of keys itself https://github.com/Meziu/pthread-3ds/blob/90bd4d00e371ea5dbd8487c5a63d3d75d667f796/src/lib.rs#L689
So I guess it works(?)

@AzureMarker
Copy link
Member

AzureMarker commented Feb 17, 2022

Yeah I'm pretty sure that annotation does work. I think the thread local example verifies that.

@AzureMarker
Copy link
Member

AzureMarker commented Feb 17, 2022

If it's acceptable to use a bigger type for pthread_t then that might be fine. Safety wise it's ok as long as we make that static size assertion.

@Meziu
Copy link
Member

Meziu commented Feb 17, 2022

If it's acceptable to use a bigger type for pthread_t then that might be fine. Safety wise it's ok as long as we make that static size assertion.

I honestly would like not changing the pthread_t size. Aren't pointers on 3DS 32bit anyways? It should fit in the ulong type.

@ian-h-chamberlain
Copy link
Member Author

Yes, it would require making pthread_t larger than a pointer... There seems to be some precedent (based on this pthread header, I think) for something like this although most pthread_t definitions I found in libc are simply a pointer type, c_ulong, or similar, not a c_longlong or equivalent (which is what we would need in this case).

As far as I know pthread_t is meant to be treated as an opaque type to any callers, so I think any of these options are probably fine, but if there are serious reservations about increasing the size then I suppose we would need to do something like the thread local keys.

@Meziu
Copy link
Member

Meziu commented Feb 17, 2022

While it is true it is meant as an opaque type, of which one shouldn’t assume the size or value, it also isn’t meant to be a storage. Under normal circumstances it is either an unique index or a pointer. I believe we should try to follow that rule in any case. Especially because we wouldn’t be limited by the amount of additional data we can store. The only real issue are memory leaks.

@ian-h-chamberlain
Copy link
Member Author

Ok, I've come up with something that I think addresses the leak, without adding much complexity or changing the pointer size. This does a little duplication of the PThread struct, but I think it should be cheap enough and does get properly destructed on thread exit, with the exception of the main thread.

Also added some basic code to run thread-local destructors, in a similar manner to THREAD_SELF.

@AzureMarker
Copy link
Member

AzureMarker commented Feb 19, 2022

I took a quick glance but I'll review more soon. Very interesting approach.
(I finally got my big desktop working again, after 2 months. PSU and GPU died over the holidays. Now I have 32 cores to compile with instead of my laptop's 4 😄)

Use a drop impl to make sure we don't accidentally leave any open
handles.
@Meziu
Copy link
Member

Meziu commented Feb 19, 2022

I took a quick glance but I'll review more soon. Very interesting approach. (I finally got my big desktop working again, after 2 months. PSU and GPU died over the holidays. Now I have 32 cores to compile with instead of my laptop's 4 smile)

I have a 5th gen i5 and with integrated graphics. My laptop dies if I compile Rust while having an open VsCode window (though that may be more of a problem with VsCode rather than Rust).

Edit: One day I will upgrade, maybe.

Copy link
Member

@AzureMarker AzureMarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, just needs some tweaking due to the requirements around pthread_t (see comment in pthread_self).

Edit: Actually there's some fundamental issues with this approach. We probably need to use the BTreeMap method.

/// The main thread's thread ID. It is "null" because libctru didn't spawn it.
const MAIN_THREAD_ID: libc::pthread_t = 0;
#[derive(Clone, Debug)]
#[repr(C)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be repr(C) since it's never exposed outside of this implementation (and only used behind a pointer).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so, I think I had this as repr(C) while I was trying out the transmutation from this struct to int types, based on this advice:

Any type you expect to pass through an FFI boundary should have repr(C), as C is the lingua-franca of the programming world.

But as you mention, we are only passing around a pointer so it should be okay to use default repr.

#[repr(C)]
struct PThread {
thread: ctru_sys::Thread,
os_thread: u32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it would be more clear to name this os_thread_id. Otherwise it could be misunderstood as a handle.

src/lib.rs Outdated Show resolved Hide resolved
// Destruct thread local values where possible. We can't guarantee destructors
// will run for all values, which is similar to other Unix-like platforms:
// https://doc.rust-lang.org/std/thread/struct.LocalKey.html#platform-specific-behavior
for (&key, &value) in &LOCALS {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to move this to a separate PR. There's some more nuance (edge cases) in running the destructors:

If, after all the destructors have been called for all non-NULL values with associated destructors, there are still some non-NULL values with associated destructors, then the process is repeated. If, after at least {PTHREAD_DESTRUCTOR_ITERATIONS} iterations of destructor calls for outstanding non-NULL values, there are still some non-NULL values with associated destructors, implementations may stop calling destructors, or they may continue calling destructors until no non-NULL values with associated destructors exist, even though this might result in an infinite loop.

https://linux.die.net/man/3/pthread_key_create

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Actually, I noticed that the segfault when using #[thread_local] (Meziu/rust-horizon#16) doesn't occur without this change, since the dtors aren't run. So probably worth hashing out separately... I will back it out of this PR.


ctru_sys::threadDetach(thread as ctru_sys::Thread);
pub unsafe extern "C" fn pthread_detach(native: libc::pthread_t) -> libc::c_int {
let pthread = Box::from_raw(native as *mut PThread);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should drop the box at this point because they could use more pthread commands after this (ex. set priority).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, in that case how would the PThread ever get freed? We'd probably need to set up a thread local destructor for detached threads to clean themselves up when finished, which ties in the whole discussion around Meziu/rust-horizon#16

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THREAD_SELF could be used for this, since it should get destroyed at the end of the thread's life.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure it is totally unsafe to use a detached thread because it may free at any time. We could add some error handling for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that's true for any thread... It might not be feasible to store the data in a thread local since the thread could die at any time. I think the btreemap might be the best option then.

Copy link
Member

@AzureMarker AzureMarker Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the thread dies, the thread local state gets freed. So if you read from it (ex. from another thread), you're reading from a dangling pointer. Since a thread could die at any time, there's no safe way to use thread local state as the main data store.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the thread-local is not actually holding the PThread struct, just a pointer to it on the heap. Either a joining thread can free it from heap (possibly well after the thread itself has exited), or a detached thread can free the PThread itself upon exit. I believe this is safe the way it is used in std, but perhaps I can verify by copying the tests from https://github.com/Meziu/rust-horizon/blob/horizon-std/library/std/src/thread/tests.rs into the ctru-rs test runner to verify – does that seem fair?

I have been traveling so haven't had a chance to test / push the changes I have locally, but will try to do so tonight – maybe it will make more sense to just show the code I have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading e.g. https://linux.die.net/man/3/pthread_detach, it seems like it's fair to say that it's invalid to attempt to join a previously detached thread, but I'm not sure what the expected behavior is (unspecified? return an error?).

From this I think it's just EINVAL:

EINVAL
thread is not a joinable thread.
...
Failure to join with a thread that is joinable (i.e., one that is not detached), produces a "zombie thread".

https://linux.die.net/man/3/pthread_join

Copy link
Member

@AzureMarker AzureMarker Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the thread-local is not actually holding the PThread struct, just a pointer to it on the heap. Either a joining thread can free it from heap (possibly well after the thread itself has exited), or a detached thread can free the PThread itself upon exit.

Since we need to clean up the heap data when the thread dies, any future usage of the pthread_t value would be using a dangling pointer. If you're given a pthread_t, how do you know if the allocation it points to is still around (if the thread has finished)? I don't think there's enough data there to answer the question, but it is still valid to use a pthread_t after the thread has finished. For example, you can join on a finished thread (ex. thread may have already finished by the time the main thread calls join).

In the BTreeMap case, if the entry doesn't exist then we know the thread has either finished or never existed. We can further narrow it down by checking the current "next thread ID" value. If the given thread ID is less than the next thread ID, it was a thread that finished. Otherwise it was an invalid/unknown thread ID.

Copy link
Member

@AzureMarker AzureMarker Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, the libctru implementation isn't unsound even though it uses a similar heap allocation idea. It has a threadFree function which explicitly closes out the thread handle, which pthread doesn't have.

@@ -69,75 +110,87 @@ pub unsafe extern "C" fn pthread_join(
native: libc::pthread_t,
_value: *mut *mut libc::c_void,
) -> libc::c_int {
if native == MAIN_THREAD_ID {
// This is not a valid thread to join on
return libc::EINVAL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should remain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I was going to disagree, but after thinking about it more, I think maybe we should just be checking if *(native as *mut PThread).thread.is_null() rather than this check, although I suppose null ptr to PThread is also invalid, so maybe both checks are good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where it's an invalid thread ID (not main), we should return ESRCH:

ESRCH
No thread with the ID thread could be found.

https://linux.die.net/man/3/pthread_join

pub unsafe extern "C" fn pthread_detach(thread: libc::pthread_t) -> libc::c_int {
if thread == MAIN_THREAD_ID {
// This is not a valid thread to detach
return libc::EINVAL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should remain.

let pthread = Box::new(PThread { thread, os_thread });
let raw = Box::into_raw(pthread);
THREAD_SELF = Some(raw);
raw as libc::pthread_t
Copy link
Member

@AzureMarker AzureMarker Feb 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at the pthread_self documentation, I think this approach is flawed. We can't return a new allocation here. It has to be the same ID that we returned in pthread_create:

The pthread_self() function returns the ID of the calling thread.
This is the same value that is returned in *thread in the
pthread_create(3) call that created this thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may be able to work around this by setting the value of THREAD_SELF before calling entrypoint in pthread_create. This is how libctru works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that and it would be nicer, but I'm not sure if there's an easy way to set the value of a #[thread_local] for a different thread. Would the value be copied if it was set before actually starting entrypoint?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I mean to say do this in the new thread before calling entrypoint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comments in pthread_detach, I don't think using thread locals as the main storage is going to work.

fn drop(&mut self) {
unsafe {
let res = ctru_sys::svcCloseHandle(self.0);
assert!(ctru_sys::R_SUCCEEDED(res), "{:#X}", res);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unsafe because it would unwind across an FFI boundary. I wouldn't worry about making sure the result is successful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, probably just ignoring the error is easiest here. I think panicking here might normally abort anyway, so just as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have panics set to unwind, so it would trigger that whole mechanism

@@ -216,6 +269,7 @@ pub unsafe extern "C" fn pthread_getprocessorid_np() -> libc::c_int {
/// Internal struct for storing pthread attribute data
/// Must be less than or equal to the size of `libc::pthread_attr_t`. We assert
/// this below via static_assertions.
#[repr(C)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be repr(C)

Co-authored-by: Mark Drobnak <mark.drobnak@gmail.com>
@ian-h-chamberlain
Copy link
Member Author

Ok, based on the discussion and some local testing, I think it's fair to say this approach doesn't quite work in every scenario.

I won't have much time in the next two weeks to try the suggested approach (still traveling) so I am going to close the PR for now. I can revisit when I get back from my trip, or if someone else wants to take a stab at it in the meantime, then feel free. The BTreeMap design seems like it should work similar to how thread-locals do, so that's probably worth pursuing.

@AzureMarker
Copy link
Member

Thanks for working on this and experimenting @ian-h-chamberlain! I'll take a stab at it and see how things go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants