Use struct pointer as pthread_t #13

ian-h-chamberlain · 2022-02-17T02:32:16Z

This allows us to hold the OS thread ID as well as the ctru Thread, which makes lookups for schedparam work a bit nicer. One downside is that pthread_self leaks the struct when called...

I considered trying to stuff the whole PThread struct into the pthread_t (if it were a c_ulonglong instead of c_ulong), but it requires mem::transmute, and I tried it in Miri and it was flagged as Undefined Behavior (and also didn't seem to work), so ... probably not the best idea.

Open to ideas for avoiding the leak, although arguably leaking a 64-bit struct once in a while is not so bad compared to the zombie processing getting left around by svcGetThreadList.

@Meziu @AzureMarker

This allows us to hold the OS thread ID as well as the ctru `Thread`, which makes lookups for schedparam work a bit nicer. One downside is that pthread_self leaks a struct when called...

src/lib.rs

AzureMarker · 2022-02-17T02:43:03Z

I think I have an alternative design for keeping track of these PThread objects which would fix the leak issue...

src/lib.rs

AzureMarker · 2022-02-17T03:17:42Z

@ian-h-chamberlain Idea: Do something similar to the thread local keys implementation by having a RwLock<BTreeMap<u32, PThread>, spin::Yield> and set pthread_t to u32. Then store the u32 for that thread in a thread local (to make pthread_self work). The u32 values can come from an AtomicUsize like in the thread keys impl.

Edit: could hold the current thread's key in a Cell

This introduces some synchronization, but as long as we drop the locks at the right times (we can make Pthread Copy and drop right after we read the map) I don't think it will need to wait for locks much at all.

Edit: by the way, we also should think about running the thread local destructors on thread deletion...

ian-h-chamberlain · 2022-02-17T15:38:27Z

Update: I tried a variant of the transmute for stuffing the whole struct into an int again, and it seems to pass Miri validation now? I don't think I was using Box before for the pointer field, so that might have been the real issue.

This seems kinda gnarly but I think it avoids the leaking issue, and shouldn't require any allocation or whatever. I still haven't been able to find any resources that definitively declare this as undefined behavior, and Miri accepts it, so maybe it's okay?

I am a bit hesitant to use thread locals and stuff like that since it feels like reimplementing what's already there in libctru, so if we can use their impl instead I think that would be preferable.

Re: destructors – yes, that's a good point and I will see if I can do that as part of my next push, seems fairly straightforward, I think.

Side note: from reading comments on rust-lang/rust#29594 and rust-lang/rust#91659 – does #[thread_local] actually do anything on this platform? I didn't have a chance to look at disassembly or anything so I'm wondering if it actually compiles down as expected. If so, we should probably mark the target as #[cfg(target_thread_local)] in rustc.

Meziu · 2022-02-17T16:01:50Z

Side note: from reading comments on rust-lang/rust#29594 and rust-lang/rust#91659 – does #[thread_local] actually do anything on this platform? I didn't have a chance to look at disassembly or anything so I'm wondering if it actually compiles down as expected. If so, we should probably mark the target as #[cfg(target_thread_local)] in rustc.

I am pretty sure it does work (or at least, the thread_local! macro does). We had problems with thread storage in the past, but it should work now.

ian-h-chamberlain · 2022-02-17T16:32:44Z

I am pretty sure it does work (or at least, the thread_local! macro does). We had problems with thread storage in the past, but it should work now.

My understanding, though limited, is that thread_local! uses emulation on unsupported targets (i.e. calling the pthread_key_* etc. functions implemented in this crate), but uses #[thread_local] in its implementation on supported targets. #[thread_local] seems to lower to an LLVM attribute or something like that on those targets, but I'm not sure how we can tell if that's supported on our target, other than perhaps comparing disassembled code or something like that. Perhaps @AzureMarker knows more from working on the thread local implementation...

Meziu · 2022-02-17T17:11:36Z

I am pretty sure it does work (or at least, the thread_local! macro does). We had problems with thread storage in the past, but it should work now.

My understanding, though limited, is that thread_local! uses emulation on unsupported targets (i.e. calling the pthread_key_* etc. functions implemented in this crate), but uses #[thread_local] in its implementation on supported targets. #[thread_local] seems to lower to an LLVM attribute or something like that on those targets, but I'm not sure how we can tell if that's supported on our target, other than perhaps comparing disassembled code or something like that. Perhaps @AzureMarker knows more from working on the thread local implementation...

There is the has_thread_local option for the target spec, which I also remember using at one point. I don’t know if that changes anything though, or it it is just a target flag for conditional code.

Edit: we are using a raw #[thread_local] call for the implementation of keys itself https://github.com/Meziu/pthread-3ds/blob/90bd4d00e371ea5dbd8487c5a63d3d75d667f796/src/lib.rs#L689
So I guess it works(?)

AzureMarker · 2022-02-17T17:27:32Z

Yeah I'm pretty sure that annotation does work. I think the thread local example verifies that.

AzureMarker · 2022-02-17T17:29:00Z

If it's acceptable to use a bigger type for pthread_t then that might be fine. Safety wise it's ok as long as we make that static size assertion.

Meziu · 2022-02-17T19:03:29Z

If it's acceptable to use a bigger type for pthread_t then that might be fine. Safety wise it's ok as long as we make that static size assertion.

I honestly would like not changing the pthread_t size. Aren't pointers on 3DS 32bit anyways? It should fit in the ulong type.

ian-h-chamberlain · 2022-02-17T19:13:52Z

Yes, it would require making pthread_t larger than a pointer... There seems to be some precedent (based on this pthread header, I think) for something like this although most pthread_t definitions I found in libc are simply a pointer type, c_ulong, or similar, not a c_longlong or equivalent (which is what we would need in this case).

As far as I know pthread_t is meant to be treated as an opaque type to any callers, so I think any of these options are probably fine, but if there are serious reservations about increasing the size then I suppose we would need to do something like the thread local keys.

Meziu · 2022-02-17T20:53:28Z

While it is true it is meant as an opaque type, of which one shouldn’t assume the size or value, it also isn’t meant to be a storage. Under normal circumstances it is either an unique index or a pointer. I believe we should try to follow that rule in any case. Especially because we wouldn’t be limited by the amount of additional data we can store. The only real issue are memory leaks.

ian-h-chamberlain · 2022-02-18T04:11:11Z

Ok, I've come up with something that I think addresses the leak, without adding much complexity or changing the pointer size. This does a little duplication of the PThread struct, but I think it should be cheap enough and does get properly destructed on thread exit, with the exception of the main thread.

Also added some basic code to run thread-local destructors, in a similar manner to THREAD_SELF.

AzureMarker · 2022-02-19T06:21:58Z

I took a quick glance but I'll review more soon. Very interesting approach.
(I finally got my big desktop working again, after 2 months. PSU and GPU died over the holidays. Now I have 32 cores to compile with instead of my laptop's 4 😄)

Use a drop impl to make sure we don't accidentally leave any open handles.

Meziu · 2022-02-19T16:50:18Z

I took a quick glance but I'll review more soon. Very interesting approach. (I finally got my big desktop working again, after 2 months. PSU and GPU died over the holidays. Now I have 32 cores to compile with instead of my laptop's 4 smile)

I have a 5th gen i5 and with integrated graphics. My laptop dies if I compile Rust while having an open VsCode window (though that may be more of a problem with VsCode rather than Rust).

Edit: One day I will upgrade, maybe.

AzureMarker

Good idea, just needs some tweaking due to the requirements around pthread_t (see comment in pthread_self).

Edit: Actually there's some fundamental issues with this approach. We probably need to use the BTreeMap method.

AzureMarker · 2022-02-20T00:38:10Z

src/lib.rs

-/// The main thread's thread ID. It is "null" because libctru didn't spawn it.
-const MAIN_THREAD_ID: libc::pthread_t = 0;
+#[derive(Clone, Debug)]
+#[repr(C)]


I don't think this needs to be repr(C) since it's never exposed outside of this implementation (and only used behind a pointer).

I guess so, I think I had this as repr(C) while I was trying out the transmutation from this struct to int types, based on this advice:

Any type you expect to pass through an FFI boundary should have repr(C), as C is the lingua-franca of the programming world.

But as you mention, we are only passing around a pointer so it should be okay to use default repr.

AzureMarker · 2022-02-20T00:39:06Z

src/lib.rs

+#[repr(C)]
+struct PThread {
+    thread: ctru_sys::Thread,
+    os_thread: u32,


nit: it would be more clear to name this os_thread_id. Otherwise it could be misunderstood as a handle.

src/lib.rs

AzureMarker · 2022-02-20T00:47:11Z

src/lib.rs

+            // Destruct thread local values where possible. We can't guarantee destructors
+            // will run for all values, which is similar to other Unix-like platforms:
+            // https://doc.rust-lang.org/std/thread/struct.LocalKey.html#platform-specific-behavior
+            for (&key, &value) in &LOCALS {


I think it would be best to move this to a separate PR. There's some more nuance (edge cases) in running the destructors:

If, after all the destructors have been called for all non-NULL values with associated destructors, there are still some non-NULL values with associated destructors, then the process is repeated. If, after at least {PTHREAD_DESTRUCTOR_ITERATIONS} iterations of destructor calls for outstanding non-NULL values, there are still some non-NULL values with associated destructors, implementations may stop calling destructors, or they may continue calling destructors until no non-NULL values with associated destructors exist, even though this might result in an infinite loop.

https://linux.die.net/man/3/pthread_key_create

Good point. Actually, I noticed that the segfault when using #[thread_local] (Meziu/rust-horizon#16) doesn't occur without this change, since the dtors aren't run. So probably worth hashing out separately... I will back it out of this PR.

AzureMarker · 2022-02-20T00:55:14Z

src/lib.rs

-
-    ctru_sys::threadDetach(thread as ctru_sys::Thread);
+pub unsafe extern "C" fn pthread_detach(native: libc::pthread_t) -> libc::c_int {
+    let pthread = Box::from_raw(native as *mut PThread);


I don't think we should drop the box at this point because they could use more pthread commands after this (ex. set priority).

Hmm, in that case how would the PThread ever get freed? We'd probably need to set up a thread local destructor for detached threads to clean themselves up when finished, which ties in the whole discussion around Meziu/rust-horizon#16

THREAD_SELF could be used for this, since it should get destroyed at the end of the thread's life.

I am pretty sure it is totally unsafe to use a detached thread because it may free at any time. We could add some error handling for it.

Actually that's true for any thread... It might not be feasible to store the data in a thread local since the thread could die at any time. I think the btreemap might be the best option then.

Once the thread dies, the thread local state gets freed. So if you read from it (ex. from another thread), you're reading from a dangling pointer. Since a thread could die at any time, there's no safe way to use thread local state as the main data store.

Right, but the thread-local is not actually holding the PThread struct, just a pointer to it on the heap. Either a joining thread can free it from heap (possibly well after the thread itself has exited), or a detached thread can free the PThread itself upon exit. I believe this is safe the way it is used in std, but perhaps I can verify by copying the tests from https://github.com/Meziu/rust-horizon/blob/horizon-std/library/std/src/thread/tests.rs into the ctru-rs test runner to verify – does that seem fair?

I have been traveling so haven't had a chance to test / push the changes I have locally, but will try to do so tonight – maybe it will make more sense to just show the code I have.

From reading e.g. https://linux.die.net/man/3/pthread_detach, it seems like it's fair to say that it's invalid to attempt to join a previously detached thread, but I'm not sure what the expected behavior is (unspecified? return an error?).

From this I think it's just EINVAL:

EINVAL
thread is not a joinable thread.
...
Failure to join with a thread that is joinable (i.e., one that is not detached), produces a "zombie thread".

https://linux.die.net/man/3/pthread_join

Right, but the thread-local is not actually holding the PThread struct, just a pointer to it on the heap. Either a joining thread can free it from heap (possibly well after the thread itself has exited), or a detached thread can free the PThread itself upon exit.

Since we need to clean up the heap data when the thread dies, any future usage of the pthread_t value would be using a dangling pointer. If you're given a pthread_t, how do you know if the allocation it points to is still around (if the thread has finished)? I don't think there's enough data there to answer the question, but it is still valid to use a pthread_t after the thread has finished. For example, you can join on a finished thread (ex. thread may have already finished by the time the main thread calls join).

In the BTreeMap case, if the entry doesn't exist then we know the thread has either finished or never existed. We can further narrow it down by checking the current "next thread ID" value. If the given thread ID is less than the next thread ID, it was a thread that finished. Otherwise it was an invalid/unknown thread ID.

By the way, the libctru implementation isn't unsound even though it uses a similar heap allocation idea. It has a threadFree function which explicitly closes out the thread handle, which pthread doesn't have.

AzureMarker · 2022-02-20T01:00:38Z

src/lib.rs

@@ -69,75 +110,87 @@ pub unsafe extern "C" fn pthread_join(
    native: libc::pthread_t,
    _value: *mut *mut libc::c_void,
 ) -> libc::c_int {
-    if native == MAIN_THREAD_ID {
-        // This is not a valid thread to join on
-        return libc::EINVAL;


This check should remain.

Originally I was going to disagree, but after thinking about it more, I think maybe we should just be checking if *(native as *mut PThread).thread.is_null() rather than this check, although I suppose null ptr to PThread is also invalid, so maybe both checks are good.

In the case where it's an invalid thread ID (not main), we should return ESRCH:

ESRCH
No thread with the ID thread could be found.

https://linux.die.net/man/3/pthread_join

AzureMarker · 2022-02-20T01:01:01Z

src/lib.rs

-pub unsafe extern "C" fn pthread_detach(thread: libc::pthread_t) -> libc::c_int {
-    if thread == MAIN_THREAD_ID {
-        // This is not a valid thread to detach
-        return libc::EINVAL;


This check should remain.

AzureMarker · 2022-02-20T01:06:57Z

src/lib.rs

+        let pthread = Box::new(PThread { thread, os_thread });
+        let raw = Box::into_raw(pthread);
+        THREAD_SELF = Some(raw);
+        raw as libc::pthread_t


After looking at the pthread_self documentation, I think this approach is flawed. We can't return a new allocation here. It has to be the same ID that we returned in pthread_create:

The pthread_self() function returns the ID of the calling thread.
This is the same value that is returned in *thread in the
pthread_create(3) call that created this thread.

We may be able to work around this by setting the value of THREAD_SELF before calling entrypoint in pthread_create. This is how libctru works.

I thought about that and it would be nicer, but I'm not sure if there's an easy way to set the value of a #[thread_local] for a different thread. Would the value be copied if it was set before actually starting entrypoint?

Yeah, I mean to say do this in the new thread before calling entrypoint.

See the comments in pthread_detach, I don't think using thread locals as the main storage is going to work.

AzureMarker · 2022-02-20T01:18:29Z

src/lib.rs

+    fn drop(&mut self) {
+        unsafe {
+            let res = ctru_sys::svcCloseHandle(self.0);
+            assert!(ctru_sys::R_SUCCEEDED(res), "{:#X}", res);


This is unsafe because it would unwind across an FFI boundary. I wouldn't worry about making sure the result is successful.

Good point, probably just ignoring the error is easiest here. I think panicking here might normally abort anyway, so just as well.

We have panics set to unwind, so it would trigger that whole mechanism

AzureMarker · 2022-02-20T01:21:13Z

src/lib.rs

@@ -216,6 +269,7 @@ pub unsafe extern "C" fn pthread_getprocessorid_np() -> libc::c_int {
 /// Internal struct for storing pthread attribute data
 /// Must be less than or equal to the size of `libc::pthread_attr_t`. We assert
 /// this below via static_assertions.
+#[repr(C)]


This doesn't need to be repr(C)

Co-authored-by: Mark Drobnak <mark.drobnak@gmail.com>

ian-h-chamberlain · 2022-02-23T05:33:12Z

Ok, based on the discussion and some local testing, I think it's fair to say this approach doesn't quite work in every scenario.

I won't have much time in the next two weeks to try the suggested approach (still traveling) so I am going to close the PR for now. I can revisit when I get back from my trip, or if someone else wants to take a stab at it in the meantime, then feel free. The BTreeMap design seems like it should work similar to how thread-locals do, so that's probably worth pursuing.

AzureMarker · 2022-02-25T04:07:08Z

Thanks for working on this and experimenting @ian-h-chamberlain! I'll take a stab at it and see how things go.

Use struct pointer as pthread_t

9a8ed75

This allows us to hold the OS thread ID as well as the ctru `Thread`, which makes lookups for schedparam work a bit nicer. One downside is that pthread_self leaks a struct when called...

ian-h-chamberlain commented Feb 17, 2022

View reviewed changes

src/lib.rs Show resolved Hide resolved

ian-h-chamberlain commented Feb 17, 2022

View reviewed changes

src/lib.rs Show resolved Hide resolved

AzureMarker reviewed Feb 17, 2022

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

Try to destruct PThread struct by using TLS

e602216

Cleanup some comments

f34214b

ian-h-chamberlain mentioned this pull request Feb 18, 2022

Enable #[cfg(target_thread_local)] Meziu/rust-horizon#15

Closed

ian-h-chamberlain marked this pull request as ready for review February 18, 2022 16:25

Close thread handles after done using them

652a303

Use a drop impl to make sure we don't accidentally leave any open handles.

AzureMarker requested changes Feb 20, 2022

View reviewed changes

Update src/lib.rs

07df59f

Co-authored-by: Mark Drobnak <mark.drobnak@gmail.com>

ian-h-chamberlain closed this Feb 23, 2022

This was referenced Mar 1, 2022

Remove reliance on svcGetThreadList and do some refactoring #14

Merged

Enable #[thread_local] on armv6k-nintendo-3ds Meziu/rust-horizon#16

Merged

Use struct pointer as pthread_t #13

Use struct pointer as pthread_t #13

Conversation

ian-h-chamberlain commented Feb 17, 2022

AzureMarker commented Feb 17, 2022

AzureMarker commented Feb 17, 2022 • edited Loading

ian-h-chamberlain commented Feb 17, 2022

Meziu commented Feb 17, 2022

ian-h-chamberlain commented Feb 17, 2022

Meziu commented Feb 17, 2022 • edited Loading

AzureMarker commented Feb 17, 2022 • edited Loading

AzureMarker commented Feb 17, 2022 • edited Loading

Meziu commented Feb 17, 2022

ian-h-chamberlain commented Feb 17, 2022

Meziu commented Feb 17, 2022

ian-h-chamberlain commented Feb 18, 2022

AzureMarker commented Feb 19, 2022 • edited Loading

Meziu commented Feb 19, 2022 • edited Loading

AzureMarker left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AzureMarker Feb 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AzureMarker Feb 21, 2022 • edited Loading

Choose a reason for hiding this comment

AzureMarker Feb 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AzureMarker Feb 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ian-h-chamberlain commented Feb 23, 2022

AzureMarker commented Feb 25, 2022

AzureMarker commented Feb 17, 2022 •

edited

Loading

Meziu commented Feb 17, 2022 •

edited

Loading

AzureMarker commented Feb 17, 2022 •

edited

Loading

AzureMarker commented Feb 17, 2022 •

edited

Loading

AzureMarker commented Feb 19, 2022 •

edited

Loading

Meziu commented Feb 19, 2022 •

edited

Loading

AzureMarker left a comment •

edited

Loading

AzureMarker Feb 21, 2022 •

edited

Loading

AzureMarker Feb 21, 2022 •

edited

Loading

AzureMarker Feb 21, 2022 •

edited

Loading

AzureMarker Feb 20, 2022 •

edited

Loading