Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing future_result nondeterminism violates exclusive_unwrap_conflict's assumptions #4689

Closed
brson opened this issue Jan 30, 2013 · 6 comments
Labels
A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows A-testsuite Area: The testsuite used to check the correctness of rustc O-linux Operating system: Linux

Comments

@brson
Copy link
Contributor

brson commented Jan 30, 2013

In core, private::tests::exclusive_unwrap_conflict fails maybe a few times per month. Linux x86_64. I have no further information.

@bblum
Copy link
Contributor

bblum commented Feb 9, 2013

alright, i don't suppose you'd happen to have any logs of the times it failed? I haven't been able to reproduce the test failing (i.e., the test function terminating successfully).

Looking over the unwrap implementation I think it's basically impossible for the conflict test to fail with a localized bug, since it's testing two linked-failure tasks who go through code that looks like this:

if compare_and_swap(&mut ptr.unwrapper, 0, serverp) {
    // I get to unwrap the data.
    ...
} else {
    // Multiple unwrappers is illegal.
    die!(...);
}

so I think it is safe to say that exactly one of the tasks will hit the die!.

What's more, the test case itself has the parent thread (the one the test harness will block on) block on its child using a future_result. Either the parent will hit the die above (in which case the test case will surely pass), or the child will hit it, in which case the child's failure should link to the parent through the parent's recv() call (I checked the linked failure code to make sure taskgroup killing happens before AutoNotify messaging). So it might be just as likely the race is in getting failed out of the recv call.

So far I've experimented adding delays into various parts of the unwrap implementation to see which paths this test actually exercises (the state space is not very large, so it is not a Research Problem :P ), but not seen the bug in any yet. Will continue the search another day.

EDIT: reproduction successful. (I'd forgotten that there was a difference in failure semantics between when the tasks are in the main/root taskgroup and when they're running as a #[test]! Argh.) More later.

@bblum
Copy link
Contributor

bblum commented Feb 9, 2013

Would you believe this is the minimized test case for this bug? No unwrap involved at all.

#[test] #[should_fail]
pub fn test() {
    let mut res = None;
    do task::task().future_result(|+r| res = Some(move r)).spawn {
                die!(~"woops");
    }
    unsafe { libc::sleep(1); }
    let res = option::swap_unwrap(&mut res);
    res.recv();
}

Depending on the sleep, the test will either work or not (i.e., the problem is when a failing task sends to its future_result before the parent even starts to receive on it, the linked failure doesn't take effect). I'm not sure if this is an intended nondeterminism of future_result, but I definitely wrote unwrap assuming it wouldn't happen that way.

anyway one possible way to fix this is to change the last line of the test case res.recv() to assert res.recv() == Success (which doesn't weaken the test case). I will meditate on this more to decide if that's the best way.

@bblum
Copy link
Contributor

bblum commented Feb 9, 2013

OK, what's going on here is linked failure doesn't propagate through a recv() in the special case where the pipe has data and the task doesn't need to block for it. To "fix" this nondeterminism at its source would involve putting a must_fail_from_being_killed()check in the old_state == Full case of pipes::try_recv().

As another example, consider this program, whose main task always succeeds. "Fixing" the nondetermism above would make this program sometimes succeed (if the parent recvs before the child sends) and sometimes fail (if the child sends and fails before the parent recvs).

fn main() {
    let (p,c) = pipes::stream();
    do task::spawn |move c| {
        c.send(());
        assert false;
    }
    p.recv();
}

Introducing the extra killed check wouldn't require taking a lock (see #3213). But, I've always punted on the semantics of when a linked failure killing takes effect, and adding this check feels like a patchwork special-case rather than a principled semantics change. @eholk might have input on the matter?

@eholk
Copy link
Contributor

eholk commented Feb 11, 2013

Adding the must_fail_from_being_killed() check seems reasonable to me. In the case of failure, I think it's fine if things are somewhat nondeterministic, so I don't think it's a problem if the program above sometimes failed. What if we added a sleep before p.recv()? It seems like in that case you'd want this to always fail, unless you turned off link failure.

@brson
Copy link
Contributor Author

brson commented Feb 22, 2013

@bblum Thanks for looking into this. I understand how recv misses its opportunity to fail but not why the 'assert' has any effect.

This is similar to a problem I frequently run into where I expect test cases like this to work (by failing):

#[test] #[should_fail]
fn test() {
  spawn(||fail!());
}

@bblum
Copy link
Contributor

bblum commented Feb 22, 2013

Yeah, those are annoying cases. It might be nice to have a spawn interface that, while letting the parent go off and do its own thing, also makes the parent block on the child (on all such children spawned this way) before actually task-exiting.

This could be done with future_result on top of a generalised atexit interface, or it could work by stashing future_results inside the parent's TCB and having the TCB destructor wait on each one in the destructor (which is essentially a hardcoded version of how atexit would also work).

@bblum bblum closed this as completed in 9dae4cf Feb 25, 2013
@brson brson unassigned bblum Jun 16, 2014
calebcartwright pushed a commit to calebcartwright/rust that referenced this issue Jan 24, 2023
* Backport PR rust-lang#4730 that fix issue rust-lang#4689

* Test files for each Verion One and Two

* Simplify per review comment - use defer and matches!

* Changes per reviewer comments for reducing indentations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows A-testsuite Area: The testsuite used to check the correctness of rustc O-linux Operating system: Linux
Projects
None yet
Development

No branches or pull requests

3 participants