Failing future_result nondeterminism violates exclusive_unwrap_conflict's assumptions #4689

brson · 2013-01-30T09:08:11Z

In core, private::tests::exclusive_unwrap_conflict fails maybe a few times per month. Linux x86_64. I have no further information.

The text was updated successfully, but these errors were encountered:

bblum · 2013-02-09T05:56:48Z

alright, i don't suppose you'd happen to have any logs of the times it failed? I haven't been able to reproduce the test failing (i.e., the test function terminating successfully).

Looking over the unwrap implementation I think it's basically impossible for the conflict test to fail with a localized bug, since it's testing two linked-failure tasks who go through code that looks like this:

if compare_and_swap(&mut ptr.unwrapper, 0, serverp) {
    // I get to unwrap the data.
    ...
} else {
    // Multiple unwrappers is illegal.
    die!(...);
}

so I think it is safe to say that exactly one of the tasks will hit the die!.

What's more, the test case itself has the parent thread (the one the test harness will block on) block on its child using a future_result. Either the parent will hit the die above (in which case the test case will surely pass), or the child will hit it, in which case the child's failure should link to the parent through the parent's recv() call (I checked the linked failure code to make sure taskgroup killing happens before AutoNotify messaging). So it might be just as likely the race is in getting failed out of the recv call.

So far I've experimented adding delays into various parts of the unwrap implementation to see which paths this test actually exercises (the state space is not very large, so it is not a Research Problem :P ), but not seen the bug in any yet. Will continue the search another day.

EDIT: reproduction successful. (I'd forgotten that there was a difference in failure semantics between when the tasks are in the main/root taskgroup and when they're running as a #[test]! Argh.) More later.

bblum · 2013-02-09T07:27:19Z

Would you believe this is the minimized test case for this bug? No unwrap involved at all.

#[test] #[should_fail]
pub fn test() {
    let mut res = None;
    do task::task().future_result(|+r| res = Some(move r)).spawn {
                die!(~"woops");
    }
    unsafe { libc::sleep(1); }
    let res = option::swap_unwrap(&mut res);
    res.recv();
}

Depending on the sleep, the test will either work or not (i.e., the problem is when a failing task sends to its future_result before the parent even starts to receive on it, the linked failure doesn't take effect). I'm not sure if this is an intended nondeterminism of future_result, but I definitely wrote unwrap assuming it wouldn't happen that way.

anyway one possible way to fix this is to change the last line of the test case res.recv() to assert res.recv() == Success (which doesn't weaken the test case). I will meditate on this more to decide if that's the best way.

bblum · 2013-02-09T17:12:09Z

OK, what's going on here is linked failure doesn't propagate through a recv() in the special case where the pipe has data and the task doesn't need to block for it. To "fix" this nondeterminism at its source would involve putting a must_fail_from_being_killed()check in the old_state == Full case of pipes::try_recv().

As another example, consider this program, whose main task always succeeds. "Fixing" the nondetermism above would make this program sometimes succeed (if the parent recvs before the child sends) and sometimes fail (if the child sends and fails before the parent recvs).

fn main() {
    let (p,c) = pipes::stream();
    do task::spawn |move c| {
        c.send(());
        assert false;
    }
    p.recv();
}

Introducing the extra killed check wouldn't require taking a lock (see #3213). But, I've always punted on the semantics of when a linked failure killing takes effect, and adding this check feels like a patchwork special-case rather than a principled semantics change. @eholk might have input on the matter?

eholk · 2013-02-11T17:18:53Z

Adding the must_fail_from_being_killed() check seems reasonable to me. In the case of failure, I think it's fine if things are somewhat nondeterministic, so I don't think it's a problem if the program above sometimes failed. What if we added a sleep before p.recv()? It seems like in that case you'd want this to always fail, unless you turned off link failure.

brson · 2013-02-22T21:12:53Z

@bblum Thanks for looking into this. I understand how recv misses its opportunity to fail but not why the 'assert' has any effect.

This is similar to a problem I frequently run into where I expect test cases like this to work (by failing):

#[test] #[should_fail]
fn test() {
  spawn(||fail!());
}

bblum · 2013-02-22T21:30:10Z

Yeah, those are annoying cases. It might be nice to have a spawn interface that, while letting the parent go off and do its own thing, also makes the parent block on the child (on all such children spawned this way) before actually task-exiting.

This could be done with future_result on top of a generalised atexit interface, or it could work by stashing future_results inside the parent's TCB and having the TCB destructor wait on each one in the destructor (which is essentially a hardcoded version of how atexit would also work).

* Backport PR rust-lang#4730 that fix issue rust-lang#4689 * Test files for each Verion One and Two * Simplify per review comment - use defer and matches! * Changes per reviewer comments for reducing indentations

brson mentioned this issue Feb 4, 2013

Remove unwrap_shared_mutable_state #4436

Closed

ghost assigned bblum Feb 4, 2013

brson added a commit to brson/rust that referenced this issue Feb 4, 2013

Ignore test exclusive_unwrap_conflict. rust-lang#4689

8dfab49

bblum closed this as completed in 9dae4cf Feb 25, 2013

brson unassigned bblum Jun 16, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing future_result nondeterminism violates exclusive_unwrap_conflict's assumptions #4689

Failing future_result nondeterminism violates exclusive_unwrap_conflict's assumptions #4689

brson commented Jan 30, 2013

bblum commented Feb 9, 2013

bblum commented Feb 9, 2013

bblum commented Feb 9, 2013

eholk commented Feb 11, 2013

brson commented Feb 22, 2013

bblum commented Feb 22, 2013

Failing future_result nondeterminism violates exclusive_unwrap_conflict's assumptions #4689

Failing future_result nondeterminism violates exclusive_unwrap_conflict's assumptions #4689

Comments

brson commented Jan 30, 2013

bblum commented Feb 9, 2013

bblum commented Feb 9, 2013

bblum commented Feb 9, 2013

eholk commented Feb 11, 2013

brson commented Feb 22, 2013

bblum commented Feb 22, 2013