Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No CancellationException thrown when join on a crashed Job #1123

Closed
bennyhuo opened this issue Apr 21, 2019 · 5 comments
Closed

No CancellationException thrown when join on a crashed Job #1123

bennyhuo opened this issue Apr 21, 2019 · 5 comments
Assignees
Labels
Milestone

Comments

@bennyhuo
Copy link

bennyhuo commented Apr 21, 2019

The job2 will throw an ArithmaticException thus making the parent job cancelled. Normally we will get a CancellationException when calling the job2.join() but there is otherwise.

suspend fun main() = runBlocking {
    log(1)
    val job = launch(Dispatchers.Unconfined) {
        log(2)
        val job2 = launch(Dispatchers.Default) {
            log(3)
            val x = 1 / 0
            log(4)
        }
        job2.join()
        log(5)
    }
    log(6)
    job.join()
    log(7)
}

val dateFormat = SimpleDateFormat("HH:mm:ss:SSS")

val now = {
    dateFormat.format(Date(System.currentTimeMillis()))
}

fun log(msg: Any?) = println("${now()} [${Thread.currentThread().name}] $msg")

The result may be:

09:35:15:065 [main @coroutine#1] 1
09:35:15:074 [main @coroutine#2] 2
09:35:15:082 [DefaultDispatcher-worker-1 @coroutine#3] 3
09:35:15:104 [main @coroutine#2] 5
Exception in thread "main" java.lang.ArithmeticException: / by zero
	at com.bennyhuo.coroutines.sample2.exceptions.E2Kt$main$2$job$1$job2$1.invokeSuspend(e2.kt:12)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:238)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:742)
09:35:15:105 [main @coroutine#1] 6

When falling into the slow-path, in other words, the joinSuspend path:

public final override suspend fun join() {
    if (!joinInternal()) { // fast-path no wait
        coroutineContext.checkCompletion()
        return // do not suspend
    }
    return joinSuspend() // slow-path wait
}

it is gonna check the job result again here:

public suspend inline fun <T> suspendCancellableCoroutine(
    crossinline block: (CancellableContinuation<T>) -> Unit
): T =
    suspendCoroutineUninterceptedOrReturn { uCont ->
        val cancellable = CancellableContinuationImpl(uCont.intercepted(), resumeMode = MODE_CANCELLABLE)
        // NOTE: Before version 1.1.0 the following invocation was inlined here, so invocation of this
        // method indicates that the code was compiled by kotlinx.coroutines < 1.1.0
        // cancellable.initCancellability()
        block(cancellable)
        cancellable.getResult() // HERE!!!!!
    }

If the job just completed, the result will be returned.

In our case, job2 completed with a Exception, but the state of the job will turn into Unit for a ResumeOnCompletion is installed when calling joinSuspend and the ResumeOnCompletion will be invoked immediately if the result is ready.

private suspend fun joinSuspend() = suspendCancellableCoroutine<Unit> { cont ->
    // We have to invoke join() handler only on cancellation, on completion we will be resumed regularly without handlers
    cont.disposeOnCancellation(invokeOnCompletion(handler = ResumeOnCompletion(this, cont).asHandler))
}
@bennyhuo
Copy link
Author

I think the problem may be the regardless of state when ResumeOnCompletion invoked:

private class ResumeOnCompletion(
    job: Job,
    private val continuation: Continuation<Unit>
) : JobNode<Job>(job)  {
    override fun invoke(cause: Throwable?) = continuation.resume(Unit) // why ignoring the job state?
    override fun toString() = "ResumeOnCompletion[$continuation]"
}

@elizarov
Copy link
Contributor

Good catch. There is indeed a race in the slow path that may result in job.join() returning normally instead of throwing a CancellationException.

@elizarov elizarov added the bug label Apr 21, 2019
@elizarov elizarov self-assigned this Apr 21, 2019
@bennyhuo
Copy link
Author

#1124 I have been trying to fix this by working on the ResumeOnCompletion, and it seems to be working properly.

private class ResumeOnCompletion(
    job: JobSupport,
    private val continuation: Continuation<Unit>
) : JobNode<JobSupport>(job)  {
    override fun invoke(cause: Throwable?) {
        val state = job.state
        check(state !is Incomplete)
        if (state is CompletedExceptionally) {
            val parentJob = continuation.context[Job]
            // Only if the parent job is cancelled.
            if (parentJob != null && !parentJob.isActive) {
                // Resume with the CancellationException as designed.
                continuation.resumeWithExceptionMode(job.getCancellationException(), MODE_ATOMIC_DEFAULT)
                return
            }
        }
        continuation.resume(Unit)
    }
    override fun toString() = "ResumeOnCompletion[$continuation]"
}

elizarov added a commit that referenced this issue Apr 21, 2019
The race happens in the slow-path of 'join' implementation when
parent invokes join on a child coroutines that crashes and cancels
the parent.

Fixes #1123
@elizarov
Copy link
Contributor

Unfortunately, the fix in #1124 is not correct. It throws cancellation exception from the job that is being joined, but, instead, the cancellation exception of the job that is joining should be thrown. See #1127 for a fix.

@bennyhuo
Copy link
Author

Thanks for your reply!

elizarov added a commit that referenced this issue Apr 21, 2019
The race happens in the slow-path of 'join' implementation when
parent invokes join on a child coroutines that crashes and cancels
the parent.

Fixes #1123
elizarov added a commit that referenced this issue Apr 21, 2019
The race happens in the slow-path of 'join' implementation when
parent invokes join on a child coroutines that crashes and cancels
the parent.

Fixes #1123
elizarov added a commit that referenced this issue Apr 21, 2019
The race happens in the slow-path of 'join' implementation when
parent invokes join on a child coroutines that crashes and cancels
the parent.

Fixes #1123
@elizarov elizarov added this to the 1.2.1 milestone Apr 21, 2019
elizarov added a commit that referenced this issue Apr 21, 2019
The race happens in the slow-path of 'join' implementation when
parent invokes join on a child coroutines that crashes and cancels
the parent.

Fixes #1123
elizarov added a commit that referenced this issue Apr 21, 2019
The race happens in the slow-path of 'join' implementation when
parent invokes join on a child coroutines that crashes and cancels
the parent.

Fixes #1123
elizarov added a commit that referenced this issue Apr 24, 2019
The race happens in the slow-path of 'join' implementation when
parent invokes join on a child coroutines that crashes and cancels
the parent.

Fixes #1123
elizarov added a commit that referenced this issue Jan 28, 2020
…CancellableCoroutine

* MODE_ATOMIC_DEFAULT split into MODE_ATOMIC (for dispatch) and MODE_ATOMIC_REUSABLE (for suspendCancellableCoroutineReusable only).
  Dispatch modes are orthogonal to additional REUSE capability now.
* Better documentation for MODE_XXX constants.
* suspendCancellableCoroutineReusable does not have a default mode anymore, so its use is more explicit.
* Completely drop (inline) suspendAtomicCancellableCoroutine. Any kind of legacy code where this call might have been inlined still works because the constant value of MODE_ATOMIC = 0 is retained and carries its legacy meaning (no continuation reuse).
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE and fixed test for #1123 bug with job.join (working in MODE_CANCELLABLE) that was not properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
elizarov added a commit that referenced this issue Feb 17, 2020
…CancellableCoroutine

* MODE_ATOMIC_DEFAULT split into MODE_ATOMIC (for dispatch) and MODE_ATOMIC_REUSABLE (for suspendCancellableCoroutineReusable only).
  Dispatch modes are orthogonal to additional REUSE capability now.
* Better documentation for MODE_XXX constants.
* suspendCancellableCoroutineReusable does not have a default mode anymore, so its use is more explicit.
* Completely drop (inline) suspendAtomicCancellableCoroutine. Any kind of legacy code where this call might have been inlined still works because the constant value of MODE_ATOMIC = 0 is retained and carries its legacy meaning (no continuation reuse).
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE and fixed test for #1123 bug with job.join (working in MODE_CANCELLABLE) that was not properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
elizarov added a commit that referenced this issue Apr 8, 2020
…CancellableCoroutine

* MODE_ATOMIC_DEFAULT split into MODE_ATOMIC (for dispatch) and MODE_ATOMIC_REUSABLE (for suspendCancellableCoroutineReusable only).
  Dispatch modes are orthogonal to additional REUSE capability now.
* Better documentation for MODE_XXX constants.
* suspendCancellableCoroutineReusable does not have a default mode anymore, so its use is more explicit.
* Completely drop (inline) suspendAtomicCancellableCoroutine. Any kind of legacy code where this call might have been inlined still works because the constant value of MODE_ATOMIC = 0 is retained and carries its legacy meaning (no continuation reuse).
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE and fixed test for #1123 bug with job.join (working in MODE_CANCELLABLE) that was not properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
elizarov added a commit that referenced this issue Apr 21, 2020
This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for #1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic

Fixes #1265
Fixes #1813
Fixes #1915
elizarov added a commit that referenced this issue Apr 22, 2020
This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for #1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic

Fixes #1265
Fixes #1813
Fixes #1915
elizarov added a commit that referenced this issue Apr 22, 2020
This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for #1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic

Fixes #1265
Fixes #1813
Fixes #1915
elizarov added a commit that referenced this issue Apr 22, 2020
This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for #1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic

Fixes #1265
Fixes #1813
Fixes #1915
elizarov added a commit that referenced this issue Jun 16, 2020
This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for #1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic

Fixes #1265
Fixes #1813
Fixes #1915
elizarov added a commit that referenced this issue Sep 16, 2020
This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for #1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic

Fixes #1265
Fixes #1813
Fixes #1915
elizarov added a commit that referenced this issue Oct 12, 2020
…1937)

This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for #1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic
* Channel onUnderliveredElement is introduced as a replacement.

Fixes #1265
Fixes #1813
Fixes #1915
Fixes #1936

Co-authored-by: Louis CAD <louis.cognault@gmail.com>
Co-authored-by: Vsevolod Tolstopyatov <qwwdfsad@gmail.com>
recheej pushed a commit to recheej/kotlinx.coroutines that referenced this issue Dec 28, 2020
…otlin#1937)

This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for Kotlin#1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic
* Channel onUnderliveredElement is introduced as a replacement.

Fixes Kotlin#1265
Fixes Kotlin#1813
Fixes Kotlin#1915
Fixes Kotlin#1936

Co-authored-by: Louis CAD <louis.cognault@gmail.com>
Co-authored-by: Vsevolod Tolstopyatov <qwwdfsad@gmail.com>
recheej pushed a commit to recheej/kotlinx.coroutines that referenced this issue Dec 28, 2020
…otlin#1937)

This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for Kotlin#1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic
* Channel onUnderliveredElement is introduced as a replacement.

Fixes Kotlin#1265
Fixes Kotlin#1813
Fixes Kotlin#1915
Fixes Kotlin#1936

Co-authored-by: Louis CAD <louis.cognault@gmail.com>
Co-authored-by: Vsevolod Tolstopyatov <qwwdfsad@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants