Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky shutdown #2312

Merged
merged 8 commits into from
Aug 21, 2024
Merged

Fix flaky shutdown #2312

merged 8 commits into from
Aug 21, 2024

Conversation

ntkme
Copy link
Contributor

@ntkme ntkme commented Aug 17, 2024

This PR fixes a few shutdown reliability issues including the following flaky test on GitHub Actions windows runner:

❌ test\embedded\importer_test.dart: emits a protocol error for a response without a corresponding request ID (failed)
  
  Process `dart_sass_embedded` was killed with SIGKILL in a tear-down. Output:
      canonicalizeRequest: {
        id: 0
        importerId: 1
        url: other
        fromImport: true
      }
      
  [e] Host caused params error: Response ID 1 doesn't match any outstanding requests in compilation 4321.
      error: {
        type: PARAMS
        id: 4294967295
        message: Response ID 1 doesn't match any outstanding requests in compilation 4321.
      }
  TimeoutException after 0:00:30.000000: Test timed out after 30 seconds. See https://pub.dev/packages/test#timeouts
  dart:isolate  _RawReceivePort._handleMessage

@ntkme ntkme marked this pull request as draft August 17, 2024 07:41
@ntkme ntkme marked this pull request as ready for review August 17, 2024 07:57
@ntkme ntkme marked this pull request as draft August 17, 2024 16:47
@ntkme ntkme force-pushed the fix-isolate-kill branch 2 times, most recently from 564ea71 to ce11d92 Compare August 17, 2024 17:36
@ntkme ntkme marked this pull request as ready for review August 17, 2024 18:05
@ntkme ntkme force-pushed the fix-isolate-kill branch 12 times, most recently from c90fb13 to 6438623 Compare August 18, 2024 05:51
@ntkme ntkme marked this pull request as draft August 18, 2024 16:58
@ntkme ntkme changed the title Fix flaky isolate shutdown Fix flaky shutdown Aug 18, 2024
@ntkme ntkme marked this pull request as ready for review August 18, 2024 19:52
@ntkme ntkme force-pushed the fix-isolate-kill branch 2 times, most recently from e169238 to aa15c20 Compare August 19, 2024 01:26
@ntkme ntkme changed the title Fix flaky shutdown Reduce flaky shutdown Aug 19, 2024
@ntkme ntkme marked this pull request as draft August 19, 2024 04:31
@ntkme ntkme marked this pull request as ready for review August 19, 2024 04:45
@ntkme ntkme changed the title Reduce flaky shutdown Fix flaky shutdown Aug 19, 2024
lib/src/embedded/compilation_dispatcher.dart Outdated Show resolved Hide resolved
lib/src/embedded/compilation_dispatcher.dart Outdated Show resolved Hide resolved
lib/src/embedded/isolate_dispatcher.dart Outdated Show resolved Hide resolved
lib/src/embedded/isolate_dispatcher.dart Outdated Show resolved Hide resolved
lib/src/embedded/isolate_dispatcher.dart Outdated Show resolved Hide resolved
@@ -105,25 +116,41 @@ class IsolateDispatcher {
} else {
var future = ReusableIsolate.spawn(_isolateMain);
isolate = await future;
isolate.receivePort.listen((message) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to pull this logic out here, rather than keeping it in ReusableIsolate?

Copy link
Contributor Author

@ntkme ntkme Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From previous implementation:

        // Parse out the compilation ID and surface the [ProtocolError] as an
        // error. This allows the [IsolateDispatcher] to notice that an error
        // has occurred and close out the underlying channel.

        // Protocol errors have already been through [_handleError] in the child
        // isolate, so we just send them as-is and close out the underlying
        // channel.

First, the motivation to remove the intermediate StreamController is to eliminate two unnecessary performance overheads:

  1. To eliminate the overhead of deserializing ProtocolError packet, passing it through addError, and re-serializing it back to packet. - We don't actually care about the ProtocolError object, all we care is that the category 2 indicate the packet contains a ProtocolError.
  2. To eliminate the overhead of allocating new StreamController for every single compilation.

In regarding of why the receivePort.listen is moved to IsolateDispatcher, IMO logically how the message is sent across Isolates is a contract between IsolateDispatcher and CompilationDispatcher, ReusableIsolate should purely be a passthrough helper just to help with creating and managing isolate for repeated use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that putting the decoding logic here makes the most sense. But I'm not a big fan of giving the caller responsibility for maintaining a single stream subscription across multiple borrows, and I'm not very worried about the overhead of an extra StreamController. Can we make .borrow() return a subscription that just forwards a view of the Stream<object?> which closes automatically once .return() is called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not precisely a new Stream each time, but I made .borrow take a new onData function each time, which behaves somewhat like the previous implementation but without creating new StreamController.

lib/src/embedded/reusable_isolate.dart Outdated Show resolved Hide resolved
@ntkme ntkme force-pushed the fix-isolate-kill branch 2 times, most recently from 7def05e to d95b74c Compare August 19, 2024 21:38
@ntkme ntkme requested a review from nex3 August 19, 2024 21:59
try {
return _mailbox.take();
} on StateError catch (_) {
// [_mailbox] has been closed, exit the current isolate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could still use some detail on why exiting immediately rather than waiting for futures to wind down is valuable. Think of it from the perspective of someone reading this without any context on the flakiness that used to occur.

@@ -105,25 +116,41 @@ class IsolateDispatcher {
} else {
var future = ReusableIsolate.spawn(_isolateMain);
isolate = await future;
isolate.receivePort.listen((message) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that putting the decoding logic here makes the most sense. But I'm not a big fan of giving the caller responsibility for maintaining a single stream subscription across multiple borrows, and I'm not very worried about the overhead of an extra StreamController. Can we make .borrow() return a subscription that just forwards a view of the Stream<object?> which closes automatically once .return() is called?

@ntkme ntkme force-pushed the fix-isolate-kill branch 2 times, most recently from 950316a to ea383c7 Compare August 20, 2024 01:46
@ntkme ntkme requested a review from nex3 August 20, 2024 02:34
@ntkme ntkme force-pushed the fix-isolate-kill branch 6 times, most recently from 1fc7361 to eb5afb1 Compare August 20, 2024 20:24
@nex3
Copy link
Contributor

nex3 commented Aug 21, 2024

Thanks for working on this and helping reduce flakiness!

@nex3 nex3 merged commit a236460 into sass:main Aug 21, 2024
34 checks passed
@ntkme ntkme deleted the fix-isolate-kill branch August 22, 2024 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants