Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spawning 16 (or more) [Isolate]s with [sleep] freeze the application #51254

Closed
alexmercerind opened this issue Feb 5, 2023 · 14 comments
Closed
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-isolate

Comments

@alexmercerind
Copy link

alexmercerind commented Feb 5, 2023

Hi! I'm in a situation where I'm using Isolates to "poll" for events. But as soon as there are more than 16 isolates in Dart SDK or 8 in Flutter, whole application freezes. Please refer to code below.

Expected Behavior

"Alive!" is printed out to the console every second.

Actual Behavior

No output is ever printed out to the console as if application is frozen.
When using the following code in Dart SDK & Flutter:

  1. Dart SDK: "Alive!" is printed as long as [kIsolateSpawnCount] is less than 16.
  2. Flutter: "Alive!" is printed as long as [kIsolateSpawnCount] is less than 8.

Once kIsolateSpawnCount is set to 15 in Dart SDK or 7 in Flutter, "Alive!" is printed every second as expected.

Dart Example
import 'dart:io';
import 'dart:async';
import 'dart:isolate';

/// Expected Behavior
/// -----------------
/// "Alive!" is printed out to the console every second.
///
/// Actual Behavior
/// ---------------
/// No output is ever printed out to the console as if application is frozen.
/// When using the following code in Dart SDK & Flutter:
/// 1. Dart SDK: "Alive!" is printed as long as [kIsolateSpawnCount] is less than 16.
/// 2. Flutter: "Alive!" is printed as long as [kIsolateSpawnCount] is less than 8.
///
/// Once [kIsolateSpawnCount] is set to 15 in Dart SDK or 7 in Flutter, "Alive!" is printed every second as expected.
///

const kIsolateSpawnCount = 16;

void isolate(message) {
  // [sleep]ing one [Isolate] should not affect rest of the [Isolate]s, main [Isolate] or the event loop.
  // Note: [Future.delayed] works as intended.
  sleep(const Duration(days: 1));
}

Future<void> main() async {
  for (int i = 0; i < kIsolateSpawnCount; i++) {
    await Isolate.spawn(isolate, null);
  }
  Timer.periodic(
    const Duration(seconds: 1),
    (e) {
      print('Alive!');
    },
  );
}
Flutter Example
import 'dart:io';
import 'dart:async';
import 'dart:isolate';

import 'package:flutter/material.dart';

/// Expected Behavior
/// -----------------
/// "Alive!" is printed out to the console every second.
///
/// Actual Behavior
/// ---------------
/// No output is ever printed out to the console as if application is frozen.
/// When using the following code in Dart SDK & Flutter:
/// 1. Dart SDK: "Alive!" is printed as long as [kIsolateSpawnCount] is less than 16.
/// 2. Flutter: "Alive!" is printed as long as [kIsolateSpawnCount] is less than 8.
///
/// Once [kIsolateSpawnCount] is set to 15 in Dart SDK or 7 in Flutter, "Alive!" is printed every second as expected.
///

const kIsolateSpawnCount = 8;

void isolate(message) {
  // [sleep]ing one [Isolate] should not affect rest of the [Isolate]s, main [Isolate] or the event loop.
  // Note: [Future.delayed] works as intended.
  sleep(const Duration(days: 1));
}

void main() {
  runApp(const MyApp());
}

class MyApp extends StatefulWidget {
  const MyApp({Key? key}) : super(key: key);

  @override
  State<MyApp> createState() => _MyAppState();
}

class _MyAppState extends State<MyApp> {
  @override
  void initState() {
    super.initState();
    WidgetsBinding.instance.addPostFrameCallback((_) async {
      for (int i = 0; i < kIsolateSpawnCount; i++) {
        await Isolate.spawn(isolate, null);
      }
      Timer.periodic(
        const Duration(seconds: 1),
        (e) {
          debugPrint('Alive!');
        },
      );
    });
  }

  @override
  Widget build(BuildContext context) {
    return const MaterialApp(
      home: HomeScreen(),
    );
  }
}

class HomeScreen extends StatelessWidget {
  const HomeScreen({Key? key}) : super(key: key);

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(
        title: const Text('test'),
      ),
      body: Center(
        child: ElevatedButton(
          onPressed: () {
            Navigator.of(context).push(
              MaterialPageRoute(
                builder: (context) => const HomeScreen(),
              ),
            );
          },
          child: const Text('Go!'),
        ),
      ),
    );
  }
}
dart --version
Dart SDK version: 2.18.6 (stable) (Tue Dec 13 21:15:14 2022 +0000) on "windows_x64"

I assume Flutter spawns it's own isolates. I don't know however, I have limited knowledge of this.


I'm working on package:media_kit & one of the users recently reported that as soon as they create 8 Player instances, everything freezes up. I could reduce down problem to the code I shared above. Still, you can see these lines which run on another Isolate for every instance of Player, where I'm calling mpv_wait_event, passing negative duration means waiting until next event comes up.

Now I'm considering to create a separate shared library which will do the "polling" using C++'s std::thread, but it will be a pain & hacky approach to ship another shared library separately for each platform (while rest of the code is in 100% Dart).
On the other hand, using Isolate already present in Dart SDK was handy & worked correctly.

Any guidance or explanation will be really helpful.
Thanks!

@alexmercerind alexmercerind changed the title [Windows] Spawning 16 (or more) [Isolate]s with [sleep] freeze the application Spawning 16 (or more) [Isolate]s with [sleep] freeze the application Feb 5, 2023
@lrhn lrhn added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-isolate labels Feb 5, 2023
@mraleph
Copy link
Member

mraleph commented Feb 6, 2023

/cc @aam @mkustermann

@mkustermann
Copy link
Member

The issue is well known: We only allow a certain number of threads to be active in an isolate group to avoid issues with the memory subsystem (see #51261 for more context)

There's a workaround to this, which is basically telling the Dart VM that the isolate isn't actively needed, which can currently be done via custom C code:

auto isolate = Dart_CurrentIsolate();
Dart_ExitIsolate();
{
  <... run blocking C Code, e.g. sleep() or mpv_wait_event() ...>
}
Dart_EnterIsolate(isolate);

Filed #51261 for possibly letting the VM do that automatically based on boolean.

@mkustermann
Copy link
Member

@alexmercerind Do you have the option to write custom C code for the few places where you know C code can block for longer time? If so, is your use case standalone dart or flutter (or both)?

(Flutter is hiding all Dart C API symbols - such as Dart_ExitIsolate() - so we may need to add those two functions to our NativeApi.initializeApiDLData & include/dart_api_dl.h)

@alexmercerind
Copy link
Author

alexmercerind commented Feb 6, 2023

Hi @mkustermann!

Thanks a lot for the detailed explanation & information. I'm really glad.

Sure! Flutter recently added a new project template for FFI plugins, where one can build custom C/C++ shared library targets before build. I think I can include these Dart API headers there & export the required C methods for access inside Dart SDK.

Alternatively, I think I can (I believe this will save you people a lot of work as-well since my case seems very specific):

I'm thinking to call mpv_wait_event within C/C++ using std::thread (avoiding to use Isolate) & send the raw address (int) of the mpv_event* using SendPort/ReceivePort mechanism to Dart (where I can interpret it using dart:ffi). This will avoid any block/sleep within Dart. I have used this once before when dealing with #37022.

Infact, libmpv also offers a non-wait API for handling events, but since those event callbacks are invoked from another thread, it caused Dart VM to crash due to #37022,. So, I went with mpv_wait_event + Isolate, but it seems this has it's own limitations too.

Thanks again!

@SynSzakala
Copy link

@mkustermann Is there a possibility to actually use Dart_CurrentIsolate etc. in the native code of an Flutter application? Since, as you mentioned, these functions are not exported it seems that the workaround cannot be applied.

@alexmercerind
Copy link
Author

For now I just created a new FFI plugin package which spawns a std::thread & forwards data to Dart VM through SendPort. Inside original root package, I just load the native library if found otherwise fallback to Isolate based solution.

https://github.com/alexmercerind/media_kit/blob/b5b980e29b2f88f226a58c1c5eda416dcb19b598/media_kit_native_event_loop/src/media_kit_native_event_loop.cc#L43-L46
https://github.com/alexmercerind/media_kit/blob/main/media_kit/lib/src/libmpv/core/initializer.dart

@SynSzakala
Copy link

This is a huge implementation overhead to a seemingly simple feature.

@mkustermann Will you be willing to accept a PR exposing these functions?

@mkustermann
Copy link
Member

@mkustermann Is there a possibility to actually use Dart_CurrentIsolate etc. in the native code of an Flutter application? Since, as you mentioned, these functions are not exported it seems that the workaround cannot be applied.
@mkustermann Will you be willing to accept a PR exposing these functions?

The 3 needed API functions are quite stable and I don't see a big reason not to expose them. Would be happy to accept a PR - though our ffi testing setup has some particularities, so maybe it's easier if we do it: made cl/292722. Will add relevant reviews to see if they have objections.

Will think a bit more about whether we can do it in an automatic way (see #51261) that works nicely.

@mraleph
Copy link
Member

mraleph commented Apr 11, 2023

@SynSzakala We could easily expose these functions, but before that I want to check with you regarding the timelines. If we land this change today it is going to reach master or dev Flutter channel relatively quickly, but it is going to take more than 6 months to reach a stable Flutter release. So we have a choice - to land something quick and dirty which you can only use on a master channel for a while - or spend this time to design a more complete and future-proof solution.

Which Flutter channel do you use?

@SynSzakala
Copy link

@mraleph I'm aware of that, it'll be completely fine for us to migrate to the master channel (or back-port the fix to a fork somehow, if that's possible). It's still less work than adding an additional asynchronous native layer (especially given large API surfaces).
Of course, designing and implementing solution as discussed in #51261 would be very desirable, since that completely removes the need for adding native code. But, from what I understand there's still some discussion on how to do that right.
So, from my perspective it would be reasonable to expose the Dart_*Isolate functions anyway, since the solution discussed in #51261 might not cover every use-case (e.g. some kind of conditional blocking in the native code).

@Piero512
Copy link

Piero512 commented Apr 11, 2023

I agree with @SynSzakala comment, that it is just way easier to write that kind of logic in Dart and TBH I'm into letting more advanced usecases for DL APIs. I currently don't have any issue that this change would resolve, but this change gives FFI users flexibility and it wouldn't already burden people that are currently using stuff like NativePorts.

EDIT: I think something like this dart+cpp code wrapping bonjour bonjour_ffi if the change is applied, could be written using mostly isolates and a small native code library, if the native code can yield the isolate when sync waiting for the FDs/sockets, eliminating the need for me to ship libuv (but then there's the overhead of using too many isolates, but then it would be way less painful to bundle that library into, say, a Flutter FFI plugin) (of course, it could be done entirely in Dart if I could wrap the FDs into RawSockets, but this is issue #46196).

copybara-service bot pushed a commit that referenced this issue Apr 12, 2023
For applications that want to have arbitrary number of isolates call
into native code that may be blocking, we expose the API functions that
allows those native threads to exit an isolate before running
long/blocking code.

Without the ability to exit/re-enter isolate, one may experience
deadlocks as we have a fixed limit on the number of concurrently
executing isolates atm.

In the longer term we may find a way to do this automatically
with low overhead, see [0]. But since those API functions are quite
stable and we already expose e.g. `Dart_{Enter,Exit}Scope`, I don't
see a reason not to expose `Dart_{Enter,Exit}Isolate`.

[0] Issue #51261

Issue #51254

TEST=ffi{,_2}/dl_api_exit_enter_isolate_test

Change-Id: I91c772ca962fddb87919663fea07939a498fa205
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/292722
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Daco Harkes <dacoharkes@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
copybara-service bot pushed a commit that referenced this issue Apr 12, 2023
This reverts commit a251281.

Reason for revert: FFI tests fail to link on Windows, fail to load on product-mode Android

Original change's description:
> [vm] Expose Dart_{CurrentIsolate,ExitIsolate,EnterIsolate}
>
> For applications that want to have arbitrary number of isolates call
> into native code that may be blocking, we expose the API functions that
> allows those native threads to exit an isolate before running
> long/blocking code.
>
> Without the ability to exit/re-enter isolate, one may experience
> deadlocks as we have a fixed limit on the number of concurrently
> executing isolates atm.
>
> In the longer term we may find a way to do this automatically
> with low overhead, see [0]. But since those API functions are quite
> stable and we already expose e.g. `Dart_{Enter,Exit}Scope`, I don't
> see a reason not to expose `Dart_{Enter,Exit}Isolate`.
>
> [0] Issue #51261
>
> Issue #51254
>
> TEST=ffi{,_2}/dl_api_exit_enter_isolate_test
>
> Change-Id: I91c772ca962fddb87919663fea07939a498fa205
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/292722
> Commit-Queue: Martin Kustermann <kustermann@google.com>
> Reviewed-by: Daco Harkes <dacoharkes@google.com>
> Reviewed-by: Ryan Macnak <rmacnak@google.com>

Change-Id: I05ad5b9ce24754a68693160e470f8eb71a812c75
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/294860
Auto-Submit: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
copybara-service bot pushed a commit that referenced this issue Apr 13, 2023
For applications that want to have arbitrary number of isolates call
into native code that may be blocking, we expose the API functions that
allows those native threads to exit an isolate before running
long/blocking code.

Without the ability to exit/re-enter isolate, one may experience
deadlocks as we have a fixed limit on the number of concurrently
executing isolates atm.

In the longer term we may find a way to do this automatically
with low overhead, see [0]. But since those API functions are quite
stable and we already expose e.g. `Dart_{Enter,Exit}Scope`, I don't
see a reason not to expose `Dart_{Enter,Exit}Isolate`.

Difference to original CL:

  Do use STL synchronization primitives (as the ones in runtime/bin
  are not always available in shared libraries)


[0] Issue #51261

Issue #51254

TEST=ffi{,_2}/dl_api_exit_enter_isolate_test

Change-Id: Id817e8d4edb3db35f029248d62388cbd0682001d
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/294980
Reviewed-by: Daco Harkes <dacoharkes@google.com>
@mkustermann
Copy link
Member

This has landed now on main branch in the Dart SDK. Once it rolls up to flutter engine and flutter framework main branches, it will be available to use.

We'll continue the discussion on how to do this better / automatically in #51261 (to avoid the custom C code that exits + enters isolate).

@azkadev
Copy link

azkadev commented May 12, 2024

I also experienced this, is there anything the same?
has it been fixed?

@mkustermann
Copy link
Member

I also experienced this, is there anything the same?
has it been fixed?

@azkadev Generally if the isolate yield back to the event loop there should be no freezing of the application. This is mainly a problem if one e.g. calls out to C code in many isolates and that C code blocks. For this scenario one can use a new C API to exit the current isolate and re-enter when the blocking C call has finished. See f9eb97f (You can use dart:ffis NativeApi.initializeApiDLData pass it to C code, which can include dart_api_dl.h from the Dart SDK (can make a copy of it) initialize it and then call the 2 new API functions)

copybara-service bot pushed a commit that referenced this issue Dec 5, 2024
This prevents such an isolate from occupying one of the limited number of mutator slots and blocking other isolates in the same group from running.

TEST=ci
Bug: #51254
Bug: #54687
Bug: #57119
Change-Id: Ic04bbaa7f482d533ad0ecf2c6da17ea9f00c264e
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/398927
Reviewed-by: Alexander Aprelev <aam@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-isolate
Projects
None yet
Development

No branches or pull requests

7 participants