-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StackOverflow when multiple Managed Resources are being cleaned up at the same time #10211
Comments
As mentioned on Discord I think the problem is that the finalizer of a resource runs Enso code, which polls safepoints. Then in that safepoint, another finalizer is scheduled to be run. If there's lots of pending finalizers scheduled to run, each runs inside of another, creating a cascade of finalizers running on top of one another, pumping up the stack a lot and causing the overflow. Instead, we should ensure that only one finalizer shall run at once. The code of the finalizer should probably still be polling safepoints (for all the other purposes), but as long as a finalizer is entered, no other finalizer should start inside of it - instead it should be enqueued and run once the first finalizer finishes. |
Instead of the log above, I'm also sometimes getting the following error:
|
Possibly related StackOverflowError in Table_Tests in https://github.com/enso-org/enso/actions/runs/9451385966/job/26032192518?pr=10192#step:7:1712. I could not reproduce that one locally. |
The code in question originates from |
Here is a patch that fixes the program that demonstrates the inquiry: diff --git engine/runtime/src/main/java/org/enso/interpreter/runtime/ResourceManager.java engine/runtime/src/main/java/org/enso/interpreter/runtime/ResourceManager.java
index a2304b06d1..24b7f0a4f5 100644
--- engine/runtime/src/main/java/org/enso/interpreter/runtime/ResourceManager.java
+++ engine/runtime/src/main/java/org/enso/interpreter/runtime/ResourceManager.java
@@ -6,20 +6,20 @@ import com.oracle.truffle.api.interop.InteropLibrary;
import java.lang.ref.PhantomReference;
import java.lang.ref.Reference;
import java.lang.ref.ReferenceQueue;
+import java.util.ArrayList;
+import java.util.List;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
-import java.util.concurrent.Future;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicInteger;
-import java.util.concurrent.atomic.AtomicReference;
import org.enso.interpreter.runtime.data.ManagedResource;
/** Allows the context to attach garbage collection hooks on the removal of certain objects. */
-public class ResourceManager {
+public final class ResourceManager {
private final EnsoContext context;
private volatile boolean isClosed = false;
private volatile Thread workerThread;
- private final Runner worker = new Runner();
+ private final ProcessItems worker = new ProcessItems(false, false, false);
private final ReferenceQueue<ManagedResource> referenceQueue = new ReferenceQueue<>();
private final ConcurrentMap<PhantomReference<ManagedResource>, Item> items =
new ConcurrentHashMap<>();
@@ -98,21 +98,8 @@ public class ResourceManager {
// no further attempts are made.
boolean continueFinalizing = it.isFlaggedForFinalization().compareAndSet(true, false);
if (continueFinalizing) {
- var futureToCancel = new AtomicReference<Future<Void>>(null);
- var performFinalizeNow =
- new ThreadLocalAction(false, false, true) {
- @Override
- protected void perform(ThreadLocalAction.Access access) {
- var tmp = futureToCancel.getAndSet(null);
- if (tmp == null) {
- return;
- }
- tmp.cancel(false);
- it.finalizeNow(context);
- items.remove(it);
- }
- };
- futureToCancel.set(context.submitThreadLocal(null, performFinalizeNow));
+ it.finalizeNow(context);
+ items.remove(it);
}
}
}
@@ -174,12 +161,57 @@ public class ResourceManager {
}
/**
- * The worker action for the underlying logic of this module. At least one such thread must be
- * spawned in order for this module to be operational.
+ * Processes {@link Item}s eligible for GC. Plays two roles. First of
+ * all cleans {@link #referenceQueue} in {@link #run()} method running in
+ * its own thread. Then it invokes finalizers in {@link #perform} method
+ * inside of Enso execution context.
*/
- private class Runner implements Runnable {
+ private final class ProcessItems extends ThreadLocalAction implements Runnable {
+ /** @GuardedBy("pendingItems") */
+ private final List<Item> pendingItems = new ArrayList<>();
private volatile boolean killed = false;
+ ProcessItems(boolean hasSideEffects, boolean synchronous, boolean recurring) {
+ super(hasSideEffects, synchronous, recurring);
+ }
+
+ /**
+ * Runs at a safe point in middle of regular Enso program execution.
+ * Gathers all available {@link #pendingItems} and runs their finalizers.
+ * Removes all processed items from {@link #pendingItems}. If there are
+ * any remaining, continues processing them. Otherwise finishes.
+ *
+ * @param access not used for anything
+ */
+ @Override
+ protected void perform(ThreadLocalAction.Access access) {
+ for (;;) {
+ Item[] toProcess;
+ synchronized (pendingItems) {
+ toProcess = pendingItems.toArray(Item[]::new);
+ }
+ try {
+ for (var it : toProcess) {
+ scheduleFinalizationAtSafepoint(it);
+ }
+ } finally {
+ synchronized (pendingItems) {
+ pendingItems.subList(0, toProcess.length).clear();
+ if (pendingItems.isEmpty()) {
+ return;
+ }
+ // continue processing meanwhile added pendingItems
+ }
+ }
+ }
+ }
+
+ /**
+ * Running in its own thread. Waiting for {@link #referenceQueue} to
+ * be populated with GCed items. Scheduling {@link #perform}
+ * action at safe points while passing the {@link Item}s to it via
+ * {@link #pendingItems}.
+ */
@Override
public void run() {
while (true) {
@@ -188,7 +220,12 @@ public class ResourceManager {
if (!killed) {
if (ref instanceof Item it) {
it.isFlaggedForFinalization().set(true);
- scheduleFinalizationAtSafepoint(it);
+ synchronized (pendingItems) {
+ if (pendingItems.isEmpty()) {
+ context.submitThreadLocal(null, this);
+ }
+ pendingItems.add(it);
+ }
}
}
if (killed) { with this change the above program finishes without errors even if |
Jaroslav Tulach reports a new STANDUP for yesterday (2024-07-19): Progress: - analyzing and merging #10593 (comment)
|
Jaroslav Tulach reports a new STANDUP for yesterday (2024-07-22): Progress: - Analyzing JNA usage: #10440 (comment)
|
Jaroslav Tulach reports a new STANDUP for yesterday (2024-07-23): Progress: - planning & discussions It should be finished by 2024-07-23. |
Try running the following script:
With
n = 10
it will happily allocate and then clean up resources:Now, try changing
n
to10000
:and running it again.
I'm consistently getting a StackOverflow failure:
The text was updated successfully, but these errors were encountered: