Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YogaNode Finalization is not thread safe #1271

Closed
JuliaSullivanGoogle opened this issue Apr 28, 2023 · 14 comments
Closed

YogaNode Finalization is not thread safe #1271

JuliaSullivanGoogle opened this issue Apr 28, 2023 · 14 comments

Comments

@JuliaSullivanGoogle
Copy link

Report

Issues and Steps to Reproduce

Create many YogaNodes. Finalize them on separate threads. If two try to free at the same exact time, it will fail because the finalization code is not thread safe.

Expected Behavior

To remove nodes from the vector in a thread safe way to avoid a crash in the JNI.

Actual Behavior

It crashed during finalization.

Link to Code

The code that isn't thread safe in the JNI: https://github.com/facebook/yoga/blob/v1.19.0/yoga/Yoga.cpp#L249
The code that isn't thread safe in the Java library: https://github.com/facebook/yoga/blob/v1.19.0/java/com/facebook/yoga/YogaNodeJNIFinalizer.java#L28

We were thinking this might need to be synchronized per tree, so we'd need to get the root of the tree?

https://github.com/facebook/yoga/blob/v1.19.0/java/com/facebook/yoga/YogaNode.java#L42

@JuliaSullivanGoogle
Copy link
Author

Also, if you try to synchronize it by synchronizing it locally in each finalizer, it will not work. It has to be a global lock around all instances of YogaNodes or else it will still try to access the JNI finalization code at the same time and crash.

@NickGerleman
Copy link
Contributor

That's a nasty issue. From my reading, finalizes runs on a dedicated GC thread, so it seems like any GC of Yoga nodes is unsafe.

Apart from a global lock, I am first wondering that if YGNodeFree didn't mutate the tree, and really just freed the node, if we would be safe from this. This would be breaking for anything relying on YGNodeFree to remove the node from the tree as well (maybe can split into something like YGNodeRemoveAndFree?).

The JNI bindings do already use the node context API to attach some Java specific information. So another option might be to keep information to allow redispatching destruction to the thread which allocated the node.

Need to look closer at all the options here. Ideally we could avoid adding overhead.

Also FYI @jkoritzinsky since this would impact the SafeHandle based finalizers in the C# port.

@JuliaSullivanGoogle
Copy link
Author

Is there anything we can do on our side in the meantime? Do you have an idea of how long it would take to address/fix this and what priority it would be?

@JuliaSullivanGoogle
Copy link
Author

QQ, does a lock on freeing a node here lock on all YogaNodes across all trees or does it just lock the one tree?

@NickGerleman
Copy link
Contributor

NickGerleman commented May 2, 2023

I took a closer look at how memory is managed in the Java bindings, and would like to know more about how you are seeing concurrent modification happening on different threads.

The Java bindings don’t offer a way to explicitly free a node, so the finalizer is always relied upon. A node on the Java side has references to both parent and children Java nodes, so if a Java Node is finalized, it is guaranteed that the native parent and children nodes also belong to dead Java nodes.

Reading through your issue report again, are there multiple threads doing GC at once? So, multiple finalizers running on the same tree? Or are the edits happening some other way?

If we know the entire connected tree is dead, we could probably make freeing an individual node thread safe by removing the steps to detach the node from the tree. I.e. YGNodeFree would just free() without interfering with any other connected nodes, and we know nothing will access the dangling pointers because everything connected is dead and being finalized.

@JuliaSullivanGoogle
Copy link
Author

I can confirm this is definitely multiple threads GCing at once. These are some logs I printed out from System.err. These are as follows:

ThreadId, mNativePointer, Root node's mNativePointer

Starting Deletion: 9 139901460002576 139901460004720
Starting Deletion: 40 139901460001936 139901460004720
Finishing Deletion: 9 0 139901460004720
Finishing Deletion: 40 0 139901460004720

I got the Root node's mNativePointer by doing this:

YogaNodeJNIBase parent = getOwner();
while (parent.getOwner() != null) {
    parent = parent.getOwner();
}

These logs start inside the if statement of the freeNatives function and FinishDeletion is at the end of the if statement. The logs above show that there are two threads (9 and 40) and two different nodes (139901460002576, 139901460001936) are being deleted which are from the same parent (139901460004720)

In these logs we got lucky, but in crashes it fails.

@NickGerleman
Copy link
Contributor

If you are able to locally validate a fix, would you be able to test the below patch?

YOGA_EXPORT void YGNodeFree(const YGNodeRef node) {
-  if (YGNodeRef owner = node->getOwner()) {
-    owner->removeChild(node);
-    node->setOwner(nullptr);
-  }
-
-  const uint32_t childCount = YGNodeGetChildCount(node);
-  for (uint32_t i = 0; i < childCount; i++) {
-    const YGNodeRef child = YGNodeGetChild(node, i);
-    child->setOwner(nullptr);
-  }
-
-  node->clearChildren();
  Event::publish<Event::NodeDeallocation>(node, {node->getConfig()});
  delete node;
}

That change cannot be landed as-is, but if it resolves the issue when using the finalizer there should be a straightforward path to a change which can be merged.

@JuliaSullivanGoogle
Copy link
Author

This seems to work! I ran it a few thousand times and never got the flake to happen.

NickGerleman added a commit to NickGerleman/yoga that referenced this issue May 4, 2023
Summary:
Fixes facebook#1271

Java bindings for Yoga rely solely on garbage collection for memory management. Each Java `YogaNode` has references to its children and parent, along with an `YGNodeRef` which it owns. When the `YogaNode` is garbage collected, a finalizer is run to call `YGNodeFree` and free the underlying native Yoga Node.

This may cause a use-after-free if finalizers are run from multiple threads. This is because `YGNodeFree` does more than just freeing, but instead also interacts with its parent and children nodes to detach itself, and remove any dangling pointers. If multiple threads run finalizers at once, one may traverse and try to mutate a node which another is freeing.

This bit of behavior is counterintuitive from the API name anyway, so this diff makes a breaking change to `YGNodeFree` to instead mostly just be a `free()` (with a tracing step whose publishing should be thread safe). The existing behavior is exposed in an added function `YGNodeDetachAndFree()`.

The majority of existing `YGNodeFree` usages will probably still work under the new behavior, but we don't have that many in fbsource (many instead call `YGNodeFreeRecursive` to free a whole tree at once). So for safety, I replaced existing usages of `YGNodeFree` outside of Yoga with `YGNodeDetachAndFree` to preserve existing behavior.

JavaScript and C# bindings also use `YGNodeFree()`, and both allow explicit freeing. For now I switched both of their behavior to detachment, to avoid exposing the ability to managed languages to corrupt memory.

Differential Revision: D45556206

fbshipit-source-id: 27a7c94ffea566e51b23cf3d7c48b94761624936
NickGerleman added a commit to NickGerleman/litho that referenced this issue May 4, 2023
Summary:
Fixes facebook/yoga#1271

Java bindings for Yoga rely solely on garbage collection for memory management. Each Java `YogaNode` has references to its children and parent, along with an `YGNodeRef` which it owns. When the `YogaNode` is garbage collected, a finalizer is run to call `YGNodeFree` and free the underlying native Yoga Node.

This may cause a use-after-free if finalizers are run from multiple threads. This is because `YGNodeFree` does more than just freeing, but instead also interacts with its parent and children nodes to detach itself, and remove any dangling pointers. If multiple threads run finalizers at once, one may traverse and try to mutate a node which another is freeing.

Because we know the entire connected tree is dead, there is no point to trying to remove dangling pointers, and this bit of behavior is counterintuitive from the API name anyway. This diff makes a breaking change to `YGNodeFree` to instead mostly just be a `free()` (with a tracing step whose publishing should be thread safe). The existing behavior is exposed in an added function `YGNodeDetachAndFree()`.

The majority of existing `YGNodeFree` usages will probably still work under the new behavior, but we don't have that many in fbsource (many instead call `YGNodeFreeRecursive` to free a whole tree at once). So for safety, I replaced existing usages of `YGNodeFree` outside of Yoga with `YGNodeDetachAndFree` to preserve existing behavior.

JavaScript and C# bindings also use `YGNodeFree()`, and both allow explicit freeing. For now I switched both of their behavior to detachment, to avoid exposing the ability to managed languages to corrupt memory.

Differential Revision: D45556206

fbshipit-source-id: 0d1f52a4195c169c5ab43f6f417e22ce1f998d8c
@NickGerleman
Copy link
Contributor

Okay, I think the PR attached above should resolve the issue.

NickGerleman added a commit to NickGerleman/react-native that referenced this issue May 4, 2023
Summary:
Fixes facebook/yoga#1271

Java bindings for Yoga rely solely on garbage collection for memory management. Each Java `YogaNode` has references to its children and parent, along with an `YGNodeRef` which it owns. When the `YogaNode` is garbage collected, a finalizer is run to call `YGNodeFree` and free the underlying native Yoga Node.

This may cause a use-after-free if finalizers are run from multiple threads. This is because `YGNodeFree` does more than just freeing, but instead also interacts with its parent and children nodes to detach itself, and remove any dangling pointers. If multiple threads run finalizers at once, one may traverse and try to mutate a node which another is freeing.

Because we know the entire connected tree is dead, there is no point to trying to remove dangling pointers, and this bit of behavior is counterintuitive from the API name anyway. This diff makes a breaking change to `YGNodeFree` to instead mostly just be a `free()` (with a tracing step whose publishing should be thread safe). The existing behavior is exposed in an added function `YGNodeDetachAndFree()`.

The majority of existing `YGNodeFree` usages will probably still work under the new behavior, but we don't have that many in fbsource (many instead call `YGNodeFreeRecursive` to free a whole tree at once). So for safety, I replaced existing usages of `YGNodeFree` outside of Yoga with `YGNodeDetachAndFree` to preserve existing behavior.

JavaScript and C# bindings also use `YGNodeFree()`, and both allow explicit freeing. For now I switched both of their behavior to detachment, to avoid exposing the ability to managed languages to corrupt memory.

Differential Revision: D45556206

fbshipit-source-id: a60954260157e4de8551d97c71aaadd0f203c2fb
@NickGerleman
Copy link
Contributor

Heads up if you were using the last PR that I made a change after talking to folks, to avoid changing behavior of YGNodeFree for safety, and to scope raw deallocation to just JNI. Should be the same for Java users as tested previously, but without changing any existing C API behavior.

@NickGerleman
Copy link
Contributor

Fixed with 3b088c3

@khandpur
Copy link

khandpur commented May 11, 2023

Hi @NickGerleman, thanks for fixing this issue. If I read the PR correctly, it looks like finalizer calls jni_YGNodeDeallocateJNI which calls YGNodeFree which has the old behavior (modifying the tree and free-ing itself). Did we also want to change jni_YGNodeDeallocateJNI to call into YGNodeDeallocate instead of YGNodeFree?

@NickGerleman
Copy link
Contributor

Ugghh, yes, that was left behind in the switch from the one revision to the other. Let me make a new change to fix this.

@NickGerleman NickGerleman reopened this May 11, 2023
NickGerleman added a commit to NickGerleman/react-native that referenced this issue May 11, 2023
Summary:
Fixes facebook/yoga#1271

Updating this glue was missed in D45556206 when moving from one revision to the other...

Differential Revision: D45780647

fbshipit-source-id: a21ac3227ab79e659d377ba644374c4dfb13d66b
NickGerleman added a commit to NickGerleman/react-native that referenced this issue May 11, 2023
Summary:
Pull Request resolved: facebook#37388

Fixes facebook/yoga#1271

Updating this glue was missed in D45556206 when moving from one revision to the other...

Reviewed By: yungsters

Differential Revision: D45780647

fbshipit-source-id: 60ea526d995d7a8096960a286628587e4c57c9f2
NickGerleman added a commit to NickGerleman/react-native that referenced this issue May 11, 2023
Summary:
Pull Request resolved: facebook#37388

Fixes facebook/yoga#1271

Updating this glue was missed in D45556206 when moving from one revision to the other...

Reviewed By: yungsters

Differential Revision: D45780647

fbshipit-source-id: 81b0159b198ff25a4de31313cb505b9b602308a4
facebook-github-bot pushed a commit to facebook/react-native that referenced this issue May 11, 2023
Summary:
Pull Request resolved: #37388

Fixes facebook/yoga#1271

Updating this glue was missed in D45556206 when moving from one revision to the other...

Reviewed By: yungsters

Differential Revision: D45780647

fbshipit-source-id: 4ca64bc9971d3e4697990e73b618a3dc91df259b
facebook-github-bot pushed a commit to facebook/litho that referenced this issue May 11, 2023
Summary:
X-link: facebook/react-native#37388

Fixes facebook/yoga#1271

Updating this glue was missed in D45556206 when moving from one revision to the other...

Reviewed By: yungsters

Differential Revision: D45780647

fbshipit-source-id: 4ca64bc9971d3e4697990e73b618a3dc91df259b
@khandpur
Copy link

Thank you Nick!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment