-
Notifications
You must be signed in to change notification settings - Fork 11.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[11.x] Handle circular references in model serialization #52461
[11.x] Handle circular references in model serialization #52461
Conversation
Thanks for submitting a PR! Note that draft PR's are not reviewed. If you would like a review, please mark your pull request as ready for review in the GitHub user interface. Pull requests that are abandoned in draft may be closed due to inactivity. |
8e827b8
to
396b9ce
Compare
If a circular relationship is set up between two models using `setRelation()` (or similar methods) then calling `$model->relationsToArray()` will call `toArray()` on each related model, which will in turn call `relationsToArray()`. In an instance where one of the related models is an object that has already had `toArray()` called further up the stack, it will infinitely recurse down and result in a stack overflow. The same issue exists with `getQueueableRelations()`, `push()`, and potentially other methods. This adds tests which will fail if one of the known potentially problematic methods gets into a recursive loop.
2e16f51
to
c29aa6b
Compare
This adds a trait for Eloquent which can be used to prevent recursively serializing circular references.
c29aa6b
to
1e175f4
Compare
src/Illuminate/Database/Eloquent/Concerns/PreventsCircularRecursion.php
Outdated
Show resolved
Hide resolved
1cb4499
to
05cdf3b
Compare
I like this PR in general! Are there performance / memory implications with these changes? Maybe some benchmarks would be great with the before and after comparison. |
That's a very difficult question to answer because the "before" case that this PR is targeting would have resulted in a stack overflow, so the test would be "it works at all" vs "Segmentation Fault". In terms of non-circular references the heaviest memory implication would be temporarily holding on to a second copy of a model's attributes only on the My understanding of The |
Yeah, I meant regarding a non-circular |
So there would be extra memory usage due to |
@samlev really dig the PR, but unfortunately it is a pretty big hit on performance. For example, seed 1,000 users into the database and use App\Models\User;
use Illuminate\Support\Benchmark;
$users = User::all();
Benchmark::dd(fn () => $users->toArray()); |
@taylorotwell There's definitely some ways to clean it up more. I'm guessing that the speed issue is the additional call to |
@samlev we may just have to give the extra complexity a shot - hopefully won't be too bad. I don't think we can stomach a 2x performance hit for all Eloquent serialization. |
d51bc1e
to
8982c36
Compare
@taylorotwell this new update should negate the performance penalty for non-circular relations (and should more or less resolve @Jubeki 's concerns about increased memory usage). The only other "improvement" that I can think of would be to get extremely meta by forcing public function toArray()
{
return $this->withoutRecursion(
fn () => array_merge($this->toArray(), $this->relationsToArray()),
fn () => $this->attributesToArray(),
);
} which would have the effect of only calling toArray() {
withoutRecursion() {
toArray() {
withoutRecursion() {
attributesToArray();
}
}
relationsToArray();
}
} The end result, though, is that |
@samlev Indeed I don't see the performance regression anymore. Nice! Will do a final review in the morning and try to get this merged. 👍 |
Good to see this being handled, but isn't using a flag a better approach? Regarding performance? Something like this: <?php
namespace App\Models;
use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\Relations\BelongsTo;
use Illuminate\Database\Eloquent\Relations\HasMany;
class Item extends Model
{
// recursion flag
protected $visited = false;
public function item(): BelongsTo
{
return $this->belongsTo(Item::class);
}
public function items(): HasMany
{
return $this->hasMany(Item::class);
}
public function toArray(): array
{
if ($this->visited) {
// TODO: improve the fallback case, maybe an array
// with a single marker value to filter out later?
return [];
}
$this->visited = true;
try {
return parent::toArray();
} finally {
$this->visited = false;
}
}
} With this test code: <?php
use App\Models\Item;
use Illuminate\Support\Facades\Artisan;
Artisan::command('circular', function () {
$parent = Item::query()->create();
$parent->items()->create();
$parent->items()->create();
$parent->items()->create();
$parent->load(['items']);
// set parent instance manually to create the circular reference
$parent->items->each->setRelation('item', $parent);
$this->info($parent->toJson(\JSON_PRETTY_PRINT));
// print twice to show the visited property is cleared nicely
$this->info($parent->toJson(\JSON_PRETTY_PRINT));
}); Yields this output: $ php artisan circular
{
"updated_at": "2024-08-22T05:28:24.000000Z",
"created_at": "2024-08-22T05:28:24.000000Z",
"id": 1,
"items": [
{
"id": 2,
"item_id": 1,
"created_at": "2024-08-22T05:28:24.000000Z",
"updated_at": "2024-08-22T05:28:24.000000Z",
"item": []
},
{
"id": 3,
"item_id": 1,
"created_at": "2024-08-22T05:28:24.000000Z",
"updated_at": "2024-08-22T05:28:24.000000Z",
"item": []
},
{
"id": 4,
"item_id": 1,
"created_at": "2024-08-22T05:28:24.000000Z",
"updated_at": "2024-08-22T05:28:24.000000Z",
"item": []
}
]
}
{
"updated_at": "2024-08-22T05:28:24.000000Z",
"created_at": "2024-08-22T05:28:24.000000Z",
"id": 1,
"items": [
{
"id": 2,
"item_id": 1,
"created_at": "2024-08-22T05:28:24.000000Z",
"updated_at": "2024-08-22T05:28:24.000000Z",
"item": []
},
{
"id": 3,
"item_id": 1,
"created_at": "2024-08-22T05:28:24.000000Z",
"updated_at": "2024-08-22T05:28:24.000000Z",
"item": []
},
{
"id": 4,
"item_id": 1,
"created_at": "2024-08-22T05:28:24.000000Z",
"updated_at": "2024-08-22T05:28:24.000000Z",
"item": []
}
]
} Of course, there is this dangling array on the child elements' parent references, but this is just a proof of concept. I thought on hacking with the EDIT: without the overridden |
Yes, adding a marker will remove the dangling ones: <?php
namespace App;
enum Visited
{
case VISITED;
} Changed model to account for the marker: <?php
namespace App\Models;
use App\Visited;
use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\Relations\BelongsTo;
use Illuminate\Database\Eloquent\Relations\HasMany;
class Item extends Model
{
// recursion flag
protected $visited = false;
public function item(): BelongsTo
{
return $this->belongsTo(Item::class);
}
public function items(): HasMany
{
return $this->hasMany(Item::class);
}
public function toArray(): array
{
if ($this->visited) {
return [Visited::VISITED];
}
$this->visited = true;
try {
return parent::toArray();
} finally {
$this->visited = false;
}
}
public function relationsToArray(): array
{
// filters out visited relations
return \array_filter(parent::relationsToArray(), fn ($item) => $item !== [Visited::VISITED]);
}
} Updated output (with a single $ php artisan circular
{
"updated_at": "2024-08-22T05:43:34.000000Z",
"created_at": "2024-08-22T05:43:34.000000Z",
"id": 1,
"items": [
{
"id": 2,
"item_id": 1,
"created_at": "2024-08-22T05:43:34.000000Z",
"updated_at": "2024-08-22T05:43:34.000000Z"
},
{
"id": 3,
"item_id": 1,
"created_at": "2024-08-22T05:43:34.000000Z",
"updated_at": "2024-08-22T05:43:34.000000Z"
},
{
"id": 4,
"item_id": 1,
"created_at": "2024-08-22T05:43:35.000000Z",
"updated_at": "2024-08-22T05:43:35.000000Z"
}
]
} Of course, instead of overriding those methods on user land, we can consider incorporating them upstream. |
Just for easiness to reproduce (e.g. copy and paste), here is the migration I used: <?php
use App\Models\Item;
use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;
return new class extends Migration
{
public function up(): void
{
Schema::create('items', function (Blueprint $table) {
$table->id();
$table->foreignIdFor(Item::class)->nullable()->constrained();
$table->timestamps();
});
}
public function down(): void
{
Schema::dropIfExists('items');
}
}; |
Reproducible repo: https://github.com/rodrigopedra/circular P.S.: moved overrides to a trait |
@rodrigopedra the minor difference between the two implementations here is that the nested children will still have the relation, they just won't have the relation of the relations. So with your example (timestamps removed for brevity): {
"id": 1,
"items": [
{
"id": 2,
"item_id": 1
},
{
"id": 3,
"item_id": 1
},
{
"id": 4,
"item_id": 1
}
]
} The implementation in this PR would still have the "item" on each of the children: {
"id": 1,
"items": [
{
"id": 2,
"item_id": 1,
"item": {
"id": 1
}
},
{
"id": 3,
"item_id": 1,
"item": {
"id": 1
}
},
{
"id": 4,
"item_id": 1,
"item": {
"id": 1
}
}
]
} This doesn't seem super important, but it's more "accurate" as to the actual state of each object, and more consistent with the result if you had eager-loaded the relations (e.g. There's more to it, but we also have the potential for a method that causes recursion to call another method that might cause recursion, and we don't want to prevent that second call from happening but we still want to be sure that everything is cleaned up after any individual call to Doing it with a static weak map also opens up possibilities with inspecting levels of recursion (e.g. Ultimately, this PR isn't just about |
@rodrigopedra yours has an edge case when the relation collection is shared while current PR does not: $sharedItem = new Item(['name' => 'shared']);
$itemA = new Item(['name' => 'a']);
$itemA->setRelation('shared', $sharedItem);
$itemB = new Item(['name' => 'b']);
$itemB->setRelation('shared', $sharedItem);
$items = Collection::make([$itemA, $itemB]);
$parent = new Item(['name' => 'parent']);
$parent->setRelation('items', $items);
$sharedItem->setRelation('items', $items);
dd($parent->toArray()); Output (dd because you cannot json serialize non-backed enum and it crashes):
|
@samlev thanks, I noticed that when looking at the tests. And you are right, that'd be more accurate. @donnysim while digging into the PR's code, I moved away from using this marker. But, thank you so much for reviewing it, and for the insights. @samlev I updated my test repo, and also created a fork to use the same test set as yours (Minus the test cases that, I commented out that I don't quite get the assertion being 11.x...rodrigopedra:framework:11.x I didn't want to push a new PR, as I don't want to overtake this one. I understood from your test cases that my alternative implementation was missing the mixed function recursion. So I incorporated the stack trace you have in yours. Can you review it? I believe it is a simpler approach, and avoids a static global Nonetheless, thank you very much for this PR, I had to deal with manually unsetting relations on many projects before. |
@donnysim, on my POC repository, I added a second command to account for the use case you mentioned. Follow its README and run Thanks again for the insights. |
The big problem with the flag is if you call the method multiple times after having changed attributes (or even without) for example, then it will not output the correct array, which in theory the It's a bit of a functional programming vs OOP |
@Tofandel as an internal/ Yet, the same can happen with the static cache. One could just assign a new When I saw the tests I moved away to a single flag, due to the mixed recursive method call, as a single flag prevented that use case. (check the But I'd still go for a per-instance stack than a global EDIT: on the first iteration of the flag-based alternative, the sample command calls the recursive method twice, without any issues. |
I was talking about your original implementation in comments, your new implementation is almost the same as the one proposed in this PR and uses a local Map of the callstack as well which doesn't have this issue I do prefer when things are not static as well (because static always causes unexpected issues when callbacks are involved eg #51825), so if your solution manages all the test cases of this PR and has no additional performance issue, then it's better |
@Tofandel it has a different approach for not caching the first ever On this PR's implementation, the first ever But nonetheless, apart from the cross recursive calls between different functions, which clearly don't work with the single flag approach, I wonder how keeping a map of call stack hashes is much different than the flag approach, regarding your functional vs OOP comment. I moved away from the single flag implementation to accommodate the cross recursive calls for different functions. Although I think that for the serialization use case it would be totally fine, as none of the currently affected methods reference each other. It is ok if you don't want to stretch this further, and thanks again for reviewing my code =) |
Thanks @samlev - really well done PR. |
This is a companion PR to #51582 which resolves the issues raised around serialization of models that have circular references in relations.
TL;DR:
This PR makes some minor changes to how the main recursive operations on Models work to prevent getting into infintite recursion down a stack of circular references, which leads to a stack overflow. It does this by preventing recursive calls to the same method on an object.
What are circular references?
Put simply - circular references occur when two objects hold a reference to each other. In the context of this PR, it's when a model holds a relationship to another model which in turn holds a relationship back to the first instance. For example, if you had a user with multiple posts, and you wanted to give each post a reference back to the original user, you might do something like this:
This would mean that each post has a link back to the original user, not to another copy of the user.
Why would you want to do this?
There are many reasons, but the most common reason that I encounter is authorization. What if you had a policy similar to this for posts:
Note that the post references it's own copy of the user object. When you're looping through posts on an index page you can easily eager-load the user (
Post::with(['user'])->get()
), but if you're pulling the posts for a particular user, you either have to:$user->load(['posts.user']);
) which leaves you with multiple copies of the user that you already had.Setting the relation is the most memory efficient, database efficient, and has some other benefits like being able to naviagate back and forth between the user and posts without generating more queries, e.g.
How does this cause problems?
For the most part, it doesn't. I've used this pattern many times on many projects, and PR #51582 would make it possible to set the inverse relation automatically in a number of situations.
The issue is when it comes to serializing a model that has circular relations. When you call
toJson()
ortoArray()
on a model with circular relations, or you attempt to send that model to a queue, or you simply want to just callpush()
to save the model and any of it's dependent children. Essentially any time where it tries to walk down the stack of relations recursively, you'll end up in this loop:I don't know how many people have experienced a SegFault in PHP, but it's... not a fun issue to diagnose.
How does this PR fix that issue?
This adds a new internal method to models called
once()
withoutRecursion()
. This keeps track of the call stack and will prevent the same method from being called on the same instance of an object more than once within that stack. So instead of that call stack above, you end up with something like this:Because
toArray()
was called on$user
at the top level, it doesn't get called again, so it doesn't dig down again, but because each post is a unique object,toArray()
can get called on each of them.A Couple Of Questions That You Might Want Answered
Is this backwards compatible?
Yes. This won't break any existing code because any code that would have encoutnered this problem wouldn't be working.
The only concerns are if someone has already defined a.once()
method on a model (in which case I'm happy to change this method name if you'd prefer - I was just keeping consistency with the other existing functionality for theonce()
helper)Why not use the
once()
helper?For two major reasons:
It expects a call to complete so that it can store the value before it can prevent subsequent calls to the method. This means that if the method is recursive, it will never complete. The change that I've made here accepts a "default" value which will be stored as the return value before the method is called, meaning that any recursion will return early with the default value instead of continuing to dig itself deeper.
once()
would keep the result for the rest of the request, even if the model changedWe don't want to prevent
->toArray()
from changing its result if the models or relations change. Each time it gets called we want to actually pull the result at that point in time. We only want the "once" behaviour to persist within the context of that one call stack. This is particularly important for->push()
- you don't want to store the result of$model->push()
and never actually call it again.Has anyone other than you even encoutnered this?
Yep! @reinink has talked about Optimizing circular relationships in Laravel, @stancl made a package to add
hasManyWithInverse()
, and @stayallive extended that, with an specific note about dealing with recursion.Even if the
inverse()
pull request never gets merged, I believe that this would still be a valuable fix for models.