Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to ignore reference cycles on serialization #46101

Merged
merged 9 commits into from
Feb 18, 2021

Conversation

jozkee
Copy link
Member

@jozkee jozkee commented Dec 15, 2020

Fixes #40099

This is a draft implementing a new API for ReferenceHandler capable of detect circularity on the input value of JsonSerializer.Serialize. A sample usage scenario would be the following:

class Node
{
    public string Description { get; set; }
    public object Next { get; set; }
}

void Test()
{
    var node = new Node { Description = "Node 1" };
    node.Next = node;
    
    var opts = new JsonSerializerOptions { ReferenceHandler = ReferenceHandler.IgnoreCycle };
    
    string json = JsonSerializer.Serialize(node, opts);
    Console.WriteLine(json); // Prints {"Description":"Node 1","Next":null}
}

Note that reference cycles are broken by null tokens in JSON.

@jozkee jozkee added this to the 6.0.0 milestone Dec 15, 2020
@jozkee jozkee self-assigned this Dec 15, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@eiriktsarpalis
Copy link
Member

I think this has already been discussed in the parent issue, but as a json user, assuming I didn't want to preserve references I would prefer it if the serializer threw an exception rather than silently ignoring data (because there's likely a bug in my code).

@jozkee
Copy link
Member Author

jozkee commented Dec 16, 2020

@eiriktsarpalis this is not changing the default behavior that throws an exception when MaxDepth is reached. You have to opt-in for ignoring reference loops as shown in the description's code snippet.

@steveharter
Copy link
Member

@jozkee have you evaluated what it would take to improve the existing extensibility model to support ignore? That would either be an alternative to this feature or perhaps complementary to handle scenarios with different requirements.

@jozkee
Copy link
Member Author

jozkee commented Jan 13, 2021

@steveharter I was thinking that we could add a ReferenceCycleStrategy flag that indicates the semantics that will be used and add a couple methods to ReferenceResolver for such purpose.

public abstract class ReferenceResolver
{
    protected ReferenceResolver() { }
    public abstract void AddReference(string referenceId, object value);
    public abstract string GetReference(object value, out bool alreadyExists);
    public abstract object ResolveReference(string referenceId);

+   // Pushes the value to the stack of references.
+   // Returns true if the reference already exists; otherwise, false.
+   public virtual bool PushReferenceForCycleDetection(object value);
+
+   // Pops the last reference on the stack of references once serialization of the value is complete.
+   public virtual void PopReferenceForCycleDetection();
}

public abstract class ReferenceHandler
{
    public static ReferenceHandler Preserve { get { throw null; } }
    public static ReferenceHandler IgnoreCycle { get { throw null; } }
    public abstract ReferenceResolver CreateResolver();

+   public virtual JsonReferenceCycleStrategy ReferenceCycleStrategy => JsonReferenceCycleStrategy.None;
}

+public enum JsonReferenceCycleStrategy
+{
+   // Use preserve semantics ($id and $ref) on (de)serialization.
+   None, 
+   // Use ignore semantics (treat the value as null on cycle detection) on serialization. 
+   Ignore 
+}

EDIT: As we discussed offline, this prototype breaks the single responsibility principle in ReferenceResolver.

@jozkee jozkee marked this pull request as ready for review January 21, 2021 11:19
@jozkee
Copy link
Member Author

jozkee commented Jan 21, 2021

-public abstract class ReferenceResolver
+public abstract class ReferenceResolver : ReferenceResolverBase
{
    public abstract void AddReference(string referenceId, object value);
    public abstract string GetReference(object value, out bool alreadyExists);
    public abstract object ResolveReference(string referenceId);
}

+public abstract class ReferenceResolverBase {}

+public abstract class IgnoreReferenceResolver : ReferenceResolverBase 
+{
+   public abstract void PopReferenceForCycleDetection() => throw new InvalidOperationException();
+   public abstract void PushReferenceForCycleDetection(object value) => throw new InvalidOperationException();
+   public abstract bool ContainsReferenceForCycleDetection(object value) => throw new InvalidOperationException();
+}

// Binary breaking change.
-public sealed class ReferenceHandler<T> : ReferenceHandler where T : ReferenceResolver, new()
+public sealed class ReferenceHandler<T> : ReferenceHandler where T : ReferenceResolverBase, new()
{
    
-   public override ReferenceResolver CreateResolver() { throw null; }
+   public override ReferenceResolverBase CreateResolver() { throw null; }
}

public abstract partial class ReferenceHandler
{
    protected ReferenceHandler() { }
    public static ReferenceHandler Preserve { get; }
    public static ReferenceHandler IgnoreCycle { get; }
-   public abstract ReferenceResolver CreateResolver();
+   public abstract ReferenceResolverBase CreateResolver();
}

I was thinking a bit more about how an uber class could help us to keep single responsibility on each resolver and thought that we could probably add ReferenceResolverBase and make ReferenceResolver inherit from it, that way, we can keep it to work as a set of references that uses the Preserve semantics.

ReferenceResolverBase would allow us to add a new resolver (IgnoreReferenceResolver) that can be used as a stack of references and would use the ignore semantics from Newtonsoft.

However the downside of such design is that it would introduce a binary breaking change on users that already implement their own ReferenceHandler.

@steveharter @ahsonkhan @layomia @eiriktsarpalis thoughts?

@jozkee
Copy link
Member Author

jozkee commented Jan 21, 2021

Update: addressed feedback and changed feature behavior to treat reference loops as null values in order to not alter the length of collections.

@jozkee
Copy link
Member Author

jozkee commented Jan 26, 2021

Fixed a perf regression caused by using IsIgnoreCyclesEnabled method in order to constantly check if the feature was on.

internal static bool IsIgnoreCyclesEnabled(JsonSerializerOptions options)
=> !options.ReferenceHandler?.UsePreserveSemantics ?? false;

New internal field ReferenceHandlingStrategy in JsonSerializerOptions is used instead in order to perform the same check. Using a field that refreshes only when ReferenceHandler changes is cheaper than using above method.

ReferenceHandlingStrategy = value?.HandlingStrategy ?? ReferenceHandlingStrategy.None;
}
}
// The cached value used to determine if ReferenceHandler should use Preserve or IgnoreCycle semanitcs or None of them.
internal ReferenceHandlingStrategy ReferenceHandlingStrategy = ReferenceHandlingStrategy.None;

@jozkee
Copy link
Member Author

jozkee commented Jan 26, 2021

BenchmarkDotNet=v0.12.1.1466-nightly, OS=Windows 10.0.18363.1316 (1909/November2019Update/19H2)
Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=5.0.100
  [Host]     : .NET 5.0.0 (5.0.20.51904), X64 RyuJIT
  Job-ABOOKC : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT
  Job-QJROBB : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1  
Type Method Job Toolchain Mean Error StdDev Median Min Max Ratio MannWhitney(3ms) Gen 0 Gen 1 Gen 2 Allocated
WriteJson<IndexViewModel> SerializeToUtf8Bytes Job-ABOOKC \runtime2\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 17,076.1 ns 172.47 ns 161.33 ns 17,057.3 ns 16,803.3 ns 17,401.4 ns 1.01 Same 2.0292 0.0676 - 13,024 B
WriteJson<IndexViewModel> SerializeToUtf8Bytes Job-QJROBB \runtime\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 16,887.4 ns 99.72 ns 93.27 ns 16,875.5 ns 16,746.1 ns 17,076.6 ns 1.00 Base 2.0380 0.0679 - 13,024 B
WriteJson<Int32> SerializeToUtf8Bytes Job-ABOOKC \runtime2\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 122.9 ns 1.55 ns 1.37 ns 123.1 ns 119.2 ns 125.0 ns 1.01 Same 0.0289 - - 184 B
WriteJson<Int32> SerializeToUtf8Bytes Job-QJROBB \runtime\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 121.1 ns 1.78 ns 1.48 ns 121.1 ns 117.9 ns 123.5 ns 1.00 Base 0.0289 - - 184 B
WriteJson<LargeStructWithProperties> SerializeToUtf8Bytes Job-ABOOKC \runtime2\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 627.5 ns 4.92 ns 4.60 ns 627.4 ns 621.2 ns 636.3 ns 1.01 Same 0.0578 - - 376 B
WriteJson<LargeStructWithProperties> SerializeToUtf8Bytes Job-QJROBB \runtime\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 624.2 ns 4.57 ns 4.28 ns 624.6 ns 617.5 ns 632.7 ns 1.00 Base 0.0581 - - 376 B
WriteJson<Location> SerializeToUtf8Bytes Job-ABOOKC \runtime2\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 761.1 ns 4.59 ns 4.07 ns 760.9 ns 755.2 ns 770.5 ns 0.98 Same 0.0611 - - 384 B
WriteJson<Location> SerializeToUtf8Bytes Job-QJROBB \runtime\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 778.0 ns 7.14 ns 6.68 ns 778.5 ns 768.6 ns 792.3 ns 1.00 Base 0.0596 - - 384 B
WriteJson<LoginViewModel> SerializeToUtf8Bytes Job-ABOOKC \runtime2\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 314.8 ns 2.71 ns 2.54 ns 314.4 ns 310.9 ns 320.0 ns 0.98 Same 0.0410 - - 264 B
WriteJson<LoginViewModel> SerializeToUtf8Bytes Job-QJROBB \runtime\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 320.0 ns 1.60 ns 1.42 ns 320.2 ns 317.1 ns 323.1 ns 1.00 Base 0.0416 - - 264 B
WriteJson<MyEventsListerViewModel> SerializeToUtf8Bytes Job-ABOOKC \runtime2\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 413,696.7 ns 3,250.95 ns 3,040.94 ns 412,356.4 ns 410,011.2 ns 418,388.0 ns 1.01 Same 29.6053 4.9342 - 191,665 B
WriteJson<MyEventsListerViewModel> SerializeToUtf8Bytes Job-QJROBB \runtime\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 408,065.2 ns 3,297.89 ns 3,084.85 ns 408,330.6 ns 403,268.6 ns 412,867.3 ns 1.00 Base 28.8462 4.8077 - 191,705 B
WriteJson<SimpleStructWithProperties> SerializeToUtf8Bytes Job-ABOOKC \runtime2\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 232.2 ns 1.87 ns 1.75 ns 231.9 ns 229.1 ns 235.3 ns 1.01 Same 0.0361 - - 232 B
WriteJson<SimpleStructWithProperties> SerializeToUtf8Bytes Job-QJROBB \runtime\artifacts\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\CoreRun.exe 230.6 ns 2.59 ns 2.42 ns 230.4 ns 227.7 ns 236.1 ns 1.00 Base 0.0366 - - 232 B

new JsonSerializerOptions { ReferenceHandler = ReferenceHandler.IgnoreCycle };

[Fact]
public async Task IgnoreCycles_OnObject()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and the following collection-related types, I notice that the element types are all object. Do we have any with strongly typed class-type, value-type, and nullable-value-type elements?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more types that contain a property with object or a concrete type. It should cover boxed classes, non-boxed classes and boxed structs.

I am not testing non-boxed value types since those are ignored/never tracked in the stack of references and Nullable<T> can't be boxed.

Comment on lines 126 to 128
var root = new EmptyClassWithExtensionProperty();
root.MyOverflow.Add("root", root);
await Test_Serialize_And_SerializeAsync(root, @"{""root"":null}", s_optionsIgnoreCycles);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Newtonsoft.Json have the same behavior here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Newtonsoft also detects loops in the elements of the extension data.

Copy link
Contributor

@layomia layomia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jozkee
Copy link
Member Author

jozkee commented Feb 18, 2021

CI issues are unrelated: #39244, #39244.

@jozkee jozkee merged commit fd9886d into dotnet:master Feb 18, 2021
@jozkee jozkee deleted the refhandler_ignore branch February 18, 2021 04:05
@jeffhandley
Copy link
Member

Awesome! 💯

@jozkee, please add a comment to dotnet/core#5889 to capture this new feature in the Preview 2 release notes.

@ericsampson
Copy link

@jozkee thank you very much, I know there was a lot of churn/heat regarding this :)

@MattGal
Copy link
Member

MattGal commented Feb 18, 2021

CI issues are unrelated: #39244, #39244.

@jozkee based off the log you linked, please investigate immediately and back out your changes if appropriate. dotnet.exe exiting with 0xc0000409 (STATUS_STACK_BUFFER_OVERRUN) is not a helix failure, rather a crash with a buffer overrun

@jozkee
Copy link
Member Author

jozkee commented Feb 18, 2021

@MattGal I have sent #48464 which build on top of these changes, will expect to not hit those errors in the new PR, otherwise I will investigate if they are related to this change.

@MattGal
Copy link
Member

MattGal commented Feb 18, 2021

@MattGal I have sent #48464 which build on top of these changes, will expect to not hit those errors in the new PR, otherwise I will investigate if they are related to this change.

On further investigation this really seems like a payload building problem, since it didn't repro on the same combos in master after you merged. Probably can ignore, but please do be cautious about this sort of thing as it wasn't related to the linked issues.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should we add support to ignore cycles on serialization?
10 participants