Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of ReferenceHandler.Preserve (considerably) increase Memory Pressure #47569

Closed
codingdna2 opened this issue Jan 28, 2021 · 32 comments
Closed
Assignees
Milestone

Comments

@codingdna2
Copy link

codingdna2 commented Jan 28, 2021

Description

I've enabled in my ASP.NET Core project the usage of the built-in ReferenceHandler as following.

services.AddControllers()
    .AddJsonOptions(options =>
    {
        options.JsonSerializerOptions.IgnoreNullValues = true;
        options.JsonSerializerOptions.Converters.Add(new TimeSpanConverter());
        options.JsonSerializerOptions.ReferenceHandler = ReferenceHandler.Preserve;
    });

I've observed my process, which was previously running with a memory consuption of 110-120Mb, running at over 2GB (and increasing) a few seconds minutes after I've connected my client. My client is a C# client generated with NSwag.

Configuration

  • NET 5.0
  • Windows 10 v20H2 build 19042.746
  • Architecture x64

Other information

I've tried to have a look at source code and I see PreserveReferenceHandler have a static reference (ReferenceHandler.cs) and contains a dictionary from which values are never removed (PreserveReferenceResolver.cs).. So I can just guess disposing would occur when the whole serializer is disposed but it's not clear to me when this happens.

Am I using it in a wrong way?
Thanks in advance for any help.
Dan

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Jan 28, 2021
@ghost
Copy link

ghost commented Jan 28, 2021

Tagging subscribers to this area: @eiriktsarpalis, @layomia
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

I've enable in my ASP.NET Core project the usage of the built-in ReferenceHandler as following.

services.AddControllers()
    .AddJsonOptions(options =>
    {
        options.JsonSerializerOptions.IgnoreNullValues = true;
        options.JsonSerializerOptions.Converters.Add(new TimeSpanConverter());
        options.JsonSerializerOptions.ReferenceHandler = ReferenceHandler.Preserve;
    });

I've observed my process, which was previously running with a memory consuption of 110-120Mb, running at over 2GB (and increasing) after with seconds after I connect my client. My client is a C# client generated with NSwag.

Configuration

  • NET 5.0
  • Windows 10 v20H2 build 19042.746
  • Architecture x64

Other information

I've tried to have a look at source code and I see PreserveReferenceHandler have a static reference (ReferenceHandler.cs) and contains a dictionary from which values are never removed (PreserveReferenceResolver.cs).. So I can just guess disposing would occur when the whole serializer is disposed but it's not clear to me when this happens.

Is this a memory leak? Am I using it in a wrong way?
Thanks in advance for any help.

Author: codingdna2
Assignees: -
Labels:

area-System.Text.Json, untriaged

Milestone: -

@eiriktsarpalis eiriktsarpalis self-assigned this Jan 28, 2021
@eiriktsarpalis
Copy link
Member

Hi @codingdna2, would you be able to supply a minimal reproduction that demonstrates the issue? If not, have you tried profiling your application?

@layomia
Copy link
Contributor

layomia commented Jan 28, 2021

cc @jozkee

This reminds me of dotnet/aspnetcore#24230. Are you able to take a memory dump, similar to that issue, and share what you find with us?

How many elements (unique objects with different $ids) are placed in the reference dictionary you mentioned? JsonSerializer, by default, news up an instance of the reference dictionary for every call to the serializer, which is destroyed when (de)serialization is complete. If the problem here is using the preserve-reference feature, this suggests to me that a single call to the serializer is creating a whole lot of objects based on the incoming JSON payload (when deserializing), or based on the size of the instantiated input object graph (when serializing).

Is it possible to check if the same issue happens if Newtonsoft.Json is used instead?

@codingdna2
Copy link
Author

Hi @eiriktsarpalis, I'll try my best to create a sample repo.

What I've done was to enable/disable the reference handling and observing the memory increase. My model is quite big and I've tried to use reference handling to decrease the size of my JSONs. The client is also quite complex and makes various call to the APIs in response to some SignalR notifications.

I've also tried to profile the application but I cannot understand exactly what kind of objects get allocated (probably I don't have the experience to carry such investigation myself).

@layomia About object graph, I've checked one of my responses and the biggest $id I see is 3478. But as you said if the dictionary is destroyed at every call, I wouldn't expect them to pile.

Give me some time, I'll try to collect some more informations.

@layomia
Copy link
Contributor

layomia commented Jan 28, 2021

Thanks @codingdna2. I'll start to look at this, but yes a sample repro would be really helpful.

@codingdna2
Copy link
Author

codingdna2 commented Jan 29, 2021

Sorry for the delay, here is the sample repro. It's close enough to my configuration. The model is quite simple and the memory grows slowly (but steady).

EDIT: I'm not sure replicates it or if it's a memory leak but enabling/disabling the ReferenceHandler sure makes a big difference. I'm now collecting some statistics..

EDIT2: After 200 minutes, it's now at 540MB. It doesn't feel right but I don't know what to think...
image
image

@layomia, @eiriktsarpalis Let me know if you need anything else.

@codingdna2
Copy link
Author

Last question asked by @layomia.

Is it possible to check if the same issue happens if Newtonsoft.Json is used instead?

Yes, It seems to have the same effect.

services.AddControllers().AddNewtonsoftJson(
    options =>
    {
        options.SerializerSettings.PreserveReferencesHandling = Newtonsoft.Json.PreserveReferencesHandling.All;
    });

@codingdna2
Copy link
Author

codingdna2 commented Jan 29, 2021

I've done an additional test using <ServerGarbageCollection>false</ServerGarbageCollection> which has finally convinced me there's no leak. Memory is stable at 61MB. Surely ReferenceHandler have a strong impact on memory which makes it unusable for me.

EDIT: The (unnecessary) question I'm left with is: Why having entities in a dictionary for the time of the serialization, (apparently) cause GC to move them to an higher generation?

@eiriktsarpalis
Copy link
Member

I would recommend closing the issue for now. If you could provide us with a reproduction (e.g. containing the type of object graphs that illustrate the behaviour you are seeing) we can take a second look.

@jozkee
Copy link
Member

jozkee commented Jan 29, 2021

Why having entities in a dictionary for the time of the serialization, (apparently) cause GC to move them to an higher generation?

@codingdna2 the dictionary holds unique objects found during serialization and helps us to determine if the current object to be serialized was previously serialized, if so, the serializer only writes an object with a pointer property ($ref) instead of writing the whole object for a second time.

This is beneficial for a couple cases:

  1. If your object has a large amount of duplicate references, using ReferenceHandler.Preserve can significantly reduce the size of your resulting JSON.
  2. If your object happens to contain a cycle, the pointer written by the serializer when Preserve is on serves as a terminator to avoid infinite recursion.

@codingdna2
Copy link
Author

codingdna2 commented Jan 29, 2021

I would recommend closing the issue for now. If you could provide us with a reproduction (e.g. containing the type of object graphs that illustrate the behaviour you are seeing) we can take a second look.

@eiriktsarpalis, it was posted above...

https://github.com/codingdna2/JsonReferenceHandlerIssue

@codingdna2
Copy link
Author

Why having entities in a dictionary for the time of the serialization, (apparently) cause GC to move them to an higher generation?

@codingdna2 the dictionary holds unique objects found during serialization and helps us to determine if the current object to be serialized was previously serialized, if so, the serializer only writes an object with a pointer property ($ref) instead of writing the whole object for a second time.

This is beneficial for a couple cases:

  1. If your object has a large amount of duplicate references, using ReferenceHandler.Preserve can significantly reduce the size of your resulting JSON.

  2. If your object happens to contain a cycle, the pointer written by the serializer when Preserve is on serves as a terminator to avoid infinite recursion.

Thanks, still not understand why 3000 objects gets disposed almost immediately, while introducing a dictionary containing some ids fill the memory so much. Attached is a repro. If you run it, you'll notice memory growing. If you change serialization option and disable reference handler, memory is stable.

@codingdna2 codingdna2 changed the title Usage of JsonSerializerOptions.ReferenceHandler = ReferenceHandler.Preserve possible memory leak Use of ReferenceHandler.Preserve (considerably) increase Memory Pressure Feb 3, 2021
@eiriktsarpalis
Copy link
Member

Hi @codingdna2, I tried running your code locally and was able to reproduce the increase in memory consumption. This however only happened when hosting the server on the debugger, running it standalone either using Debug or Release builds never exceeded 300 MB of memory consumption.

I took a snapshot of the debug process and the heap looks rather inconspicuous:
image

Note that memory usage is dominated by Newtonsoft types, which I think have been pulled in because of the swagger dependencies. One problem I see here is that this is not a minimal reproduction: the app has many dependencies unrelated to the issue at hand. As soon as I removed NSwag and NLog, memory consumption was stabilized below 300 megs in the debugger as well.

This is not to say that there isn't an issue with reference preservation memory consumption, but I think there are too many at factors at play that might obscure the root cause. I would recommend trimming down your app to a minimal reproduction (gradually removing components that are not contributing to the observed issue).

@codingdna2
Copy link
Author

codingdna2 commented Apr 6, 2021

Hi @eiriktsarpalis, I've followed your indication and removed unnecessary components (modifications were commited on the same repository)

Made a simple test running the server process compiled in Debug mode without and with reference preservation for about 10 minutes. Measuring using Task Manager with attached debugger, I see 60MB consumption without and 180MB with reference preservation, totaling a 215% increase (or a 105% difference). Almost same results were observed without attached debugger. Running for 30 minutes gets to about 230MB (280% increase, 120% difference).

Of course this refers to the simple model I've used to create the repro. Using a more complex model exacerbate the problem (my own project moves from 800MB to over 2GB).

@eiriktsarpalis
Copy link
Member

Made a simple test running the server process compiled in Debug mode without and with reference preservation for about 10 minutes.

What happens if you run the test in Release mode though?

@codingdna2
Copy link
Author

codingdna2 commented Apr 26, 2021

I see almost the same results. Here's a summary:

Mode Reference Preservation Memory Consumption
Debug W/O 60 MB
Debug W 180 MB
Release W/O 66 MB
Release W 167 MB

Note: Attach the debugger doesn't seems to affect the results significantly.

@eiriktsarpalis
Copy link
Member

But does memory consumption increase over time? This is not something I could detect unless running the process from the debugger. I also can't find any evidence of a memory leak happening in System.Text.Json.

I tried tightening the loop of your reproduction by making the following changes: while enabling reference preservation does result in slightly increased memory usage (which is expected), there is no indication of any memory leaks happening.

@codingdna2
Copy link
Author

codingdna2 commented Apr 27, 2021

Not sure if it'll get to a stable point but after two hours in Release mode with no debugger attached, I have the following:

image

I'll leave it running for the day to see where it goes...

I also can't find any evidence of a memory leak happening in System.Text.Json.

You're probably right about the memory leak. I guess, if there's a problem, it's more a promotion to an higher generation in garbage collection than a real leak.

Previously, I made a test using <ServerGarbageCollection>false</ServerGarbageCollection> (see above) and it was stable. I've not repeated the test since then but I suppose it's enough to exclude the leak.

I tried tightening the loop of your reproduction by making the following changes

I see what you did and I'll run some tests leaving all the ASPNET infrastructure aside.

while enabling reference preservation does result in slightly increased memory usage

Not sure it fits the slightly definition.

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Apr 27, 2021

while enabling reference preservation does result in slightly increased memory usage

Not sure it fits the slightly definition.

I was referring to the standalone console app, where memory consumption differs by a couple of megabytes.

@codingdna2
Copy link
Author

codingdna2 commented May 4, 2021

About the previous test, it stabilize around 300/400 MB.

I just pushed the same standalone program as you did.

If I run it as it is I get a stable 26MB memory usage.
BUT, If I enable the Server GC in csproj, it grows very fast (350 MB in 30 seconds and growing).
If I still use Server GC and disable Reference preservation, it runs at 55-60 MB steady.

NOTE: I moved everything inside the loop so variables go out of scope and can be collected

@eiriktsarpalis eiriktsarpalis added this to the 6.0.0 milestone Jul 21, 2021
@eiriktsarpalis eiriktsarpalis removed needs more info untriaged New issue has not been triaged by the area owner labels Jul 21, 2021
@eiriktsarpalis eiriktsarpalis added the tenet-performance Performance related issue label Jul 21, 2021
@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Jul 30, 2021

When profiling your reproducing app I can see clear differences in the allocation profiles for server and workstation GC:

Workstation GC

Capture2

Server GC

Capture

I tried removing System.Text.Json from the reproducing app simply running the logic of PreserveReferenceResolver in a loop. Sure enough, on my machine memory usage balloons from ~15M to ~900M when switching to server GC.

Tagging @dotnet/gc who might know if this is expected behavior.

@ghost
Copy link

ghost commented Jul 30, 2021

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

I've enabled in my ASP.NET Core project the usage of the built-in ReferenceHandler as following.

services.AddControllers()
    .AddJsonOptions(options =>
    {
        options.JsonSerializerOptions.IgnoreNullValues = true;
        options.JsonSerializerOptions.Converters.Add(new TimeSpanConverter());
        options.JsonSerializerOptions.ReferenceHandler = ReferenceHandler.Preserve;
    });

I've observed my process, which was previously running with a memory consuption of 110-120Mb, running at over 2GB (and increasing) a few seconds minutes after I've connected my client. My client is a C# client generated with NSwag.

Configuration

  • NET 5.0
  • Windows 10 v20H2 build 19042.746
  • Architecture x64

Other information

I've tried to have a look at source code and I see PreserveReferenceHandler have a static reference (ReferenceHandler.cs) and contains a dictionary from which values are never removed (PreserveReferenceResolver.cs).. So I can just guess disposing would occur when the whole serializer is disposed but it's not clear to me when this happens.

Am I using it in a wrong way?
Thanks in advance for any help.
Dan

Author: codingdna2
Assignees: eiriktsarpalis
Labels:

area-System.Collections, tenet-performance

Milestone: 6.0.0

@Maoni0
Copy link
Member

Maoni0 commented Aug 2, 2021

the best way to investigate a managed memory related issue is to start with collecting the top level GC events. my guess is you have quite a few heaps in Server GC and/or the survival rate is high which means the gen0 budget is quite a high which then leads to much higher memory usage before the next GC is triggered. this can all be verified by the top level GC trace.

@eiriktsarpalis
Copy link
Member

my guess is you have quite a few heaps in Server GC and/or the survival rate is high which means the gen0 budget is quite a high which then leads to much higher memory usage before the next GC is triggered.

This seems consistent with what I'm seeing in the VS profiler. FWIW here are the collected events for the reproducing app:

Workstation GC

image

image

Server GC

image

image

@Maoni0
Copy link
Member

Maoni0 commented Aug 3, 2021

yeah, as expected you are doing a lot fewer GCs with Server GC because it allows you to allocate a lot more before triggering a GC. is there any reason why you need to use Server GC if you want a small memory footprint?

@eiriktsarpalis eiriktsarpalis added customer assistance and removed tenet-performance Performance related issue labels Aug 4, 2021
@codingdna2
Copy link
Author

codingdna2 commented Aug 4, 2021

@Maoni0 This is just a sample repro used to demonstrate the issue that arise in a ASPNET core project enabling a functionality of the json serializer. The ASPNET core uses Server GC by default and so does the repro.

@eiriktsarpalis I've run your latest repro and the issue disappears if the Clear on dictionary is called. So it seems that Server GC is having an issue if the Dictionary is not cleared before going out of scope while Workstation GC does not seems to care. The actual STJ implementation you linked seems not to include a Clear of the dictionary.

Latest repro here.

@eiriktsarpalis
Copy link
Member

@eiriktsarpalis I've run your latest repro and the issue disappears if the "Clear" on dictionary is called. So it seems that Server GC is having an issue if the Dictionary is not cleared while Workstation GC does not. The actual STJ implementation you linked seems not to include a "Clear" of the dictionary.

Latest repro here.

STJ does not share state across serializations so maintaining a global dictionary like your changes suggest is not something we could realistically incorporate into the reference preservation feature. Also note that your changes do not in fact clear the global dictionary at the end of each iteration: this results in operations becoming idempotent after the first iteration and no allocations being recorded subsequently.

I'm going to close this issue as by-design. Reference preservation is a memory intensive feature, per @Maoni0's feedback enabling Server GC results in GC events being fewer and further between which would correlate with increased memory usage.

@codingdna2
Copy link
Author

codingdna2 commented Aug 4, 2021

@eiriktsarpalis Fine about closing. But I want to be clear, I've not suggested any modifications (in fact the comment about a global ReferenceHandler was inserted by you, not me). I think this is something related with Server GC and dictionary usage. Frequency is not the culript in my opinion, but the clearing of the dictionary before its (garbage) collection could be.

@eiriktsarpalis
Copy link
Member

I might have misunderstood your comments then. Clearing the dictionary should only be meaningful in this context if the same dictionary instance is being reused across iterations. But I failed to spot any clearing operation in the latest reproduction that you shared.

@codingdna2
Copy link
Author

codingdna2 commented Aug 4, 2021

Clearing the dictionary should only be meaningful in this context if the same dictionary instance is being reused across iterations.

I expected the same as you, but the repro shows that the memory pressure almost disappear if the dictionary is cleared before going out of scope.

But I failed to spot any clearing operation in the latest reproduction that you shared.

At line 27, there's a handler.Reset() call which in turn call _objectToReferenceIdMap.Clear();. Uncommenting the handler.Reset() solve the issue (which shouldn't, but it does). This is also a modification you introduced when you removed STJ.

@eiriktsarpalis
Copy link
Member

I see now. Per my comment in the original repro this was testing a global dictionary that was getting cleared at the end of each loop. However you are pointing out that simply clearing the (freshly allocated) dictionary at the end of each iteration makes memory consumption decrease significantly.

Running the code in the VS profiler seems to indicate that clearing the dictionary before it falls out of scope results in GC events happening much more frequently, thus contributing to reduced memory consumption in the app. I'm sure @Maoni0 can provide insight as to why this is happening.

@Maoni0
Copy link
Member

Maoni0 commented Aug 4, 2021

this should help explain what you are seeing.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants