-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible finalizer thread stuck on System.RuntimeMethodHandle.Destroy #31782
Comments
This fragment does not show the handle of the method being finalized. It is not a proof that the finalizer is stuck finalizing the same dynamic method. What is the full stacktrace from The most likely explanation of the symptoms is that each request produces one or more DynamicMethods and the finalizer thread is not able to clean them up fast enough. Try to find out what is producing these dynamic methods. The dynamic methods are typically meant to be cached, and it is not unusual to a have a bug where it is not happening. The problem can be magnified by enable tracing. Tracing makes the cleanup of DynamicMethods more expensive and so it makes it more likely that the finalizer thread won't be able to keep up with high DynamicMethod churn. |
Tagging subscribers to this area: @dotnet/gc Issue DetailsDescriptionWhen our application reaches ~1700 Rps and ~85% Cpu usage per server during a short period of time due a traffic spike, around a 25% of our servers experiment what it seems to be a blockage in the finalizer thread and the application starts hoarding memory and increasing latency. Eventually we have to kill the server when memory is at 10X of what it uses normally and latency is not acceptable. Exploring with dotnet-dump, we see that after the requests spike the finalizer queue starts to accumulate objects. Taking multiple dumps shows that the "Ready for finalization" keep growing in each heap. When exploring the threads, there is always a thread
If I take a dump 30 minutes later, it still shows this frame with this same data in this thread. Those servers keep having a higher % of time in GC ConfigurationServer is an Azure Standard_F16s_v2 (16 cores, 32 GiB RAM). Docker image
Regression?We are not completely sure, but we have the feeling it started happening when we moved to .NET7 from .NET6. Data
|
I am using dotnet-dump to analyze the memory dump, can that be found out with it?
Note this happens due a traffic spike that lasts just a few minutes. After that most of servers memory levels go back to normal when traffic volume gets back to normal, it is just a few of them that memory keeps growing and latency is getting worse despite of being receiving the same traffic than the rest of servers.
Tracking those references using
We do not have tracing enabled. |
dotnet-dump is not able to printnative stacktraces. The dump has to be opened in native debugger for that. https://learn.microsoft.com/en-us/dotnet/core/diagnostics/debug-linux-dumps or https://learn.microsoft.com/en-us/troubleshoot/developer/webapps/aspnetcore/practice-troubleshoot-linux/lab-1-2-analyze-core-dumps-lldb-debugger have instructions for how to do that.
The way you are calling into Entity Framework and System.Text.Json may be creating too many dynamic methods that leads to the problem. For example, are you reusing JsonSerializationOptions instances - https://learn.microsoft.com/dotnet/standard/serialization/system-text-json/configure-options#reuse-jsonserializeroptions-instances ? Ideally, you should not see any dynamic methods created and destroyed per request in steady state.
The monitoring software may be enabling tracing for you. How are you producing the graph with GC utilization you have shared above? It is mostly likely done via tracing. |
This issue has been marked |
Thank you. I will try getting than information from the dump. Yes, we cache the JsonSerializationOptions. We get that GC metric ftom Event Counters. |
I have LLDB installed now. I can see the backtrace of the thread in question:
Unfortunately I only have this dump. I will force the situation again in the next few days and take two of them. From the CLR thread stacks, I see 108 threads which top stackframes are :
When comparing with a healthy application dump, only one thread was using Checking some of those thread's backtrace I see they are also blocked on
|
This is the problem. The EFCore is generating a dynamic method per requiests that is overwhelming the system. It should be fixed in current EF version by #29815 . cc @roji |
Thank you @jkotas For the record we use Entity Framework 7.0.10 on .NET7. It was also happening with Entity Framework 6.0.6 on .NET7. Our application usually reads a single record from the database using We load related entities both with
Although I can see stack traces using
The application were these stackframes were taken was running for days without problem until due a sudden traffic spike that made the CPU hit 85%, some servers included this one started to have memory issues and sluggishness. |
@jkotas thanks for digging into this... We should backport that specific fix to 7.0 and 6.0. |
@roji in which version is this fixed? |
@vtortola 8.0.0-rc.1 no longer has this problem - can you please give that a try and confirm that it resolves the problem for you? Note that 8.0 (non-preview/rc) will be out in November. |
@roji I am sorry I cannot deploy our application as .NET8 in our production system, that is where the problem happens. I am afraid I will need to wait till November. If you backport it to EF7 in .NET7 let me know and I can give it a try. Thanks. |
@vtortola thanks. I'll submit a PR for a patching 6 and 7, keep your eyes on this issue to be updated on progress. |
@jkotas although maybe it is true that Entity Framework is calling too much to |
Yes, I agree that we do not a full explanation of the behavior. |
FYI #31784 has been approved for patching for 6.0 and 7.0. |
Any idea in which 7.0 version will it be released? |
Looks like it should be in 7.0.12. |
Is this still the case? Could only find the 6.0 PR |
@stevendarby It will get merged from 6 into 7. |
@ajcvickers @roji I see 7.0.12 is out, can you please confirm this fix in it? thanks! |
@vtortola All non-security fixes got pulled from 7.0.12, so it will be in 7.0.13 instead. |
Alright, we will wait for 7.0.13, many thanks! |
@ajcvickers @roji hi again! I see 7.0.13 is out, can you please confirm this fix in it? thanks! |
I can see the commit 28c0abe in v7.0.13...main 🎊 I will let you know how it goes |
Description
When a single application instance reaches ~1700 Rps and ~85% Cpu usage during a short period of time due a traffic spike, around a 25% of our servers experiment what it seems to be a blockage in the finalizer thread and the application starts hoarding memory and increasing latency. Eventually we have to kill the server when memory is at 10X of what it uses normally and latency is not acceptable.
Exploring with dotnet-dump, we see that after the requests spike the finalizer queue starts to accumulate objects. Taking multiple dumps shows that the "Ready for finalization" keep growing in each heap.
When exploring the threads, there is always a thread
0x001B
that is stuck in this frame:If I take a dump 30 minutes later, it still shows this frame with this same data in this thread.
Those servers keep having a higher % of time in GC
Configuration
Server is an Azure Standard_F16s_v2 (16 cores, 32 GiB RAM).
Docker image
mcr.microsoft.com/dotnet/aspnet:7.0.11
.Regression?
We are not completely sure, but we have the feeling it started happening when we moved to .NET7 from .NET6.
When we were in .NET6 we did not do see this kind of situation. After we moved to .NET7 we started seeing some machines using an unexpected big amount of memory that starts on a traffic spike situation.
Data
The text was updated successfully, but these errors were encountered: