-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Threads 99% of the Blocked causing slowdown #101216
Comments
#11056 ? |
Tagging subscribers to this area: @tommcdon |
@jkotas It seems very related. As on CPU stacks window, this is what I see
|
Thank you all. I see it is assigned and tagged at milestone 9. Will it fix the NET Framework issue as well. I am assuming the issue is known at this point. Is the only workaround is to disable the profiler? If there is any other workaround kindly let me know or something different I can do in the app that stands out in the trace. It's been source of continuous problems for us. Appreciate the help. |
Hi @marafiq Apologies for the confusion - the milestone that we added is for our tracking only. It just means that we are tracking the issue. Looking at the CPU time for this trace is unlikely to help figure out the root cause since the application is not CPU bound. Would you be able to do a wall clock analysis as described here: https://learn.microsoft.com/en-us/shows/perfview-tutorial/tutorial-12-wall-clock-time-investigation-basics? |
@davmason thanks. Let me try to do the analysis. Analysis The next 3 most slow requests are waiting for query results thus not a concern for this trace.
I could not find any other reasons it was blocked, except that the thread 936 recorded - Compression Failure with Reason Code 14 occurred at 84,882.473 ms, that is when this request completed. The thread 936 from Events received the CSwitch
When I look at the CSwitch to Thread 936 time range, there are many calls to remove the IO seen for many threads. Including in case if it contains helpful trace.
When looking from the context of HTTP Requests, it seems ETW is not a factor here. Can you help me interpret why a bundle request took 71 - and help explaining the role of |
Sorry, I have few more questions. In multiple threads I see Under what conditions GC gets disabled or what the below call stack means when you find rare disable RareDisablePreemptiveGC call in call stack? Below stack traces from a different profile trace taken today experiencing slowdown - but the same application Partial Stack Traces of Possible GC Disable - Few threads have ETW thread before it got triggered.
Another partial stack trace under Blocked Time - Callers - This happens when I have JIT time crossing more than a second for a method.
|
If you see a thread waiting in RareDisablePreemptiveGC, it means that there is GC running on some other thread and this thread is waiting for that GC to finish. |
Thank you @jkotas One of the GC thread has the below stack trace. is it blocked? Callstack seems very similar to #67559
|
Hi @marafiq! This repo (dotnet/runtime) is used to track issues with .NET 6+, though this issue is specific to .NET Framework. Based off of the original description it sounded like there might be a product issue however it seems the ETW trace is normal and nothing unusual from a diagnostics perspective stood out. Since this issue is now tracking answering questions on a specific performance analysis, we have decided to keep this issue open as other community members might benefit from the discussion and move it to the Future milestone. Since there are no .NET 6+ product issues listed here we are not planning on active working on this issue. If you are interested in paid Microsoft support, please see https://support.microsoft.com/contactus. |
@tommcdon thanks. Keeping it open might help - yes, original trace ETW is not problem rather it was 'piohper' and some native calls, waiting for some clarifications there whenever folks find time. Regard 2nd trace from same app, as per my understanding GC itself seems to be blocked - so we will know who is blocking it. But I m waiting for the answer before making assumption. |
Description
Application experience slowdown up to 30 seconds. Most of the threads are waiting for
WaitForSingleObject
in the memory dump and similarly when a trace is taken 99% of threads are waiting underBlockedTime
.Configuration
ASP.NET Framework 4.8 running on Azure App Service Premium V3
The application uses the following:
Azure SignalR SDK version - 1.21.4
SignalR Client SDK version - 2.4.1
Azure Insights Enabled - .NET Basic Level
MVC 5.2.3
WebAPI
Help
I have been chasing this issue from few weeks now. I am unclear on the root cause, It would be great help if you can guide me to pin the root cause. My confusion is that I might be chasing the wrong optimization as I am failing to make sense of the blocked time quite clearly. As I suspect the ETW thread playing huge role in it than the application code.
Analysis
For the similar issue where JIT time was huge which I thought is the root cause. I did the following analysis
microsoft/perfview#1997 and with expert help from PerfView Team Member, It was concluded that it is probably the Profile RunDown event causing the high JIT time.
Subsequently, we approached the Azure Support team, and the made different conclusion that it is somehow that GC is blocked for some reason but were not able to pin down the root cause as it falls outside of their day to day job. I am attaching the final conclusion made here that seems to indicate that there is some problem with the usage of
Dictionary
and most of thread traces show that is coming from Routing.Some of the suggestions to use the workstation mode might solve some slowdown but probably will not solve the root cause.
Things we know:
Issue related to ETW events - Azure/azure-signalr#1837
Exception in Finalizer Thread- Dump shows that finalizer thread is not blocked - Though we have exception happening every then and now here Azure/azure-signalr#1928
Latest Event of Slow Down
Whenever I take a trace the traces include ETW Trace events writing logs which our application explicitly does not write out.
- I can not make sense of it, what exactly it means?
When I look at the thread time stacks window
Only one thread is working under CPU_Time - thread 13212 - Below is part of stack trace of the thread.
One thing to note that there are multiple threads with same thread id 13212 in the same trace, below is 2nd thread with same id doing the rundown
Other instances of same thread
The text was updated successfully, but these errors were encountered: