-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A running SubscriberClient causes exceptions after a few seconds #10992
Comments
I'll look at this tomorrow. Looking at the code very briefly though, it's not clear whether this is actually related to ASP.NET Core at all - do you observe the same behavior if you run the same code in a simple console app, without the ASP.NET Core part? It would be good to reproduce this in as simple a fashion as possible. |
Thank you for the quick reply. Just tested it and it's actually the same problem with a console application. You can use the following code to reproduce it: using Google.Api.Gax;
using Google.Apis.Auth.OAuth2;
using Google.Cloud.PubSub.V1;
const string projectId = "<...>";
var credentials = GoogleCredential.FromJson("<...>");
const string subscriptionId = "<...>";
var subscriberClient = new SubscriberClientBuilder
{
GoogleCredential = credentials,
SubscriptionName = SubscriptionName.FromProjectSubscription(projectId, subscriptionId)
}.Build();
await subscriberClient.StartAsync((_, _) => Task.FromResult(SubscriberClient.Reply.Ack)); I'm looking forward to your reply. |
Great, thanks - that will certainly make it simpler to repro. Will look into it tomorrow. |
I tried to downgrade to older versions, and the problem occurs until and including 3.1.0. I didn't downgrade any further because there seemed to be a change in the way you need to initialize a |
Just downgraded to 3.0.0. There are still exceptions being logged, but far fewer than with >3.1.0. |
Okay, seems like the fewer exceptions are explained by the fact that when using version 3.0.0, I couldn't use the This means that the error is probably related to the Btw, in the meantime I created a repository with the examples: https://github.com/tnotheis/pubsub-subscriber-exceptions
|
I've reproduced this now, and the RpcException is:
It looks like it's coming from The fact that we're seeing that as far back as 3.0 suggests that this is a service issue rather than a library issue, but I'm reluctant to come to any firm conclusions just yet. While it's also possible that some RPC failures are expected ( Could you clarify what you meant by this in the initial post:
Do you mean your topic has lots of messages and they're being pulled more slowly over time? Any details so I could try to reproduce that too would be appreciated. |
Could I ask a question about timing? Is this code that you've had running for some time, and it looks like the problem only started recently, or is it fresh new code? I assume that in production, you wouldn't see the exceptions (because they're being caught) and you'd just see the performance degradation. If the reason you've been running this in the debugger is because you did spot a performance degradation in a production service, then any information you have about when things changed would be enormously helpful. |
Sorry for the late reply. I wanted to reproduce the performance problem. But it only seemed to occur in my actual solution. So I fiddled around with it to see when it was happening and when it wasn't. Until I noticed that it wasn't related to this library at all... A restart of my PC, combined with the deletion of my database volume and all my containers then fixed the performance problems. So ignore that one. |
That's really good to know, thanks. (And thanks for being so responsive - your idea of a "late" reply is far more responsive than most folks!) |
The code in question was implemented several months ago, but was only used in some small tests until yesterday, when we finally got a pub/sub topic. I then ran a larger test suite, which took a few minutes. It was during this test run that I first noticed the logged exceptions. So I can't say how long the problem has existed. Since there is no performance degradation, as I mentioned in my previous comment, the impact on our business is no longer that high. But of course I will still try to provide you with any information necessary to resolve this. Because as you said, the amount of exceptions seems like there's something wrong. |
That's all good to know, thanks. (It's also good to hear that the impact on the business is no longer high.) I don't think there's anything else to ask you to test at the moment, but thanks so much for the help so far. FYI, I've reproduced this in both .NET 7.0 and .NET 4.8, where the latter uses Grpc.Core rather than Grpc.Net.Client - so it's unlikely to be a gRPC stack issue. I'm talking with folks internally, and will update you when I can. |
Hmm... having said I'm able to reproduce it, I'm not sure that's actually correct - I can't reproduce nearly that many exceptions. In order to limit the variation, I've got a new test which creates the client like this: var client = new SubscriberClientBuilder
{
ClientCount = 1,
SubscriptionName = subscriptionName
}.Build(); That should correspond to the code you had with manually-specified underlying clients, where you were reporting ~11 exceptions per minute. I'm only seeing one every ~80-120 seconds (so less than 1 per minute). One difference is that I'm using HEAD code (so that I can easily add more diagnostics) - but we pushed out version 3.7.0 today, so if you can upgrade your test rig to that, it would be helpful in terms of minimizing differences. Note that I'm only counting I suspect that the debugger output is adding to some confusion here - running in the debugger, I'm seeing 4 lines like this:
... for every one exception actually caught. Even that wouldn't fully explain the discrepancy though - I'd expect you to see debugger output of 2-4 lines per minute, not 11. |
If you want to run the exact code I'm running, see #11004 including the comment - basically if you fetch either that pull request, or everything after it's merged, and make the small change indicated in the commit, you can get simpler diagnostics. |
First of all: I just installed version 3.7.0. The output is still the same though, unfortunately. Did you keep the program running for more than one minute? The reason I ask this is that after ~70s there are actually only 4 exceptions logged. But if you run it for ~2.5 minutes, there is a second batch of exceptions logged, which is significantly larger than the first one (27 instead of 4). |
That's what I meant by "It starts slowly, but it's getting worse with time." in the issue description. Could have been more expressive, I admit. |
Yes, I was running for about 10 minutes. I haven't seen a batch of exceptions logged like you're describing. I'm running again now, not in VS, but with the code in #11004. Now there's the possibility that the exception isn't the one I'm currently logging... I'll see if I can find a more "catch-all" place to put that logging. |
Hmm. Not seeing anywhere nicely centralized that I can add logging right now - I'm sure it's possible, but I'm keep to get another run going. I've added it everywhere I think we might be seeing things. I'm going to leave this running for an hour and then upload the log here, so you can compare it with what you're seeing. |
For the sake of completeness, here is the whole log after 2.5 minutes:
I will have a call with a colleague soon where I will make him run the project to see if the problem is my machine. |
That's really interesting - the only exceptions I've seen in VS at all are System.Threading.Tasks.TaskCanceledException and Grpc.Core.RpcException. It looks like you're seeing very different behavior. (This could end up being related to a proxy or other network difference.) I'm still running my just-on-a-console test - after that I'll try again in VS and copy my log from that (after leaving it for at least 10 minutes). |
Now it's getting weird. My colleague doesn't see any exceptions at all. So this means we have three different behaviors now. He has installed dotnet 7.0.200. I have 7.0.307. Which version do you have? |
I've been running under .NET 6.0 - SDK 6.0.413. If we want to try .NET 7, I've got 7.0.400 installed, but nothing between 7.0.110 and that. I'll try with 7.0.400 to see if that changes things. I've attached my console log below - this is just the additional I'll run in VS now (still with .NET 6 to start with) and capture ~10 minutes of output from that. |
Okay, 11 minutes of Output Window from VS running in .NET 6:
(6 exceptions logged to the actual console.) Will try in .NET 7 now. |
Output window log when running with .NET 7.0, which is loading assemblies from Microsoft.NETCore.App\7.0.10, for about 10 minutes:
So again, no big batch of exceptions. |
Just to set expectations: I'm on vacation tomorrow, so will pick this up again on Friday. It now looks like it's probably environmental though... |
Yeah, that's what I'm thinking. Tomorrow is a day full of meeting, so I won't have much time to look into this either. But I will try to find some time on Friday. Enjoy your vacation and thank you for the effort. |
Any more updates on this? I'd like to close it if possible, as I don't think it's anything to do with the client library specifically. It may well be an aspect of a gRPC implementation, and I'd be interested to know more, but I doubt that we'll want to make any changes in the Pub/Sub library. Let me know your thoughts. |
Sorry, I totally forgot to update this issue. I've been very busy the last few days. |
Great, thanks! (Hope whatever's causing you to be busy is either welcome, or is resolved soon...) |
Thank you for the great support! |
Hi folks, please note that I'm facing the same issue that prints warnings for weeks(months?), lots of
and sometimes
Maybe it can help, I tried to limit the ClientCount to 1 but I'm still facing those logs! |
@kamisoft-fr: I would suggest at least trying to disable HTTP/3. See the code here for an example of how to do that. It's slightly fiddly at the moment - we're hoping that the gRPC team can simplify it somewhat. If that doesn't help, please file a new issue with more details, ideally including a short but complete example as a console app, including environmental details. |
Thanks Jon, unfortunately it did change the behavior As soon as I start the app, always after 1 minute ish, the warnings start to popup
I created a console app with all my pubsub boilerplate as it is in done in my code base, I'm about to push it somewhere so that you can launch it locally :) |
@kamisoft-fr: As noted before, please create a new issue rather than commenting further here. (And did you mean "it did not change the behavior" instead of "it did change the behavior"? Something to note in the new issue.) |
Yes indeed I meant "did not", I will create the issue right away! |
Here it is, thanks for your time! #12964 |
Environment details
Steps to reproduce
I have a Visual Studio solution with the minimal setup, which I would like to add to a GitHub repo, but it seems like GitHub currently experiences some problems, and I can't create a repository. So here is a zip file: PubSubTestAspNet.zip
<...>
)Now wait for about 70 seconds. After that, Visual Studio begins to log occurring exceptions:
It starts slowly, but it's getting worse with time. Also, the performance degrades the longer the solution is running.
Is this a bug or am I using the library the wrong way?
Thanks!
The text was updated successfully, but these errors were encountered: