-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lots of unexpected logging in Python library #540
Comments
@kamalaboulhosn This appears to be happening at the level below Python. Can the customer post some more details about their OS, the Python version, the output of The output doesn't seem like the logging we do in our Pub/Sub client code (at the Python level), and it probably doesn't come from the interpreter itself either, which is written in C. Seems like some C++ code, maybe `grpc? For a start, I would at least try upgrading Pub/Sub to the latest version (2.9.0) and see if it solves the problem. |
@plamut I will ask about the additional information. I agree that it doesn't look to be something at the Pub/Sub library level. My suspicion is that debug logging is turned on the gRPC level somehow. |
@lidizheng A question for you guys, do these logs resemble Thanks! |
@plamut gRPC today uses Abseil C++ for their mutex implementations. There is no |
@lidizheng Thanks. @kamalaboulhosn had a look into Abseil, found the |
They say they have no adjusted log levels in their environment. They use these Python libraries:
And these Alpine Linux libraries:
|
Another customer is seeing the same issue. The logs are coming from here. I asked the ABSL team for info on how the raw log verbosity levels are set. abseil/abseil-cpp#1069 |
I pinged derekmauro who commented on abseil/abseil-cpp#1069 via internal chat. Hoping he can help us further with this. |
The GRPC team doesn't think it's their issue and they can't do much unless we have repro instructions for them. |
Closing as issue has not reoccurred. |
It happened to us in the night of 2022.07.11, I had to put down the service at the receiving end. but it took a few days to notice with ~ $500 per day of cloud logging usage. 2022-07-14 12:09:50.317 CEST [mutex.cc : 435] RAW: Lock returning 0x564c983a0378 @ google-api-core==2.3.2 |
I just had the same thing happen over a weekend, which ended up costing us over 400$. This service is running as a Flask application in Cloud Run. No adjusted log levels. Flask==2.1.0 It would be nice to know why this occurs, and if there's any way for us to prevent it. Paying several hundred bucks for an error you deem as "not reoccurring" feels really unfair. EDIT: This just started occurring again, after not occurring for several hours. No changes were made to the container running, it just started spamming us with logs out of the blue. We're running two instances of this container on two environments, with the same dependencies, and are only getting the error on one environment. We're on europe-west1. EDIT 2: For anyone having this issue, using log exclusion to the log sink seems to have done the trick for us. Docs here: https://cloud.google.com/logging/docs/routing/overview |
I did this: Could definitely be extended to add some further granularity, since this could filter out potentially wanted logs from other Cloud Run containers, so use with caution! :) |
I've commented again on: abseil/abseil-cpp#1069 |
The excessive logging happened to us as well. Beginning 2022-12-28 without any upgrades/updates to the Cloud Run container/deployment. We are using the following grpc versions in our python module: Re-deploying the service did not fix the issue. We are looking forward to a definitive answer why this is happening @acocuzzo |
This happened to us at the beginning of the month, > $2000 in Log Volume charges. my container was built on 2022-04-08 and uses Python 3.8.12 with these google packages.
|
I'm unable to reproduce the issue, if anyone has a consistent repro, it will help me narrow things down for the grpc folks. Thanks, appreciate the patience and information. |
We have just experienced the same problem - caught via a regular billing report after running for three days. I don't know if this was the same for others, but what seems to have happened is that our application code started throwing errors first. A bad config change meant that the GKE pod that subscribes to Pub/Sub started to repeatedly throw errors in the callback function. This caused the pod to fall over and restart, repeatedly. Each time it would terminate as follows:
Then, after this had happened a few tens of times, the deluge of 'mutex' logs began. What's interesting is that
Hope that helps to trace this behaviour! |
We see the same behavior when Pods run a certain time in GKE. |
@PaddyAlton Thanks so much for the detailed description. |
We looked into this some more and the problem seems to be coming from using the Abseil library "wrong". |
Thanks for the update @AndreasBergmeier6176 |
This just occurred on my team's Cloud Composer cluster, where a single container in the Cloud Composer GKE cluster (autopilot enabled) began emitting 32k mutex logs/sec and incurred ~USD700 in 24h. We caught it after a day and a half. Conditions appeared very similar to what @PaddyAlton describes. We had to try multiple attempts to kill the pod in order to get the logs to finally stop. Didn't rise to the level of detection by Autopilot. |
similar issue on our team's cloud composer cluster,
Thought of setting up exclusion filter as suggested here but it may still consume the logging quota
https://cloud.google.com/logging/docs/routing/overview#exclusions |
I would like move this issue to a broader audience so that it can get the attention in the right location. |
For us the solution was simply to stop using waitress and python altogether. |
In an effort to improve visibility of this issue and allow for collaboration across libraries, I've filed a public issue: |
Summarized from a customer issue:
Somehow, something with this google PubSub service is printing out Mutex.cc logs at an enormous scale of 3.5M / 5 minutes. The logs look as follows:
The information provided about version:
The customer suggested "I believe the core of the issue is that pubsub is sending messages without a delay and no exponential backoff, thus causing massive spikes in utilization and an over-use of the logging module."
The text was updated successfully, but these errors were encountered: