-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
401 Unauthorized for GCP related sinks after a while #10828
Comments
Is the auth tied to the ServiceAccount being used with the Vector pod? What version of Kubernetes are you running? |
Yes and I double checked, the owner of the created objects on the GCS bucket is the pod's ServiceAccount.
1.18.20-gke.4501 |
I'm wondering if it's similar to #8616 (comment), seems like Kubernetes has added some rotation times to service accounts. |
I don't think it's related since, in this case, the GCS and Stackdriver sinks are authenticating using a token grabbed from the GKE Metadata Server here: Lines 60 to 86 in 3160892
It's kept up-to-date with the task: Lines 131 to 150 in ea0d002
FYI: It has now been running in production for 2 days without any 401 error. It makes me think that it might be related to some sporadic and uncaught error with the Metadata Server. Also, prior to the 401 errors I had, there are no signs in the logs that there had been a problem with the token renewal (nothing in the logs that says "Failed to update GCP authentication token.") |
Ah, gotcha - wasn't sure what it got tied to for auth. We could look to add more logging into the regen process if it comes up again, otherwise I'm not sure there's a good course of action. |
My thoughts too. It has not happened again yet. I'll keep en eye on it. You can close the issue if you want and I'll reopen if it happens again and I manage to catch more data. |
I am also seeing this sporadically. Scanning through vector logs this seems to recover automatically after < 1 minute, however I just experienced a 15 minute log outage on 1 VM, which again automatically recovered. Running vector 0.19.1 on debian buster. |
I am having this issue as well. At first it seemed to happen hourly and I thought it might be related to token expiration, but now sometimes it'll run for several hours.
To note: No k8s involved, and no recovery is experienced. Once it starts, it never clears up without a restart. |
I think this should be fixed by #12645, which should be in tonight's nightly build. Could you test and see if this is still happening for you? |
Unfortunately it still occurred.
Update: |
I think this has been fixed by #12757 which was just merged, at least I was unable to reproduce the problem after several hours of continuous running. Are you set up to build from sources to test if this fixes the problem for you? |
🤞 I've built and have it running. I'll let it run overnight and post any updates. Update: didn't make it past the first hour. :( |
@jdoupe which nightly did you run with (what does |
vector 0.22.0 (x86_64-unknown-linux-gnu 50dd7a7 2022-05-19) |
I have set up a test configs sending to both |
Should I see the renewal debug output with the current nightly? (With just a single -v?) |
No, I will be submitting a PR for adding the details, as it would be useful for future diagnosis. Edit: See #12814 |
@bruceg - Thanks much for all your time on this... Running Things looking good thus far. Just noting that it looks like the
|
Great, thanks for confirming. |
Hey @bruceg . I appreciate this issues has been closed for some months now, however we are seeing this problem when trying to send data to our GCS bucket from Vector. We have Vector running in Kubernetes, and we have found that we can send data to GCS for some time after restarting our pods, but after a while we start getting 401s (assuming because Vector has failed to refresh its token). Any possibility this problem has been reintroduced? The Vector image we are using is timberio/vector:0.24.1-alpine This is the error we're seeing: |
@RobertSLane Do you have healthchecks disabled by chance? The code for these sinks only starts the token refresh after a response from a healthcheck. See #13058 for details. |
i have health checks enabled but i'm seeing this with the GCP Cloud Monitoring sink. it seems to consistently fail with 401 errors after about an hour or so. i've looked through the code and i guess this is the addition which is present in a couple different sinks (but not, afaict, the cloud monitoring one): vector/src/sinks/gcp/pubsub.rs Line 224 in ea17c1f
i have a local branch that i'm using with a workaround for #14890, so i'll see if it's not too much to add a similar health check to the cloud monitoring code path as well. |
@jkachmar did you resolve this? We're still seeing this behavior with v0.29.1. After running for an hour, we get token expired errors: 2023-04-27T12:03:47.174065Z ERROR sink{component_kind="sink" component_id=gcp component_type=gcp_stackdriver_metrics component_name=gcp}:request{request_id=6983}: vector::sinks::util::sink: Response failed. response=Response { status: 401, version: HTTP/1.1, headers: {"www-authenticate": "Bearer realm="https://accounts.google.com/\", error="invalid_token"", "vary": "Origin", "vary": "X-Origin", "vary": "Referer", "content-type": "application/json; charset=UTF-8", "transfer-encoding": "chunked", "date": "Thu, 27 Apr 2023 12:03:47 GMT", "server": "ESF", "cache-control": "private", "x-xss-protection": "0", "x-frame-options": "SAMEORIGIN", "x-content-type-options": "nosniff"}, body: b"{\n "error": {\n "code": 401,\n "message": "Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.\",\n "status": "UNAUTHENTICATED",\n "details": [\n {\n "@type": "type.googleapis.com/google.rpc.ErrorInfo",\n "reason": "ACCESS_TOKEN_EXPIRED",\n "domain": "googleapis.com",\n "metadata": {\n "method": "google.monitoring.v3.MetricService.CreateTimeSeries",\n "service": "monitoring.googleapis.com"\n }\n }\n ]\n }\n}\n" } |
@ansel1 would you mind filling out a new bug report and fill out the sections in the template? |
After executing properly for a while on a GKE pod, both of my GCP related sinks (Stackdriver and Google Cloud Storage) start giving me 401 Unauthorized errors.
The underlying librairiesGcpAuthConfig/GcpCredentials don't seem to be handlingJWTaccess token renewal/refetch from the Metadata Server properly (since I'm not using a service account private key).Community Note
Vector Version
From the Docker image
timberio/vector:0.18.1-debian
Vector Configuration File
Actual Behavior
After a few hours of execution, the
JWTaccess token expires and every time Vector tries to flush to its GCP related sinks I get this from stderr:The text was updated successfully, but these errors were encountered: