-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ideas for debugging a timeout? #675
Comments
Looks like that's the case, yeah. From https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#limitations:
I'm guessing the gcloud CLI has some retry logic? Maybe the SDK had a recent update that adds this retry logic as well, so we might be able to fix this by using the latest SDK version. |
Ah, great find, thanks @autrilla I'm not sure how sops calls the SDK — I'm using the latest Feel free to close the issue / repurpose to the SDK update. |
I'm not sure it would actually fix it. Would it be possible for you to build SOPS locally, but with https://github.com/mozilla/sops/blob/master/go.mod#L6 to v0.57.0, which appears to be the latest version, and then try and see if that fixes it? I kind of doubt it, since there's no mention of anything like that in the changelog. IMO this is something the SDK should handle, but if they're unwilling to, we could retry on the SOPS side when we hit that error. |
Yes, on master or develop? |
develop ideally |
Unfortunately I get the same result:
If there's a command I can run to check I have the updated sdk version on that pod, let me know. I haven't used go before, though everything went fairly smoothly. Here's the branch I'm using: develop...max-sixty:update-gcloud-sdk I agree it seems like a problem gcloud should solve rather than sops. For the moment, I'll add a sleep & cache into my application code. |
I have an issue that is probably not the fault of sops, but I can only replicate with sops — and so if anyone has hit something similar, or has thoughts on how I could debug this further, I'd appreciate any insights.
I'm using
sops
on a GKE cluster set up with Workload Identity. The following command fails:kubectl run -it --image gcr.io/[...] \ --serviceaccount argo-service-account \ --namespace default \ --rm \ -- test-pod bash -c 'sops -d [...]'
...with...
But the following command succeeds:
...so adding a
sleep 5
remedies the failure. I've replicated a few times to ensure this is the reason. I can't replicate the failures with any calls togcloud kms encrypt
; I only get this behavior with sops.Potentially sops is "too fast" and issues a request before GKE has had the chance to set up permissions for the pod?
Besides adding
sleep 5
to every instantiation, is there any way to allow for a longer timeout with sops? Or any other ideas for debugging?Thank you!
The text was updated successfully, but these errors were encountered: