-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
saving log artifact to s3 missing retries #9914
Comments
Is it because the underlying code that creates the client for s3 -- doesn't have https://github.com/argoproj/pkg/blob/513d2b4d4df37502f7e05732f381c60ffd075b6f/s3/s3.go#L93 |
another similar error
maybe https://github.com/argoproj/argo-workflows/blob/master/workflow/artifacts/s3/errors.go#L11-L21 needs an update @danajp ? |
This also happened to me but with slightly different error:
It would be great to have the possibility to retry just the saving of an artifact. As far as I understand, if this is treated as transient error then the whole Step would be retried, right? I believe there would be value in adding that feature. What the status here? Is anybody looking into it? I can help and contribute the fix myself. |
your PR will be appreciated! @kromanow94 |
@tooptoop4 sure, I should start working on that sometime around after next two weeks :). |
Signed-off-by: Helge Willum Thingvad <1019305+tachylatus@users.noreply.github.com>
Signed-off-by: Helge Willum Thingvad <1019305+tachylatus@users.noreply.github.com>
If I'm not mistaken, #12191 doesn't fix this issue because this issue is about retrying just the routine for saving artifacts. The changes in PR seems to retry the whole step if there was an error when saving the artifact. |
@kromanow94 the code in #12191 is used inside of an internal back-off loop in the Executor. So that is in fact only retrying the artifact request and not the whole step. |
Signed-off-by: Helge Willum Thingvad <1019305+tachylatus@users.noreply.github.com>
Signed-off-by: Helge Willum Thingvad <1019305+tachylatus@users.noreply.github.com>
Pre-requisites
:latest
What happened/what you expected to happen?
expecting retry for transient error
i guess this is the code: https://github.com/argoproj/argo-workflows/blob/v3.4.2/workflow/artifacts/logging/driver.go#L49
this error only happens around once a month
Version
v3.4.2
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
could be any wf, as long as you have setup logs to be archived. am using AWS IRSA to auth against s3
Logs from the workflow controller
kubectl logs -n argo deploy/workflow-controller | grep ${workflow}
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: