Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

saving log artifact to s3 missing retries #9914

Closed
2 of 3 tasks
tooptoop4 opened this issue Oct 27, 2022 · 9 comments · Fixed by #12191
Closed
2 of 3 tasks

saving log artifact to s3 missing retries #9914

tooptoop4 opened this issue Oct 27, 2022 · 9 comments · Fixed by #12191
Labels
area/archive-logs Archive Logs feature area/artifacts S3/GCP/OSS/Git/HDFS etc type/feature Feature request

Comments

@tooptoop4
Copy link
Contributor

tooptoop4 commented Oct 27, 2022

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

expecting retry for transient error

time="2022-10-27T00:40:12.699Z" level=info msg="Save artifact" artifactName=main-logs duration=4.139891602s error="failed to create new S3 client: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.amazonaws.com/\": Service Unavailable" key=argo_logs/2022/10/27/00/36/aws-redact-workflow-20221027t003621171zsvhv2/aws-redact-workflow-20221027t003621171zsvhv2-in-d-272392656/main.log

i guess this is the code: https://github.com/argoproj/argo-workflows/blob/v3.4.2/workflow/artifacts/logging/driver.go#L49

image

this error only happens around once a month

Version

v3.4.2

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

could be any wf, as long as you have setup logs to be archived. am using AWS IRSA to auth against s3

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

time="2022-10-27T00:36:21.428Z" level=info msg="Processing workflow" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:21.470Z" level=info msg="Updated phase  -> Running" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:21.473Z" level=info msg="Retry node aws-redact-workflow-20221027t003621171zsvhv2 initialized Running" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:21.476Z" level=info msg="Steps node aws-redact-workflow-20221027t003621171zsvhv2-3756069412 initialized Running" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:21.477Z" level=info msg="StepGroup node aws-redact-workflow-20221027t003621171zsvhv2-3385936590 initialized Running" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:21.483Z" level=info msg="Retry node aws-redact-workflow-20221027t003621171zsvhv2-3939989835 initialized Running" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:21.485Z" level=info msg="Pod node aws-redact-workflow-20221027t003621171zsvhv2-4145714118 initialized Pending" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:21.592Z" level=info msg="Created pod: aws-redact-workflow-20221027t003621171zsvhv2(0)[0].parse-messages-split(0) (aws-redact-workflow-20221027t003621171zsvhv2-parse-messages-split-4145714118)" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:21.592Z" level=info msg="Workflow step group node aws-redact-workflow-20221027t003621171zsvhv2-3385936590 not yet completed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:21.592Z" level=info msg="TaskSet Reconciliation" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:21.592Z" level=info msg=reconcileAgentPod namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:21.667Z" level=info msg="Workflow update successful" namespace=auth phase=Running resourceVersion=118668703 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:31.430Z" level=info msg="Processing workflow" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:31.439Z" level=info msg="Task-result reconciliation" namespace=auth numObjs=0 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:31.439Z" level=info msg="node changed" namespace=auth new.message=PodInitializing new.phase=Pending new.progress=0/1 nodeID=aws-redact-workflow-20221027t003621171zsvhv2-4145714118 old.message= old.phase=Pending old.progress=0/1 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:31.447Z" level=info msg="Workflow step group node aws-redact-workflow-20221027t003621171zsvhv2-3385936590 not yet completed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:31.448Z" level=info msg="TaskSet Reconciliation" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:31.450Z" level=info msg=reconcileAgentPod namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:31.910Z" level=info msg="Workflow update successful" namespace=auth phase=Running resourceVersion=118668836 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:44.642Z" level=info msg="Processing workflow" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:44.649Z" level=info msg="Task-result reconciliation" namespace=auth numObjs=0 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:44.657Z" level=info msg="node changed" namespace=auth new.message= new.phase=Running new.progress=0/1 nodeID=aws-redact-workflow-20221027t003621171zsvhv2-4145714118 old.message=PodInitializing old.phase=Pending old.progress=0/1 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:44.663Z" level=info msg="Workflow step group node aws-redact-workflow-20221027t003621171zsvhv2-3385936590 not yet completed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:44.663Z" level=info msg="TaskSet Reconciliation" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:44.663Z" level=info msg=reconcileAgentPod namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:44.723Z" level=info msg="Workflow update successful" namespace=auth phase=Running resourceVersion=118669021 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.725Z" level=info msg="Processing workflow" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.732Z" level=info msg="Task-result reconciliation" namespace=auth numObjs=1 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.732Z" level=info msg="task-result changed" namespace=auth nodeID=aws-redact-workflow-20221027t003621171zsvhv2-4145714118 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.732Z" level=info msg="node changed" namespace=auth new.message= new.phase=Succeeded new.progress=0/1 nodeID=aws-redact-workflow-20221027t003621171zsvhv2-4145714118 old.message= old.phase=Running old.progress=0/1 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.739Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-3939989835 phase Running -> Succeeded" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.739Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-3939989835 finished: 2022-10-27 00:36:54.739867786 +0000 UTC" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.740Z" level=info msg="Step group node aws-redact-workflow-20221027t003621171zsvhv2-3385936590 successful" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.740Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-3385936590 phase Running -> Succeeded" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.740Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-3385936590 finished: 2022-10-27 00:36:54.740412116 +0000 UTC" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.740Z" level=info msg="StepGroup node aws-redact-workflow-20221027t003621171zsvhv2-2244911403 initialized Running" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.740Z" level=info msg="SG Outbound nodes of aws-redact-workflow-20221027t003621171zsvhv2-3939989835 are [aws-redact-workflow-20221027t003621171zsvhv2-4145714118]" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.741Z" level=info msg="Skipping aws-redact-workflow-20221027t003621171zsvhv2(0)[1].s: when ''r' == 'a'' evaluated false" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.741Z" level=info msg="Skipped node aws-redact-workflow-20221027t003621171zsvhv2-427105756 initialized Skipped (message: when ''r' == 'a'' evaluated false)" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.741Z" level=info msg="Step group node aws-redact-workflow-20221027t003621171zsvhv2-2244911403 successful" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.742Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-2244911403 phase Running -> Succeeded" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.742Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-2244911403 finished: 2022-10-27 00:36:54.742244045 +0000 UTC" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.742Z" level=info msg="StepGroup node aws-redact-workflow-20221027t003621171zsvhv2-2245455856 initialized Running" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.742Z" level=info msg="SG Outbound nodes of aws-redact-workflow-20221027t003621171zsvhv2-427105756 are [aws-redact-workflow-20221027t003621171zsvhv2-427105756]" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.743Z" level=info msg="Skipping aws-redact-workflow-20221027t003621171zsvhv2(0)[2].a: when ''r' == 'a'' evaluated false" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.743Z" level=info msg="Skipped node aws-redact-workflow-20221027t003621171zsvhv2-4071976917 initialized Skipped (message: when ''r' == 'a'' evaluated false)" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.743Z" level=info msg="Step group node aws-redact-workflow-20221027t003621171zsvhv2-2245455856 successful" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.743Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-2245455856 phase Running -> Succeeded" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.743Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-2245455856 finished: 2022-10-27 00:36:54.743683108 +0000 UTC" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.743Z" level=info msg="StepGroup node aws-redact-workflow-20221027t003621171zsvhv2-3386083685 initialized Running" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.744Z" level=info msg="SG Outbound nodes of aws-redact-workflow-20221027t003621171zsvhv2-4071976917 are [aws-redact-workflow-20221027t003621171zsvhv2-4071976917]" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.748Z" level=info msg="Retry node aws-redact-workflow-20221027t003621171zsvhv2-4052194153 initialized Running" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.751Z" level=info msg="Pod node aws-redact-workflow-20221027t003621171zsvhv2-272392656 initialized Pending" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.859Z" level=info msg="Created pod: aws-redact-workflow-20221027t003621171zsvhv2(0)[3].ir(0) (aws-redact-workflow-20221027t003621171zsvhv2-ir-272392656)" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.860Z" level=info msg="Workflow step group node aws-redact-workflow-20221027t003621171zsvhv2-3386083685 not yet completed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.861Z" level=info msg="TaskSet Reconciliation" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.862Z" level=info msg=reconcileAgentPod namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:36:54.943Z" level=info msg="Workflow update successful" namespace=auth phase=Running resourceVersion=118669118 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:04.888Z" level=info msg="Processing workflow" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:04.901Z" level=info msg="Task-result reconciliation" namespace=auth numObjs=1 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:04.903Z" level=info msg="node unchanged" namespace=auth nodeID=aws-redact-workflow-20221027t003621171zsvhv2-4145714118 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:04.903Z" level=info msg="node changed" namespace=auth new.message=PodInitializing new.phase=Pending new.progress=0/1 nodeID=aws-redact-workflow-20221027t003621171zsvhv2-272392656 old.message= old.phase=Pending old.progress=0/1 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:04.906Z" level=info msg="SG Outbound nodes of aws-redact-workflow-20221027t003621171zsvhv2-3939989835 are [aws-redact-workflow-20221027t003621171zsvhv2-4145714118]" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:04.907Z" level=info msg="SG Outbound nodes of aws-redact-workflow-20221027t003621171zsvhv2-427105756 are [aws-redact-workflow-20221027t003621171zsvhv2-427105756]" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:04.909Z" level=info msg="SG Outbound nodes of aws-redact-workflow-20221027t003621171zsvhv2-4071976917 are [aws-redact-workflow-20221027t003621171zsvhv2-4071976917]" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:04.914Z" level=info msg="Workflow step group node aws-redact-workflow-20221027t003621171zsvhv2-3386083685 not yet completed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:04.915Z" level=info msg="TaskSet Reconciliation" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:04.916Z" level=info msg=reconcileAgentPod namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:04.972Z" level=info msg="Workflow update successful" namespace=auth phase=Running resourceVersion=118669227 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:18.172Z" level=info msg="Processing workflow" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:18.182Z" level=info msg="Task-result reconciliation" namespace=auth numObjs=1 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:18.183Z" level=info msg="node changed" namespace=auth new.message= new.phase=Running new.progress=0/1 nodeID=aws-redact-workflow-20221027t003621171zsvhv2-272392656 old.message=PodInitializing old.phase=Pending old.progress=0/1 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:18.183Z" level=info msg="node unchanged" namespace=auth nodeID=aws-redact-workflow-20221027t003621171zsvhv2-4145714118 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:18.186Z" level=info msg="SG Outbound nodes of aws-redact-workflow-20221027t003621171zsvhv2-3939989835 are [aws-redact-workflow-20221027t003621171zsvhv2-4145714118]" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:18.186Z" level=info msg="SG Outbound nodes of aws-redact-workflow-20221027t003621171zsvhv2-427105756 are [aws-redact-workflow-20221027t003621171zsvhv2-427105756]" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:18.186Z" level=info msg="SG Outbound nodes of aws-redact-workflow-20221027t003621171zsvhv2-4071976917 are [aws-redact-workflow-20221027t003621171zsvhv2-4071976917]" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:18.191Z" level=info msg="Workflow step group node aws-redact-workflow-20221027t003621171zsvhv2-3386083685 not yet completed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:18.191Z" level=info msg="TaskSet Reconciliation" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:18.191Z" level=info msg=reconcileAgentPod namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:37:18.238Z" level=info msg="Workflow update successful" namespace=auth phase=Running resourceVersion=118669337 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:18.981Z" level=info msg="Processing workflow" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:18.996Z" level=info msg="Task-result reconciliation" namespace=auth numObjs=1 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:18.996Z" level=info msg="node unchanged" namespace=auth nodeID=aws-redact-workflow-20221027t003621171zsvhv2-4145714118 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:18.996Z" level=info msg="node changed" namespace=auth new.message="Error (exit code 1): failed to create new S3 client: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.amazonaws.com/\": Service Unavailable" new.phase=Error new.progress=0/1 nodeID=aws-redact-workflow-20221027t003621171zsvhv2-272392656 old.message= old.phase=Running old.progress=0/1 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:18.999Z" level=info msg="SG Outbound nodes of aws-redact-workflow-20221027t003621171zsvhv2-3939989835 are [aws-redact-workflow-20221027t003621171zsvhv2-4145714118]" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.000Z" level=info msg="SG Outbound nodes of aws-redact-workflow-20221027t003621171zsvhv2-427105756 are [aws-redact-workflow-20221027t003621171zsvhv2-427105756]" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.000Z" level=info msg="SG Outbound nodes of aws-redact-workflow-20221027t003621171zsvhv2-4071976917 are [aws-redact-workflow-20221027t003621171zsvhv2-4071976917]" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.005Z" level=info msg="Max duration limit exceeded. Failing..." namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.005Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-4052194153 phase Running -> Error" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.006Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-4052194153 message: Max duration limit exceeded" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.006Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-4052194153 finished: 2022-10-27 00:40:19.006051237 +0000 UTC" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.006Z" level=info msg="Step group node aws-redact-workflow-20221027t003621171zsvhv2-3386083685 deemed failed: child 'aws-redact-workflow-20221027t003621171zsvhv2-4052194153' failed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.006Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-3386083685 phase Running -> Failed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.006Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-3386083685 message: child 'aws-redact-workflow-20221027t003621171zsvhv2-4052194153' failed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.006Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-3386083685 finished: 2022-10-27 00:40:19.006331232 +0000 UTC" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.006Z" level=info msg="step group aws-redact-workflow-20221027t003621171zsvhv2-3386083685 was unsuccessful: child 'aws-redact-workflow-20221027t003621171zsvhv2-4052194153' failed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.006Z" level=info msg="Outbound nodes of aws-redact-workflow-20221027t003621171zsvhv2-4052194153 is [aws-redact-workflow-20221027t003621171zsvhv2-272392656]" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.006Z" level=info msg="Outbound nodes of aws-redact-workflow-20221027t003621171zsvhv2-3756069412 is [aws-redact-workflow-20221027t003621171zsvhv2-272392656]" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.006Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-3756069412 phase Running -> Failed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.006Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-3756069412 message: child 'aws-redact-workflow-20221027t003621171zsvhv2-4052194153' failed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.006Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-3756069412 finished: 2022-10-27 00:40:19.006755909 +0000 UTC" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.006Z" level=info msg="Checking daemoned children of aws-redact-workflow-20221027t003621171zsvhv2-3756069412" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.009Z" level=info msg="Max duration limit exceeded. Failing..." namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.010Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2 phase Running -> Failed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.010Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2 message: Max duration limit exceeded" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.010Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2 finished: 2022-10-27 00:40:19.010203164 +0000 UTC" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.010Z" level=info msg="TaskSet Reconciliation" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.010Z" level=info msg=reconcileAgentPod namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.010Z" level=info msg="Running OnExit handler: exit-handler" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.012Z" level=info msg="Retry node aws-redact-workflow-20221027t003621171zsvhv2-1369574988 initialized Running" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.013Z" level=info msg="Steps node aws-redact-workflow-20221027t003621171zsvhv2-691715399 initialized Running" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.013Z" level=info msg="StepGroup node aws-redact-workflow-20221027t003621171zsvhv2-2059406947 initialized Running" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.016Z" level=info msg="Retry node aws-redact-workflow-20221027t003621171zsvhv2-3603194516 initialized Running" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.017Z" level=info msg="Pod node aws-redact-workflow-20221027t003621171zsvhv2-1912837311 initialized Pending" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.074Z" level=info msg="Created pod: aws-redact-workflow-20221027t003621171zsvhv2.onExit(0)[0].notifyError(0) (aws-redact-workflow-20221027t003621171zsvhv2-sendmail-1912837311)" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.074Z" level=info msg="Workflow step group node aws-redact-workflow-20221027t003621171zsvhv2-2059406947 not yet completed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:19.136Z" level=info msg="Workflow update successful" namespace=auth phase=Running resourceVersion=118670695 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:29.074Z" level=info msg="Processing workflow" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:29.094Z" level=info msg="Task-result reconciliation" namespace=auth numObjs=1 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:29.095Z" level=info msg="node unchanged" namespace=auth nodeID=aws-redact-workflow-20221027t003621171zsvhv2-4145714118 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:29.095Z" level=info msg="node unchanged" namespace=auth nodeID=aws-redact-workflow-20221027t003621171zsvhv2-272392656 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:29.096Z" level=info msg="node changed" namespace=auth new.message=PodInitializing new.phase=Pending new.progress=0/1 nodeID=aws-redact-workflow-20221027t003621171zsvhv2-1912837311 old.message= old.phase=Pending old.progress=0/1 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:29.100Z" level=info msg="TaskSet Reconciliation" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:29.100Z" level=info msg=reconcileAgentPod namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:29.100Z" level=info msg="Running OnExit handler: exit-handler" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:29.110Z" level=info msg="Workflow step group node aws-redact-workflow-20221027t003621171zsvhv2-2059406947 not yet completed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:29.178Z" level=info msg="Workflow update successful" namespace=auth phase=Running resourceVersion=118670801 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.356Z" level=info msg="Processing workflow" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.377Z" level=info msg="Task-result reconciliation" namespace=auth numObjs=2 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.377Z" level=info msg="task-result changed" namespace=auth nodeID=aws-redact-workflow-20221027t003621171zsvhv2-1912837311 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.377Z" level=info msg="node changed" namespace=auth new.message= new.phase=Succeeded new.progress=0/1 nodeID=aws-redact-workflow-20221027t003621171zsvhv2-1912837311 old.message=PodInitializing old.phase=Pending old.progress=0/1 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.378Z" level=info msg="node unchanged" namespace=auth nodeID=aws-redact-workflow-20221027t003621171zsvhv2-4145714118 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.379Z" level=info msg="node unchanged" namespace=auth nodeID=aws-redact-workflow-20221027t003621171zsvhv2-272392656 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.382Z" level=info msg="TaskSet Reconciliation" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.382Z" level=info msg=reconcileAgentPod namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.382Z" level=info msg="Running OnExit handler: exit-handler" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.386Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-3603194516 phase Running -> Succeeded" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.386Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-3603194516 finished: 2022-10-27 00:40:40.386551785 +0000 UTC" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.386Z" level=info msg="Step group node aws-redact-workflow-20221027t003621171zsvhv2-2059406947 successful" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.386Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-2059406947 phase Running -> Succeeded" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.386Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-2059406947 finished: 2022-10-27 00:40:40.386618116 +0000 UTC" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.386Z" level=info msg="Outbound nodes of aws-redact-workflow-20221027t003621171zsvhv2-3603194516 is [aws-redact-workflow-20221027t003621171zsvhv2-1912837311]" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.386Z" level=info msg="Outbound nodes of aws-redact-workflow-20221027t003621171zsvhv2-691715399 is [aws-redact-workflow-20221027t003621171zsvhv2-1912837311]" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.386Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-691715399 phase Running -> Succeeded" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.387Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-691715399 finished: 2022-10-27 00:40:40.387275468 +0000 UTC" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.387Z" level=info msg="Checking daemoned children of aws-redact-workflow-20221027t003621171zsvhv2-691715399" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.388Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-1369574988 phase Running -> Succeeded" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.388Z" level=info msg="node aws-redact-workflow-20221027t003621171zsvhv2-1369574988 finished: 2022-10-27 00:40:40.388975754 +0000 UTC" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.389Z" level=info msg="Updated phase Running -> Failed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.389Z" level=info msg="Updated message  -> Max duration limit exceeded" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.389Z" level=info msg="Marking workflow completed" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.389Z" level=info msg="Marking workflow as pending archiving" namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.389Z" level=info msg="Checking daemoned children of " namespace=auth workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.395Z" level=info msg="cleaning up pod" action=deletePod key=auth/aws-redact-workflow-20221027t003621171zsvhv2-1340600742-agent/deletePod
time="2022-10-27T00:40:40.463Z" level=info msg="Workflow update successful" namespace=auth phase=Failed resourceVersion=118670888 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.516Z" level=info msg="archiving workflow" namespace=auth uid=ebb1184a-4891-4f74-bd5a-50961124aef9 workflow=aws-redact-workflow-20221027t003621171zsvhv2
time="2022-10-27T00:40:40.647Z" level=info msg="Queueing Failed workflow auth/aws-redact-workflow-20221027t003621171zsvhv2 for delete in 4m0s due to TTL"
time="2022-10-27T00:40:45.514Z" level=info msg="cleaning up pod" action=deletePod key=auth/aws-redact-workflow-20221027t003621171zsvhv2-parse-messages-split-4145714118/deletePod
time="2022-10-27T00:40:45.514Z" level=info msg="cleaning up pod" action=deletePod key=auth/aws-redact-workflow-20221027t003621171zsvhv2-ir-272392656/deletePod
time="2022-10-27T00:40:45.514Z" level=info msg="cleaning up pod" action=deletePod key=auth/aws-redact-workflow-20221027t003621171zsvhv2-sendmail-1912837311/deletePod
time="2022-10-27T00:44:41.003Z" level=info msg="Deleting garbage collected workflow 'auth/aws-redact-workflow-20221027t003621171zsvhv2'"
time="2022-10-27T00:44:41.067Z" level=info msg="Successfully deleted 'auth/aws-redact-workflow-20221027t003621171zsvhv2'"

Logs from in your workflow's wait container

{"log":"time=\"2022-10-27T00:37:10.975Z\" level=info msg=\"Starting Workflow Executor\" version=v3.4.2\n","stream":"stderr","time":"2022-10-27T00:37:10.996347186Z"}
{"log":"time=\"2022-10-27T00:37:11.002Z\" level=info msg=\"Using executor retry strategy\" Duration=1s Factor=1.6 Jitter=0.5 Steps=5\n","stream":"stderr","time":"2022-10-27T00:37:11.028449349Z"}
{"log":"time=\"2022-10-27T00:37:11.009Z\" level=info msg=\"Starting deadline monitor\"\n","stream":"stderr","time":"2022-10-27T00:37:11.029791108Z"}
{"log":"time=\"2022-10-27T00:40:08.559Z\" level=info msg=\"Main container completed\" error=\"\u003cnil\u003e\"\n","stream":"stderr","time":"2022-10-27T00:40:08.559311871Z"}
{"log":"time=\"2022-10-27T00:40:08.559Z\" level=info msg=\"No Script output reference in workflow. Capturing script output ignored\"\n","stream":"stderr","time":"2022-10-27T00:40:08.559343671Z"}
{"log":"time=\"2022-10-27T00:40:08.559Z\" level=info msg=\"No output parameters\"\n","stream":"stderr","time":"2022-10-27T00:40:08.559379913Z"}
{"log":"time=\"2022-10-27T00:40:08.559Z\" level=info msg=\"No output artifacts\"\n","stream":"stderr","time":"2022-10-27T00:40:08.559387913Z"}
{"log":"time=\"2022-10-27T00:40:08.559Z\" level=info msg=\"S3 Save path: /tmp/argo/outputs/logs/main.log, key: argo_logs/2022/10/27/00/36/aws-redact-workflow-20221027t003621171zsvhv2/aws-redact-workflow-20221027t003621171zsvhv2-in-d-272392656/main.log\"\n","stream":"stderr","time":"2022-10-27T00:40:08.559791791Z"}
{"log":"time=\"2022-10-27T00:40:08.559Z\" level=info msg=\"Creating minio client using AWS SDK credentials\"\n","stream":"stderr","time":"2022-10-27T00:40:08.559806941Z"}
{"log":"time=\"2022-10-27T00:40:12.699Z\" level=warning msg=\"Non-transient error: WebIdentityErr: failed to retrieve credentials\\ncaused by: RequestError: send request failed\\ncaused by: Post \\\"https://sts.amazonaws.com/\\\": Service Unavailable\"\n","stream":"stderr","time":"2022-10-27T00:40:12.699772436Z"}
{"log":"time=\"2022-10-27T00:40:12.699Z\" level=info msg=\"Save artifact\" artifactName=main-logs duration=4.139891602s error=\"failed to create new S3 client: WebIdentityErr: failed to retrieve credentials\\ncaused by: RequestError: send request failed\\ncaused by: Post \\\"https://sts.amazonaws.com/\\\": Service Unavailable\" key=argo_logs/2022/10/27/00/36/aws-redact-workflow-20221027t003621171zsvhv2/aws-redact-workflow-20221027t003621171zsvhv2-in-d-272392656/main.log\n","stream":"stderr","time":"2022-10-27T00:40:12.699809087Z"}
{"log":"time=\"2022-10-27T00:40:12.699Z\" level=error msg=\"executor error: failed to create new S3 client: WebIdentityErr: failed to retrieve credentials\\ncaused by: RequestError: send request failed\\ncaused by: Post \\\"https://sts.amazonaws.com/\\\": Service Unavailable\"\n","stream":"stderr","time":"2022-10-27T00:40:12.699818097Z"}
{"log":"time=\"2022-10-27T00:40:12.699Z\" level=info msg=\"stopping progress monitor (context done)\" error=\"context canceled\"\n","stream":"stderr","time":"2022-10-27T00:40:12.699836267Z"}
{"log":"time=\"2022-10-27T00:40:12.699Z\" level=info msg=\"Deadline monitor stopped\"\n","stream":"stderr","time":"2022-10-27T00:40:12.699842207Z"}
{"log":"time=\"2022-10-27T00:40:12.704Z\" level=info msg=\"Alloc=6653 TotalAlloc=13077 Sys=18386 NumGC=4 Goroutines=4\"\n","stream":"stderr","time":"2022-10-27T00:40:12.704504406Z"}
{"log":"time=\"2022-10-27T00:40:12.704Z\" level=fatal msg=\"failed to create new S3 client: WebIdentityErr: failed to retrieve credentials\\ncaused by: RequestError: send request failed\\ncaused by: Post \\\"https://sts.amazonaws.com/\\\": Service Unavailable\"\n","stream":"stderr","time":"2022-10-27T00:40:12.704951586Z"}

@awwwd
Copy link
Contributor

awwwd commented Oct 27, 2022

Is it because the underlying code that creates the client for s3 -- doesn't have MaxRetries config?

https://github.com/argoproj/pkg/blob/513d2b4d4df37502f7e05732f381c60ffd075b6f/s3/s3.go#L93

@sarabala1979 sarabala1979 added type/feature Feature request and removed type/bug labels Oct 31, 2022
@tooptoop4
Copy link
Contributor Author

@tooptoop4
Copy link
Contributor Author

tooptoop4 commented Mar 23, 2023

another similar error

Error (exit code 1): failed to create new S3 client: WebIdentityErr: failed to retrieve credentials caused by: : status code: 408, request id:

maybe https://github.com/argoproj/argo-workflows/blob/master/workflow/artifacts/s3/errors.go#L11-L21 needs an update @danajp ?

@kromanow94
Copy link

This also happened to me but with slightly different error:

Error (exit code 1): failed to put file: Remote backend is unreachable (Put \"https://<my-bucket-name>.s3.dualstack.eu-central-1.amazonaws.com/artifacts/.../main.log\": Connection closed by foreign host https://<my-bucket-name>.s3.dualstack.eu-central-1.amazonaws.com/artifacts/.../main.log. Retry again.)

It would be great to have the possibility to retry just the saving of an artifact. As far as I understand, if this is treated as transient error then the whole Step would be retried, right? I believe there would be value in adding that feature.

What the status here? Is anybody looking into it? I can help and contribute the fix myself.

@tooptoop4
Copy link
Contributor Author

your PR will be appreciated! @kromanow94

@kromanow94
Copy link

@tooptoop4 sure, I should start working on that sometime around after next two weeks :).

@agilgur5 agilgur5 added the area/artifacts S3/GCP/OSS/Git/HDFS etc label Oct 27, 2023
tachylatus added a commit to tachylatus/argo-workflows that referenced this issue Nov 13, 2023
Signed-off-by: Helge Willum Thingvad <1019305+tachylatus@users.noreply.github.com>
tachylatus added a commit to tachylatus/argo-workflows that referenced this issue Nov 13, 2023
Signed-off-by: Helge Willum Thingvad <1019305+tachylatus@users.noreply.github.com>
terrytangyuan pushed a commit that referenced this issue Nov 13, 2023
Signed-off-by: Helge Willum Thingvad <1019305+tachylatus@users.noreply.github.com>
@kromanow94
Copy link

If I'm not mistaken, #12191 doesn't fix this issue because this issue is about retrying just the routine for saving artifacts. The changes in PR seems to retry the whole step if there was an error when saving the artifact.

terrytangyuan pushed a commit that referenced this issue Nov 27, 2023
Signed-off-by: Helge Willum Thingvad <1019305+tachylatus@users.noreply.github.com>
@agilgur5
Copy link
Member

@kromanow94 the code in #12191 is used inside of an internal back-off loop in the Executor. So that is in fact only retrying the artifact request and not the whole step.

@kromanow94
Copy link

@agilgur5 ah you're right! I see it's used around Save and Load. Thanks for explanation :)

@agilgur5 agilgur5 added the area/archive-logs Archive Logs feature label Feb 18, 2024
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this issue Mar 12, 2024
Signed-off-by: Helge Willum Thingvad <1019305+tachylatus@users.noreply.github.com>
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this issue May 6, 2024
Signed-off-by: Helge Willum Thingvad <1019305+tachylatus@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/archive-logs Archive Logs feature area/artifacts S3/GCP/OSS/Git/HDFS etc type/feature Feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants