Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow Hangs Indefinitely with 'mutex' and 'withSequence' in 'http' Template #12018

Closed
2 of 3 tasks
nikashamova opened this issue Oct 16, 2023 · 0 comments · Fixed by #12176
Closed
2 of 3 tasks

Workflow Hangs Indefinitely with 'mutex' and 'withSequence' in 'http' Template #12018

nikashamova opened this issue Oct 16, 2023 · 0 comments · Fixed by #12176
Labels
area/looping `withParams`, `withItems`, and `withSequence` area/mutex-semaphore area/templates/http P3 Low priority type/bug

Comments

@nikashamova
Copy link

nikashamova commented Oct 16, 2023

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

We have encountered a bug that affects the execution of workflows when a combination of mutex and withSequence is used within the http template. When this combination is employed, the workflow does not reach a completion state, causing it to hang indefinitely.
The goal was to make the http template steps run sequentially rather than in parallel.

Version

v3.4.5

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
  kind: Workflow
  metadata:
    name: http-wf-test-mutex
  spec:
    entrypoint: main
    templates:
      - name: main
        steps:
          - - name: call-endpoint
              template: call-endpoint
              withSequence:
                count: "2"

      - name: call-endpoint
        synchronization:
          mutex:
            name: test
        http:
          url: "https://api.github.com/users/hadley/repos"
          successCondition: "response.statusCode == 200"

Logs from the workflow controller

time="2023-10-16T15:07:32.721Z" level=info msg="Processing workflow" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:32.727Z" level=info msg="Updated phase  -> Running" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:32.727Z" level=info msg="Steps node http-wf-test-mutex initialized Running" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:32.727Z" level=info msg="StepGroup node http-wf-test-mutex-2049211380 initialized Running" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:32.727Z" level=info msg="argo-events/Mutex/test acquired by argo-events/http-wf-test-mutex/http-wf-test-mutex-3714441117. Lock availability: 0/1" mutex=argo-events/Mutex/test
time="2023-10-16T15:07:32.727Z" level=info msg="Node http-wf-test-mutex[0].call-endpoint(0:0) acquired synchronization lock" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:32.727Z" level=info msg="HTTP node http-wf-test-mutex-3714441117 initialized Pending" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:32.727Z" level=info msg=" node http-wf-test-mutex-1388791593 initialized Pending" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:32.727Z" level=info msg="Workflow step group node http-wf-test-mutex-2049211380 not yet completed" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:32.727Z" level=info msg="TaskSet Reconciliation" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:32.727Z" level=info msg="Creating TaskSet" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:32.746Z" level=info msg=reconcileAgentPod namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:32.844Z" level=info msg="Created Agent pod" namespace=argo-events podName=http-wf-test-mutex-1340600742-agent workflow=http-wf-test-mutex
time="2023-10-16T15:07:32.844Z" level=info msg=updateAgentPodStatus namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:32.844Z" level=info msg=assessAgentPodStatus namespace=argo-events podName=http-wf-test-mutex-1340600742-agent
time="2023-10-16T15:07:32.870Z" level=info msg="Workflow update successful" namespace=argo-events phase=Running resourceVersion=523805700 workflow=http-wf-test-mutex
time="2023-10-16T15:07:42.721Z" level=info msg="Processing workflow" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:42.722Z" level=info msg="Task-result reconciliation" namespace=argo-events numObjs=0 workflow=http-wf-test-mutex
time="2023-10-16T15:07:42.722Z" level=info msg=updateAgentPodStatus namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:42.722Z" level=info msg=assessAgentPodStatus namespace=argo-events podName=http-wf-test-mutex-1340600742-agent
time="2023-10-16T15:07:42.722Z" level=info msg="Node http-wf-test-mutex[0].call-endpoint(0:0) acquired synchronization lock" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:42.722Z" level=info msg="Workflow step group node http-wf-test-mutex-2049211380 not yet completed" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:42.722Z" level=info msg="TaskSet Reconciliation" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:42.722Z" level=info msg=reconcileAgentPod namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:42.746Z" level=info msg="Workflow update successful" namespace=argo-events phase=Running resourceVersion=523805811 workflow=http-wf-test-mutex
time="2023-10-16T15:07:52.747Z" level=info msg="Processing workflow" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:52.748Z" level=info msg="Task-result reconciliation" namespace=argo-events numObjs=0 workflow=http-wf-test-mutex
time="2023-10-16T15:07:52.748Z" level=info msg=updateAgentPodStatus namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:52.748Z" level=info msg=assessAgentPodStatus namespace=argo-events podName=http-wf-test-mutex-1340600742-agent
time="2023-10-16T15:07:52.748Z" level=info msg="Node http-wf-test-mutex[0].call-endpoint(0:0) acquired synchronization lock" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:52.748Z" level=info msg="Workflow step group node http-wf-test-mutex-2049211380 not yet completed" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:52.748Z" level=info msg="TaskSet Reconciliation" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:52.748Z" level=info msg=reconcileAgentPod namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:07:52.848Z" level=info msg="Workflow update successful" namespace=argo-events phase=Running resourceVersion=523805955 workflow=http-wf-test-mutex
time="2023-10-16T15:08:02.953Z" level=info msg="Processing workflow" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:08:02.957Z" level=info msg="Task-result reconciliation" namespace=argo-events numObjs=0 workflow=http-wf-test-mutex
time="2023-10-16T15:08:02.957Z" level=info msg=updateAgentPodStatus namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:08:02.957Z" level=info msg=assessAgentPodStatus namespace=argo-events podName=http-wf-test-mutex-1340600742-agent
time="2023-10-16T15:08:02.957Z" level=info msg="Lock has been released by argo-events/http-wf-test-mutex/http-wf-test-mutex-3714441117. Available locks: 1" mutex=argo-events/Mutex/test
time="2023-10-16T15:08:02.957Z" level=info msg="argo-events/Mutex/test acquired by argo-events/http-wf-test-mutex/http-wf-test-mutex-1388791593. Lock availability: 0/1" mutex=argo-events/Mutex/test
time="2023-10-16T15:08:02.957Z" level=info msg="Node http-wf-test-mutex[0].call-endpoint(1:1) acquired synchronization lock" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:08:02.957Z" level=info msg="Workflow step group node http-wf-test-mutex-2049211380 not yet completed" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:08:02.957Z" level=info msg="TaskSet Reconciliation" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:08:02.957Z" level=info msg=reconcileAgentPod namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:08:03.072Z" level=info msg="Workflow update successful" namespace=argo-events phase=Running resourceVersion=523806058 workflow=http-wf-test-mutex
time="2023-10-16T15:08:03.959Z" level=info msg="Processing workflow" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:08:03.962Z" level=info msg="Task-result reconciliation" namespace=argo-events numObjs=0 workflow=http-wf-test-mutex
time="2023-10-16T15:08:03.962Z" level=info msg=updateAgentPodStatus namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:08:03.962Z" level=info msg=assessAgentPodStatus namespace=argo-events podName=http-wf-test-mutex-1340600742-agent
time="2023-10-16T15:08:03.962Z" level=info msg="Node http-wf-test-mutex[0].call-endpoint(1:1) acquired synchronization lock" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:08:03.962Z" level=info msg="Workflow step group node http-wf-test-mutex-2049211380 not yet completed" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:08:03.962Z" level=info msg="TaskSet Reconciliation" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:08:03.962Z" level=info msg=reconcileAgentPod namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:08:04.059Z" level=info msg="Workflow update successful" namespace=argo-events phase=Running resourceVersion=523806065 workflow=http-wf-test-mutex
time="2023-10-16T15:11:21.635Z" level=info msg="Processing workflow" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:11:21.638Z" level=info msg="Task-result reconciliation" namespace=argo-events numObjs=0 workflow=http-wf-test-mutex
time="2023-10-16T15:11:21.638Z" level=info msg=updateAgentPodStatus namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:11:21.638Z" level=info msg=assessAgentPodStatus namespace=argo-events podName=http-wf-test-mutex-1340600742-agent
time="2023-10-16T15:11:21.638Z" level=info msg="Node http-wf-test-mutex[0].call-endpoint(1:1) acquired synchronization lock" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:11:21.638Z" level=info msg="Workflow step group node http-wf-test-mutex-2049211380 not yet completed" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:11:21.638Z" level=info msg="TaskSet Reconciliation" namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:11:21.638Z" level=info msg=reconcileAgentPod namespace=argo-events workflow=http-wf-test-mutex
time="2023-10-16T15:11:21.753Z" level=info msg="Workflow update successful" namespace=argo-events phase=Running resourceVersion=523808412 workflow=http-wf-test-mutex

Logs from in your workflow's wait container

error: container wait is not valid for pod http-wf-test-mutex-1340600742-agent
@agilgur5 agilgur5 added area/mutex-semaphore area/templates/http area/looping `withParams`, `withItems`, and `withSequence` P3 Low priority and removed area/templates/http labels Oct 16, 2023
shmruin added a commit to shmruin/argo-workflows that referenced this issue Nov 10, 2023
sarabala1979 pushed a commit that referenced this issue Jan 16, 2024
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this issue Feb 27, 2024
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this issue Feb 28, 2024
…oproj#12018 (argoproj#12176)

Signed-off-by: shmruin <meme_hm@naver.com>
Signed-off-by: Isitha Subasinghe <isubasinghe@student.unimelb.edu.au>
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this issue May 6, 2024
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this issue May 7, 2024
@agilgur5 agilgur5 added this to the v3.4.x patches milestone May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/looping `withParams`, `withItems`, and `withSequence` area/mutex-semaphore area/templates/http P3 Low priority type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants