Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-step workflow does not terminate (wait container does not exist with Docker executor in v3.0) #6064

Closed
ongchinkai opened this issue Jun 2, 2021 · 22 comments · Fixed by #6083
Assignees
Labels
Milestone

Comments

@ongchinkai
Copy link

Summary

Executing a multi-step workflow does not terminate or proceed to the next step even after the pod has terminated.

The issue started at https://github.com/hyfen-nl/PIVT/issues/106, where the developer identified this as a potential Argo issue. The logs below are based on the example at https://argoproj.github.io/argo-workflows/examples/#steps.

Diagnostics

What Kubernetes provider are you using? Digital Ocean

What version of Argo Workflows are you running? v3.0.7

Paste a workflow that reproduces the bug, including status:
kubectl get wf -o yaml ${workflow} 

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  creationTimestamp: "2021-06-02T07:19:32Z"
  generateName: steps-
  generation: 3
  labels:
    workflows.argoproj.io/phase: Running
  name: steps-5xt4d
  namespace: default
  resourceVersion: "3619084"
  uid: bc955ea0-7a0c-4a36-8a37-d3fb10e27615
spec:
  arguments: {}
  entrypoint: hello-hello-hello
  templates:
  - inputs: {}
    metadata: {}
    name: hello-hello-hello
    outputs: {}
    steps:
    - - arguments:
          parameters:
          - name: message
            value: hello1
        name: hello1
        template: whalesay
    - - arguments:
          parameters:
          - name: message
            value: hello2a
        name: hello2a
        template: whalesay
      - arguments:
          parameters:
          - name: message
            value: hello2b
        name: hello2b
        template: whalesay
  - container:
      args:
      - '{{inputs.parameters.message}}'
      command:
      - cowsay
      image: docker/whalesay
      name: ""
      resources: {}
    inputs:
      parameters:
      - name: message
    metadata: {}
    name: whalesay
    outputs: {}
status:
  artifactRepositoryRef:
    default: true
  conditions:
  - status: "True"
    type: PodRunning
  finishedAt: null
  nodes:
    steps-5xt4d:
      children:
      - steps-5xt4d-3743377224
      displayName: steps-5xt4d
      finishedAt: null
      id: steps-5xt4d
      name: steps-5xt4d
      phase: Running
      progress: 0/1
      startedAt: "2021-06-02T07:19:32Z"
      templateName: hello-hello-hello
      templateScope: local/steps-5xt4d
      type: Steps
    steps-5xt4d-293443185:
      boundaryID: steps-5xt4d
      displayName: hello1
      finishedAt: null
      hostNodeName: hlf-pool1-8rnem
      id: steps-5xt4d-293443185
      inputs:
        parameters:
        - name: message
          value: hello1
      name: steps-5xt4d[0].hello1
      phase: Running
      progress: 0/1
      startedAt: "2021-06-02T07:19:32Z"
      templateName: whalesay
      templateScope: local/steps-5xt4d
      type: Pod
    steps-5xt4d-3743377224:
      boundaryID: steps-5xt4d
      children:
      - steps-5xt4d-293443185
      displayName: '[0]'
      finishedAt: null
      id: steps-5xt4d-3743377224
      name: steps-5xt4d[0]
      phase: Running
      progress: 0/1
      startedAt: "2021-06-02T07:19:32Z"
      templateScope: local/steps-5xt4d
      type: StepGroup
  phase: Running
  progress: 0/1
  startedAt: "2021-06-02T07:19:32Z"
Paste the logs from the workflow controller:
kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

time="2021-06-02T07:19:32.754Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:32.769Z" level=info msg="Updated phase  -> Running" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:32.780Z" level=info msg="Steps node steps-5xt4d initialized Running" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:32.786Z" level=info msg="StepGroup node steps-5xt4d-3743377224 initialized Running" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:32.790Z" level=info msg="Pod node steps-5xt4d-293443185 initialized Pending" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:32.808Z" level=info msg="Created pod: steps-5xt4d[0].hello1 (steps-5xt4d-293443185)" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:32.808Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:32.852Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=3619053 workflow=steps-5xt4d
time="2021-06-02T07:19:42.869Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:42.874Z" level=info msg="Updating node steps-5xt4d-293443185 status Pending -> Running" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:42.882Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:42.930Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=3619084 workflow=steps-5xt4d
time="2021-06-02T07:19:52.913Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:19:52.916Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:39:52.914Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:39:52.915Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:59:52.914Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T07:59:52.915Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
time="2021-06-02T08:19:52.919Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T08:19:52.919Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
time="2021-06-02T08:39:52.919Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T08:39:52.920Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
time="2021-06-02T08:59:52.920Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T08:59:52.921Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
time="2021-06-02T09:19:52.920Z" level=info msg="Processing workflow" namespace=default workflow=steps-5xt4d
time="2021-06-02T09:19:52.921Z" level=info msg="Workflow step group node steps-5xt4d-3743377224 not yet completed" namespace=default workflow=steps-5xt4d
Paste the logs from your workflow's wait container:
kubectl logs -c wait -l workflows.argoproj.io/workflow=${workflow}

time="2021-06-02T09:24:46.518Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:47.681Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:48.847Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:50.027Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:51.191Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:52.360Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:53.565Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:54.767Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:55.891Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"
time="2021-06-02T09:24:56.933Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-5xt4d-293443185"

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@alexec
Copy link
Contributor

alexec commented Jun 2, 2021

I suspect this is a setup issue. Your easiest solution would be to try the pjs executor and see if that works. If that works, it points to a bug rather than a setup issue.

@alexec alexec added this to the v3.0 milestone Jun 2, 2021
@ongchinkai
Copy link
Author

I suspect this is a setup issue. Your easiest solution would be to try the pjs executor and see if that works. If that works, it points to a bug rather than a setup issue.

I checked https://github.com/argoproj/argo-workflows/blob/master/docs/workflow-executors.md but I don't see this pjs executor that you mentioned. Do you mean pns executor, by any chance?

Also, how should I go about configuring this executor? Because even the hello-world workflow example above also failed to terminate cleanly.

@alexec
Copy link
Contributor

alexec commented Jun 2, 2021

Sorry. Autocorrect. PNS

@alexec
Copy link
Contributor

alexec commented Jun 3, 2021

@ongchinkai I've created a dev build for you to test argoproj/argoexec:dev-signal. This includes a potential fix for the Docker executor.

@ongchinkai
Copy link
Author

Sorry if this is a stupid question but how do I go about testing this on DigitalOcean? I tried kubectl apply -n argo -f https://raw.githubusercontent.com/argoproj/argo-workflows/dev-signal/manifests/install.yaml but it seems like there are no changes.

@alexec
Copy link
Contributor

alexec commented Jun 3, 2021

Just edit wherever you have configured the executor. You might just be able to search and replace in you manifests.

@ongchinkai
Copy link
Author

I downloaded the contents of https://raw.githubusercontent.com/argoproj/argo-workflows/dev-signal/manifests/install.yaml, and changed argoexec:latest to argoexec:dev-signal. Then I run kubectl apply -n argo -f devsignal.yaml.

This is the describe output for workflow-controller:

[isprintsg@vmmock3 fabric-kube]$ kubectl describe pod -n argo workflow-controller-858c8985dc-7q69j
Name:         workflow-controller-858c8985dc-7q69j
Namespace:    argo
Priority:     0
Node:         hlf-pool1-8rnem/10.104.0.8
Start Time:   Thu, 03 Jun 2021 15:20:11 +0800
Labels:       app=workflow-controller
              pod-template-hash=858c8985dc
Annotations:  <none>
Status:       Running
IP:           10.244.0.142
IPs:
  IP:           10.244.0.142
Controlled By:  ReplicaSet/workflow-controller-858c8985dc
Containers:
  workflow-controller:
    Container ID:  containerd://448fd6de592f4b0bb6ab43e1f6620b807bd7e5f66f6d73a337aa76e9fdc52e5f
    Image:         docker.io/argoproj/workflow-controller:latest
    Image ID:      docker.io/argoproj/workflow-controller@sha256:0459525ffc0354c35d68b6e548dfaa2778ede667459a967df0025fad52e04dca
    Ports:         9090/TCP, 6060/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      workflow-controller
    Args:
      --configmap
      workflow-controller-configmap
      --executor-image
      argoproj/argoexec:dev-signal
    State:          Running
      Started:      Thu, 03 Jun 2021 15:20:13 +0800
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:6060/healthz delay=90s timeout=30s period=60s #success=1 #failure=3
    Environment:
      LEADER_ELECTION_IDENTITY:  workflow-controller-858c8985dc-7q69j (v1:metadata.name)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from argo-token-hflpb (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  argo-token-hflpb:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  argo-token-hflpb
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  12m   default-scheduler  Successfully assigned argo/workflow-controller-858c8985dc-7q69j to hlf-pool1-8rnem
  Normal  Pulled     12m   kubelet            Container image "docker.io/argoproj/workflow-controller:latest" already present on machine
  Normal  Created    12m   kubelet            Created container workflow-controller
  Normal  Started    12m   kubelet            Started container workflow-controller

Following this, I try again to run the hello-world but the workflow still does not terminate even after the pod has completed execution.

[isprintsg@vmmock3 fabric-kube]$ kubectl describe wf steps-g9fj6
Name:         steps-g9fj6
Namespace:    default
Labels:       workflows.argoproj.io/phase=Running
Annotations:  <none>
API Version:  argoproj.io/v1alpha1
Kind:         Workflow
Metadata:
  Creation Timestamp:  2021-06-03T07:25:20Z
  Generate Name:       steps-
  Generation:          4
  Managed Fields:
    API Version:  argoproj.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:generateName:
      f:spec:
        .:
        f:arguments:
        f:entrypoint:
        f:templates:
      f:status:
        .:
        f:finishedAt:
    Manager:      argo
    Operation:    Update
    Time:         2021-06-03T07:25:20Z
    API Version:  argoproj.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:workflows.argoproj.io/phase:
      f:status:
        f:artifactRepositoryRef:
        f:conditions:
        f:nodes:
        f:phase:
        f:progress:
        f:startedAt:
    Manager:         workflow-controller
    Operation:       Update
    Time:            2021-06-03T07:25:30Z
  Resource Version:  3763495
  UID:               873a3b85-8a25-4336-bbf8-75eb620f7f43
Spec:
  Arguments:
  Entrypoint:  hello-hello-hello
  Templates:
    Inputs:
    Metadata:
    Name:  hello-hello-hello
    Outputs:
    Steps:
      [map[arguments:map[parameters:[map[name:message value:hello1]]] name:hello1 template:whalesay]]
      [map[arguments:map[parameters:[map[name:message value:hello2a]]] name:hello2a template:whalesay] map[arguments:map[parameters:[map[name:message value:hello2b]]] name:hello2b template:whalesay]]
    Container:
      Args:
        {{inputs.parameters.message}}
      Command:
        cowsay
      Image:  docker/whalesay
      Name:
      Resources:
    Inputs:
      Parameters:
        Name:  message
    Metadata:
    Name:  whalesay
    Outputs:
Status:
  Artifact Repository Ref:
    Default:  true
  Conditions:
    Status:     True
    Type:       PodRunning
  Finished At:  <nil>
  Nodes:
    steps-g9fj6:
      Children:
        steps-g9fj6-2155100515
      Display Name:    steps-g9fj6
      Finished At:     <nil>
      Id:              steps-g9fj6
      Name:            steps-g9fj6
      Phase:           Running
      Progress:        0/1
      Started At:      2021-06-03T07:25:20Z
      Template Name:   hello-hello-hello
      Template Scope:  local/steps-g9fj6
      Type:            Steps
    steps-g9fj6-2155100515:
      Boundary ID:  steps-g9fj6
      Children:
        steps-g9fj6-3273036340
      Display Name:    [0]
      Finished At:     <nil>
      Id:              steps-g9fj6-2155100515
      Name:            steps-g9fj6[0]
      Phase:           Running
      Progress:        0/1
      Started At:      2021-06-03T07:25:20Z
      Template Scope:  local/steps-g9fj6
      Type:            StepGroup
    steps-g9fj6-3273036340:
      Boundary ID:     steps-g9fj6
      Display Name:    hello1
      Finished At:     <nil>
      Host Node Name:  hlf-pool1-8rnem
      Id:              steps-g9fj6-3273036340
      Inputs:
        Parameters:
          Name:        message
          Value:       hello1
      Name:            steps-g9fj6[0].hello1
      Phase:           Running
      Progress:        0/1
      Started At:      2021-06-03T07:25:20Z
      Template Name:   whalesay
      Template Scope:  local/steps-g9fj6
      Type:            Pod
  Phase:               Running
  Progress:            0/1
  Started At:          2021-06-03T07:25:20Z
Events:
  Type    Reason               Age   From                 Message
  ----    ------               ----  ----                 -------
  Normal  WorkflowRunning      12m   workflow-controller  Workflow Running
  Normal  WorkflowNodeRunning  12m   workflow-controller  Running node steps-g9fj6[0]
  Normal  WorkflowNodeRunning  12m   workflow-controller  Running node steps-g9fj6
  Normal  WorkflowNodeRunning  11m   workflow-controller  Running node steps-g9fj6[0].hello1

@pierreyves-lebrun
Copy link

pierreyves-lebrun commented Jun 3, 2021

I can confirm this issue occurs with pns too

Edit: I misread this issue description, what I experience actually is this issue:
#6052

@alexec
Copy link
Contributor

alexec commented Jun 3, 2021

@ongchinkai can you please attach the wait container logs?

alexec added a commit that referenced this issue Jun 3, 2021
Signed-off-by: Alex Collins <alex_collins@intuit.com>
@ongchinkai
Copy link
Author

ongchinkai commented Jun 4, 2021

Installed using https://raw.githubusercontent.com/argoproj/argo-workflows/dev-docker/manifests/install.yaml except with argoexec:dev-docker

Installation output:

[isprintsg@vmmock3 fabric-kube]$ kubectl apply -n argo -f install.yaml
customresourcedefinition.apiextensions.k8s.io/clusterworkflowtemplates.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/cronworkflows.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/workfloweventbindings.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/workflows.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/workflowtemplates.argoproj.io unchanged
serviceaccount/argo unchanged
serviceaccount/argo-server unchanged
role.rbac.authorization.k8s.io/argo-role unchanged
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-admin unchanged
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-edit unchanged
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-view unchanged
clusterrole.rbac.authorization.k8s.io/argo-cluster-role unchanged
clusterrole.rbac.authorization.k8s.io/argo-server-cluster-role unchanged
rolebinding.rbac.authorization.k8s.io/argo-binding unchanged
clusterrolebinding.rbac.authorization.k8s.io/argo-binding unchanged
clusterrolebinding.rbac.authorization.k8s.io/argo-server-binding unchanged
configmap/workflow-controller-configmap unchanged
service/argo-server unchanged
service/workflow-controller-metrics unchanged
deployment.apps/argo-server unchanged
deployment.apps/workflow-controller configured

describe output for workflow-controller

[isprintsg@vmmock3 fabric-kube]$ kubectl describe pod -n argo workflow-controller-5ccc89879-z97pr
Name:         workflow-controller-5ccc89879-z97pr
Namespace:    argo
Priority:     0
Node:         hlf-pool1-8rnem/10.104.0.8
Start Time:   Fri, 04 Jun 2021 11:57:32 +0800
Labels:       app=workflow-controller
              pod-template-hash=5ccc89879
Annotations:  <none>
Status:       Running
IP:           10.244.0.211
IPs:
  IP:           10.244.0.211
Controlled By:  ReplicaSet/workflow-controller-5ccc89879
Containers:
  workflow-controller:
    Container ID:  containerd://64125cdca9c874c6d1443fef5eb396bb11c3eff23214795934276ec1f08fc9f7
    Image:         docker.io/argoproj/workflow-controller:latest
    Image ID:      docker.io/argoproj/workflow-controller@sha256:0459525ffc0354c35d68b6e548dfaa2778ede667459a967df0025fad52e04dca
    Ports:         9090/TCP, 6060/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      workflow-controller
    Args:
      --configmap
      workflow-controller-configmap
      --executor-image
      argoproj/argoexec:dev-docker
    State:          Running
      Started:      Fri, 04 Jun 2021 11:57:33 +0800
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:6060/healthz delay=90s timeout=30s period=60s #success=1 #failure=3
    Environment:
      LEADER_ELECTION_IDENTITY:  workflow-controller-5ccc89879-z97pr (v1:metadata.name)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from argo-token-hflpb (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  argo-token-hflpb:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  argo-token-hflpb
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  2m25s  default-scheduler  Successfully assigned argo/workflow-controller-5ccc89879-z97pr to hlf-pool1-8rnem
  Normal  Pulled     2m24s  kubelet            Container image "docker.io/argoproj/workflow-controller:latest" already present on machine
  Normal  Created    2m24s  kubelet            Created container workflow-controller
  Normal  Started    2m24s  kubelet            Started container workflow-controller

@ongchinkai can you please attach the wait container logs?

The wait logs still look the same as OP. Here's a TLDR:

[isprintsg@vmmock3 fabric-kube]$ kubectl logs steps-68dfk-1759197227 -c wait
time="2021-06-04T04:01:59.962Z" level=info msg="Starting Workflow Executor" executorType= version=untagged
time="2021-06-04T04:01:59.968Z" level=info msg="Creating a docker executor"
time="2021-06-04T04:01:59.968Z" level=info msg="Executor initialized" includeScriptOutput=false namespace=default podName=steps-68dfk-1759197227 template="{\"name\":\"whalesay\",\"inputs\":{\"parameters\":[{\"name\":\"message\",\"value\":\"hello1\"}]},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"docker/whalesay\",\"command\":[\"cowsay\"],\"args\":[\"hello1\"],\"resources\":{}}}" version="&Version{Version:untagged,BuildDate:2021-06-03T22:08:32Z,GitCommit:cee0d8a049ed91fc347ae67695ddba9b9473ba14,GitTag:untagged,GitTreeState:clean,GoVersion:go1.15.7,Compiler:gc,Platform:linux/amd64,}"
time="2021-06-04T04:01:59.969Z" level=info msg="Starting annotations monitor"
time="2021-06-04T04:01:59.969Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-68dfk-1759197227"
time="2021-06-04T04:01:59.980Z" level=info msg="Starting deadline monitor"
time="2021-06-04T04:02:01.026Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-68dfk-1759197227"
... lines 9-292 all the same, in one-second intervals ...
time="2021-06-04T04:06:59.252Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-68dfk-1759197227"
time="2021-06-04T04:06:59.969Z" level=info msg="Alloc=7004 TotalAlloc=26567 Sys=72785 NumGC=7 Goroutines=9"
time="2021-06-04T04:07:00.293Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-68dfk-1759197227"
... lines 296-580 all the same, as above ...
time="2021-06-04T04:11:59.573Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-68dfk-1759197227"
time="2021-06-04T04:11:59.969Z" level=info msg="Alloc=6072 TotalAlloc=44167 Sys=72785 NumGC=12 Goroutines=9"
time="2021-06-04T04:12:00.622Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-68dfk-1759197227"
... more of what is above ...

@alexec
Copy link
Contributor

alexec commented Jun 4, 2021

Can you add the workflow pod YAML too?

@ongchinkai
Copy link
Author

Can you add the workflow pod YAML too?

I hope I didn't misunderstand what you were asking for.

[isprintsg@vmmock3 fabric-kube]$ kubectl get pod -n argo workflow-controller-5ccc89879-z97pr -o yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2021-06-04T03:57:32Z"
  generateName: workflow-controller-5ccc89879-
  labels:
    app: workflow-controller
    pod-template-hash: 5ccc89879
  name: workflow-controller-5ccc89879-z97pr
  namespace: argo
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: workflow-controller-5ccc89879
    uid: a6d3e6d7-eb90-4643-8997-971172cc8a16
  resourceVersion: "3885638"
  uid: 7c65309c-e300-41e4-b438-390c859dce8a
spec:
  containers:
  - args:
    - --configmap
    - workflow-controller-configmap
    - --executor-image
    - argoproj/argoexec:dev-docker
    command:
    - workflow-controller
    env:
    - name: LEADER_ELECTION_IDENTITY
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    image: docker.io/argoproj/workflow-controller:latest
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: 6060
        scheme: HTTP
      initialDelaySeconds: 90
      periodSeconds: 60
      successThreshold: 1
      timeoutSeconds: 30
    name: workflow-controller
    ports:
    - containerPort: 9090
      name: metrics
      protocol: TCP
    - containerPort: 6060
      protocol: TCP
    resources: {}
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      readOnlyRootFilesystem: true
      runAsNonRoot: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: argo-token-hflpb
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: hlf-pool1-8rnem
  nodeSelector:
    kubernetes.io/os: linux
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    runAsNonRoot: true
  serviceAccount: argo
  serviceAccountName: argo
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: argo-token-hflpb
    secret:
      defaultMode: 420
      secretName: argo-token-hflpb
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-06-04T03:57:32Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-06-04T03:57:34Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-06-04T03:57:34Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-06-04T03:57:32Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://64125cdca9c874c6d1443fef5eb396bb11c3eff23214795934276ec1f08fc9f7
    image: docker.io/argoproj/workflow-controller:latest
    imageID: docker.io/argoproj/workflow-controller@sha256:0459525ffc0354c35d68b6e548dfaa2778ede667459a967df0025fad52e04dca
    lastState: {}
    name: workflow-controller
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-06-04T03:57:33Z"
  hostIP: 10.104.0.8
  phase: Running
  podIP: 10.244.0.211
  podIPs:
  - ip: 10.244.0.211
  qosClass: BestEffort
  startTime: "2021-06-04T03:57:32Z"

@alexec
Copy link
Contributor

alexec commented Jun 4, 2021

No. Not that pod, the pod that the workflow runs. It'll be labeled with workflows.argoproj.io/workflow.

@ongchinkai
Copy link
Author

No. Not that pod, the pod that the workflow runs. It'll be labeled with workflows.argoproj.io/workflow.

Sorry, I think this is what you are asking for.

[isprintsg@vmmock3 fabric-kube]$ kubectl get pod steps-68dfk-1759197227 -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    workflows.argoproj.io/node-name: steps-68dfk[0].hello1
    workflows.argoproj.io/template: '{"name":"whalesay","inputs":{"parameters":[{"name":"message","value":"hello1"}]},"outputs":{},"metadata":{},"container":{"name":"","image":"docker/whalesay","command":["cowsay"],"args":["hello1"],"resources":{}}}'
  creationTimestamp: "2021-06-04T04:01:44Z"
  labels:
    workflows.argoproj.io/completed: "false"
    workflows.argoproj.io/workflow: steps-68dfk
  name: steps-68dfk-1759197227
  namespace: default
  ownerReferences:
  - apiVersion: argoproj.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: Workflow
    name: steps-68dfk
    uid: de6ded05-ffa5-4bcd-b73c-91a35495fcdd
  resourceVersion: "3886139"
  uid: 247b288d-85b5-4818-a740-0e542c6268e0
spec:
  containers:
  - command:
    - argoexec
    - wait
    - --loglevel
    - info
    env:
    - name: ARGO_POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: ARGO_CONTAINER_RUNTIME_EXECUTOR
    - name: GODEBUG
      value: x509ignoreCN=0
    - name: ARGO_CONTAINER_NAME
      value: wait
    - name: ARGO_INCLUDE_SCRIPT_OUTPUT
      value: "false"
    image: argoproj/argoexec:dev-docker
    imagePullPolicy: IfNotPresent
    name: wait
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /argo/podmetadata
      name: podmetadata
    - mountPath: /var/run/docker.sock
      name: docker-sock
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-5wjwq
      readOnly: true
  - args:
    - hello1
    command:
    - cowsay
    env:
    - name: ARGO_CONTAINER_NAME
      value: main
    - name: ARGO_INCLUDE_SCRIPT_OUTPUT
      value: "false"
    image: docker/whalesay
    imagePullPolicy: Always
    name: main
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-5wjwq
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: registry-isprint
  - name: isprint
  nodeName: hlf-pool1-8rnem
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - downwardAPI:
      defaultMode: 420
      items:
      - fieldRef:
          apiVersion: v1
          fieldPath: metadata.annotations
        path: annotations
    name: podmetadata
  - hostPath:
      path: /var/run/docker.sock
      type: Socket
    name: docker-sock
  - name: default-token-5wjwq
    secret:
      defaultMode: 420
      secretName: default-token-5wjwq
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-06-04T04:01:44Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-06-04T04:01:44Z"
    message: 'containers with unready status: [main]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-06-04T04:01:44Z"
    message: 'containers with unready status: [main]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-06-04T04:01:44Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://f069afa0eb172cf05c6fbb5d1a8c550212c40037a4c4a855c1d60e767418dc92
    image: docker.io/docker/whalesay:latest
    imageID: sha256:c717279bbba020bf95ac72cf47b2c8abb3a383ad4b6996c1a7a9f2a7aaa480ad
    lastState: {}
    name: main
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://f069afa0eb172cf05c6fbb5d1a8c550212c40037a4c4a855c1d60e767418dc92
        exitCode: 0
        finishedAt: "2021-06-04T04:02:05Z"
        reason: Completed
        startedAt: "2021-06-04T04:02:05Z"
  - containerID: containerd://ff3050f012150ac8f39009685715c7e626fee82f4134f3279db98162cb73b2f2
    image: docker.io/argoproj/argoexec:dev-docker
    imageID: docker.io/argoproj/argoexec@sha256:0d2540c42cde7805d0a43b3d8d349c52f43cb5349eab7bbfadb0d1a579ed40bf
    lastState: {}
    name: wait
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-06-04T04:01:59Z"
  hostIP: 10.104.0.8
  phase: Running
  podIP: 10.244.0.195
  podIPs:
  - ip: 10.244.0.195
  qosClass: BestEffort
  startTime: "2021-06-04T04:01:44Z"

@alexec
Copy link
Contributor

alexec commented Jun 4, 2021

It looks to me that your main container exited quickly, <1s?

@ongchinkai
Copy link
Author

It looks to me that your main container exited quickly, <1s?

Maybe it's because it's simply executing a hello-world statement. The reason I'm using this to test is because it's much simpler than the original workflow I was using. In that workflow I was encountering the same issue, in which the workflow would stop at a step and not proceed to the next.

@alexec alexec self-assigned this Jun 5, 2021
@alexec
Copy link
Contributor

alexec commented Jun 5, 2021

I've just pushed a change to argoproj/argoexec:dev-docker that captures additional diagnostics. Could I please ask you to test?

@ongchinkai
Copy link
Author

This is the describe output for workflow-controller:

[isprintsg@vmmock3 fabric-kube]$ kubectl describe pod workflow-controller-5ccc89879-z97pr -n argo
Name:         workflow-controller-5ccc89879-z97pr
Namespace:    argo
Priority:     0
Node:         hlf-pool1-8rnem/10.104.0.8
Start Time:   Fri, 04 Jun 2021 11:57:32 +0800
Labels:       app=workflow-controller
              pod-template-hash=5ccc89879
Annotations:  <none>
Status:       Running
IP:           10.244.0.211
IPs:
  IP:           10.244.0.211
Controlled By:  ReplicaSet/workflow-controller-5ccc89879
Containers:
  workflow-controller:
    Container ID:  containerd://64125cdca9c874c6d1443fef5eb396bb11c3eff23214795934276ec1f08fc9f7
    Image:         docker.io/argoproj/workflow-controller:latest
    Image ID:      docker.io/argoproj/workflow-controller@sha256:0459525ffc0354c35d68b6e548dfaa2778ede667459a967df0025fad52e04dca
    Ports:         9090/TCP, 6060/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      workflow-controller
    Args:
      --configmap
      workflow-controller-configmap
      --executor-image
      argoproj/argoexec:dev-docker
    State:          Running
      Started:      Fri, 04 Jun 2021 11:57:33 +0800
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:6060/healthz delay=90s timeout=30s period=60s #success=1 #failure=3
    Environment:
      LEADER_ELECTION_IDENTITY:  workflow-controller-5ccc89879-z97pr (v1:metadata.name)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from argo-token-hflpb (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  argo-token-hflpb:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  argo-token-hflpb
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

These are the first few lines for the wait logs:

[isprintsg@vmmock3 fabric-kube]$ kubectl logs steps-twx75-1233006697 -c wait
time="2021-06-06T15:44:26.060Z" level=info msg="Starting Workflow Executor" executorType= version=untagged
time="2021-06-06T15:44:26.074Z" level=info msg="Creating a docker executor"
time="2021-06-06T15:44:26.075Z" level=info msg="Executor initialized" includeScriptOutput=false namespace=default podName=steps-twx75-1233006697 template="{\"name\":\"whalesay\",\"inputs\":{\"parameters\":[{\"name\":\"message\",\"value\":\"hello1\"}]},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"docker/whalesay\",\"command\":[\"cowsay\"],\"args\":[\"hello1\"],\"resources\":{}}}" version="&Version{Version:untagged,BuildDate:2021-06-03T22:08:32Z,GitCommit:cee0d8a049ed91fc347ae67695ddba9b9473ba14,GitTag:untagged,GitTreeState:clean,GoVersion:go1.15.7,Compiler:gc,Platform:linux/amd64,}"
time="2021-06-06T15:44:26.075Z" level=info msg="Starting annotations monitor"
time="2021-06-06T15:44:26.077Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-twx75-1233006697"
time="2021-06-06T15:44:26.091Z" level=info msg="Starting deadline monitor"
time="2021-06-06T15:44:27.144Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-twx75-1233006697"
time="2021-06-06T15:44:28.185Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-twx75-1233006697"
time="2021-06-06T15:44:29.225Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=default --filter=label=io.kubernetes.pod.name=steps-twx75-1233006697"

In case you need the yaml for the pod again...

[isprintsg@vmmock3 fabric-kube]$ kubectl get pod steps-twx75-1233006697 -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    workflows.argoproj.io/node-name: steps-twx75[0].hello1
    workflows.argoproj.io/template: '{"name":"whalesay","inputs":{"parameters":[{"name":"message","value":"hello1"}]},"outputs":{},"metadata":{},"container":{"name":"","image":"docker/whalesay","command":["cowsay"],"args":["hello1"],"resources":{}}}'
  creationTimestamp: "2021-06-06T15:44:24Z"
  labels:
    workflows.argoproj.io/completed: "false"
    workflows.argoproj.io/workflow: steps-twx75
  name: steps-twx75-1233006697
  namespace: default
  ownerReferences:
  - apiVersion: argoproj.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: Workflow
    name: steps-twx75
    uid: 4de9478e-34f1-4952-bea0-08614217bc29
  resourceVersion: "4240635"
  uid: 4a5969b9-4704-47a0-9c5b-9a73e9b7f18d
spec:
  containers:
  - command:
    - argoexec
    - wait
    - --loglevel
    - info
    env:
    - name: ARGO_POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: ARGO_CONTAINER_RUNTIME_EXECUTOR
    - name: GODEBUG
      value: x509ignoreCN=0
    - name: ARGO_CONTAINER_NAME
      value: wait
    - name: ARGO_INCLUDE_SCRIPT_OUTPUT
      value: "false"
    image: argoproj/argoexec:dev-docker
    imagePullPolicy: IfNotPresent
    name: wait
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /argo/podmetadata
      name: podmetadata
    - mountPath: /var/run/docker.sock
      name: docker-sock
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-5wjwq
      readOnly: true
  - args:
    - hello1
    command:
    - cowsay
    env:
    - name: ARGO_CONTAINER_NAME
      value: main
    - name: ARGO_INCLUDE_SCRIPT_OUTPUT
      value: "false"
    image: docker/whalesay
    imagePullPolicy: Always
    name: main
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-5wjwq
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: registry-isprint
  - name: isprint
  nodeName: hlf-pool1-8rnem
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - downwardAPI:
      defaultMode: 420
      items:
      - fieldRef:
          apiVersion: v1
          fieldPath: metadata.annotations
        path: annotations
    name: podmetadata
  - hostPath:
      path: /var/run/docker.sock
      type: Socket
    name: docker-sock
  - name: default-token-5wjwq
    secret:
      defaultMode: 420
      secretName: default-token-5wjwq
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-06-06T15:44:24Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-06-06T15:44:24Z"
    message: 'containers with unready status: [main]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-06-06T15:44:24Z"
    message: 'containers with unready status: [main]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-06-06T15:44:24Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://315bcc8ed8f5e227ffd8cdc00ed2d97cd8216117ccf9dbb143323248fd8020ac
    image: docker.io/docker/whalesay:latest
    imageID: sha256:c717279bbba020bf95ac72cf47b2c8abb3a383ad4b6996c1a7a9f2a7aaa480ad
    lastState: {}
    name: main
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://315bcc8ed8f5e227ffd8cdc00ed2d97cd8216117ccf9dbb143323248fd8020ac
        exitCode: 0
        finishedAt: "2021-06-06T15:44:31Z"
        reason: Completed
        startedAt: "2021-06-06T15:44:31Z"
  - containerID: containerd://58f9fb48b52bb39d8ebf917b8385c3d63d7067bf176f9537c056b973bf8bfbcc
    image: docker.io/argoproj/argoexec:dev-docker
    imageID: docker.io/argoproj/argoexec@sha256:0d2540c42cde7805d0a43b3d8d349c52f43cb5349eab7bbfadb0d1a579ed40bf
    lastState: {}
    name: wait
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-06-06T15:44:25Z"
  hostIP: 10.104.0.8
  phase: Running
  podIP: 10.244.0.247
  podIPs:
  - ip: 10.244.0.247
  qosClass: BestEffort
  startTime: "2021-06-06T15:44:24Z"

Hope this helps!

@alexec
Copy link
Contributor

alexec commented Jun 6, 2021

Thank you. The logs you’ve attached appear to be from a different version as they do not contain the additional diagnostic output. You’ll need to change your image pull policy.

@ongchinkai
Copy link
Author

Not sure if this is what you're referring to, but I notice in the logs I've attached that there's this setting imagePullPolicy: IfNotPresent. However, I'm not sure where I should be changing this.

@alexec
Copy link
Contributor

alexec commented Jun 6, 2021

I've just pushed v0.0.0-dev-docker-0 and you can use that once the images are published (takes about 1h).

@alexec alexec changed the title Multi-step workflow does not terminate Multi-step workflow does not terminate (wait container does not exist with Docker executor in v3.0) Jun 6, 2021
@alexec alexec mentioned this issue Jun 10, 2021
20 tasks
@sarabala1979 sarabala1979 mentioned this issue Jun 10, 2021
88 tasks
alexec added a commit that referenced this issue Jun 10, 2021
Signed-off-by: Alex Collins <alex_collins@intuit.com>
@alexec alexec mentioned this issue Jun 21, 2021
16 tasks
@m0nk3y-s3c
Copy link

I am using 3.2.6 version and I get the same behavior.
I opened a discussion on this issue --> #7480

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment