Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3.4.9: podGC.deleteDelayDuration not working #11412

Closed
2 of 3 tasks
mytracks opened this issue Jul 21, 2023 · 4 comments · Fixed by #11445
Closed
2 of 3 tasks

v3.4.9: podGC.deleteDelayDuration not working #11412

mytracks opened this issue Jul 21, 2023 · 4 comments · Fixed by #11445
Labels
area/docs Incorrect, missing, or mistakes in docs area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more solution/outdated This is not up-to-date with the current version type/support User support issue - likely not a bug
Milestone

Comments

@mytracks
Copy link

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

I created a workflow with the property podGC.deleteDelayDuration set to "120s". So my workflow looks like this:

metadata:
  name: omniscient-poochenheimer
  namespace: def
spec:
  entrypoint: argosay
  templates:
    - name: argosay
      container:
        name: main
        image: argoproj/argosay:v2
        command:
          - /argosay
        args:
          - echo
          - Hello
  ttlStrategy:
    secondsAfterCompletion: 300
  podGC:
    strategy: OnPodCompletion
    deleteDelayDuration: 120s

Afterwards a fetched the workflow from the Kubernetes API. However, the value deleteDelayDuration was missing:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  annotations:
    workflows.argoproj.io/pod-name-format: v2
  creationTimestamp: "2023-07-21T12:44:33Z"
  generation: 2
  labels:
    example: "true"
    workflows.argoproj.io/creator: system-serviceaccount-def-argo-server
    workflows.argoproj.io/phase: Running
  name: awesome-whale
  namespace: def
  resourceVersion: "304989"
  uid: eb9415cd-2ce0-426f-9d50-a4c5271a09e7
spec:
  arguments:
    parameters:
    - name: message
      value: hello argo
  entrypoint: argosay
  podGC:
    strategy: OnPodCompletion
  templates:
  - container:
      args:
      - echo
      - '{{inputs.parameters.message}}'
      command:
      - /argosay
      image: argoproj/argosay:v2
      name: main
      resources: {}
    inputs:
      parameters:
      - name: message
        value: '{{workflow.parameters.message}}'
    metadata: {}
    name: argosay
    outputs: {}
  ttlStrategy:
    secondsAfterCompletion: 300
status:
  artifactGCStatus:
    notSpecified: true
  artifactRepositoryRef:
    artifactRepository: {}
    configMap: artifact-repositories
    key: empty
    namespace: def
  finishedAt: null
  nodes:
    awesome-whale:
      displayName: awesome-whale
      finishedAt: null
      id: awesome-whale
      inputs:
        parameters:
        - name: message
          value: hello argo
      name: awesome-whale
      phase: Pending
      progress: 0/1
      startedAt: "2023-07-21T12:44:33Z"
      templateName: argosay
      templateScope: local/awesome-whale
      type: Pod
  phase: Running
  progress: 0/1
  startedAt: "2023-07-21T12:44:33Z"

Besides the fact that the value doesn't show up it also doesn't have any effect.

Version

v3.4.9

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

* In the UI click "Submit new workflow"
* Use the existing example template and add `deleteDelayDuration: 120s` under `podGC`.
* Create the workflow
* invetigate the workflow resource using the Kubernetes API. It does not contain the `deleteDelayDuration` property.

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

There wasn't any workflow specific output in the controller log. Just this:

time="2023-07-21T12:43:04Z" level=info msg="index config" indexWorkflowSemaphoreKeys=true
time="2023-07-21T12:43:04Z" level=info msg="cron config" cronSyncPeriod=10s
time="2023-07-21T12:43:04Z" level=info msg="Memoization caches will be garbage-collected if they have not been hit after" gcAfterNotHitDuration=30s
I0721 12:43:04.821336       1 leaderelection.go:248] attempting to acquire leader lease def/workflow-controller...
I0721 12:43:04.858433       1 leaderelection.go:258] successfully acquired lease def/workflow-controller
time="2023-07-21T12:43:05.595Z" level=warning msg="Controller doesn't have RBAC access for ClusterWorkflowTemplates"
W0721 12:43:05.697578       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

The output was empty.
@mytracks
Copy link
Author

I just realized that the latest version of the container wasn't really the latest. So I pulled the newest latest image and the bug is gone. So I will close the issue.

@agilgur5
Copy link
Contributor

agilgur5 commented Jul 25, 2023

I implemented podGC.deleteDelayDuration very recently in #11325. It hasn't been part of an official release yet; slated to be released with 3.5.0 (see also #11381).

I just realized that the latest version of the container wasn't really the latest. So I pulled the newest latest image and the bug is gone.

Yea latest should have everything on master including that PR. Since it was recent, makes sense that any caches would need to be updated

@agilgur5
Copy link
Contributor

Added a note in the examples to specify the version requirement: #11445

@agilgur5 agilgur5 added the area/docs Incorrect, missing, or mistakes in docs label Aug 8, 2023
@borkke
Copy link

borkke commented Nov 13, 2023

@agilgur5 I couldn't find any note on what is the maximum deleteDelayDuration maybe you have an idea?

I tested it last week with a smaller delay, up to a few minutes, and it worked fine. But then I configured it to be 72h which is what we want but the pod was not deleted.

This did not delete the Pods after 3 days

  podGC:
    strategy: OnPodCompletion
    deleteDelayDuration: 72h

@agilgur5 agilgur5 added solution/outdated This is not up-to-date with the current version type/support User support issue - likely not a bug and removed type/bug labels Aug 14, 2024
@agilgur5 agilgur5 changed the title podGC.deleteDelayDuration not working podGC.deleteDelayDuration not working on 3.4.9 Aug 14, 2024
@agilgur5 agilgur5 added this to the v3.5.0 milestone Aug 18, 2024
@agilgur5 agilgur5 changed the title podGC.deleteDelayDuration not working on 3.4.9 v3.4.9: podGC.deleteDelayDuration not working Oct 8, 2024
@agilgur5 agilgur5 added the area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more label Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docs Incorrect, missing, or mistakes in docs area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more solution/outdated This is not up-to-date with the current version type/support User support issue - likely not a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants