Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(controller): Delay pod GC deletion by a configurable amount (default 5s) #6168

Merged
merged 4 commits into from
Jun 26, 2021

Conversation

stefansedich
Copy link
Contributor

The PR #5033 was meant to fix the issue of delaying pod deletion for some period so things like logging scrapers can do their thing before the pod goes away.

Based on my comments #5033 (comment) this won't actually work as intended and another solution is needed.

My fix in this PR aims to delay the actual deletion by some configurable value and the deletion logic will then either delete immediately or queue the deletion for after this period.

@terrytangyuan tagging you as you did the original PR and tagging @alexec as you were the reviewer, as you might have more insight here. I have written an initial test here but am happy to expand coverage if desired.

One area I am not sure about is what other docs need to be updated for an API change and if that is a manual process or if there is some automation around that, it was not clear to me as I could not find any documentation explaining how to make contributions that add to the spec and this is my first time so just let me know what else I need to do.

Checklist:

Tips:

  • Your PR needs to pass the required checks before it can be approved. If the check is not required (e.g. E2E tests) it does not need to pass
  • Sign-off your commits to pass the DCO check: git commit --signoff.
  • Run make pre-commit -B to fix codegen or lint problems.
  • Say how how you tested your changes. If you changed the UI, attach screenshots.

@@ -862,6 +864,11 @@ func (podGC *PodGC) GetLabelSelector() *metav1.LabelSelector {
return nil
}

// GetDelay gets the podGC delay as a duration.
func (podGC *PodGC) GetDelay() time.Duration {
return time.Duration(podGC.DelaySeconds) * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it handle the default? what will the delay be if the user didn't specify in Workflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DelaySeconds should default to 0 and GetDelay in this case should return 0 seconds, meaning the deletion will be queued right away.

@stefansedich stefansedich deleted the pod-gc-delay branch June 17, 2021 23:43
@stefansedich stefansedich restored the pod-gc-delay branch June 17, 2021 23:44
@stefansedich stefansedich reopened this Jun 17, 2021
@codecov

This comment was marked as resolved.

@@ -840,6 +840,8 @@ type PodGC struct {
Strategy PodGCStrategy `json:"strategy,omitempty" protobuf:"bytes,1,opt,name=strategy,casttype=PodGCStrategy"`
// LabelSelector is the label selector to check if the pods match the labels before being added to the pod GC queue.
LabelSelector *metav1.LabelSelector `json:"labelSelector,omitempty" protobuf:"bytes,2,opt,name=labelSelector"`
// DelaySeconds is how long to delay pod GC for.
DelaySeconds uint32 `json:"delaySeconds,omitempty" protobuf:"bytes,3,opt,name=delaySeconds"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change it to intstr.IntOrString some it can be parameterized by parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarabala1979 done, hopefully I got that right.

@stefansedich stefansedich force-pushed the pod-gc-delay branch 2 times, most recently from f35e65d to 27d5b14 Compare June 22, 2021 19:29
@alexec
Copy link
Contributor

alexec commented Jun 22, 2021

I spoke with @sarabala1979 today and we think it is likely that few users will be impacted by a short delay on Pod GC, and many who'll benefit.

I suggest we have a single global value this suitable default. No per-workflow configuration.

@stefansedich
Copy link
Contributor Author

I spoke with @sarabala1979 today and we think it is likely that few users will be impacted by a short delay on Pod GC, and many who'll benefit.

I suggest we have a single global value this suitable default. No per-workflow configuration.

Thanks @alexec this makes sense, will make things simpler and only require some more sleeps in the tests, did you talk about a reasonable default?

And by global default value do you mean configurable with config still?

@alexec
Copy link
Contributor

alexec commented Jun 22, 2021

I don't really know what a sensible default would be - maybe 10s? How long were you thinking you'd need? It should be configurable using the workflow-controller-configmap.

@stefansedich
Copy link
Contributor Author

I don't really know what a sensible default would be - maybe 10s? How long were you thinking you'd need? It should be configurable using the workflow-controller-configmap.

In our case I planned to configure a default of 1m or so, 10s sounds like a reasonable default however for most and this does simplify things quite a bit, let me get that done.

// Defaults to 30 seconds.
// PodGCGracePeriodSeconds specifies the duration in seconds before a terminating pod is forcefully killed.
// Value must be non-negative integer. A zero value indicates that the pod will be forcefully terminated immediately.
// Defaults to the Kubernetes default of 30 seconds.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this comment to signify what this option does, as the new option below handles the actual deletion delay.

config/config.go Outdated
@@ -144,6 +149,14 @@ func (c Config) GetResourceRateLimit() ResourceRateLimit {
}
}

func (c Config) GetPodGCDeleteDelay() time.Duration {
if c.PodGCDeleteDelay == nil {
return 30 * time.Second
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting to 30s, happy to lower but this was somewhat in the middle of the suggested 10s and the 1m we would run in our environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be lower - 5s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lowered to 5s by default.

@@ -21,7 +21,7 @@ type PodCleanupSuite struct {
fixtures.E2ESuite
}

const enoughTimeForPodCleanup = 3 * time.Second
const enoughTimeForPodCleanup = 33 * time.Second
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Padding out tests as they will need a little more time to "wait" now as there is a default deletion delay.

@stefansedich
Copy link
Contributor Author

@alexec @sarabala1979 have made an initial cut at the changes moving to the config, let me know what you think.

config/config.go Outdated
// PodGCDelaySeconds specifies the duration in seconds before the pods in the GC queue get deleted.
// Value must be non-negative integer. A zero value indicates that the pods will be deleted immediately.
// Defaults to 30 seconds.
PodGCDeleteDelay *metav1.Duration `json:"podGCDeleteDelay,omitempty"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a duration here, I was torn between using seconds like the other option above or moving to the more user-friendly duration so went with the more friendly option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename with type PodGCDeleteDelayDuration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok done @sarabala1979

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you'll have to fix the DCO check.

@stefansedich
Copy link
Contributor Author

I believe you'll have to fix the DCO check.

Fixed!

Signed-off-by: Stefan Sedich <stefan.sedich@gmail.com>
@alexec alexec changed the title feat: Add support for deletion delay when using PodGC feat(controller): Add support for deletion delay when using PodGC Jun 25, 2021
@alexec alexec changed the title feat(controller): Add support for deletion delay when using PodGC feat(controller): Delay pod GC deletion by a configurable amount (default 5s) Jun 25, 2021
@alexec alexec enabled auto-merge (squash) June 25, 2021 17:19
@alexec alexec added this to the v3.2 milestone Jun 25, 2021
@alexec
Copy link
Contributor

alexec commented Jun 25, 2021

I've marked this for v3.2, but we may consider back-porting to v3.1.

@alexec alexec merged commit 73a36d8 into argoproj:master Jun 26, 2021
@sarabala1979 sarabala1979 mentioned this pull request Jul 13, 2021
17 tasks
uturunku1 pushed a commit to newrelic-forks/argo-workflows that referenced this pull request Jul 22, 2021
Signed-off-by: Stefan Sedich <stefan.sedich@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>
alexec added a commit that referenced this pull request Jul 26, 2021
* Update events.md (#6119)

Trying to use the argo workflows events and I noticed that some crucial explanations are missing here. I would like to add:
- A simple WorkflowTemplate bound to the WorkflowEventBinding, to show what is triggered by the curl that send the event
- Some infos about the process that bind the event to the workflow template:
   - template creation
   - event binding apply
   - api call to trigger the workflow template creation
Plus: there is a little mistake in the selector:  metadata["x-argo"] instead of metadata["X-Argo-E2E"] I would like to correct it in order to avoid mistakes during the curl.

Hope this is appreciated! ;)

Denis

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Add note on the requirements of resource templates. Fixes #5566 (#6125)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: updated CHANGELOG.md (#6127)

Signed-off-by: GitHub <noreply@github.com>

Co-authored-by: alexec <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* add troubleshooting notes section for running-locally docs (#6132)

Co-authored-by: uturunku1 <“21225410+uturunku1@users.noreply.github.com”>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(executor): Check whether any errors within checkResourceState() are transient. Fixes #6118. (#6134)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* build: Remove PNS_PRIVILEGED=true (#6138)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Document the extraction of data from a k8s resource (#6102)

* Document the extraction of data from a k8s resource

* remove reference to lines of a file that can be outdated

Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>

* Remove yaml snippet and only keep the link to the example

Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* build image output to docker (#6128)

Co-authored-by: Alex Collins <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* chore: Update stress rig and docs. Fixes #6136 (#6141)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* chore: Upgrade Alibaba OSS to use more secure ListObjectsV2() (#6142)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix: Allow setting workflow input parameters in UI. Fixes #4234 (#5319)

* fix: Allow setting workflow input parameters in UI. Fixes #4234

Signed-off-by: Kenny Trytek <kenneth.g.trytek@gmail.com>

* fix: Allow setting workflow input parameters in UI. Fixes #4234

 - Allow workflow input parameters as well as entrypoint parameters.

Signed-off-by: Kenny Trytek <kenneth.g.trytek@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): Performance improvement for Sprig. Fixes #6135 (#6140)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* update from v0.19.6 to v0.20.4 and indirect dependencies

Signed-off-by: uturunku1 <“21225410+uturunku1@users.noreply.github.com”>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* exec.GetAuthenticator takes two arguments in the k8s-client-go v0.20.4

Signed-off-by: uturunku1 <“21225410+uturunku1@users.noreply.github.com”>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* update makefile to use code-generator@v0.20.4

Signed-off-by: uturunku1 <“21225410+uturunku1@users.noreply.github.com”>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Fix release-notes.md

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Update Graviti's website link (#6148)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(ui): Fix-up local storage namespaces. Fixes #6109 (#6144)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(executor): Capture emissary main-logs. Fixes #6145 (#6146)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(ui): Fix event-flow scrolling. Fixes #6133 (#6147)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* test: Fix logging test (#6159)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat(ui): Add checkbox to check all workflows in list. Fixes #6069 (#6158)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Use 'depends' instead of 'dependencies' in examples (#6166)

Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat(server): Allow redirect_uri to be automatically resolved when using sso (#6167)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): Allow retry on transient errors when validating workflow spec. Fixes #6163 (#6178)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): dehydrate workflow before deleting offloaded node status (#6112)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: updated CHANGELOG.md (#6160)

Signed-off-by: GitHub <noreply@github.com>

Co-authored-by: alexec <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Remove RBAC for SSO from Roadmap (Already implemented) (#6174)

It looks like RBAC for SSO is already implemented by #4198 so hopefully it can be removed from the roadmap as it is also documented? https://argoproj.github.io/argo-workflows/argo-server-sso/#sso-rbac

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: updated CHANGELOG.md (#6187)

Signed-off-by: GitHub <noreply@github.com>

Co-authored-by: alexec <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Fix changelog order for .0 tags (#6188)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): Wrong validate order when validate DAG task's argument (#6190)

Signed-off-by: BOOK <book78987book@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix rebase conflict

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* run go mod tidy

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* refactor: Remove the need for pod annotations to be mounted as a volume (#6022)

Signed-off-by: Antony Chazapis <chazapis@ics.forth.gr>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: ContainerSets do not have 'depends' (#6199)

Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix: Fix security issues related to file closing and paths (G307 & G304) (#6200)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Add links to Python examples to description annotations (#6202)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs(executor): document k8s executor behaviour with program warnings (#6212)

* docs(executor): document k8s executor behaviour with program warnings

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>

* docs(executor): fix typo

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix: Fix certain sibling tasks not connected to parent (#6193)

Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat(ui): Add copy to clipboard shortcut (#6217)

Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: updated CHANGELOG.md (#6220)

Signed-off-by: GitHub <noreply@github.com>

Co-authored-by: alexec <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Add KarrotPay in USERS.md (#6221)

Signed-off-by: Byungjin Park <posquit0.bj@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* run go mod tidy

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Add workflow-count-resourcequota.yaml example (#6225)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix: Reduce argoexec image size (#6197)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(conttroller): Always set finishedAt dote. Fixes #6135 (#6139)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat: Add support for deletion delay when using PodGC (#6168)

Signed-off-by: Stefan Sedich <stefan.sedich@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: update bug report template (#6236)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: updated CHANGELOG.md (#6242)

Signed-off-by: GitHub <noreply@github.com>

Co-authored-by: alexec <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(executor): emissary - make argoexec executable from non-root containers. Fixes #6238 (#6247)

Signed-off-by: Yuan Gong <gongyuan94@gmail.com>

Co-authored-by: Alex Collins <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat: Introduce when condition to retryStrategy (#6114)

Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* ci: Add Go code security scanner via gosec. Fixes #6203 (#6232)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: fix end of files, new lines and remove multiple lines (#6240)

Signed-off-by: NikeNano <niklas.sven.hansson@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: add json destructuring example (#6250)


Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

Co-authored-by: Alex Collins <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(executor): Tolerate docker re-creating containers. Fixes #6244 (#6252)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(executor): emissary - make /var/run/argo files readable from non-root users. Fixes #6238 (#6304)

Signed-off-by: Yuan Gong <gongyuan94@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs(controller): add missing emissary executor (#6291)

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: docs and hacks improvements (#6310)

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(cli): Only list needed fields. Fixes #6000 (#6298)

* fix(cli): Only list needed fields

Signed-off-by: Alex Collins <alex_collins@intuit.com>

* ok

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Fix typo (#6311)

Signed-off-by: Byungjin Park <posquit0.bj@gmail.com>

Co-authored-by: Saravanan Balasubramanian <33908564+sarabala1979@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* require sso redirect url to be an argo url (#6211)

Signed-off-by: Brandon Goode <brandon.goode@cox.com>

Co-authored-by: Alex Collins <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: code format (#6269)

- Add yaml rendering
- Add bash rendering

Co-authored-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat(controller): Store artifact repository in workflow status. Fixes #6255 (#6299)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: document using ingress with TLS enabled (#6324)

Signed-off-by: valorl <11498571+valorl@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: document how to access hyphenated steps in expression templates (#6318)

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat(controller): Differentiate CronWorkflow submission vs invalid spec error metrics (#6309)

* feat(controller): Differentiate CronWorkflow submission vs invalid spec error metrics

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

* Address feedback

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

Co-authored-by: Alex Collins <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* chore: deleted wft.yaml

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* ci: only run Snyk once a day on master

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): Not updating StoredWorkflowSpec when WFT changed during workflow running (#6342)

Signed-off-by: Saravanan Balasubramanian <sarabala1979@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(cli): v3.1 Argo Auth Token (#6344)

* fix(cli): v3.1 Argo Auth Token

Signed-off-by: Saravanan Balasubramanian <sarabala1979@gmail.com>

* update

Signed-off-by: Saravanan Balasubramanian <sarabala1979@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Add Alibaba Group to USERS.md (#6353)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(crd): temp fix 34s timeout bug for k8s 1.20+ (#6350)

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: updated CHANGELOG.md (#6348)

Signed-off-by: GitHub <noreply@github.com>

Co-authored-by: sarabala1979 <sarabala1979@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs(users): Add WooliesX (#6358)

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(cli): Overridding name/generateName when creating CronWorkflows if specified (#6308)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat(controller): sortDAGTasks supports sort by field Depends (#6307)

Signed-off-by: BOOK <book78987book@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(fields): handle nexted fields when excluding (#6359)

Signed-off-by: AntoineDao <antoinedao1@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat(controller): Allow configurable host name label key when retrying different hosts (#6341)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* pull argo-events changes
update versions in go.mod and go.sum

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* run go mod tidy

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): allow workflow.duration to pass validator (#6376)

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): fix retry on transient errors when validating workflow spec (#6370)

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>

Co-authored-by: Saravanan Balasubramanian <33908564+sarabala1979@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix: examples/ci.yaml indent (#6328)

Signed-off-by: kungho.back <kungho.back@naverlabs.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* chore: import grafana dashboard (#6365)

Signed-off-by: GitHub <noreply@github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(gcs): throw argo not found error if key not exist (#6393)

Signed-off-by: AntoineDao <antoinedao1@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* Revert "fix: examples/ci.yaml indent (#6328)"

This reverts commit 3f72fe5.

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix: Server crash when opening timeline tab for big workflows (#6369)

Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>

Co-authored-by: Saravanan Balasubramanian <33908564+sarabala1979@users.noreply.github.com>
Co-authored-by: Alex Collins <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Add 4intelligence (#6400)

Signed-off-by: Thiago Gil <t.gil@4intelligence.com.br>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Add note on additional required permission for createBucketIfNotPresent for OSS driver (#6378)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): allow initial duration to be 0 instead of current_time-0 (#6389)


Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): Mark workflows wait for semaphore as pending. Fixes #6351 (#6356)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Updating upgrading.md. Closes #6314

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* not need to convert to unstructured.unstructured

I was getting this error controller_test.go: pkg/mod/k8s.io/client-go@v0.20.4/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: item[0]: can't assign or convert unstructured.Unstructured into v1alpha1.Workflow

Based on this comment, it seems like the conversion is not needed: kubernetes-sigs/controller-runtime#524 (comment)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* run make pre-commit -B

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix potential file inclusion via variable lint error

there is a risk that an unintended file path will be specified. So uuse filepath.Clean() to clean up possible bad paths

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix format issue

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

Co-authored-by: Denis Bellotti <denis.bellotti.android@gmail.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: alexec <alexec@users.noreply.github.com>
Co-authored-by: uturunku1 <“21225410+uturunku1@users.noreply.github.com”>
Co-authored-by: Christophe Blin <christophe.blin@free.fr>
Co-authored-by: meijin <859037421@qq.com>
Co-authored-by: kennytrytek <kenneth.g.trytek@gmail.com>
Co-authored-by: Caden <32856921+CadenOf@users.noreply.github.com>
Co-authored-by: Simon Behar <simbeh7@gmail.com>
Co-authored-by: Stefan Sedich <stefan.sedich@gmail.com>
Co-authored-by: Reijer Copier <copierrj@users.noreply.github.com>
Co-authored-by: Brandon High <highb@users.noreply.github.com>
Co-authored-by: BOOK <book78987book@gmail.com>
Co-authored-by: Antony Chazapis <chazapis@ics.forth.gr>
Co-authored-by: Tianchu Zhao <evantczhao@gmail.com>
Co-authored-by: Byungjin Park (Claud) <posquit0.bj@gmail.com>
Co-authored-by: Yuan (Bob) Gong <4957653+Bobgy@users.noreply.github.com>
Co-authored-by: Niklas Hansson <niklas.sven.hansson@gmail.com>
Co-authored-by: Michael Crenshaw <michael@crenshaw.dev>
Co-authored-by: Saravanan Balasubramanian <33908564+sarabala1979@users.noreply.github.com>
Co-authored-by: brgoode <86316314+brgoode@users.noreply.github.com>
Co-authored-by: Valér Orlovský <11498571+valorl@users.noreply.github.com>
Co-authored-by: Alex Collins <alex_collins@intuit.com>
Co-authored-by: sarabala1979 <sarabala1979@users.noreply.github.com>
Co-authored-by: Antoine Dao <antoinedao1@gmail.com>
Co-authored-by: KUNG HO BACK <bkh751@gmail.com>
Co-authored-by: Zadkiel <zadkiel.aharonian@gmail.com>
Co-authored-by: Alexander Matyushentsev <Alexander_Matyushentsev@intuit.com>
Co-authored-by: Thiago Bittencourt Gil <79285506+thiago4int@users.noreply.github.com>
@prein
Copy link

prein commented Dec 16, 2021

Is this meant to have no per-workflow configuration or no configuration at all? I'm not sure, reading the conversation above.
What I would find best for my scenarios would be if I could set different delay for failed and for succeeded workflows. I would then have much longer delay for failed ones to allow for manual inspection.

@alexec
Copy link
Contributor

alexec commented Dec 16, 2021

The goal of this PR is to delay all deletion by a minimum amount of time to allow for other services to clean them up. I think you just need normal PodGC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants