Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Avoid controller crashes when running large number of workflows #9691

Merged
merged 3 commits into from
Sep 27, 2022

Conversation

terrytangyuan
Copy link
Member

@terrytangyuan terrytangyuan commented Sep 26, 2022

Fixes #8275.

Signed-off-by: Yuan Tang terrytangyuan@gmail.com

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
@terrytangyuan terrytangyuan changed the title fix: Add limit to list calls fix: Avoid controller crashes when running large number of workflows Sep 27, 2022
@terrytangyuan terrytangyuan marked this pull request as ready for review September 27, 2022 19:31
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
@terrytangyuan terrytangyuan enabled auto-merge (squash) September 27, 2022 21:05
| `INDEX_WORKFLOW_SEMAPHORE_KEYS` | `bool` | `true` | Whether or not to index semaphores. |
| `LEADER_ELECTION_IDENTITY` | `string` | Controller's `metadata.name` | The ID used for workflow controllers to elect a leader. |
| `LEADER_ELECTION_DISABLE` | `bool` | `false` | Whether leader election should be disabled. |
| `LEADER_ELECTION_LEASE_DURATION` | `time.Duration` | `15s` | The duration that non-leader candidates will wait to force acquire leadership. |
| `LEADER_ELECTION_RENEW_DEADLINE` | `time.Duration` | `10s` | The duration that the acting master will retry refreshing leadership before giving up. |
| `LEADER_ELECTION_RETRY_PERIOD` | `time.Duration` | `5s` | The duration that the leader election clients should wait between tries of actions. |
| `LIST_LIMIT` | `int` | `200` | The maximum number of responses to return for a list call on workflows for workflow informer. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be removed, right?

Copy link
Member

@jessesuen jessesuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditional approval after removal of LIST_LIMIT from documentation.

@terrytangyuan terrytangyuan merged commit e08524d into argoproj:master Sep 27, 2022
@@ -803,6 +803,7 @@ func (wfc *WorkflowController) tweakListOptions(options *metav1.ListOptions) {
labelSelector := labels.NewSelector().
Add(util.InstanceIDRequirement(wfc.Config.InstanceID))
options.LabelSelector = labelSelector.String()
options.Limit = int64(env.LookupEnvIntOr("LIST_LIMIT", 200))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this paginate properly? Or will it simply cut it off at 200

terrytangyuan added a commit that referenced this pull request Sep 28, 2022
@jessesuen jessesuen deleted the controller-crash branch September 28, 2022 01:16
chenyangxueHDU pushed a commit to chenyangxueHDU/argo that referenced this pull request Sep 29, 2022
…rgoproj#9691)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: yangxue.chen <chenyangxuehdu@126.com>
chenyangxueHDU pushed a commit to chenyangxueHDU/argo that referenced this pull request Sep 29, 2022
…rgoproj#9691)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: yangxue.chen <chenyangxuehdu@126.com>
juchaosong pushed a commit to juchaosong/argo-workflows that referenced this pull request Nov 3, 2022
…rgoproj#9691)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: juchao <juchao@coscene.io>
@agilgur5 agilgur5 added the area/controller Controller issues, panics label Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Controller crashes when too many completed workflows
3 participants