Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup time duration options #273

Merged

Conversation

karlkfi
Copy link
Contributor

@karlkfi karlkfi commented Nov 10, 2022

  • Use the -Period suffix consistently. Technically, most of these should be "Delay", because they're a timer between attempts and not the duration of the event itself, but we're already using "period" for most of them. So this at least makes them all consistent without requiring rename of any external facing env vars or command flags.
  • Standardize the many duplicate descriptions for duration options, at each layer of the stack.
  • Add missing constants to replace hard-coded durations.
  • Reafactor existing duration constants to make the names consistent.

@karlkfi
Copy link
Contributor Author

karlkfi commented Nov 10, 2022

So it turns out these test errors are actually a pretty interesting edge case... I'm just not sure why they're triggering only on this PR.

TestPublicHelm has a test where it's patching the helm config of a namespace to change the input values. But since the status doesn't have the input values and the Syncing condition doesn't either, the WaitForRepoSyncs returns immediately (within 20s), instead of waiting for the new rendered config to fully sync.

It looks like this is largely because the "commit" used by the new native helm renderer is just the name:version of the helm chart, and not a unique identifier for all inputs. We might need to hash the input values and include that in the commit string somehow.

It also occurs to me that this would happen for other spec fields as well, like the git dir. If you change the git dir after syncing has succeeded, there's no way to know if the sync status or Syncing condition are reflecting the desired state or not, because the commit hasn't changed.

@karlkfi
Copy link
Contributor Author

karlkfi commented Nov 10, 2022

The TestInlineClusterSelectorFormat failure is similar.

The validation that is failing is a Validate right after a WaitForRepoSyncs following a non-commit change. In this case, it was the cluster name that's been changed, which is stored in the config-management-system/reconciler-manager ConfigMap as the CLUSTER_NAME data field.

Even tho the reconciler-manage Pod is replaced, and the root-reconciler is updated with the new CLUSTER_NAME env var, there's no way to communicate that input data in the RootSync status. So the WaitForRepoSyncs call returns too early.

@karlkfi karlkfi force-pushed the karl-polling-period branch 2 times, most recently from a146d94 to 3de3a28 Compare November 10, 2022 06:59
@karlkfi karlkfi requested a review from sdowell November 10, 2022 07:00
@karlkfi
Copy link
Contributor Author

karlkfi commented Nov 10, 2022

/retest

1 similar comment
@karlkfi
Copy link
Contributor Author

karlkfi commented Nov 10, 2022

/retest

- Use the -Period suffix consistently.
  Technically, most of these should be "Delay", because they're
  a timer between attempts and not the duration of the event itself,
  but we're already using "period" for most of them. So this at least
  makes them all consistent without requiring rename of any external
  facing env vars or command flags.
- Standardize the many duplicate descriptions for duration options,
  at each layer of the stack.
- Add missing constants to replace hard-coded durations.
- Reafactor existing duration constants to make the names consistent.

Change-Id: If77a2ffc845fce3d78d39d5f663eafcb8e4f67dd
@karlkfi
Copy link
Contributor Author

karlkfi commented Nov 10, 2022

Extracted some test flake "fixes" this change uncovered:

Copy link
Contributor

@nan-yu nan-yu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nan-yu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit db1aca0 into GoogleContainerTools:main Nov 10, 2022
@karlkfi karlkfi deleted the karl-polling-period branch November 10, 2022 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants