Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Update Syncing condition to include all errors #260

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

karlkfi
Copy link
Contributor

@karlkfi karlkfi commented Nov 3, 2022

  • Update the Syncing condition to reflect errors from all phases
    with the current commit. This prevents hiding/dropping errors,
    especially non-blocking errors.
  • Add/clarify some comments
  • Set ErrorSummary and ErrorSources to nil when empty (no errors)

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from karlkfi by writing /assign @karlkfi in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karlkfi karlkfi requested review from nan-yu and removed request for tiffanny29631 November 3, 2022 23:24
pkg/applier/kpt_applier.go Outdated Show resolved Hide resolved
pkg/parse/root.go Outdated Show resolved Hide resolved
pkg/parse/run.go Outdated Show resolved Hide resolved
@karlkfi karlkfi force-pushed the karl-kpt-applier branch 2 times, most recently from 9124c33 to f974e29 Compare November 4, 2022 21:02
@google-oss-prow google-oss-prow bot added size/XL and removed size/L labels Nov 4, 2022
@karlkfi karlkfi changed the title Clean up Applier Improve applier and status synchronization Nov 4, 2022
@karlkfi karlkfi force-pushed the karl-kpt-applier branch 2 times, most recently from d32e568 to 2baf2c5 Compare November 5, 2022 07:15
@google-oss-prow google-oss-prow bot added size/XXL and removed size/XL labels Nov 5, 2022
@karlkfi karlkfi force-pushed the karl-kpt-applier branch 2 times, most recently from dac0ce9 to 3c50011 Compare November 9, 2022 02:46
@karlkfi
Copy link
Contributor Author

karlkfi commented Nov 9, 2022

/retest

@karlkfi karlkfi force-pushed the karl-kpt-applier branch 2 times, most recently from 64d517f to 52bc32c Compare November 9, 2022 23:07
@karlkfi
Copy link
Contributor Author

karlkfi commented Nov 9, 2022

/retest

@karlkfi
Copy link
Contributor Author

karlkfi commented Nov 10, 2022

Extracted the duration period cleanup: #273

@karlkfi
Copy link
Contributor Author

karlkfi commented Nov 10, 2022

Extracted flakey bug fix: #274

@karlkfi
Copy link
Contributor Author

karlkfi commented Nov 11, 2022

Extracted periodic status updates: #279

@karlkfi
Copy link
Contributor Author

karlkfi commented Nov 11, 2022

Extracted part of the logic to combine sync errors: #281

The logic to preserve the non-applier errors from the updater will be done in a subsequent PR.

@karlkfi karlkfi changed the title Improve applier and status synchronization [WIP] Improve applier and status synchronization Nov 14, 2022
@karlkfi
Copy link
Contributor Author

karlkfi commented Nov 16, 2022

Extracted thread-safety improvements to #301

@karlkfi karlkfi changed the title [WIP] Improve applier and status synchronization [WIP] Update Syncing condition to include all errors Nov 18, 2022
- Update the Syncing condition to reflect errors from all phases
  with the current commit. This prevents hiding/dropping errors,
  especially non-blocking errors.
- Add/clarify some comments
- Set ErrorSummary and ErrorSources to nil when empty (no errors)

Change-Id: Ica2dbae2b196a0fa0e2a4fbdc21d1775a27f4b0d
@karlkfi
Copy link
Contributor Author

karlkfi commented Nov 18, 2022

Extracted the hydration e2e changes: #310

@karlkfi karlkfi changed the title [WIP] Update Syncing condition to include all errors Update Syncing condition to include all errors Nov 18, 2022
pkg/parse/state.go Show resolved Hide resolved
pkg/parse/state.go Show resolved Hide resolved
Copy link
Contributor

@sdowell sdowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the refactoring. Left a few questions/comments but overall LGTM

@@ -96,11 +89,3 @@ func (o *opts) k8sClient() client.Client {
func (o *opts) discoveryClient() discovery.ServerResourcer {
return o.discoveryInterface
}

func (o *opts) SetReconciling(value bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

@@ -41,7 +41,9 @@ type Resources struct {
objectSet map[core.ID]*unstructured.Unstructured
}

// Update performs an atomic update on the resource declaration set.
// Update performs an atomic update on the resource declaration set, converts
// the objects to Unstructured and validating that not all namespaces are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and validating -> and validates

@@ -429,35 +446,62 @@ func setSyncStatusFields(syncStatus *v1beta1.Status, newStatus syncStatus, denom
}

func setSyncStatusErrors(syncStatus *v1beta1.Status, cse []v1beta1.ConfigSyncError, denominator int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if this was usable for RenderingStatus and SourceStatus to minimize duplication

@@ -142,66 +142,74 @@ func Run(ctx context.Context, p Parser) {
}
}

// run executes the parts of the pipeline managed by the reconciler.
//
// Phases:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this! Would you mind adding a bit more context for each phase?

}
if state.needToSetSourceStatus(newSourceStatus) {
if err := p.setSourceStatus(ctx, newSourceStatus); err != nil {
parseStatus := sourceStatus{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm a bit confused as to what a sourceStatus is supposed to represent. Looks like here it's related to issues with parsing (i.e. marshalling?) but above it's related to issues reading some config files. Is it supposed to represent any error with reading from the filesystem at any stage of the pipeline?

reposync.SetSyncing(&rs, continueSyncing, "Source", "Source", newStatus.commit, errorSource, rs.Status.Source.ErrorSummary, newStatus.lastUpdate)
// Syncing is stopped/prevented if there are any errors from the same commit.
// TODO: Handle non-blocking errors
syncing := (syncErrorCount == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, syncing only depends on the source error. Now, it is changed to include all errors, including the former render or sync errors with the same commit.

It affects how the notification sends out alerts.

The current proposal is to leverage the Syncing condition to determine the sync status and send out alerts.

If the Syncing condition is false, it indicates the current apply loop is done. If there're no errors, then we send out success notification. Otherwise, it is failed.

In this case, if there're errors from the same commit, even if it might be fixed with the new loop, there will be a failure notification.

We can include previous errors with the same commit in the ErrorSummary to surface the errors earlier, but can we keep syncing only depend on the errors from the current phase (source, render, or sync)?

@sdowell FYI

Copy link
Contributor Author

@karlkfi karlkfi Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can include previous errors with the same commit in the ErrorSummary to surface the errors earlier, but can we keep syncing only depend on the errors from the current phase (source, render, or sync)?

It's not super clear what the Syncing condition was/is supposed to represent.

Internally, the phases are effectively Fetch/Read -> Render/Read -> Parse -> Update (Validate/Apply/Watch/Reconcile).

It's confusing to have the status.sync field represent just the errors from the Updater (which wraps the Applier) and then have the Syncing conditions which is set to True by the previous source and rendering phases.

This PR effectively assumes that since the source and rendering phases were setting the Syncing condition to True that the Syncing condition must include the entire pipeline, and not just the Update/Apply phase.

If that is NOT what we want, then we need to document the proposed change and implement it consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's worth noting that the inputs to a sync/apply are not just Git. They also can include Helm values in the spec and other config values, like Git dir, URL, cluster name, etc. So it doesn't make sense to have a unique pipeline execution depend on just a commit. Even just a generation would be technically insufficient, because the previous run may fail, perhaps due to some ephemeral network issue, triggering a retry. Today we don't indicate what generation or attempt/retry the Syncing condition represents, which makes it hard to use for triggering discrete alerts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Syncing condition represents the current/latest fetch -> render -> parse -> update pipeline, while status.source, status.rendering, and status.sync represent the statuses of the last processed commit in each phase.

It is possible that the Syncing condition only includes status of partial pipeline while the loop is still in progress.

For example,

  1. The first pipeline failed at commit A in the apply phase, status.sync will have errors for commit A.
  2. The second pipeline failed to fetch commit A, status.source will have errors for commit A, and status.sync still contains the former apply errors.
  3. The current pipeline succeeded in fetching and rendering commit A, source errors from the second pipeline will get cleared from status.source. Then the parseAndUpdate function will call p.setSourceStatus, which calls the setSourceStatusWithRetries function. The setSourceStatusWithRetries function should set Syncing condition to true because it will continue applying the parsed objects next. However, this PR sets the syncing condition to false because of the sync errors from the first pipeline.

@karlkfi karlkfi changed the title Update Syncing condition to include all errors [WIP] Update Syncing condition to include all errors Aug 30, 2023
@mikebz
Copy link
Contributor

mikebz commented Apr 17, 2024

curious if this is still WIP or if it's even relevant. If it is consider closing the PR and keeping the private branch or converting to draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants