[WIP] Update Syncing condition to include all errors #260

karlkfi · 2022-11-03T23:24:01Z

Update the Syncing condition to reflect errors from all phases
with the current commit. This prevents hiding/dropping errors,
especially non-blocking errors.
Add/clarify some comments
Set ErrorSummary and ErrorSources to nil when empty (no errors)

google-oss-prow · 2022-11-03T23:24:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from karlkfi by writing /assign @karlkfi in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pkg/applier/kpt_applier.go

pkg/parse/root.go

pkg/parse/run.go

e2e/testcases/multi_sync_test.go

karlkfi · 2022-11-09T19:59:48Z

/retest

karlkfi · 2022-11-09T23:25:53Z

/retest

karlkfi · 2022-11-10T00:22:38Z

Extracted the duration period cleanup: #273

karlkfi · 2022-11-10T00:37:58Z

Extracted flakey bug fix: #274

karlkfi · 2022-11-11T01:06:52Z

Extracted periodic status updates: #279

karlkfi · 2022-11-11T01:16:28Z

Extracted part of the logic to combine sync errors: #281

The logic to preserve the non-applier errors from the updater will be done in a subsequent PR.

karlkfi · 2022-11-16T02:49:07Z

Extracted thread-safety improvements to #301

- Update the Syncing condition to reflect errors from all phases with the current commit. This prevents hiding/dropping errors, especially non-blocking errors. - Add/clarify some comments - Set ErrorSummary and ErrorSources to nil when empty (no errors) Change-Id: Ica2dbae2b196a0fa0e2a4fbdc21d1775a27f4b0d

karlkfi · 2022-11-18T04:40:30Z

Extracted the hydration e2e changes: #310

pkg/parse/state.go

sdowell

Really like the refactoring. Left a few questions/comments but overall LGTM

sdowell · 2022-11-05T01:37:16Z

pkg/parse/opts.go

@@ -96,11 +89,3 @@ func (o *opts) k8sClient() client.Client {
 func (o *opts) discoveryClient() discovery.ServerResourcer {
 	return o.discoveryInterface
 }
-
-func (o *opts) SetReconciling(value bool) {


sdowell · 2022-11-18T18:32:14Z

pkg/declared/resources.go

@@ -41,7 +41,9 @@ type Resources struct {
 	objectSet map[core.ID]*unstructured.Unstructured
 }

-// Update performs an atomic update on the resource declaration set.
+// Update performs an atomic update on the resource declaration set, converts
+// the objects to Unstructured and validating that not all namespaces are


and validating -> and validates

sdowell · 2022-11-18T18:42:43Z

pkg/parse/root.go

@@ -429,35 +446,62 @@ func setSyncStatusFields(syncStatus *v1beta1.Status, newStatus syncStatus, denom
 }

 func setSyncStatusErrors(syncStatus *v1beta1.Status, cse []v1beta1.ConfigSyncError, denominator int) {


Would be nice if this was usable for RenderingStatus and SourceStatus to minimize duplication

sdowell · 2022-11-18T18:51:39Z

pkg/parse/run.go

@@ -142,66 +142,74 @@ func Run(ctx context.Context, p Parser) {
 	}
 }

+// run executes the parts of the pipeline managed by the reconciler.
+//
+// Phases:


Love this! Would you mind adding a bit more context for each phase?

sdowell · 2022-11-18T19:08:46Z

pkg/parse/run.go

-	}
-	if state.needToSetSourceStatus(newSourceStatus) {
-		if err := p.setSourceStatus(ctx, newSourceStatus); err != nil {
+	parseStatus := sourceStatus{


I think I'm a bit confused as to what a sourceStatus is supposed to represent. Looks like here it's related to issues with parsing (i.e. marshalling?) but above it's related to issues reading some config files. Is it supposed to represent any error with reading from the filesystem at any stage of the pipeline?

nan-yu · 2022-11-22T18:28:44Z

pkg/parse/namespace.go

-	reposync.SetSyncing(&rs, continueSyncing, "Source", "Source", newStatus.commit, errorSource, rs.Status.Source.ErrorSummary, newStatus.lastUpdate)
+	// Syncing is stopped/prevented if there are any errors from the same commit.
+	// TODO: Handle non-blocking errors
+	syncing := (syncErrorCount == 0)


Previously, syncing only depends on the source error. Now, it is changed to include all errors, including the former render or sync errors with the same commit.

It affects how the notification sends out alerts.

The current proposal is to leverage the Syncing condition to determine the sync status and send out alerts.

If the Syncing condition is false, it indicates the current apply loop is done. If there're no errors, then we send out success notification. Otherwise, it is failed.

In this case, if there're errors from the same commit, even if it might be fixed with the new loop, there will be a failure notification.

We can include previous errors with the same commit in the ErrorSummary to surface the errors earlier, but can we keep syncing only depend on the errors from the current phase (source, render, or sync)?

@sdowell FYI

We can include previous errors with the same commit in the ErrorSummary to surface the errors earlier, but can we keep syncing only depend on the errors from the current phase (source, render, or sync)?

It's not super clear what the Syncing condition was/is supposed to represent.

Internally, the phases are effectively Fetch/Read -> Render/Read -> Parse -> Update (Validate/Apply/Watch/Reconcile).

It's confusing to have the status.sync field represent just the errors from the Updater (which wraps the Applier) and then have the Syncing conditions which is set to True by the previous source and rendering phases.

This PR effectively assumes that since the source and rendering phases were setting the Syncing condition to True that the Syncing condition must include the entire pipeline, and not just the Update/Apply phase.

If that is NOT what we want, then we need to document the proposed change and implement it consistency.

Also, it's worth noting that the inputs to a sync/apply are not just Git. They also can include Helm values in the spec and other config values, like Git dir, URL, cluster name, etc. So it doesn't make sense to have a unique pipeline execution depend on just a commit. Even just a generation would be technically insufficient, because the previous run may fail, perhaps due to some ephemeral network issue, triggering a retry. Today we don't indicate what generation or attempt/retry the Syncing condition represents, which makes it hard to use for triggering discrete alerts.

The Syncing condition represents the current/latest fetch -> render -> parse -> update pipeline, while status.source, status.rendering, and status.sync represent the statuses of the last processed commit in each phase.

It is possible that the Syncing condition only includes status of partial pipeline while the loop is still in progress.

For example,

The first pipeline failed at commit A in the apply phase, status.sync will have errors for commit A.

The second pipeline failed to fetch commit A, status.source will have errors for commit A, and status.sync still contains the former apply errors.

The current pipeline succeeded in fetching and rendering commit A, source errors from the second pipeline will get cleared from status.source. Then the parseAndUpdate function will call p.setSourceStatus, which calls the setSourceStatusWithRetries function. The setSourceStatusWithRetries function should set Syncing condition to true because it will continue applying the parsed objects next. However, this PR sets the syncing condition to false because of the sync errors from the first pipeline.

mikebz · 2024-04-17T20:04:25Z

curious if this is still WIP or if it's even relevant. If it is consider closing the PR and keeping the private branch or converting to draft.

google-oss-prow bot requested review from sdowell and tiffanny29631 November 3, 2022 23:24

karlkfi requested review from nan-yu and removed request for tiffanny29631 November 3, 2022 23:24

google-oss-prow bot added the size/L label Nov 3, 2022

karlkfi commented Nov 3, 2022

View reviewed changes

pkg/applier/kpt_applier.go Outdated Show resolved Hide resolved

karlkfi commented Nov 3, 2022

View reviewed changes

pkg/parse/root.go Outdated Show resolved Hide resolved

karlkfi commented Nov 3, 2022

View reviewed changes

pkg/parse/run.go Outdated Show resolved Hide resolved

karlkfi force-pushed the karl-kpt-applier branch 2 times, most recently from 9124c33 to f974e29 Compare November 4, 2022 21:02

google-oss-prow bot added size/XL and removed size/L labels Nov 4, 2022

karlkfi force-pushed the karl-kpt-applier branch from f974e29 to b689d33 Compare November 4, 2022 21:10

karlkfi requested review from haiyanmeng and janetkuo November 4, 2022 21:11

karlkfi changed the title ~~Clean up Applier~~ Improve applier and status synchronization Nov 4, 2022

karlkfi force-pushed the karl-kpt-applier branch 2 times, most recently from d32e568 to 2baf2c5 Compare November 5, 2022 07:15

google-oss-prow bot added size/XXL and removed size/XL labels Nov 5, 2022

karlkfi commented Nov 5, 2022

View reviewed changes

e2e/testcases/multi_sync_test.go Outdated Show resolved Hide resolved

karlkfi force-pushed the karl-kpt-applier branch 2 times, most recently from dac0ce9 to 3c50011 Compare November 9, 2022 02:46

karlkfi force-pushed the karl-kpt-applier branch 2 times, most recently from 64d517f to 52bc32c Compare November 9, 2022 23:07

karlkfi mentioned this pull request Nov 10, 2022

Disable Syncing condition validation on error #275

Merged

karlkfi force-pushed the karl-kpt-applier branch from 52bc32c to 705661c Compare November 10, 2022 22:40

karlkfi changed the title ~~Improve applier and status synchronization~~ [WIP] Improve applier and status synchronization Nov 14, 2022

google-oss-prow bot added the do-not-merge/work-in-progress label Nov 14, 2022

karlkfi force-pushed the karl-kpt-applier branch from 705661c to e851d04 Compare November 18, 2022 02:05

karlkfi changed the title ~~[WIP] Improve applier and status synchronization~~ [WIP] Update Syncing condition to include all errors Nov 18, 2022

karlkfi force-pushed the karl-kpt-applier branch from e851d04 to e7a4194 Compare November 18, 2022 04:35

google-oss-prow bot added size/XL and removed size/XXL labels Nov 18, 2022

karlkfi changed the title ~~[WIP] Update Syncing condition to include all errors~~ Update Syncing condition to include all errors Nov 18, 2022

google-oss-prow bot removed the do-not-merge/work-in-progress label Nov 18, 2022

haiyanmeng reviewed Nov 18, 2022

View reviewed changes

pkg/parse/state.go Show resolved Hide resolved

haiyanmeng reviewed Nov 18, 2022

View reviewed changes

pkg/parse/state.go Show resolved Hide resolved

pkg/parse/state.go Show resolved Hide resolved

karlkfi mentioned this pull request Nov 18, 2022

[WIP] Improve rendering status e2e validation #310

Open

sdowell reviewed Nov 18, 2022

View reviewed changes

nan-yu reviewed Nov 22, 2022

View reviewed changes

karlkfi changed the title ~~Update Syncing condition to include all errors~~ [WIP] Update Syncing condition to include all errors Aug 30, 2023

google-oss-prow bot added the do-not-merge/work-in-progress label Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Update Syncing condition to include all errors #260

[WIP] Update Syncing condition to include all errors #260

karlkfi commented Nov 3, 2022 •

edited

Loading

google-oss-prow bot commented Nov 3, 2022

karlkfi commented Nov 9, 2022

karlkfi commented Nov 9, 2022

karlkfi commented Nov 10, 2022

karlkfi commented Nov 10, 2022

karlkfi commented Nov 11, 2022

karlkfi commented Nov 11, 2022

karlkfi commented Nov 16, 2022

karlkfi commented Nov 18, 2022

sdowell left a comment

sdowell Nov 5, 2022

sdowell Nov 18, 2022

sdowell Nov 18, 2022

sdowell Nov 18, 2022

sdowell Nov 18, 2022

nan-yu Nov 22, 2022

karlkfi Nov 29, 2022 •

edited

Loading

karlkfi Nov 29, 2022

nan-yu Dec 6, 2022

mikebz commented Apr 17, 2024

		@@ -429,35 +446,62 @@ func setSyncStatusFields(syncStatus *v1beta1.Status, newStatus syncStatus, denom
		}

		func setSyncStatusErrors(syncStatus *v1beta1.Status, cse []v1beta1.ConfigSyncError, denominator int) {

[WIP] Update Syncing condition to include all errors #260

Are you sure you want to change the base?

[WIP] Update Syncing condition to include all errors #260

Conversation

karlkfi commented Nov 3, 2022 • edited Loading

google-oss-prow bot commented Nov 3, 2022

karlkfi commented Nov 9, 2022

karlkfi commented Nov 9, 2022

karlkfi commented Nov 10, 2022

karlkfi commented Nov 10, 2022

karlkfi commented Nov 11, 2022

karlkfi commented Nov 11, 2022

karlkfi commented Nov 16, 2022

karlkfi commented Nov 18, 2022

sdowell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karlkfi Nov 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikebz commented Apr 17, 2024

karlkfi commented Nov 3, 2022 •

edited

Loading

karlkfi Nov 29, 2022 •

edited

Loading