Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Patch() when binding-status controller update workload's status #4094

Merged

Conversation

zach593
Copy link
Contributor

@zach593 zach593 commented Sep 30, 2023

Change-Id: Ia0b5fde0b9d69f73aa6c2e3135e2590a6a503075

What type of PR is this?

/kind cleanup

What this PR does / why we need it:
Discribed in #4093

Which issue(s) this PR fixes:
Fixes #4093

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

`karmada-controller-manager: use Patch() instead of Update() when updating workload status because error rate is too high.`

@karmada-bot karmada-bot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Sep 30, 2023
@karmada-bot karmada-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Sep 30, 2023
@codecov-commenter
Copy link

codecov-commenter commented Sep 30, 2023

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (557348f) 53.48% compared to head (d423206) 52.86%.
Report is 63 commits behind head on master.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4094      +/-   ##
==========================================
- Coverage   53.48%   52.86%   -0.62%     
==========================================
  Files         234      239       +5     
  Lines       23304    23592     +288     
==========================================
+ Hits        12465    12473       +8     
- Misses      10157    10442     +285     
+ Partials      682      677       -5     
Flag Coverage Δ
unittests 52.86% <86.20%> (-0.62%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
pkg/util/helper/patch.go 85.71% <100.00%> (+21.00%) ⬆️
pkg/controllers/status/common.go 0.00% <0.00%> (ø)

... and 16 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zach593
Copy link
Contributor Author

zach593 commented Oct 9, 2023

/retest

@karmada-bot
Copy link
Collaborator

@zach593: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@RainbowMango
Copy link
Member

Hi @zach593 Karmada hasn't enable /retest now, and it is tracked by #3421

The failing CI is due to a CI issue and it was resolved by #4107 yesterday. You might need to rebase this PR and push again.

@zach593
Copy link
Contributor Author

zach593 commented Oct 9, 2023

Cool, thanks for following this.

@zach593 zach593 force-pushed the fix-binding-status-error-rate branch 3 times, most recently from a338ebc to f4828cb Compare October 12, 2023 07:40
@RainbowMango
Copy link
Member

cc @chaunceyjiang @XiShanYongYe-Chang Can you help with this?

@XiShanYongYe-Chang
Copy link
Member

ok
/assign

Comment on lines 84 to 86
if newStatus != nil {
j, err := json.Marshal(newStatus)
if err != nil {
klog.Errorf("Failed to marshal status for resource(%s/%s/%s, Error: %v", gvr, resource.GetNamespace(), resource.GetName(), err)
return err
}
statusPatch = []byte(fmt.Sprintf(`[{"op": "replace", "path": "/status", "value": %s}]`, j))
} else {
statusPatch = []byte(`[{"op": "remove", "path": "/status"}]`)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to consider using helper.GenMergePatch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XiShanYongYe-Chang The reason I used JSONPatchType instead of MergePatchType is that I am worried that some fields may not be deleted when patch status, because we do not understand and cannot modify the definition of resource templates and the logic of related controllers. We even provide a webhook so that users can customize the status.

If you think jsonpatch is acceptable, I can move this part of the construction logic to helper/patch.go (I didn't notice this file existed before); or if you think merge is still a better choice, Then I will update this part.

Obviously I should reply here...😆

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) haha

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done some research and conducted some tests, and it seems that MergePatch can completely replace JSONPatch. Therefore, I choose to use MergePatch here. We can monitor if anyone reports any issues in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor Author

@zach593 zach593 Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reminder from my teammate (@lxtywypc) made me rethink this. What JSONPatch does is completely replace the status field, but MergePatch is content-based. This PR is intended to resolve update conflicts, which means we often keep outdated resource templates. If there are some field changes between this copy and the copy in etcd (which has not been "watched" yet), then patch will set the wrong status into etcd.

So I guess we have to move to JSONPatch... I apologize for my capriciousness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind. Full discussion and testing gave us a much better result 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. kindly ping @XiShanYongYe-Chang to review again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your explanation, I probably understand what you mean, can you help write an example to describe it, so that it is easier to follow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple example could be like this: playground.
If everything works on time but the informer holding an outdated resource template (currentA), using mergePatch may create an inappropriate mergePatchBytes for actual template object (currentB).

@zach593 zach593 force-pushed the fix-binding-status-error-rate branch 3 times, most recently from 14d6534 to 5aac08a Compare October 20, 2023 02:22
@karmada-bot karmada-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 20, 2023
Change-Id: I531e40fa99e657af3c060e59c13c47a835356274
Signed-off-by: zach593 <zach_li@outlook.com>
@zach593 zach593 force-pushed the fix-binding-status-error-rate branch from 5aac08a to 8520227 Compare October 20, 2023 02:28
Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zach593.
I'm sorry for replying late~

pkg/controllers/status/common.go Show resolved Hide resolved
klog.V(3).Infof("Ignore update resource(%s/%s/%s) status as up to date.", gvr, resource.GetNamespace(), resource.GetName())
return nil
}

if _, err = dynamicClient.Resource(gvr).Namespace(resource.GetNamespace()).UpdateStatus(context.TODO(), newObj, metav1.UpdateOptions{}); err != nil {
klog.Errorf("Failed to update resource(%s/%s/%s), Error: %v", gvr, resource.GetNamespace(), resource.GetName(), err)
patchBytes, err := helper.GenReplaceFieldJSONPatch("/status", oldStatus, newStatus)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a bit of a duplicate of the question above, for resources without status, can the program execute up to here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will return []byte(`[]`), nil, []byte(`[]`) representing an empty jsonpatch, which will be safely passed to apiserver when patching, the objects we patch will always no change. You can test it with kubectl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saw that GenMergePatch() returns nil if patchBytes == "{}", so I followed this, returned nil when no patching is required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question, can we use retry.RetryOnConflict() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could reduce the reconciliation error rate, but can't reduce the HTTP PUT error rate. I feel that it is difficult to convince users that a controller’s HTTP update error rate for a certain type of resource consistently above 30% is within the normal range...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how to prevent empty patches from being sent to apiserver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a DeepEqual() judge in line 77

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, thanks for reminding me. can you make the patch more precise?

status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2023-12-05T09:25:15Z"
    lastUpdateTime: "2023-12-05T09:25:15Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2023-11-21T11:43:12Z"
    lastUpdateTime: "2023-12-05T09:25:15Z"
    message: ReplicaSet "demo-57755f7c74" has successfully
      progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 6
  readyReplicas: 0
  replicas: 1
  updatedReplicas: 1

if only the field readyReplicas changes from 0 to 1, we'd better use such a patch rather than the entire status:

[{
"op":"replace",
"path":"/status/readyReplicas",
"value":1
}]

Copy link
Contributor Author

@zach593 zach593 Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe implementing such functionality can indeed make the final patch bytes appear very concise in certain situations (as you illustrated). However, it may introduce other issues: if there's a need to update multiple fields, should a separate patch operation be written for each field? If it's necessary to consolidate certain patch operations into one, what is the appropriate timing for the consolidation?

Implementing it this way seems like using JSONPatch to emulate MergePatch. If the goal is simplicity in patch operations, it might prove to be more trouble than it's worth in many scenarios, and the implementation is likely to be less elegant.

If the concern is that other controllers might modify the status, as mentioned in #4093 (comment), I believe this possibility is low. In the current implementation, .status is returned by the interpreterWebhook, and users have the opportunity to introduce other mechanisms for custom handling.

pkg/util/helper/patch_test.go Outdated Show resolved Hide resolved
pkg/util/helper/patch.go Show resolved Hide resolved
@zach593 zach593 force-pushed the fix-binding-status-error-rate branch 2 times, most recently from a92e5fc to e79f44c Compare November 5, 2023 09:52
Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot~
Can we then use this logic for ResourceBinding and Work status updates?

/lgtm
/cc @RainbowMango

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 6, 2023
Change-Id: I68a9b2d293c35e10f287fdc4a625ded3da4848ba
Signed-off-by: zach593 <zach_li@outlook.com>
@zach593 zach593 force-pushed the fix-binding-status-error-rate branch from e79f44c to d423206 Compare November 6, 2023 05:41
@karmada-bot karmada-bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 6, 2023
@zach593
Copy link
Contributor Author

zach593 commented Nov 6, 2023

Updated some UT test name.. @XiShanYongYe-Chang

While I feel that a controller that takes ownership of an object should not cause conflict errors when updating an object, and there seems to be no obvious problem with using Patch() to replace Update(), extending the use of Patch() may still require more discussion.

@XiShanYongYe-Chang
Copy link
Member

I got it, thanks @zach593
/lgtm

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 6, 2023
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign

@XiShanYongYe-Chang
Copy link
Member

Kindly ping @RainbowMango

@RainbowMango
Copy link
Member

Invite controller and util owers to help with it:
cc @chaunceyjiang @whitewindmills

@chaunceyjiang
Copy link
Member

/assign

@chaunceyjiang
Copy link
Member

Nice!!!

/lgtm

@RainbowMango
Copy link
Member

Hi @whitewindmills Do you have any comments?

@whitewindmills
Copy link
Member

/lgtm

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 9, 2023
@karmada-bot karmada-bot merged commit 33fdf52 into karmada-io:master Dec 9, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Binding-status controller update workload's status error rate too high
7 participants