-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] - Smart bisect faild change batch #1641
[WIP] - Smart bisect faild change batch #1641
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: OmerKahani The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi, so I started working on it but I have a problem: TXT record should be in the same batch, batches are group using the dns name. the problem is that TXT record can have prefix and suffix, so their name is different from the A/CNAME record. So far I cam with two solutions:
I also think this as an issue today both solutions are big, and will make the PR even bigger 😞 so I will be happy to hear your inputs before I solving it |
Tested on master: maybe it will be better to solve this bug before merging this PR? |
time.Sleep(p.batchChangeInterval) | ||
} | ||
for i, b := range batchCs { | ||
if p.ChangeZone(ctx, b, z, zoneName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's take the opportunity to inverse the return value here (or return an error). It's unexpected that p.ChangeZone() => true
means that an error happened.
} | ||
|
||
size := len(names) / 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to ensure the minimum size is 2 because the main record and the ownership TXT records should be applied in the same transaction so that they either are both or applied or not at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For safety we should also ensure that names
is even I guess. Otherwise this logic might become incorrect.
time.Sleep(p.batchChangeInterval) | ||
} | ||
for i, b := range batchCs { | ||
if p.ChangeZone(ctx, b, z, zoneName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could introduce a feature flag here that can be used to toggle between the old approach and the better bisect handling. I think it's fairly easy to do and it would greatly help with getting this merged.
There's an alternative approach worth trying that solves the TXT record problem, enables this feature for all providers and might be even easier to do. You could implement this as part of the planning phase here. The Plan holds the current and desired records and calculates the raw diff of creates, updates and deletes needed to turn the current into the desired records. After the diff is calculated the changes are post-processed by different policies. We use that for instance to strip out all deletions when running in upsert-only mode. You could very easily write and test a "DoHalf" policy that strips out half the endpoints from all creates, updates and deletes. Then you could use that policy to reduce the change set in case ApplyChanges fails: for {
err = c.Registry.ApplyChanges(ctx, plan.Changes)
if err == nil {
break // needs some better logic to avoid infinite loops.
}
plan.Changes = DoHalf{}.Apply(plan.Changes) // strips out some of the desired changes, e.g. half of everything.
} Note, that this happens before any of the TXT record stuff and works for all providers. It should even be fine to continue once one batch has been processed successfully. The reconciling nature of ExternalDNS will pick up the remaining records in the next loop. That might avoid complicated logic of keeping track of the partitions or recursion. However, with so many providers down the processing line I feel there's some that won't play nice with this DoHalf policy. This is just an idea how it could also be solved. Feel free to continue with the current approach as well. |
@OmerKahani The bug you see with one batch containing only the TXT records: Is that only when you use |
Do you think we can push this kind of change? It feels that changing this code will result in much bigger change (not in code but in concept). It is a better approach, I just want to make sure I will be able to merge the code :) |
/kind bug |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@OmerKahani: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
fixes #1517
fixes #731