-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Route53: retry single changes in a batch if the batch fails #1209
Route53: retry single changes in a batch if the batch fails #1209
Conversation
/assign @njuettner |
Example log:
next iteration:
|
provider/aws.go
Outdated
failedUpdate = true | ||
|
||
if len(b) > 1 { | ||
log.Error("Trying to submit changes one-by-one instead") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks more like a debug log to me, or delete it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm are you sure about this? If there is a transaction failing, in my opinion this should output more verbose logs than debug because it needs a manual intervention to fix it.
provider/aws.go
Outdated
} | ||
if _, err := p.client.ChangeResourceRecordSets(params); err != nil { | ||
failedUpdate = true | ||
log.Error("Failed submitting change, it will be retried in a separate change batch in the next iteration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks more like a debug log to me, or delete it
provider/aws.go
Outdated
p.failedChangesQueue[z] = append(p.failedChangesQueue[z], groupedChanges...) | ||
} | ||
} else { | ||
log.Info("Change successful") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks more like a debug log to me, or delete it
@devkid Thanks for the PR only small comments about the logs from my side. |
@devkid Nice idea to group the smaller changes into transactions with the same hostname. Could you double-check if that also works with the In order to allow TXT records for CNAMEs which cannot co-exist with the same hostname we map between them using a user provided prefix on the TXT record (See here). In that configuration the CNAME and its corresponding TXT don't have the same hostname but rather differ by the fixed prefix. I think the transaction grouping needs to take this into account in order to have the corresponding records in the same transaction. |
@linki Good catch. Would it make sense to attach some kind of meta information to the TXT records (like: "I belong to this record")? Or is there even some information already? |
@linki The existing What would you suggest going forward? Shall we merge this pull request first and tackle the "group batches batches so that TXT records are changed in the same transaction" afterwards or should I come up with a solution in this pull request? For a solution I though about two options:
With any of these options we could improve the grouping in the |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
@devkid do you mind rebasing, please? |
bd0a4ab
to
ec4233b
Compare
ec4233b
to
09e8994
Compare
@szuecs done. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/assign @njuettner |
@alfredkrohmer would you mind addressing @linki's comments. We would love to get this finally merged as it would be addressing some longstanding issues with the AWS provider. |
If a single change fails during the retry, it will be added to a queue. In the next iteration, changes from this queue will be submitted after all other changes. When submitting single changes, they are always submitted as batches of changes with the same DNS name and ownership relation to avoid inconsistency between the record created and the TXT records.
a295d1e
to
7dd84a5
Compare
Rebased and split into two commits for better reviewability. Only open discussion is around |
@Raffo @szuecs @njuettner LGTM @alfredkrohmer Thanks a lot for persistently working on this! 🙏 It will solve a long-standing issue with ExternalDNS on AWS. |
/lgtm |
@Raffo Would you mind cutting a new release with this included? |
@linki i did a release last weekend. I'm totally up to do a release, but maybe let's aim for the beginning of February so that we have a few more changes. Wdyt? |
That's OK. The sooner the better for me. I tested this fix through our pipeline but I would like to run the official version going forward: zalando-incubator/kubernetes-on-aws#5604 |
@linki sounds good. I'll try to get some more fixes in and then get a release cooking at the beginning of February. |
If a single change fails during the retry, it will be added to a queue.
In the next iteration, changes from this queue will be submitted after
all other changes.
When submitting single changes, they are always submitted as batches of
changes with the same DNS name to avoid inconsistency between the record
created and the TXT records.
This closes #421.