-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-15605: Update bufsize to 1232 bytes #370
Conversation
@gcs278: This pull request references Jira Issue OCPBUGS-15605, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest-required |
Verified using pre-image verification. melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get configmaps/dns-default -o yaml | grep -i bufsize |
@gcs278: This pull request references Jira Issue OCPBUGS-15605, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign |
@gcs278: This pull request references Jira Issue OCPBUGS-15605, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@gcs278: This pull request references Jira Issue OCPBUGS-15605, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/approve We'll also run some payload tests before merging. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Miciah The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
As endorsed at [DNS Flag Day](http://www.dnsflagday.net/2020/), the DNS Community recommends a bufsize setting of 1232 as a safe default that supports larger payloads, while generally avoiding IP fragmentation on most networks. This is particularly relevant for payloads like those generated by DNSSEC, which tend to be larger. Previously, CoreDNS always used the EDNS0 extension, which enables UDP-based DNS queries to exceed 512 bytes, when CoreDNS forwarded DNS queries to an upstream name server, and so OpenShift specified a bufsize setting of 512 to maintain compatibility with applications and name servers that did not support the EDNS0 extension. Currently, CoreDNS no longer uses EDNS0 when forwarding requests if the original client request did not use EDNS0, and so the reasoning for using a bufsize setting of 512 no longer applies. By increasing the bufsize setting to the recommended value of 1232 bytes, we can enhance DNS performance by decreasing the probability of DNS truncations. This PR updates the bufsize setting for CoreDNS from 512 to 1232 bytes.
/payload 4.14 nightly blocking |
@gcs278: trigger 8 job(s) of type blocking for the nightly release of OCP 4.14
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a1b9a5e0-3c51-11ee-9da9-eff9ee711bbc-0 |
@gcs278: This pull request references Jira Issue OCPBUGS-15605, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lgtm |
|
@gcs278: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
I've reviewed the payload results: https://pr-payload-tests.ci.openshift.org/runs/ci/a1b9a5e0-3c51-11ee-9da9-eff9ee711bbc-0. 4 passes and 4 failures, but nothing stands out as a DNS regression yet. The 4 failures:
The other 4 payload jobs succeed. It's probably worth kicking off these two jobs for major unrelated failures: But, generally speaking, I don't see any reason to be concerned yet. |
@gcs278: trigger 2 job(s) for the /payload-(job|aggregate) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/77e68c70-3c8c-11ee-8d44-3ef99b9b171f-0 |
2nd try of payload testing:
No indication that payload testing is failing due to this DNS update. I feel we are ready to merge. |
@gcs278: Jira Issue OCPBUGS-15605: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-15605 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
As endorsed at DNS Flag Day, the DNS Community recommends a bufsize setting of 1232 as a safe default that supports larger payloads, while generally avoiding IP fragmentation on most networks. This is particularly relevant for payloads like those generated by DNSSEC, which tend to be larger.
Previously, CoreDNS always used the EDNS0 extension, which enables UDP-based DNS queries to exceed 512 bytes, when CoreDNS forwarded DNS queries to an upstream name server, and so OpenShift specified a bufsize setting of 512 to maintain compatibility with applications and name servers that did not support the EDNS0 extension.
For clients and name servers that do support EDNS0, a bufsize setting of 512 can result in more DNS truncation and unnecessary TCP retransmissions, resulting in worse DNS performance for most users. This is due to the fact that if a response is larger than the bufsize setting, it gets truncated, prompting clients to initiate a TCP retry. In this situation, two DNS requests are made for a single DNS answer, leading to higher bandwidth usage and longer response times.
Currently, CoreDNS no longer uses EDNS0 when forwarding requests if the original client request did not use EDNS0 (ref: coredns/coredns@a5b9749), and so the reasoning for using a bufsize setting of 512 no longer applies. By increasing the bufsize setting to the recommended value of 1232 bytes, we can enhance DNS performance by decreasing the probability of DNS truncations.
Using a larger bufsize setting of 1232 bytes also would potentially help alleviate bugs like https://issues.redhat.com/browse/OCPBUGS-6829 in which a non-compliant upstream DNS is not respecting a bufsize of 512 bytes and sending larger-than-512-bytes responses. A bufsize setting of 1232 bytes doesn't fix the root cause of this issue; rather, it decreases the likelihood of its occurrence by increasing the acceptable size range for UDP responses.
Note that clients that don’t support EDNS0 or TCP, such as applications built using older versions of Alpine Linux, are still subject to the aforementioned truncation issue. To avoid these issues, ensure that your application is built using a DNS resolver library that supports EDNS0 or TCP-based DNS queries.
Brief history of OpenShift's Bufsize changes:
Slack Thread