-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ubuntu 18.04 unable to resolve cognitiveservices DNS names #798
Comments
- Workaround issue resolving cognitiveservices names - actions/runner-images#798
Also, this does not repro on a new Azure Ubuntu 18.04 VM. It only repros on a DevOps Ubuntu 18.04 Hosted Agent. |
@mikeharder Hello, Thank you for provided details and the investigation. I was able to reproduce the issue on Ubuntu 18.04 agent, and it seems something is configured incorrectly here, since file directly shows us that
Locally, we still use 127.0.0.53 address, which is recorded in the file /etc/resolv.conf. It seems you are right, it is required to link local /etc/resolv.conf file with |
@mikeharder I have created Pull Request with suggested workaround. As soon as all verification processes are complete, workaround will be applied on Ubuntu 18 agents, until then your workaround is the best option here. |
@mikeharder We have added fix for the issue to the image and it will be rolled out next week. |
The PR breaks agent build for self hosted agent pools:
|
@nerijusk |
@Darleev: Sounds good. I don't think my workaround is the correct long-term solution, it's just sufficient to unblock our builds. The local (stub) DNS server should be able to resolve all DNS names. As I mentioned earlier, this doesn't repro on an Azure Ubuntu 18 VM, so you might want to start by figuring out which additional component on a DevOps Hosted Ubuntu 18 VM is causing this behavior difference. Maybe Docker? |
Last time I tested it I am pretty sure it did not repro on an Azure Ubuntu 18 VM. However, I just created a new Azure Ubuntu 18 VM and now it does repro until I use the same workaround. |
However, it does not repro on a Hyper-V VM created from |
@nerijusk, Could you please append and validate script with small changes ?
|
@mikeharder @nerijusk we have applied new fix according to comment above and it will be rolled out next week. |
Are we pulling in kernel upgrades with each new image? Do we suspect a DNS bug that was recently introduced into systemd? If so, we should probably file an issue upstream. |
@chkimes: It might be this bug, but I am not an expert in this area so I am not certain: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1822416 While it does not repro on a new VM created from the ISO, it does also repro on a new Azure Ubuntu 18.04 image, so the issue does appear to be upstream from DevOps. One thing I know recently changed in the Azure Ubuntu 18.04 image is the default culture was changed from |
It's possible that they're related, but the behaviors appear to be different from the bug description. I'm seeing the DNS resolver return SERVFAIL while the linked ticket describes NXDOMAIN responses. I took a packet capture and, interestingly, I see the systemd resolver making an external DNS query and the response making it back to systemd, however it appears to completely ignore the response and re-issue queries 1 and 2 seconds later (likely due to a configured timeout).
|
@mikeharder fix has been applied to the current images and initial DNS issue should not be reproduced anymore. Could you please verify? |
I think at least reporting the bug to systemd is the responsible thing to do here. It was clearly regressed in a recent release, so something broke and we shouldn't have to work around it. |
@Darleev @chkimes this workaround breaks the stuff for some users, we have to rollback the changes |
@chkimes @mikeharder That looks like a systemd-resolve bug, that cannot be fixed on our side due to possible unpredictable impact on other customers( example ). As a workaround I suggest using the way described in the initial message. |
@Darleev, @chkimes: Last time I tested, I could repro this on a new Azure Ubuntu 18 VM, but not on a new Hyper-V VM created from the latest Ubuntu Server ISO. And I believe both VMs were using the same version of So I am not sure if this issue is in base Ubuntu Server image, or specific to the Azure Ubuntu image. Do you know how to report issues against the Azure Ubuntu images? |
@mikeharder Could you please provide an output of commands:
from Hyper-V VM? It helps to understand the difference. |
@mikeharder, |
@Darleev: The output of the latter two commands appear to be identical on both an Azure Ubuntu 18 VM and a Hyper-V Ubuntu 18 VM (created from the Ubuntu Server ISO).
The first command appears to be identical in the "Global" section, with slight differences in the "Link" sections:
|
Just for reference #191441649 |
After much digging, I believe this is relevant: systemd/systemd#10672 I see that with EDNS extensions, we are specifying a maximum 512-byte response size. When Azure responds, the response does not include the final A record. If the EDNS extension is removed, Azure then responds with the final A record. I find this strange since neither response goes over 512 bytes. I'm still following up with Azure here, but perhaps we may be able to work around it by disabling the EDNS extension or attempting to raise the max response size. |
Successful query:
Unsuccessful query:
Notable difference:
And notable differences in the responses:
Interestingly, systemd-resolved is setting the maximum payload size to 512 regardless of whether EDNS0 is configured and regardless of what is sent to it for the payload size. I'm reasonably sure that the way to fix this is to increase the payload size that systemd-resolved is using but I can't find any details about how to do that in the docs. This explains why bypassing the local resolver was effective as a workaround. |
Hello @mikeharder, In case of any questions or issues, feel free to contact us. |
Suspected root cause: Azure/WALinuxAgent#1673 |
Describe the bug
Ubuntu 18.04 is unable to resolve
*.cognitiveservices.azure.com
DNS names by default. As a workaround, we are bypassing the local (stub) DNS server using the following command:This may be related to https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1822416 and/or #397.
Area for Triage:
Servers
Question, Bug, or Feature?:
Bug
Virtual environments affected
Expected behavior
Ubuntu 18.04 should be able to resolve
*.cognitiveservices.azure.com
DNS names by default.Actual behavior
If we try to resolve a
*.cognitiveservices.azure.com
DNS name, it fails withSERVFAIL
:https://dev.azure.com/mharder/public/_build/results?buildId=634&view=logs&j=3dc411e8-b5bf-57f2-a8a7-b25d565c86b1&t=f636eda2-37c8-5cad-c3dc-807f9e9ed0bb&l=59
However, if we bypass the local (stub) DNS server using the following command:
Then the DNS name can be resolved successfully:
https://dev.azure.com/mharder/public/_build/results?buildId=634&view=logs&j=3dc411e8-b5bf-57f2-a8a7-b25d565c86b1&t=aef3c3f0-973b-547b-f96d-ab903995d1d8&l=59
This doesn't repro on Ubuntu 16.04:
https://dev.azure.com/mharder/public/_build/results?buildId=634&view=logs&j=88c4e28e-b89e-5514-cbb8-a3c153cbe716&t=a85e08da-6704-59a1-1759-4e62f4964eb3&l=59
Pipeline Sources: https://github.com/mikeharder/AzurePipelineTests/blob/f920bf50f72fe45c1d653a3bbaac9dcaf3df7682/azure-pipelines.yml
The text was updated successfully, but these errors were encountered: