Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No name resolution on new Ubuntu 18 vm #1151

Closed
dakale opened this issue May 7, 2018 · 15 comments
Closed

No name resolution on new Ubuntu 18 vm #1151

dakale opened this issue May 7, 2018 · 15 comments

Comments

@dakale
Copy link

dakale commented May 7, 2018

Apologies if this is not the correct place for this issue.

My team is building templates with Packer and deploying VMs with az group create deployment ..., based loosely on these docs: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/build-image-with-packer

I noticed that after the provisioning done by Packer, waagent deletes resolv.conf. This is fine, and works as expected when deploying a 16 vm.

However as we attempt to move onto Ubuntu 18, when the vm is created, /etc/resolv.conf is missing, meaning it is not recreated when the vm is created from the disk image. nslookup microsoft.com times out, and no names are resolved when using eg curl. systemd-resolve however does work.

Is this something waagent is responsible for?

@boumenot
Copy link
Member

boumenot commented May 8, 2018

I think this is good place to start.

I would expect the DHCP client to create /etc/resolv.conf, but that may have changed with 18.04. The fact that the agent deleting it causes everything to break seems odd, but I don't know. I will need to investigate and try to reproduce.

@dakale
Copy link
Author

dakale commented May 8, 2018

As an update I found out by reading this unrelated bug that:

cloud-init has never updated resolv.conf directly.
/etc/resolv.conf in 16.04 is managed via resolvconf through /etc/network/interfaces.
/etc/resolv.conf in 18.04 is managed via systemd-resolve (netplan -> systemd-networkd -> systemd-resolve).

So based on systemd-resolve I can just symlink /run/systemd/resolve/resolv.conf into /etc/. For now, this appears to work. However, Im not sure if this should be done during the initial provisioning (via Packer), because ill be using the template to create vms in various different networks.

Again, not sure if this is related to waagent, but could this be something waagent takes care of when creating an 18.x vm?

@boumenot
Copy link
Member

boumenot commented May 8, 2018

The agent has two pieces: provisioning and extension handling. The provisioning piece is handled by cloud-init on Ubuntu, and will be eventually be used by CentOS, RHEL and others. What you are describing is the provisioning piece, so I do not think the agent is the right piece to handle this.

/cc @paulmey - can you shed any light on this discussion?

@dakale
Copy link
Author

dakale commented May 8, 2018

Okay, thanks for the info. Is it correct to assume there is some built in cloud-init configuration in the Azure Ubuntu skus? Based on that I am seeing how this is likely not an agent issue, feel free to close.

In the meantime Ill try providing a custom cloud-init configuration to link the resolv.conf and try to find out where to follow up regarding cloud-init on Ubuntu 18

@dakale
Copy link
Author

dakale commented May 8, 2018

Additionally, I have found out that my Ubuntu 18 VM "works" if I remove the Packer step that runs waagent to deprovision. By "works" I mean /etc/resolv.conf is not deleted (obviously) and it remains as a symlink to /run/systemd/resolve/stub-resolv.conf

Maybe it could be an option to deprovision, but omit the step to remove resolv.conf? Admittedly, I do not fully understand why removing resolv.conf is needed, or if it is still needed in Ubuntu 18- where this file is managed differently.

@paulmey
Copy link
Member

paulmey commented May 8, 2018

TLDR; this is basically #855, waagent should not be deleting files that it didn't create. resolv.conf belongs to a package in the network stack and if waagent messes with it, then it is basically implying a contract that doesn't exist.

In 18.04, DHCP, DNS (and almost all other networking services) switched to using systemd-networkd and related systemd services.

@dakale : waagent has code now to do a 'light provisioning' if it detects its it wakes up as a new instance. This might cause trouble when using extensions, but if you're only using Packer (which doesn't use extensions), it is probably a reasonable workaround until waagent stops messing with other packages' files.

@hglkrijger
Copy link
Member

@paulmey I agree - we should not be touching this file, I merged a fix to master.

@dakale the referenced change in #1164 should address your issue, if you are installing from source. If you have any problems, please reopen this issue.

@carlosporter
Copy link

carlosporter commented Aug 4, 2018

To whom it may interest,

The issue described happens when the Ubuntu 18.04 is deployed from an image deprovisioned with waagent.

When a new server is deployed the symlink to /run/systemd/resolve/stub-resolv.conf resolv.conf is not restored and as consequence the server cannot find a DNS server where to send queries.

Restoring the link manually does fix the problem after the customized virtual machine has been deployed. However in order to prevent further deployments with the same problem it's necessary to deprovision and restore the symlink as the root user on the original virtual machine ( you need root privileges to restore the symlink and sudo won't work after "waagent -deprovision+user" ).

So you should start the creation of a new Ubuntu 18.04 server image with these commands:

sudo -i

waagent -deprovision+user

cd /etc; ln -s ../run/systemd/resolve/stub-resolv.conf resolv.conf

@gustavomcarmo
Copy link

The issue persists in Ubuntu 18.04. The @carlosporter's solution works fine. I've just given a feedback to https://docs.microsoft.com/en-us/azure/virtual-machines/linux/build-image-with-packer

@Rangi-th
Copy link

Rangi-th commented Oct 5, 2018

The fix mentioned in #1164 is not found; as expected in the WA Linux Agent version 2.2.31, the version that comes along with the Ubuntu 18.04 image. May be Canonical has not pushed the fix into WA Linux Agent in their repositories?

The file has only the below content - /usr/lib/python3/dist-packages/azurelinuxagent/pa/deprovision/ubuntu.py

import os
import azurelinuxagent.common.utils.fileutil as fileutil
from azurelinuxagent.pa.deprovision.default import DeprovisionHandler, \
    DeprovisionAction


class UbuntuDeprovisionHandler(DeprovisionHandler):
    def __init__(self):
        super(UbuntuDeprovisionHandler, self).__init__()

    def del_resolv(self, warnings, actions):
        if os.path.realpath(
                '/etc/resolv.conf') != '/run/resolvconf/resolv.conf':
            warnings.append("WARNING! /etc/resolv.conf will be deleted.")
            files_to_del = ["/etc/resolv.conf"]
            actions.append(DeprovisionAction(fileutil.rm_files, files_to_del))
        else:
            warnings.append("WARNING! /etc/resolvconf/resolv.conf.d/tail "
                            "and /etc/resolvconf/resolv.conf.d/original will "
                            "be deleted.")
            files_to_del = ["/etc/resolvconf/resolv.conf.d/tail",
                            "/etc/resolvconf/resolv.conf.d/original"]
            actions.append(DeprovisionAction(fileutil.rm_files, files_to_del))

@AgSync-Aaron
Copy link

I have been beating my head against the wall with this issue for the past several days. I can confirm that the waagent deletes the resolv.conf file and I get no DNS name resolution on ubuntu 18.04. Please re-open this issue.

@carlosporter
Copy link

@AgSync-Aaron what version of the agent are you running on the server you used to create the agent ?

@AgSync-Aaron
Copy link

$ waagent --version
WALinuxAgent-2.2.20 running on ubuntu 18.04
Python: 3.6.6
Goal state agent: 2.2.34

👆 That is the version that's in the default Ubnutu 18.04 image when creating a VM through portal.azure.com

@AgSync-Aaron
Copy link

Okay don't bother re-opening this issue, I see it's being addressed here.
Thanks.

@ChadNedzlek
Copy link

It seems like this fix is explicitly only for 18.04. 18.10 is broken the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants