From 4a4e662a21d1b5c20a4672f47e924f40a336cd7f Mon Sep 17 00:00:00 2001 From: Stefan Junker Date: Thu, 20 Sep 2018 22:45:52 +0200 Subject: [PATCH 1/5] docs/libvirt-howto: structure and misc improvement --- docs/dev/libvirt-howto.md | 49 +++++++++++++++++++++------------------ 1 file changed, 27 insertions(+), 22 deletions(-) diff --git a/docs/dev/libvirt-howto.md b/docs/dev/libvirt-howto.md index bcdb6e712af..a8d23ae468a 100644 --- a/docs/dev/libvirt-howto.md +++ b/docs/dev/libvirt-howto.md @@ -1,33 +1,31 @@ -# Libvirt howto +# Libvirt HOWTO Tectonic has limited support for installing a Libvirt cluster. This is useful especially for operator development. -## HOW TO: -### 1. One-time setup +## 1. One-time setup It's expected that you will create and destroy clusters often in the course of development. These steps only need to be run once (or once per RHCOS update). -#### 1.1 Pick a name and ip range +### 1.1 Pick a name and ip range In this example, we'll set the baseDomain to `tt.testing`, the name to `test1` and the ipRange to `192.168.124.0/24` -#### 1.2 Clone the repo +### 1.2 Clone the repo ```sh git clone https://github.com/openshift/installer.git cd installer ``` -#### 1.3 (Optional) Download and prepare the operating system image +### 1.3 (Optional) Download and prepare the operating system image *By default, the installer will download the latest RHCOS image every time it is invoked. This may be problematic for users who create a large number of clusters or who have limited network bandwidth. The installer allows a local image to be used instead.* Download the latest RHCOS image (you will need access to the Red Hat internal build systems): ```sh -wget http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz -gunzip rhcos-qemu.qcow2.gz +curl http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz | gunzip > rhcos-qemu.qcow2 ``` -#### 1.4 Get a pull secret +### 1.4 Get a pull secret Go to https://account.coreos.com/ and obtain a Tectonic *pull secret*. #### 1.5 Make sure you have permisions for `qemu:///system` @@ -42,7 +40,7 @@ polkit.addRule(function(action, subject) { EOF ``` -#### 1.6 Configure libvirt to accept TCP connections +### 1.6 Configure libvirt to accept TCP connections The Kubernetes [cluster-api](https://github.com/kubernetes-sigs/cluster-api) components drive deployment of worker machines. The libvirt cluster-api @@ -50,6 +48,8 @@ provider will run inside the local cluster, and will need to connect back to the libvirt instance on the host machine to deploy workers. In order for this to work, you'll need to enable TCP connections for libvirt. + +#### Configure libvirtd.conf To do this, first modify your `/etc/libvirt/libvirtd.conf` and set the following: ``` @@ -61,6 +61,7 @@ tcp_port = "16509" Note that authentication is not currently supported, but should be soon. +#### Configure the service runner to pass `--listen` to libvirtd In addition to the config, you'll have to pass an additional command-line argument to libvirtd. On Fedora, modify `/etc/sysconfig/libvirtd` and set: @@ -76,6 +77,7 @@ libvirtd_opts="--listen" Next, restart libvirt: `systemctl restart libvirtd` +#### Firewall Finally, if you have a firewall, you may have to allow connections from the IP range used by your cluster nodes. If you're using the default subnet of `192.168.124.0/24`, something along these lines should work: @@ -114,7 +116,7 @@ NOTE: When the firewall rules are no longer needed, `sudo firewalld-cmd --reload will remove the changes made as they were not permanently added. For persistence, include the `--permanent` to the commands that add-source and add-port. -#### 1.7 Prepare the configuration file +### 1.7 Prepare the installer configuration file 1. `cp examples/libvirt.yaml ./` 2. Edit the configuration file: 1. Set an email and password in the `admin` section @@ -125,37 +127,40 @@ include the `--permanent` to the commands that add-source and add-port. 6. Set the `pullSecret` to your JSON pull secret. 7. (Optional) Change the `image` to the file URL of the operating system image you downloaded (e.g. `file:///home/user/Downloads/rhcos.qcow`). This will allow the installer to re-use that image instead of having to download it every time. -#### 1.8 Set up NetworkManager DNS overlay +### 1.8 Set up NetworkManager DNS overlay This step is optional, but useful for being able to resolve cluster-internal hostnames from your host. 1. Edit `/etc/NetworkManager/NetworkManager.conf` and set `dns=dnsmasq` in section `[main]` -2. Tell dnsmasq to use your cluster. The syntax is `server=//`. For this example: -```sh -echo server=/tt.testing/192.168.124.1 | sudo tee /etc/NetworkManager/dnsmasq.d/tectonic.conf -``` +2. Tell dnsmasq to use your cluster. The syntax is `server=//`. + + For this example: + + ```sh + echo server=/tt.testing/192.168.124.1 | sudo tee /etc/NetworkManager/dnsmasq.d/tectonic.conf + ``` 3. `systemctl restart NetworkManager` -#### 1.9 Install the terraform provider +### 1.9 Install the terraform provider 1. Make sure you have the `virsh` binary installed: `sudo dnf install libvirt-client libvirt-devel` 2. Install the libvirt terraform provider: ```sh GOBIN=~/.terraform.d/plugins go get -u github.com/dmacvicar/terraform-provider-libvirt ``` -#### 1.10 Cache terrafrom plugins (optional, but makes subsequent runs a bit faster) +### 1.10 Cache terrafrom plugins (optional, but makes subsequent runs a bit faster) ```sh cat < $HOME/.terraformrc plugin_cache_dir = "$HOME/.terraform.d/plugin-cache" EOF ``` -### 2. Build the installer +## 2. Build the installer Following the instructions in the root README: ```sh bazel build tarball ``` -### 3. Create a cluster +## 3. Create a cluster ```sh tar -zxf bazel-bin/tectonic-dev.tar.gz alias tectonic="${PWD}/tectonic-dev/installer/tectonic" @@ -186,10 +191,10 @@ With the cluster removed, you no longer need to allow libvirt nodes to reach you sudo firewall-cmd --reload ``` -# Exploring your cluster +## 4. Exploring your cluster Some things you can do: -## Watch the bootstrap process +### Watch the bootstrap process The bootstrap node, e.g. test1-bootstrap.tt.testing, runs the tectonic bootstrap process. You can watch it: ```sh From ffd5eeeee6e7df2b972e4fbf87dacf375c4d49ff Mon Sep 17 00:00:00 2001 From: Stefan Junker Date: Thu, 20 Sep 2018 22:45:00 +0200 Subject: [PATCH 2/5] docs/libvirt-howto: add faq & troubleshooting --- docs/dev/libvirt-howto.md | 68 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 63 insertions(+), 5 deletions(-) diff --git a/docs/dev/libvirt-howto.md b/docs/dev/libvirt-howto.md index a8d23ae468a..ee2553c558d 100644 --- a/docs/dev/libvirt-howto.md +++ b/docs/dev/libvirt-howto.md @@ -203,7 +203,7 @@ sudo journalctl -f -u bootkube -u tectonic ``` You'll have to wait for etcd to reach quorum before this makes any progress. -## Inspect the cluster with kubectl +### Inspect the cluster with kubectl You'll need a kubectl binary on your path. ```sh export KUBECONFIG="${PWD}/${CLUSTER_NAME}/generated/auth/kubeconfig" @@ -219,11 +219,69 @@ master0# export KUBECONFIG=/var/opt/tectonic/auth/kubeconfig master0# kubectl get -n tectonic-system pods ``` -## Connect to the cluster console +### Connect to the cluster console This will take ~30 minutes to be available. Simply go to `https://${CLUSTER_NAME}-api.${BASE_DOMAIN}:6443/console/` (e.g. `test1.tt.testing`) and log in using the credentials above. -# Libvirt vs. AWS -1. There isn't a load balancer. This means: + +## FAQ + +### Libvirt vs. AWS +1. There isn't a load balancer on libvirt. This means: 1. We need to manually remap ports that the loadbalancer would - 2. Only the first server (e.g. master) is actually used. If you want to reach another, you have to manually update the domain name. + +## Troubleshooting +If following the above steps hasn't quite worked, please review this section for well known issues. + +### SELinux might prevent access to image files +Configuring the storage pool to store images in a path incompatible with the SELinux policies (e.g. your home directory) might lead to the following errors: + +``` +Error: Error applying plan: + +1 error(s) occurred: + +* libvirt_domain.etcd: 1 error(s) occurred: + +* libvirt_domain.etcd: Error creating libvirt domain: virError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2018-07-30T22:52:54.865806Z qemu-kvm: -fw_cfg name=opt/com.coreos/config,file=/home/user/VirtualMachines/etcd.ign: can't load /home/user/VirtualMachines/etcd.ign') +``` + +[As described here][libvirt_selinux_issues] you can workaround by disabling SELinux, or store the images in a place well-known to work, e.g. by using the default pool. + +### Random domain creation errors due to libvirt race conditon +Depending on your libvirt version you might encounter [a race condition][bugzilla_libvirt_race] leading to an error similar to: + +``` +* libvirt_domain.master.0: Error creating libvirt domain: virError(Code=43, Domain=19, Message='Network not found: no network with matching name 'tectonic'') +``` +This is also being [tracked on the libvirt-terraform-provider][tfprovider_libvirt_race] but is likely not fixable on the client side, which is why you should upgrade libvirt to >=4.5 or a patched version, depending on your environment. + +### MacOS support currently broken +* Support for libvirt on Mac OS [is currently broken and being worked on][brokenmacosissue201]. + +### Error with firewall initialization on Arch Linux +If you're on Arch Linux and get an error similar to + +``` +libvirt: “Failed to initialize a valid firewall backend” +``` + +or + +``` +error: Failed to start network default +error: internal error: Failed to initialize a valid firewall backend +``` + +please check out [this thread on superuser][arch_firewall_superuser]. + +### Github Issue Tracker +You might find other reports of your problem in the [Issues tab for this repository][issues_libvirt] where we ask you to provide any additional information. +If your issue is not reported, please do. + +[arch_firewall_superuser]: https://superuser.com/questions/1063240/libvirt-failed-to-initialize-a-valid-firewall-backend +[brokenmacosissue201]: https://github.com/openshift/installer/issues/201 +[bugzilla_libvirt_race]: https://bugzilla.redhat.com/show_bug.cgi?id=1576464 +[issues_libvirt]: https://github.com/openshift/installer/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+libvirt +[libvirt_selinux_issues]: https://github.com/dmacvicar/terraform-provider-libvirt/issues/142#issuecomment-409040151 +[tfprovider_libvirt_race]: https://github.com/dmacvicar/terraform-provider-libvirt/issues/402#issuecomment-419500064 \ No newline at end of file From 98d1a1de08fc1af489fcad8a2d391fe18344a1b4 Mon Sep 17 00:00:00 2001 From: Stefan Junker Date: Mon, 24 Sep 2018 20:35:32 +0200 Subject: [PATCH 3/5] docs/dev/libvirt: mention error for wrong URI format --- docs/dev/libvirt-howto.md | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/docs/dev/libvirt-howto.md b/docs/dev/libvirt-howto.md index ee2553c558d..8da8aa0d9b0 100644 --- a/docs/dev/libvirt-howto.md +++ b/docs/dev/libvirt-howto.md @@ -227,12 +227,26 @@ This will take ~30 minutes to be available. Simply go to `https://${CLUSTER_NAME ## FAQ ### Libvirt vs. AWS -1. There isn't a load balancer on libvirt. This means: - 1. We need to manually remap ports that the loadbalancer would +1. There isn't a load balancer on libvirt. ## Troubleshooting If following the above steps hasn't quite worked, please review this section for well known issues. +### Install throws an `Unable to resolve address 'localhost'` error + +If you're seeing an error similar to + +``` +Error: Error refreshing state: 1 error(s) occurred: + +* provider.libvirt: virError(Code=38, Domain=7, Message='Unable to resolve address 'localhost' service '-1': Servname not supported for ai_socktype') + + +FATA[0019] failed to run Terraform: exit status 1 +``` + +it is likely that your install configuration contains three backslashes after the protocol (i.e. `qemu+tcp:///...`), when it should only be two. + ### SELinux might prevent access to image files Configuring the storage pool to store images in a path incompatible with the SELinux policies (e.g. your home directory) might lead to the following errors: @@ -284,4 +298,4 @@ If your issue is not reported, please do. [bugzilla_libvirt_race]: https://bugzilla.redhat.com/show_bug.cgi?id=1576464 [issues_libvirt]: https://github.com/openshift/installer/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+libvirt [libvirt_selinux_issues]: https://github.com/dmacvicar/terraform-provider-libvirt/issues/142#issuecomment-409040151 -[tfprovider_libvirt_race]: https://github.com/dmacvicar/terraform-provider-libvirt/issues/402#issuecomment-419500064 \ No newline at end of file +[tfprovider_libvirt_race]: https://github.com/dmacvicar/terraform-provider-libvirt/issues/402#issuecomment-419500064 From 75508242801d4185ba4fe6b36326dcd1d069dc41 Mon Sep 17 00:00:00 2001 From: Stefan Junker Date: Mon, 24 Sep 2018 20:35:32 +0200 Subject: [PATCH 4/5] docs/dev/libvirt: mention error for missing file:/// in image --- docs/dev/libvirt-howto.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/dev/libvirt-howto.md b/docs/dev/libvirt-howto.md index 8da8aa0d9b0..c9415126f98 100644 --- a/docs/dev/libvirt-howto.md +++ b/docs/dev/libvirt-howto.md @@ -247,6 +247,16 @@ FATA[0019] failed to run Terraform: exit status 1 it is likely that your install configuration contains three backslashes after the protocol (i.e. `qemu+tcp:///...`), when it should only be two. +### Init throws an `unsupported protocol scheme` error +If you're seeing an error similar to + +``` +$ tectonic init --config ~/tectonic.libvirt.yaml +FATA[0000] Get : unsupported protocol scheme "" +``` + +then you're probably missing the `file:///` in the value for `image:` in the install configuration. + ### SELinux might prevent access to image files Configuring the storage pool to store images in a path incompatible with the SELinux policies (e.g. your home directory) might lead to the following errors: From dfa04e7ed7f2a09d3bbc2ef1d79394a1a11fcfdf Mon Sep 17 00:00:00 2001 From: Stefan Junker Date: Mon, 24 Sep 2018 20:35:32 +0200 Subject: [PATCH 5/5] docs/dev/libvirt: make firewall instructions consistent Since we're instructing to use 192.168.122.1 for the libvirt URI, which is apparently what's used by the clusterapi-controller to talk to libvirt, the firewall has to match, otherwise it looks likt this in the logs: ``` E0924 21:26:08.925983 1 controller.go:115] Error checking existance of machine instance for machine object worker-fdtdg; Failed to build libvirt client: virError(Code=38, Domain=7, Message='unable to connect to server at '192.168.122.1:16509': Connection timed out') ``` --- docs/dev/libvirt-howto.md | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/docs/dev/libvirt-howto.md b/docs/dev/libvirt-howto.md index c9415126f98..18eb83e7016 100644 --- a/docs/dev/libvirt-howto.md +++ b/docs/dev/libvirt-howto.md @@ -28,7 +28,7 @@ curl http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest ### 1.4 Get a pull secret Go to https://account.coreos.com/ and obtain a Tectonic *pull secret*. -#### 1.5 Make sure you have permisions for `qemu:///system` +### 1.5 Make sure you have permissions for `qemu:///system` You may want to grant yourself permissions to use libvirt as a non-root user. You could allow all users in the wheel group by doing the following: ```sh cat <> /etc/polkit-1/rules.d/80-libvirt.rules @@ -78,15 +78,19 @@ libvirtd_opts="--listen" Next, restart libvirt: `systemctl restart libvirtd` #### Firewall -Finally, if you have a firewall, you may have to allow connections from the IP -range used by your cluster nodes. If you're using the default subnet of -`192.168.124.0/24`, something along these lines should work: +Finally, if you have a firewall, you may have to allow connections to the +libvirt daemon from the IP range used by your cluster nodes. + +#### Manual management +The following example rule works for the suggested cluster ipRange of `192.168.124.0/24` and a libvirt *default* subnet of `192.168.122.0/24`, which might be different in your configuration: ``` -iptables -I INPUT -p tcp -s 192.168.124.0/24 -d 192.168.124.1 --dport 16509 \ +iptables -I INPUT -p tcp -s 192.168.124.0/24 -d 192.168.122.1 --dport 16509 \ -j ACCEPT -m comment --comment "Allow insecure libvirt clients" ``` +#### Firewalld + If using `firewalld`, simply obtain the name of the existing active zone which can be used to integrate the appropriate source and ports to allow connections from the IP range used by your cluster nodes. An example is shown below. @@ -125,7 +129,10 @@ include the `--permanent` to the commands that add-source and add-port. 4. Set the `name` (e.g. test1) 5. Look at the `podCIDR` and `serviceCIDR` fields in the `networking` section. Make sure they don't conflict with anything important. 6. Set the `pullSecret` to your JSON pull secret. - 7. (Optional) Change the `image` to the file URL of the operating system image you downloaded (e.g. `file:///home/user/Downloads/rhcos.qcow`). This will allow the installer to re-use that image instead of having to download it every time. + 7. Ensure the `libvirt.uri` IP address matches your virbr0 interface IP address which belongs to the libvirt *default* network. + If you're uncertain about the libvirt *default* subnet you should be able to see its address using the command `ip -4 a show dev virbr0` or by inspecting `virsh --connect qemu:///system net-dumpxml default`. + 8. Ensure the `libvirt.network.ipRange` does not overlap your virbr0 IP address + 9. (Optional) Change the `image` to the file URL of the operating system image you downloaded (e.g. `file:///home/user/Downloads/rhcos.qcow`). This will allow the installer to re-use that image instead of having to download it every time. ### 1.8 Set up NetworkManager DNS overlay This step is optional, but useful for being able to resolve cluster-internal hostnames from your host.