Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/libvirt-howto: add faq & troubleshooting #297

Merged
merged 5 commits into from
Sep 26, 2018
Merged

docs/libvirt-howto: add faq & troubleshooting #297

merged 5 commits into from
Sep 26, 2018

Conversation

steveej
Copy link
Contributor

@steveej steveej commented Sep 20, 2018

This also improves the general structure a bit to decrease the likelihood of confusion.

Potentially conflicts with doc changes in #296.

Fixes #311.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 20, 2018
@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 20, 2018
Documentation/dev/libvirt-howto.md Outdated Show resolved Hide resolved
Documentation/dev/libvirt-howto.md Outdated Show resolved Hide resolved
Documentation/dev/libvirt-howto.md Outdated Show resolved Hide resolved
Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hooray, docs :). A few nits inline.

@@ -1,33 +1,31 @@
# Libvirt howto
# Libvirt HOWTO

Tectonic has limited support for installing a Libvirt cluster. This is useful especially
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we ok with dropping "limited" now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could say that we support libvirt on Linux


*By default, the installer will download the latest RHCOS image every time it is invoked. This may be problematic for users who create a large number of clusters or who have limited network bandwidth. The installer allows a local image to be used instead.*

Download the latest RHCOS image (you will need access to the Red Hat internal build systems):

```sh
wget http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz
gunzip rhcos-qemu.qcow2.gz
curl http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz | gunzip > rhcos-qemu.qcow2.gz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to drop the .gz suffix from the final filename.

You also want a single space before the pipe (you currently have two).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks ;-)


For this example:

```sh echo server=/tt.testing/192.168.124.1 | sudo tee /etc/NetworkManager/dnsmasq.d/tectonic.conf ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't render correctly (you can see the sh). What you want is a traditional fenced block with four-space indents:

  • Tell dnsmasq...

    For this example:

    echo server=/tt.testing/192.168.124.1 | sudo tee /etc/NetworkManager/dnsmasq.d/tectonic.conf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I misread that command to invoke sh, thanks!

1. We need to manually remap ports that the loadbalancer would
2. Only the first server (e.g. master) is actually used. If you want to reach another, you have to manually update the domain name.

## Troubleshooting
If following the above steps hasn't quite worked please review this section for well known issues.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... hasn't quite worked, please..." (with a comma).


### Github Issue Tracker
You might find other reports of your problem in the [Issues tab for this repository][issues_libvirt] where we ask you to provide any additional information.
If you're issue is not reported, please do.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"you're" -> "your"


[bugzilla_libvirt_race]: https://bugzilla.redhat.com/show_bug.cgi?id=1576464
[tfprovider_libvirt_race]:
https://github.com/dmacvicar/terraform-provider-libvirt/issues/402#issuecomment-419500064
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd rather have the URL on the same line as the reference, but GitHub seems to render this correctly, so I'm ok with you deciding to wrap it like this if you feel strongly ;).

[libvirt_selinux_issues]: https://github.com/dmacvicar/terraform-provider-libvirt/issues/142#issuecomment-409040151
[brokenmacosissue201]: https://github.com/openshift/installer/issues/201
[arch_firewall_superuser]:https://superuser.com/questions/1063240/libvirt-failed-to-initialize-a-valid-firewall-backend

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the blank line?

nit: I like collating these entries by anchor (e.g. highlight them all and use sort-lines in Emacs ;), because then I don't have to think about where to insert new references.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very helpful, thanks

@crawford
Copy link
Contributor

Why are e2e tests running on this PR? It's entirely documentation.

@steveej
Copy link
Contributor Author

steveej commented Sep 21, 2018

Why are e2e tests running on this PR? It's entirely documentation.

I had the same thought when I created this PR. @sallyom can you answer that?

@abhinavdahiya
Copy link
Contributor

abhinavdahiya commented Sep 21, 2018

@crawford @steveej ci/prow/e2e-aws — Skipped ci/prow/e2e-aws-smoke — Skipped they don't seem to be running


*By default, the installer will download the latest RHCOS image every time it is invoked. This may be problematic for users who create a large number of clusters or who have limited network bandwidth. The installer allows a local image to be used instead.*

Download the latest RHCOS image (you will need access to the Red Hat internal build systems):

```sh
wget http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz
gunzip rhcos-qemu.qcow2.gz
curl http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz | gunzip > rhcos-qemu.qcow2
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a step in @sjenning first screencast instructing how to set enforcing=0 in the qcow image with virt-edit, if that's still necessary we should add that here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to test it again with that image but the URL doesn't work anymore. Has it changed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me.. are you still on the vpn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember being on the VPN for getting that image. Is there a public mirror?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a public mirror?

No, although I don't know what the reasoning is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a step in @sjenning first screencast instructing how to set enforcing=0 in the qcow image with virt-edit, if that's still necessary we should add that here.

It is not necessary anymore


The Kubernetes [cluster-api](https://github.com/kubernetes-sigs/cluster-api)
components drive deployment of worker machines. The libvirt cluster-api
provider will run inside the local cluster, and will need to connect back to
the libvirt instance on the host machine to deploy workers.

In order for this to work, you'll need to enable TCP connections for libvirt.

##### Configure libvirtd.conf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you meant to be at level 4 instead of 5 here?

-j ACCEPT -m comment --comment "Allow insecure libvirt clients"
```

If your uncertain about the libvirt *default* subnet you should be able to see its address using the command `
ip -4 a show dev virbr0`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another command to note would be $ virsh --connect qemu:///system net-dumpxml default to see how the default network has been configured

@@ -125,37 +133,40 @@ include the `--permanent` to the commands that add-source and add-port.
6. Set the `pullSecret` to your JSON pull secret.
7. (Optional) Change the `image` to the file URL of the operating system image you downloaded (e.g. `file:///home/user/Downloads/rhcos.qcow`). This will allow the installer to re-use that image instead of having to download it every time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put my content from https://github.com/openshift/installer/pull/323/files in here please. it's all still entirely relevant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@bparees
Copy link
Contributor

bparees commented Sep 25, 2018

can you add something about tearing down the cluster?

@@ -123,39 +129,45 @@ include the `--permanent` to the commands that add-source and add-port.
4. Set the `name` (e.g. test1)
5. Look at the `podCIDR` and `serviceCIDR` fields in the `networking` section. Make sure they don't conflict with anything important.
6. Set the `pullSecret` to your JSON pull secret.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs doc on where to get this file. Also @derekwaynecarr showed me it's easier to point to a path than to try to inline the file:
pullSecretPath: "/home/bparees/git/gocode/src/github.com/openshift/installer/files/config.json"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the reason i say we need doc on where to get this file is i was told not to get it from coreos.com (and i ran into problems trying to do so).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pullSecretPath: "/home/bparees/git/gocode/src/github.com/openshift/installer/files/config.json"

This is deprecated. Once we get the new installer and #320, you'll be able to use OPENSHIFT_INSTALL_PULL_SECRET="$(cat path/to/your/secret)".

@bparees
Copy link
Contributor

bparees commented Sep 25, 2018

can you add something about tearing down the cluster?

nm, i see it's there (tectonic destroy)

@steveej
Copy link
Contributor Author

steveej commented Sep 26, 2018

I'm not sure what we're waiting for here. Did I leave any comments unaddressed?

@ashcrow
Copy link
Member

ashcrow commented Sep 26, 2018

@crawford do the updates look good?

FATA[0019] failed to run Terraform: exit status 1
```

it is likely that your install configuration contains three backslashes after the protocol (i.e. `qemu+tcp:///...), when it should only be two.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, missing a trailing backtick here.

@wking wking dismissed crawford’s stale review September 26, 2018 21:12

All of his requested changes have been addressed.

Since we're instructing to use 192.168.122.1 for the libvirt URI, which
is apparently what's used by the clusterapi-controller to talk to
libvirt, the firewall has to match, otherwise it looks likt this in the
logs:

```
E0924 21:26:08.925983       1 controller.go:115] Error checking
existance of machine instance for machine object worker-fdtdg; Failed to
build libvirt client: virError(Code=38, Domain=7, Message='unable to
connect to server at '192.168.122.1:16509': Connection timed out')
```
@wking
Copy link
Member

wking commented Sep 26, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. and removed lgtm Indicates that a PR is ready to be merged. labels Sep 26, 2018
@steveej
Copy link
Contributor Author

steveej commented Sep 26, 2018

/lgtm

You're gonna have to go again, fixed another typo

@wking
Copy link
Member

wking commented Sep 26, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 26, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: steveeJ, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 22f8f5f into openshift:master Sep 26, 2018
@steveej steveej deleted the docs-libvirt-troubleshooting branch September 26, 2018 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.