machines/libvirt/worker.machineset.yaml: Drop /var/lib/libvirt/images #70

wking · 2018-09-23T07:03:51Z

The actuator looks up the baseVolumeID in the configured pool. That lookup works with both the volume name and full volume path:

$ virsh -c qemu:///system vol-info --vol coreos_base --pool default
Name:           coreos_base
Type:           file
Capacity:       16.00 GiB
Allocation:     1.55 GiB

$ virsh -c qemu:///system vol-info --vol /home/trking/VirtualMachines/coreos_base --pool default
Name:           coreos_base
Type:           file
Capacity:       16.00 GiB
Allocation:     1.55 GiB

But it fails if you use the wrong full path:

$ virsh -c qemu:///system vol-info --vol /var/lib/libvirt/images/coreos_base --pool default
error: failed to get vol '/var/lib/libvirt/images/coreos_base'
error: Storage volume not found: no storage vol with matching path '/var/lib/libvirt/images/coreos_base'

My default pool happens to be in my home directory:

$ virsh -c qemu:///system pool-dumpxml default
<pool type='dir'>
  <name>default</name>
  <uuid>c20a2154-aa60-44cf-bf37-cd8b7818a4e4</uuid>
  <capacity unit='bytes'>105554829312</capacity>
  <allocation unit='bytes'>44134699008</allocation>
  <available unit='bytes'>61420130304</available>
  <source>
  </source>
  <target>
    <path>/home/trking/VirtualMachines</path>
    <permissions>
      <mode>0777</mode>
      <owner>114032</owner>
      <group>114032</group>
      <label>system_u:object_r:virt_image_t:s0</label>
    </permissions>
  </target>
</pool>

This commit allows configutions like mine by dropping our opinions about the default pool location and just using the volume names:

$ virsh -c qemu:///system vol-list --pool default
 Name                 Path
------------------------------------------------------------------------------
 bootstrap            /home/trking/VirtualMachines/bootstrap
 bootstrap.ign        /home/trking/VirtualMachines/bootstrap.ign
 coreos_base          /home/trking/VirtualMachines/coreos_base
 master-0.ign         /home/trking/VirtualMachines/master-0.ign
 master0              /home/trking/VirtualMachines/master0
 worker.ign           /home/trking/VirtualMachines/worker.ign

Longer-term, it would be nice to pull both the pool and volume names from information pushed by the installer. But I'm punting on that for this commit.

Reported by @mrogers950.

openshift-ci-robot · 2018-09-23T07:04:01Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wking
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: vikaschoudhary16

If they are not already assigned, you can assign the PR to them by writing /assign @vikaschoudhary16 in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

The actuator looks up the baseVolumeID in the configured pool [1]. That lookup works with both the volume name and full volume path: $ virsh -c qemu:///system vol-info --vol coreos_base --pool default Name: coreos_base Type: file Capacity: 16.00 GiB Allocation: 1.55 GiB $ virsh -c qemu:///system vol-info --vol /home/trking/VirtualMachines/coreos_base --pool default Name: coreos_base Type: file Capacity: 16.00 GiB Allocation: 1.55 GiB But it fails if you use the wrong full path: $ virsh -c qemu:///system vol-info --vol /var/lib/libvirt/images/coreos_base --pool default error: failed to get vol '/var/lib/libvirt/images/coreos_base' error: Storage volume not found: no storage vol with matching path '/var/lib/libvirt/images/coreos_base' My default pool happens to be in my home directory: $ virsh -c qemu:///system pool-dumpxml default <pool type='dir'> <name>default</name> <uuid>c20a2154-aa60-44cf-bf37-cd8b7818a4e4</uuid> <capacity unit='bytes'>105554829312</capacity> <allocation unit='bytes'>44134699008</allocation> <available unit='bytes'>61420130304</available> <source> </source> <target> <path>/home/trking/VirtualMachines</path> <permissions> <mode>0777</mode> <owner>114032</owner> <group>114032</group> <label>system_u:object_r:virt_image_t:s0</label> </permissions> </target> </pool> This commit allows configutions like mine by dropping our opinions about the default pool location and just using the volume names: $ virsh -c qemu:///system vol-list --pool default Name Path ------------------------------------------------------------------------------ bootstrap /home/trking/VirtualMachines/bootstrap bootstrap.ign /home/trking/VirtualMachines/bootstrap.ign coreos_base /home/trking/VirtualMachines/coreos_base master-0.ign /home/trking/VirtualMachines/master-0.ign master0 /home/trking/VirtualMachines/master0 worker.ign /home/trking/VirtualMachines/worker.ign We've been using the full-path approach since the templates landed in 2522d0f (add libvirt support, 2018-08-30, openshift#35), but there's no discussion there about why the path approach was chosen instead of the name approach I'm switching to here. Longer-term, it would be nice to pull both the pool and volume names from information pushed by the installer [2]. But I'm punting on *that* for this commit. Reported-by: Matt Rogers <mrogers@redhat.com> [1]: https://github.com/openshift/cluster-api-provider-libvirt/blob/2e5a516afc704c6c94d7b7cde74e78c43bbfeaa5/cloud/libvirt/actuators/machine/utils/volume.go#L174 [2]: https://github.com/openshift/installer/blob/dc4764dc603cea5da0e54f575b7ae1a2c26d3102/pkg/types/machinepools.go#L53-L58

enxebre · 2018-09-24T10:37:03Z

@wking thanks a lot! lgtm @vikaschoudhary16 / @bison can you please have a look and merge at your convenience?

mrogers950 · 2018-09-24T13:58:00Z

@wking thanks for fixing this!

wking · 2018-09-24T22:41:19Z

Thoughts on how I'd test this? I built an image from this PR using the Dockerfile in the repo root, and pushed it to quay.io/wking/machine-api-operator:pr-70. Then I patched my installer to use that image:

$ git diff
diff --git a/modules/bootkube/resources/manifests/machine-api-operator.yaml b/modules/bootkube/resources/manifests/machine-api-operator.yaml
index 125b870..5c72b5a 100644
--- a/modules/bootkube/resources/manifests/machine-api-operator.yaml
+++ b/modules/bootkube/resources/manifests/machine-api-operator.yaml
@@ -19,7 +19,7 @@ spec:
     spec:
       containers:
       - name: machine-api-operator
-        image: quay.io/coreos/machine-api-operator:b6a04c2
+        image: quay.io/wking/machine-api-operator:pr-70
         command:
         - "/machine-api-operator"
         resources:
diff --git a/pkg/asset/manifests/content/bootkube/machine-api-operator.go b/pkg/asset/manifests/content/bootkube/machine-api-operator.go
index 48e4765..fd78535 100644
--- a/pkg/asset/manifests/content/bootkube/machine-api-operator.go
+++ b/pkg/asset/manifests/content/bootkube/machine-api-operator.go
@@ -24,7 +24,7 @@ spec:
     spec:
       containers:
       - name: machine-api-operator
-        image: quay.io/coreos/machine-api-operator:b6a04c2
+        image: quay.io/wking/machine-api-operator:pr-70
         command:
         - "/machine-api-operator"
         resources:

But when I try to launch a cluster, the machine-api-operator pod dies complaining about the invocation:

[core@trking-6d200-master-0 ~]$ sudo crictl logs 42396a6f9bd9b
Run Cluster API Controller

Usage:
  machine-api-operator [command]

Available Commands:
  help        Help about any command
  start       Starts Machine API Operator
  version     Print the version number of Machine API Operator

Flags:
      --alsologtostderr                  log to standard error as well as files
      --config string                    path to the mao config (default "/etc/mao-config/config")
  -h, --help                             help for machine-api-operator
      --log_backtrace_at traceLocation   when logging hits line file:N, emit a stack trace (default :0)
      --log_dir string                   If non-empty, write log files in this directory
      --logtostderr                      log to standard error instead of files
      --stderrthreshold severity         logs at or above this threshold go to stderr (default 2)
  -v, --v Level                          log level for V logs
      --vmodule moduleSpec               comma-separated list of pattern=N settings for file-filtered logging

Use "machine-api-operator [command] --help" for more information about a command.

Is there a different Dockerfile I should be using? Do I need to update the invocation to be more than /machine-api-operator? Something else?

wking · 2018-09-24T22:42:48Z

Looks like I need to add start (and possibly other things) to catch up with #67.

wking · 2018-09-24T23:02:58Z

Ok, I'm farther along with:

diff --git a/modules/bootkube/resources/manifests/machine-api-operator.yaml b/modules/bootkube/resources/manifests/machine-api-operator.yaml
index 125b870..a11046e 100644
--- a/modules/bootkube/resources/manifests/machine-api-operator.yaml
+++ b/modules/bootkube/resources/manifests/machine-api-operator.yaml
@@ -19,9 +19,11 @@ spec:
     spec:
       containers:
       - name: machine-api-operator
-        image: quay.io/coreos/machine-api-operator:b6a04c2
+        image: quay.io/wking/machine-api-operator:pr-70
         command:
         - "/machine-api-operator"
+        args:
+        - "start"
         resources:
           limits:
             cpu: 20m
@@ -51,4 +53,3 @@ spec:
           items:
           - key: mao-config
             path: config
-
diff --git a/pkg/asset/manifests/content/bootkube/machine-api-operator.go b/pkg/asset/manifests/content/bootkube/machine-api-operator.go
index 48e4765..e4e4156 100644
--- a/pkg/asset/manifests/content/bootkube/machine-api-operator.go
+++ b/pkg/asset/manifests/content/bootkube/machine-api-operator.go
@@ -24,9 +24,11 @@ spec:
     spec:
       containers:
       - name: machine-api-operator
-        image: quay.io/coreos/machine-api-operator:b6a04c2
+        image: quay.io/wking/machine-api-operator:pr-70
         command:
         - "/machine-api-operator"
+        args:
+        - "start"
         resources:
           limits:
             cpu: 20m

Now I'm hitting:

[core@trking-359a0-master-0 ~]$ sudo crictl logs --tail 2 8a523d18e0014
E0924 23:01:50.276908       1 leaderelection.go:228] error initially creating leader election record: namespaces "openshift-machine-api-operator" not found
E0924 23:02:25.979902       1 leaderelection.go:228] error initially creating leader election record: namespaces "openshift-machine-api-operator" not found

wking · 2018-09-24T23:16:16Z

E0924 23:01:50.276908       1 leaderelection.go:228] error initially creating leader election record: namespaces "openshift-machine-api-operator" not found

Looks like #68 and 205721c (#67) made a change from a previous tectonic-system.

enxebre · 2018-09-25T06:57:49Z

hey @wking thanks for looking into it, your steps are right. We just rewrote the implementation logic to drop appVersion and x-operator. This is all it needs to be satisfied by the installer https://github.com/openshift/machine-api-operator#manual-deployment-for-kubernetes-cluster so openshift-machine-api-operator would need to be precreated as well. See also https://github.com/openshift/machine-api-operator/tree/master/tests/e2e/manifests

wking · 2018-10-02T05:00:15Z

It sounds like we're comfortable with the changes I'm proposing here, and the only remaining issues are getting the installer bumped to adjust to other changes that have landed since the last time the installer bumped it's pinned operator version. Is there anything blocking this PR, or anything I can do to help get it landed?

enxebre · 2018-10-08T10:02:17Z

Hey @wking sorry for the dealy for this to work we'll also need to update how we create the volume for each domain https://github.com/openshift/cluster-api-provider-libvirt/blob/2e5a516afc704c6c94d7b7cde74e78c43bbfeaa5/cloud/libvirt/actuators/machine/utils/domain.go#L23
https://github.com/openshift/cluster-api-provider-libvirt/blob/2e5a516afc704c6c94d7b7cde74e78c43bbfeaa5/cloud/libvirt/actuators/machine/utils/volume.go#L222
Tracked here https://jira.coreos.com/browse/CLOUD-217
There are some WIP for adding testing on libvirt and refactoring so we can reliably make changes that might slow this pr down. I'll be more than happy to get this in after openshift/cluster-api-provider-libvirt#38 lands

paulfantom · 2018-10-12T14:09:54Z

/retest

openshift-ci-robot · 2018-10-12T14:19:09Z

@wking: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws	`ea4a450`	link	`/test e2e-aws`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

enxebre · 2018-11-26T08:14:45Z

Closing this as it's only relevant to https://github.com/openshift/cluster-api-provider-libvirt repo now

Makefile: add Go debug support for local binaries

openshift-ci-robot requested review from bison and enxebre September 23, 2018 07:03

openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Sep 23, 2018

wking mentioned this pull request Sep 23, 2018

libvirt: can't create workers with non-default storage path openshift/installer#308

Closed

wking force-pushed the drop-var-lib-libvirt-images branch from 1934183 to ea4a450 Compare September 23, 2018 07:19

enxebre mentioned this pull request Oct 8, 2018

drop volumes full paths openshift/cluster-api-provider-libvirt#45

Closed

wking mentioned this pull request Oct 16, 2018

pkg/asset: Add asset for Worker machinesets openshift/installer#468

Merged

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2018

enxebre closed this Nov 26, 2018

ingvagabund pushed a commit to ingvagabund/machine-api-operator that referenced this pull request Jul 11, 2019

Merge pull request openshift#70 from frobware/add-dbg-support

2e21063

Makefile: add Go debug support for local binaries

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

machines/libvirt/worker.machineset.yaml: Drop /var/lib/libvirt/images #70

machines/libvirt/worker.machineset.yaml: Drop /var/lib/libvirt/images #70

wking commented Sep 23, 2018

openshift-ci-robot commented Sep 23, 2018

enxebre commented Sep 24, 2018 •

edited

Loading

mrogers950 commented Sep 24, 2018

wking commented Sep 24, 2018 •

edited

Loading

wking commented Sep 24, 2018

wking commented Sep 24, 2018

wking commented Sep 24, 2018 •

edited

Loading

enxebre commented Sep 25, 2018 •

edited

Loading

wking commented Oct 2, 2018

enxebre commented Oct 8, 2018 •

edited

Loading

paulfantom commented Oct 12, 2018

openshift-ci-robot commented Oct 12, 2018

enxebre commented Nov 26, 2018

machines/libvirt/worker.machineset.yaml: Drop /var/lib/libvirt/images #70

machines/libvirt/worker.machineset.yaml: Drop /var/lib/libvirt/images #70

Conversation

wking commented Sep 23, 2018

openshift-ci-robot commented Sep 23, 2018

enxebre commented Sep 24, 2018 • edited Loading

mrogers950 commented Sep 24, 2018

wking commented Sep 24, 2018 • edited Loading

wking commented Sep 24, 2018

wking commented Sep 24, 2018

wking commented Sep 24, 2018 • edited Loading

enxebre commented Sep 25, 2018 • edited Loading

wking commented Oct 2, 2018

enxebre commented Oct 8, 2018 • edited Loading

paulfantom commented Oct 12, 2018

openshift-ci-robot commented Oct 12, 2018

enxebre commented Nov 26, 2018

enxebre commented Sep 24, 2018 •

edited

Loading

wking commented Sep 24, 2018 •

edited

Loading

wking commented Sep 24, 2018 •

edited

Loading

enxebre commented Sep 25, 2018 •

edited

Loading

enxebre commented Oct 8, 2018 •

edited

Loading