Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VSHA-536 Support virtio #56

Merged
merged 1 commit into from
Feb 14, 2023
Merged

VSHA-536 Support virtio #56

merged 1 commit into from
Feb 14, 2023

Conversation

rustydb
Copy link
Contributor

@rustydb rustydb commented Feb 8, 2023

Summary and Scope

  • Fixes: VSHA-536

Issue Type

  • Bugfix Pull Request
  • RFE Pull Request

Change drive inclusion behavior from keying off of the TRANSPORT value of lsblk to the SUBSYSTEM. The TRANSPORT value makes it easy to see which bus is being utilized by a disk, however this value is NULL (empty) for virtio devices.

In order to include virtio devices, such as devices used by Google Cloud, we can key off of the SUBSYSTEM value instead. The SUBSYSTEM value provides similar information, allowing us to continue choosing sas, sata, nvme, and scsi while excluding usb devices.

Some refactoring is included in this change to fix variables that were (in some cases) undefined.

Fixes metal_resolve_disk, which was returning disks even if they failed the conditional check.

Add metal-log.sh This hook will copy any and all current and future log files made by metal in /tmp, presuming any new logs start with ^metal and end with .log. Currently this produces a just a handful of logs:

ncn-s001:~ # ll /var/log/metal/bootstrap/
total 16
-rw-r--r-- 1 root root   39 Feb 14 04:09 metal_md_exit.log
-rw-r--r-- 1 root root 9797 Feb 14 04:09 pave.log

Prerequisites

  • I have included documentation in my PR (or it is not required)
  • I tested this on internal system (if yes, please include results or a description of the test)
  • I tested this on a vshasta system (if yes, please include results or a description of the test)

Testing for VSHA-536:

  • Verified metal and vshasta k8s-masters, k8s-workers, and storage-ceph nodes successfully deployed with all expected, respective partitions
  • Verified metal and vshasta k8s-masters, k8s-workers, and storage-ceph nodes successfully redeployed with all expected, respective partitions (rebooting with metal.no-wipe=0 after already deploying yielded a fresh rebuild)
  • Verified metal and vshasta k8s-masters, k8s-workers, and storage-ceph nodes successfully produced crash dumps (kdump)
  • Verified metal k8s-masters, k8s-workers, and storage-ceph nodes successfully disk booted
  • Verified metal and vshasta k8s-masters, k8s-workers, and storage-ceph nodes successfully disk-rebooted and mounted all expected, respective partitions
  • Verified metal and vshasta k8s-masters, k8s-workers, and storage-ceph nodes successfully net-rebooted with metal.no-wipe=1 and mounted all expected, respective partitions
  • Verified vshasta PXE disks were preserved during the auto-wipe
  • Verified all existing hardware types plus Google's standard disk produced the desired partitions (scsi : sata, sas, and virtio; and nvme)
  • Verified the pre-install-toolkit successfully crash dumped (kdump) on USB (skipping remoteISO)

I used redbull and a VshastaV2 machine to test my changes and verify parity between metal and vshasta.

Idempotency

Redeploying and rebooting (net and disk) both produced the same behavior.

Risks and Mitigations

This updates the disk selection logic, while leaving the disk formatting/partitioning logic alone. Existing partition expectations are unchanged.

@rustydb rustydb requested a review from erl-hpe February 8, 2023 03:52
@rustydb rustydb force-pushed the VSHA-536 branch 15 times, most recently from 9c70346 to 1a9a038 Compare February 13, 2023 04:22
@rustydb rustydb force-pushed the VSHA-536 branch 6 times, most recently from 40a71a0 to 604175b Compare February 13, 2023 20:12
@rustydb
Copy link
Contributor Author

rustydb commented Feb 14, 2023

This has one issue I want to fix before merging, this prints on disk boots (when there are no logs to copy):

[    9.895545] dracut-pre-pivot[2195]: cp: cannot stat '/var/log/metal/*': No such file or directory

Change drive inclusion behavior from keying off of the `TRANSPORT` value of `lsblk` to the `SUBSYSTEM`. The `TRANSPORT` value
makes it easy to see which bus is being utilized by a disk, however this value is NULL (empty) for `virtio` devices.

In order to include `virtio` devices, such as devices used by Google Cloud, we can key off of the `SUBSYSTEM` value instead. The `SUBSYSTEM` value
provides similar information, allowing us to continue choosing `sas`, `sata`, `nvme`, and `scsi` while excluding `usb` devices.

Some refactoring is included in this change to fix variables that were (in some cases) undefined.

Fixes `metal_resolve_disk`, which was returning disks even if they failed the conditional check.

Add `metal-log.sh` This hook will copy any and all current and future log files made by
metal in `/tmp`, presuming any new logs start with `^metal` and end with
`.log`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants