Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kola/switch-kernel: add a test for switching between rt and default kernel #1218

Merged
merged 1 commit into from
Mar 9, 2020

Conversation

zonggen
Copy link
Member

@zonggen zonggen commented Mar 5, 2020

Adds a sub-command that uploads the RT Kernel RPMs to a machine and switch between kernels using rpm-ostree. The logic used to switch between Default Kernel and RT Kernel is defined in https://github.com/openshift/machine-config-operator/blob/f363c7be6d2d506d900e196fa2e2d05ca08b93b6/pkg/daemon/update.go#L651.

Closes: https://issues.redhat.com/browse/GRPA-1439
Signed-off-by: Allen Bai abai@redhat.com

@zonggen
Copy link
Member Author

zonggen commented Mar 6, 2020

cc @arithx
Could you take a second look to see it's reasonable?

EDIT: *if it's reasonable

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall seems sane! That said, I am hoping to push #1215 over the line soon into supporting exactly this type of thing. Basically it needs:

  • Support for injecting a directory of files too

And I didn't yet test handling reboots, but the idea is to do so.


err := dropRpmFilesAll(m, rtKernelRpmDir)
if err != nil {
return fmt.Errorf("Failed dropping Kernel RT RPM files: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In new code let's use errors.Wrapf().

@zonggen zonggen changed the title [WIP] kola/switch-kernel: add a test for switching between rt and default kernel kola/switch-kernel: add a test for switching between rt and default kernel Mar 6, 2020
@zonggen
Copy link
Member Author

zonggen commented Mar 6, 2020

sample run & output
[coreos-assembler]$ /coreos-assembler/mantle/bin/kola switch-kernel --kernel-rt ./kernel-rt -b rhcos --ignition-version v2 --qemu-image rhcos-44.81.202003021844-0-qemu.x86_64.qcow2 
Dropping RT Kernel RPMs...
Switching from Default to RT Kernel...
+ FROM_KERNEL=default
+ TO_KERNEL=rt-kernel
+ DEFAULT_KERNEL_PKG='kernel kernel-core kernel-modules kernel-modules-extra'
+ RT_KERNEL_PKG='kernel-rt-core kernel-rt-modules kernel-rt-modules-extra'
+ [[ default == \d\e\f\a\u\l\t ]]
+ [[ rt-kernel == \r\t\-\k\e\r\n\e\l ]]
+ RT_KERNEL_REPO=/var/home/core/kernel-rt-rpms/
++ ls /var/home/core/kernel-rt-rpms/
+ [[ -z kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm
kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm
kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm ]]
+ ARGS='override remove kernel kernel-core kernel-modules kernel-modules-extra'
++ ls /var/home/core/kernel-rt-rpms/
+ for RPM in $(ls ${RT_KERNEL_REPO})
+ ARGS+=' --install /var/home/core/kernel-rt-rpms//kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm'
+ for RPM in $(ls ${RT_KERNEL_REPO})
+ ARGS+=' --install /var/home/core/kernel-rt-rpms//kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm'
+ for RPM in $(ls ${RT_KERNEL_REPO})
+ ARGS+=' --install /var/home/core/kernel-rt-rpms//kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm'
+ rpm-ostree override remove kernel kernel-core kernel-modules kernel-modules-extra --install /var/home/core/kernel-rt-rpms//kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm --install /var/home/core/kernel-rt-rpms//kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm --install /var/home/core/kernel-rt-rpms//kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm
Checking out tree a35c3a6...done
Enabled rpm-md repositories: rhel8-baseos rhel8-appstream rhel8-rt
Updating metadata for 'rhel8-baseos'...done
rpm-md repo 'rhel8-baseos'; generated: 2020-02-27T15:31:54Z
Updating metadata for 'rhel8-appstream'...done
rpm-md repo 'rhel8-appstream'; generated: 2020-03-04T17:20:00Z
Updating metadata for 'rhel8-rt'...done
rpm-md repo 'rhel8-rt'; generated: 2020-02-25T05:36:45Z
Importing rpm-md...done
Resolving dependencies...done
Applying 4 overrides and 3 overlays
Processing packages...done
Running pre scripts...done
Running post scripts...done
Running posttrans scripts...done
Writing rpmdb...done
Generating initramfs...done
Writing OSTree commit...done
Staging deployment...done
Removed:
  kernel-4.18.0-147.5.1.el8_1.x86_64
  kernel-core-4.18.0-147.5.1.el8_1.x86_64
  kernel-modules-4.18.0-147.5.1.el8_1.x86_64
  kernel-modules-extra-4.18.0-147.5.1.el8_1.x86_64
Added:
  kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64
  kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64
  kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64
Run "systemctl reboot" to start a reboot
Rebooting machine...
Checking kernel type...
Switched to RT Kernel successfully!
Switching from RT to Default Kernel...
+ FROM_KERNEL=rt-kernel
+ TO_KERNEL=default
+ DEFAULT_KERNEL_PKG='kernel kernel-core kernel-modules kernel-modules-extra'
+ RT_KERNEL_PKG='kernel-rt-core kernel-rt-modules kernel-rt-modules-extra'
+ [[ rt-kernel == \d\e\f\a\u\l\t ]]
+ [[ rt-kernel == \r\t\-\k\e\r\n\e\l ]]
+ [[ default == \d\e\f\a\u\l\t ]]
+ ARGS='override reset kernel kernel-core kernel-modules kernel-modules-extra'
+ for PKG in $RT_KERNEL_PKG
+ ARGS+=' --uninstall kernel-rt-core'
+ for PKG in $RT_KERNEL_PKG
+ ARGS+=' --uninstall kernel-rt-modules'
+ for PKG in $RT_KERNEL_PKG
+ ARGS+=' --uninstall kernel-rt-modules-extra'
+ rpm-ostree override reset kernel kernel-core kernel-modules kernel-modules-extra --uninstall kernel-rt-core --uninstall kernel-rt-modules --uninstall kernel-rt-modules-extra
Staging deployment...done
Removed:
  kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64
  kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64
  kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64
Added:
  kernel-4.18.0-147.5.1.el8_1.x86_64
  kernel-core-4.18.0-147.5.1.el8_1.x86_64
  kernel-modules-4.18.0-147.5.1.el8_1.x86_64
  kernel-modules-extra-4.18.0-147.5.1.el8_1.x86_64
Run "systemctl reboot" to start a reboot
Rebooting machine...
Checking kernel type...
Switched back to Default Kernel successfully!

}
}`, base64.StdEncoding.EncodeToString([]byte(switchKernelScript))))

flight, err := kola.NewFlight("qemu-unpriv")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing intrinsically specific to qemu about this, right? I think you can use kola.NewFlight(kolaPlatform) and drop the kolaPlatform check above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes that makes sense.

@arithx
Copy link
Contributor

arithx commented Mar 7, 2020

Seems reasonable to me; I agree with @cgwalters' comments but after those are fixed should be fine.

@zonggen
Copy link
Member Author

zonggen commented Mar 9, 2020

Switched to errors.Wrapf and de-hardcoded qemu-unpriv.

Rebased and squashed into one commit.


flight, err := kola.NewFlight(kolaPlatform)
if err != nil {
return errors.Wrapf(err, "Flight failed: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nice thing about errors.Wrapf is you don't need to do the : %v", err); it does that automatically. And it also preserves the underlying error so it's not turned into a string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Search for other places in the code that use it)

@zonggen
Copy link
Member Author

zonggen commented Mar 9, 2020

Fixed : %v", err in errors.Wrapf and updated the error messages to be consistent with other error messages.

fmt.Println("Checking kernel type...")
cmd = "uname -v | grep -q 'PREEMPT RT'"
_, _, err = m.SSH(cmd)
if err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be err != nil right?

Hmm...I think this test is passing because you're missing a _ in PREEMPT_RT so both tests are inverted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checked inside a rhcos machine..

[core@ibm-p8-kvm-03-guest-02 ~]$ uname -v 
#1 SMP PREEMPT RT Tue Jan 14 16:03:46 UTC 2020

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK. But then I think we should invert these checks, because if err == nil is a HUGE trap in Go, it just looks like a typo.

And specifically here the error.Wrapf() is wrong for this one because err is nil, so it will return nil, and the rest of the code will think no error occurred.

Copy link
Member Author

@zonggen zonggen Mar 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The make check ran by openshift-ci-rebot didn't run switchkernel.go at all, it only ran kola --help for sanity check.

Yes, it should be if err != nil, sorry should've mentioned earlier.. and I confirmed on RHCOS the original test did not produce expected result (because error.Wrapf() was hiding the nil error)


// check if the kernel has switched back to default kernel
fmt.Println("Checking kernel type...")
cmd = "! $(uname -v | grep -q 'PREEMPT RT')"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to use grep -v then ! grep - it more clearly expresses intent (and ensures that you still do get an error if e.g. an input file doesn't exist or grep got OOM killed or whatever).

And see above too about the _.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with grep -v and tested on a RHCOS machine, worked without issue

cmd := "sudo " + homeDir + "/switch-kernel.sh rt-kernel default"
stdout, stderr, err := m.SSH(cmd)
if err != nil {
return errors.Wrapf(err, "failed to run %", cmd)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd/kola/switchkernel.go:206:10: Wrapf format % is missing verb at end of string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing the s of the %s in the format string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I wanted to make a note I missed that

Supports --rpm-dir option for uploading the rt kernel rpms into the
cluster machine and adds a test that switches to target kernel with rpm-ostree, using logic
defined in https://github.com/openshift/machine-config-operator/blob/f363c7be6d2d506d900e196fa2e2d05ca08b93b6/pkg/daemon/update.go#L651.

Closes: https://issues.redhat.com/browse/GRPA-1439
Signed-off-by: Allen Bai <abai@redhat.com>

kola: add a switch-kernel sub-command

Instead of adding a direct test under kola/tests,
adds a sub-command that uploads the specified
kernel rt rpms and switch to RT Kernel and switch
back with rpm-ostree.

Closes: https://issues.redhat.com/browse/GRPA-1439
Signed-off-by: Allen Bai <abai@redhat.com>

kola/switchkernel: use wrapf and drop hardcoded platform id

Drops hardcoded id "qemu-unpriv" and sets to global kolaPlatform.
Also switches to "errors.Wrapf" for better error context and stack
trace info.

Signed-off-by: Allen Bai <abai@redhat.com>

kola/switchkernel: remove use of '%v, err'

Signed-off-by: Allen Bai <abai@redhat.com>
@zonggen
Copy link
Member Author

zonggen commented Mar 9, 2020

Updated and tested locally

Output
[coreos-assembler]$ /coreos-assembler/mantle/bin/kola switch-kernel -b rhcos --ignition-version v2 --qemu-image rhcos-44.81.202003021844-0-qemu.x86_64.qcow2 --kernel-rt kernel-/
Dropping RT Kernel RPMs...
Switching from Default to RT Kernel...
+ FROM_KERNEL=default
+ TO_KERNEL=rt-kernel
+ DEFAULT_KERNEL_PKG='kernel kernel-core kernel-modules kernel-modules-extra'
+ RT_KERNEL_PKG='kernel-rt-core kernel-rt-modules kernel-rt-modules-extra'
+ [[ default == \d\e\f\a\u\l\t ]]
+ [[ rt-kernel == \r\t\-\k\e\r\n\e\l ]]
+ RT_KERNEL_REPO=/var/home/core/kernel-rt-rpms/
++ ls /var/home/core/kernel-rt-rpms/
+ [[ -z kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm
kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm
kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm ]]
+ ARGS='override remove kernel kernel-core kernel-modules kernel-modules-extra'
++ ls /var/home/core/kernel-rt-rpms/
+ for RPM in $(ls ${RT_KERNEL_REPO})
+ ARGS+=' --install /var/home/core/kernel-rt-rpms//kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm'
+ for RPM in $(ls ${RT_KERNEL_REPO})
+ ARGS+=' --install /var/home/core/kernel-rt-rpms//kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm'
+ for RPM in $(ls ${RT_KERNEL_REPO})
+ ARGS+=' --install /var/home/core/kernel-rt-rpms//kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm'
+ rpm-ostree override remove kernel kernel-core kernel-modules kernel-modules-extra --install /var/home/core/kernel-rt-rpms//kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64.m
Checking out tree a35c3a6...done
Enabled rpm-md repositories: rhel8-baseos rhel8-appstream rhel8-rt
Updating metadata for 'rhel8-baseos'...done
rpm-md repo 'rhel8-baseos'; generated: 2020-02-27T15:31:54Z
Updating metadata for 'rhel8-appstream'...done
rpm-md repo 'rhel8-appstream'; generated: 2020-03-04T17:20:00Z
Updating metadata for 'rhel8-rt'...done
rpm-md repo 'rhel8-rt'; generated: 2020-02-25T05:36:45Z
Importing rpm-md...done
Resolving dependencies...done
Applying 4 overrides and 3 overlays
Processing packages...done
Running pre scripts...done
Running post scripts...done
Running posttrans scripts...done
Writing rpmdb...done
Generating initramfs...done
Writing OSTree commit...done
Staging deployment...done
Removed:
  kernel-4.18.0-147.5.1.el8_1.x86_64
  kernel-core-4.18.0-147.5.1.el8_1.x86_64
  kernel-modules-4.18.0-147.5.1.el8_1.x86_64
  kernel-modules-extra-4.18.0-147.5.1.el8_1.x86_64
Added:
  kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64
  kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64
  kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64
Run "systemctl reboot" to start a reboot
Rebooting machine...
Checking kernel type...
Switched to RT Kernel successfully!
Switching from RT to Default Kernel...
+ FROM_KERNEL=rt-kernel
+ TO_KERNEL=default
+ DEFAULT_KERNEL_PKG='kernel kernel-core kernel-modules kernel-modules-extra'
+ RT_KERNEL_PKG='kernel-rt-core kernel-rt-modules kernel-rt-modules-extra'
+ [[ rt-kernel == \d\e\f\a\u\l\t ]]
+ [[ rt-kernel == \r\t\-\k\e\r\n\e\l ]]
+ [[ default == \d\e\f\a\u\l\t ]]
+ ARGS='override reset kernel kernel-core kernel-modules kernel-modules-extra'
+ for PKG in $RT_KERNEL_PKG
+ ARGS+=' --uninstall kernel-rt-core'
+ for PKG in $RT_KERNEL_PKG
+ ARGS+=' --uninstall kernel-rt-modules'
+ for PKG in $RT_KERNEL_PKG
+ ARGS+=' --uninstall kernel-rt-modules-extra'
+ rpm-ostree override reset kernel kernel-core kernel-modules kernel-modules-extra --uninstall kernel-rt-core --uninstall kernel-rt-modules --uninstall kernel-rt-modules-extra
Staging deployment...done
Removed:
  kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64
  kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64
  kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64
Added:
  kernel-4.18.0-147.5.1.el8_1.x86_64
  kernel-core-4.18.0-147.5.1.el8_1.x86_64
  kernel-modules-4.18.0-147.5.1.el8_1.x86_64
  kernel-modules-extra-4.18.0-147.5.1.el8_1.x86_64
Run "systemctl reboot" to start a reboot
Rebooting machine...
Checking kernel type...
Switched back to Default Kernel successfully!

@cgwalters
Copy link
Member

/lgtm

@zonggen
Copy link
Member Author

zonggen commented Mar 9, 2020

Thank you @cgwalters, merging

@zonggen zonggen merged commit 28b6aaa into master Mar 9, 2020
@zonggen zonggen deleted the test-switch-kernel branch March 9, 2020 19:29
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, zonggen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cgwalters
Copy link
Member

Thank you @cgwalters, merging

Currently this repo uses Prow, so the bot will merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants