Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kola/switch-kernel: rpm-ostree fails to switch from Default to RT Kernel #1245

Closed
zonggen opened this issue Mar 13, 2020 · 9 comments
Closed
Assignees
Labels
bug Something isn't working jira for syncing to jira

Comments

@zonggen
Copy link
Member

zonggen commented Mar 13, 2020

Bug Report

Environment

What operating system is being used to run coreos-assembler?

Fedora 30

What operating system is being assembled?

RHCOS

Is coreos-assembler running in Podman or Docker?

Podman

If Podman, is coreos-assembler running privileged or unprivileged?

Privileged

Expected Behavior

rpm-ostree command successfully switched kernel from default to rt kernel with command:
rpm-ostree override remove kernel kernel-core kernel-modules kernel-modules-extra --install kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm --install kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm --install kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm

Actual Behavior

+ rpm-ostree override remove kernel kernel-core kernel-modules kernel-modules-extra --install ./kernel-rt/kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm --install ./kernel-rt/kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm --install ./kernel-rt/kernel-rt-modules-m
Checking out tree ffd5b3c... done
Enabled rpm-md repositories: rhel8-baseos rhel8-appstream rhel8-rt
rpm-md repo 'rhel8-baseos' (cached); generated: 2020-02-27T15:31:54Z
rpm-md repo 'rhel8-appstream' (cached); generated: 2020-03-13T13:31:29Z
rpm-md repo 'rhel8-rt' (cached); generated: 2020-02-25T05:36:45Z
Importing rpm-md... done
Resolving dependencies... done
Applying 4 overrides and 4 overlays
Processing packages... done
Running pre scripts... done
Running post scripts... done
Running posttrans scripts... done
Writing rpmdb... done
error: Multiple subdirectories found in: usr/lib/modules

Reproduction Steps

  1. cosa kola switch-kernel -b rhcos --ignition-version v2 --kernel-rt ./kernel-rt
  2. ...

Other Information

Investigated a bit and found https://bugzilla.redhat.com/show_bug.cgi?id=1767215, which seems related. I've tried manually run the above rpm-ostree command inside RHCOS and the same behavior happened. And the origin of the error message is https://github.com/coreos/rpm-ostree/blob/2ee48c51fede72f1f0394c070c0f35946f3e1839/src/libpriv/rpmostree-kernel.c#L141, which only triggers when the directory /usr/lib/modules contains more than one sub-directories. But again,

[core@master-2 ~]$ ll /usr/lib/modules
total 4
drwxr-xr-x. 7 root root 4096 Jan  1  1970 4.18.0-147.el8.x86_64

This error did not occur when #1218 got merged. Am I missing anything..?

@miabbott
Copy link
Member

@jlebon @cgwalters This looks like an rpm-ostree issue at the core...the only thing that jumped out in a search over there was coreos/rpm-ostree#1933

@jlebon
Copy link
Member

jlebon commented Mar 16, 2020

Yup, agreed this is likely an rpm-ostree problem. Will look into this.

@jlebon jlebon added the bug Something isn't working label Mar 16, 2020
@jlebon jlebon self-assigned this Mar 16, 2020
@jlebon
Copy link
Member

jlebon commented Mar 16, 2020

Hmm actually I can't reproduce this locally on a fresh RHCOS build. Both running rpm-ostree override remove directly and via cosa kola switch-kernel.

What RHCOS are you testing this on?

@zonggen
Copy link
Member Author

zonggen commented Mar 16, 2020

Did fresh builds on two different machines, and ran cosa kola switch-kernel inside the cosa container..
Will try again tomorrow morning to see if it works

@zonggen
Copy link
Member Author

zonggen commented Mar 17, 2020

So I've updated src/config and the error message went away.
Though the rpm-ostree commands are now running without issue, cosa kola switch-kernel will sometimes fail at the second stage (switching RT back to Default) with error message:

Error: failed switch kernel test: failed switching from RT to Default Kernel: failed to run uname -v | grep -qv 'PREEMPT RT': Process exited with status 1

, same as observed in Jenkins pipeline (https://jenkins-rhcos-art.cloud.privileged.psi.redhat.com/job/rhcos-art-rhcos-4.5/76/console).

Since the related error is now gone, should we close this issue?

@jlebon jlebon removed their assignment Mar 17, 2020
@jlebon
Copy link
Member

jlebon commented Mar 17, 2020

Hmm yeah that's a different issue. No issues reusing this ticket if you'd prefer. Maybe try to run the same commands manually yourself until you hit the error? The kola SSH wrappers might be swallowing stderr.

@c4rt0 c4rt0 added the jira for syncing to jira label Oct 23, 2023
@jlebon
Copy link
Member

jlebon commented Apr 29, 2024

This is a pretty old issue. Two things:

  1. We should delete kola switch-kernel and make this a regular kola test instead (ideally external).
  2. Another way to switch kernels now is via layering, though that test is currently also broken: [rhel-9.4 variant] ext.config.rpm-ostree.replace-rt-kernel fails openshift/os#1383. Ideally, we need to fix that too since it's only going to be more relevant going forward. That said, it still makes sense to test the client-side rpm-ostree override replace flow since that's still what the MCO does today.

The challenge with (1) is that this requires some support on the kola side because we need access to the kernel-rt RPMs. Those RPMs are now shipped as part of the extensions container. We could have a kola test tag like extensions-container which will tell kola to copy in the extensions container into the VM. One tricky bit there is that the extensions container is generated later in the pipeline, so it won't be available on the initial kola run we do. We'd have to add it near the kola testiso run we do instead, which happens after all artifacts are generated.

@jlebon
Copy link
Member

jlebon commented May 9, 2024

Another way to switch kernels now is via layering, though that test is currently also broken: openshift/os#1383. Ideally, we need to fix that too since it's only going to be more relevant going forward. That said, it still makes sense to test the client-side rpm-ostree override replace flow since that's still what the MCO does today.

Sorry, this is incorrect. openshift/os#1383 doesn't use the layering flow, but also does it client-side.

The layering test lives in FCOS: https://github.com/coreos/fedora-coreos-config/blob/832c42ba3f406f88647621300aeecde30e9d14ef/tests/kola/rpm-ostree/kernel-replace. So then ideally, we generalize that test so it can work on both FCOS and SCOS/RHCOS.

c4rt0 added a commit to c4rt0/coreos-assembler that referenced this issue Jun 26, 2024
c4rt0 added a commit to c4rt0/fedora-coreos-config that referenced this issue Jul 2, 2024
c4rt0 added a commit to c4rt0/fedora-coreos-config that referenced this issue Jul 5, 2024
c4rt0 added a commit to c4rt0/fedora-coreos-config that referenced this issue Jul 5, 2024
c4rt0 added a commit to c4rt0/fedora-coreos-config that referenced this issue Jul 8, 2024
c4rt0 added a commit to c4rt0/fedora-coreos-config that referenced this issue Jul 9, 2024
c4rt0 added a commit to c4rt0/fedora-coreos-config that referenced this issue Jul 10, 2024
c4rt0 added a commit to c4rt0/fedora-coreos-config that referenced this issue Jul 10, 2024
c4rt0 added a commit to c4rt0/fedora-coreos-config that referenced this issue Jul 10, 2024
c4rt0 added a commit to c4rt0/fedora-coreos-config that referenced this issue Jul 10, 2024
c4rt0 added a commit to c4rt0/fedora-coreos-config that referenced this issue Jul 10, 2024
c4rt0 added a commit to c4rt0/fedora-coreos-config that referenced this issue Jul 15, 2024
In one of the old [issues](coreos/coreos-assembler#1245 (comment)) kola switch-kernel tests were failing. In the discussion @jlebon [suggested](coreos/coreos-assembler#1245 (comment)) to remove named test and make it external, so that it can be utilized by FCOS, and also SCOS/RHCOS. Aditionally kernel version for FCOS was bumped in this PR to the latest stable 6.9.8-200.fc40
c4rt0 added a commit to c4rt0/fedora-coreos-config that referenced this issue Jul 15, 2024
In one of the older issues, kola switch-kernel test was failing. In the discussion @jlebon [suggested](coreos/coreos-assembler#1245 (comment)) to remove named test and make it external, so that it can be utilized by FCOS, and also SCOS/RHCOS. Aditionally kernel version for FCOS was bumped in this PR to the latest stable 6.9.8-200.fc40

See: coreos/coreos-assembler#1245
c4rt0 added a commit to c4rt0/fedora-coreos-config that referenced this issue Jul 15, 2024
In one of the older issues, kola switch-kernel test was failing. In the discussion @jlebon suggested to remove named test and make it external, so that it can be utilized by FCOS, and also SCOS/RHCOS.
Additionally, the kernel version for FCOS was updated in this PR to the latest stable version `6.9.8-200.fc40`.

See: coreos/coreos-assembler#1245
@jlebon
Copy link
Member

jlebon commented Jul 15, 2024

Let's close this one. The command was removed in #3825 in favour of external tests.

Relatedly, @c4rt0 is working on generalizing the existing layering test that we have in f-c-c: coreos/fedora-coreos-config#3048

@jlebon jlebon closed this as completed Jul 15, 2024
c4rt0 added a commit to c4rt0/fedora-coreos-config that referenced this issue Jul 16, 2024
In one of the older issues, kola switch-kernel test was failing. In the discussion @jlebon suggested to remove named test and make it external, so that it can be utilized by FCOS, and also SCOS/RHCOS.
Additionally, the kernel version for FCOS was updated in this PR to the latest stable version `6.9.8-200.fc40`.

See: coreos/coreos-assembler#1245
c4rt0 added a commit to c4rt0/fedora-coreos-config that referenced this issue Jul 16, 2024
In one of the older issues, kola switch-kernel test was failing. In the discussion @jlebon suggested to remove named test and make it external, so that it can be utilized by FCOS, and also SCOS/RHCOS.
Additionally, the kernel version for FCOS was updated in this PR to the latest stable version `6.9.8-200.fc40`.

See: coreos/coreos-assembler#1245
c4rt0 added a commit to coreos/fedora-coreos-config that referenced this issue Jul 16, 2024
In one of the older issues, kola switch-kernel test was failing. In the discussion @jlebon suggested to remove named test and make it external, so that it can be utilized by FCOS, and also SCOS/RHCOS.
Additionally, the kernel version for FCOS was updated in this PR to the latest stable version `6.9.8-200.fc40`.

See: coreos/coreos-assembler#1245
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working jira for syncing to jira
Projects
None yet
Development

No branches or pull requests

5 participants