OCPBUGS-8113: daemon: Make switchKernel less stateful #3580

cgwalters · 2023-03-03T16:57:24Z

daemon: Clean up switchKernel a bit

De-duplicate calls to canonicalizeKernelType to make the
logic easier to read. Also add a few comments.

vendor: Bump coreos/rpm-ostree-client-go

In prep for usage in MCD.

daemon: Make switchKernel less stateful

This is prep for fixing RHEL9 upgrades while maintaining kernel-rt.

Previously the switchKernel logic tried to carefully handle
all 4 cases (default -> default, default -> rt, rt -> default, rt -> rt).

But, the last one (rt -> rt) was not quite right because
the previous rpm-ostree rebase command already preserved the previous
kernel. In fact it was pretty expensive to do things this way
because we'd e.g. regenerate the initramfs twice.

To say this another way: when doing a RHEL9 update, it's actually
the first rpm-ostree rebase command which fails before we
even get to switchKernel.

And the reason is due to the introduction of a new -core subpackage;
xref https://issues.redhat.com/browse/OCPBUGS-8113

So here's the new logic to handle this:

Before we do the rebase operation to the new OS, we detect
any previous overrides of any packages starting with kernel-rt
and we remove them. Notably this avoids hardcoding any specific
kernel subpackages; we just remove everything starting with
kernel-rt which should be more robust to subpackage changes
in the future.
Consequently the rebase operation will hence start out by deploying the
stock image i.e. with throughput kernel (though note we are
carefully preserving other local overrides)
The switchKernel function now longer needs to take the previous
machineconfig state into account (except for logging).
Instead, we just detect if the target is RT, and if so we then we
apply the latest packages.

This significantly simplifies the logic in switchKernel, and will
help fix RHEL9 upgrades.

cgwalters · 2023-03-03T18:26:04Z

Now that we've branched, we can benefit from the fact that we can land PRs like this in master with much lower risk/impact. Once (OK, openshift/release#36937 just landed) so let's give that a go:

/test e2e-gcp-ovn-rt-upgrade

openshift-ci · 2023-03-03T18:26:17Z

@cgwalters: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test 4.12-upgrade-from-stable-4.11-images
/test cluster-bootimages
/test e2e-aws-ovn
/test e2e-aws-ovn-upgrade
/test e2e-gcp-op
/test images
/test okd-scos-images
/test unit
/test verify

The following commands are available to trigger optional jobs:

/test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade
/test bootstrap-unit
/test e2e-alibabacloud-ovn
/test e2e-aws-disruptive
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-fips-op
/test e2e-aws-ovn-workers-rhel8
/test e2e-aws-proxy
/test e2e-aws-serial
/test e2e-aws-single-node
/test e2e-aws-upgrade-single-node
/test e2e-aws-workers-rhel8
/test e2e-azure
/test e2e-azure-ovn-upgrade
/test e2e-azure-upgrade
/test e2e-gcp-op-single-node
/test e2e-gcp-rt
/test e2e-gcp-rt-op
/test e2e-gcp-single-node
/test e2e-gcp-upgrade
/test e2e-hypershift
/test e2e-metal-assisted
/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-ipv6
/test e2e-openstack
/test e2e-openstack-externallb-techpreview
/test e2e-openstack-parallel
/test e2e-ovirt
/test e2e-ovirt-upgrade
/test e2e-ovn-step-registry
/test e2e-vsphere
/test e2e-vsphere-upgrade
/test e2e-vsphere-upi
/test okd-e2e-aws
/test okd-e2e-gcp-op
/test okd-e2e-upgrade
/test okd-e2e-vsphere
/test okd-images
/test okd-scos-e2e-aws-ovn
/test okd-scos-e2e-gcp-op
/test okd-scos-e2e-gcp-ovn-upgrade
/test okd-scos-e2e-vsphere

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-machine-config-operator-master-e2e-alibabacloud-ovn
pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn
pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-upgrade
pull-ci-openshift-machine-config-operator-master-e2e-gcp-op
pull-ci-openshift-machine-config-operator-master-e2e-hypershift
pull-ci-openshift-machine-config-operator-master-images
pull-ci-openshift-machine-config-operator-master-okd-images
pull-ci-openshift-machine-config-operator-master-okd-scos-e2e-aws-ovn
pull-ci-openshift-machine-config-operator-master-okd-scos-e2e-gcp-ovn-upgrade
pull-ci-openshift-machine-config-operator-master-okd-scos-images
pull-ci-openshift-machine-config-operator-master-unit
pull-ci-openshift-machine-config-operator-master-verify

In response to this:

Now that we've branched, we can benefit from the fact that we can land PRs like this in master with much lower risk/impact. Once (OK, openshift/release#36937 just landed) so let's give that a go:

/test e2e-gcp-ovn-rt-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cgwalters · 2023-03-03T18:29:30Z

/test e2e-gcp-ovn-rt-upgrade

cheesesashimi · 2023-03-03T19:02:08Z

pkg/daemon/update.go


-	if canonicalizeKernelType(oldConfig.Spec.KernelType) == ctrlcommon.KernelTypeRealtime && canonicalizeKernelType(newConfig.Spec.KernelType) == ctrlcommon.KernelTypeDefault {
+	switchingToThroughput := oldKtype == ctrlcommon.KernelTypeRealtime && newKtype == ctrlcommon.KernelTypeDefault


question: Wouldn't it be clearer to use switchingToDefault instead of switchingToThroughput?

I'm a bit confused on where throughput comes from.

I'm a bit confused on where throughput comes from.

Yeah sorry that's just me having scars from years of people saying e.g. "normal RHEL" with the implication that e.g. RHEL CoreOS is not-normal. (Or people say "normal Fedora" etc. versus "Silverblue"). Or less pejoratively they say "default RHEL"...which isn't bad but is also not super descriptive because, hey maybe one day what the default is changes 😉

From the kernel side you could certainly say kernel is the default. But it is really about latency (kernel-rt) versus throughput (kernel). And I personally find this is a better description.

(Also IMO the "realtime" kernel is a bit of a misnomer because it's really soft real-time which is actually quite different from hard real time, so I personally think calling it "latency optimized" is better)

And to expand on this, personally I'd rename kernel -> kernel-throughput-optimized and kernel-rt -> kernel-latency-optimized, I'd also rename "rhel coreos" => "rhel (image mode)" and most cruically "rhel" => "rhel (package mode)" - but only where it matters; otherwise they're both just RHEL. Just like how both kernel-throughput-optimized and kernel-latency-optimized are both really just Linux (aka kernel) in different modes.

Or to say it another way, both are normal. We don't strictly think of one as "default" even. They both have qualifiers, but only where it matters.

And to expand even more, of course today we say "OpenShift" and "Hypershift" - with the implication that the latter is the different/not-normal case. I've even seen people refer to current OpenShift as "normal" OpenShift! But in fact "hypershift" is (and should be!) well on its way to becoming the default, so it's also like OpenShift => OpenShift (standalone) and Hypershift => OpenShift (hosted control plane) etc.

But all this aside, actually I was not consistent in trying to use "throughput" instead of "default" and the top patch stops using "throughput" anyways 😄

Thanks for the very detailed clarification! Naming things is hard and even when we think it's easy, overloaded names, changing contexts, history, etc. all make things even more complicated. I'm OK with the name throughput now.

pkg/daemon/update.go

cheesesashimi · 2023-03-03T19:11:23Z

Overall this seems reasonable. I just have two minor concerns that might help clarify things that I've put inline. The first is where the word "throughput" came from. And the second, is what looks like an unfinished comment.

The only other (non-blocking) thought is my surprise with how many dependencies were bumped solely from bumping coreos/rpmostree-client-go.

cgwalters · 2023-03-03T19:36:14Z

The only other (non-blocking) thought is my surprise with how many dependencies were bumped solely from bumping coreos/rpmostree-client-go.

Yeah, I think a lot of that is transitive deps from containers/image. But also, because we don't update vendored deps here regularly at all, every time we do there's usually a large set.

cgwalters · 2023-03-03T19:38:36Z

/test e2e-gcp-ovn-rt-upgrade
/test e2e-gcp-op

cheesesashimi · 2023-03-03T19:54:57Z

Overall this looks fine. I'll approve once the test suites pass, solely because of the large number of dependency changes.

pkg/daemon/update.go

cgwalters · 2023-03-03T22:25:36Z

Hmm, the e2e-gcp-ovn-rt-upgrade job failed...but not for a reason I was expecting. First, one thing I notice in this job in that confusingly the -upgrade jobs actually just synthesize a "synthetic" upgrade from current CI without the PR to code with the PR. Consequently, we're not actually doing an OS update in this job, and that means we're not actually running this modified code because we aren't doing an OS update.

cgwalters · 2023-03-03T22:40:48Z

Actually...I am confused by that failure since it seems to say that the machine-config operator was failing, but AFAICS it isn't? Though there are a spam of warnings in the operator logs.

cgwalters · 2023-03-03T22:41:25Z

/test ?

openshift-ci · 2023-03-03T22:41:31Z

@cgwalters: The following commands are available to trigger required jobs:

/test 4.12-upgrade-from-stable-4.11-images
/test cluster-bootimages
/test e2e-aws-ovn
/test e2e-aws-ovn-upgrade
/test e2e-gcp-op
/test images
/test okd-scos-images
/test unit
/test verify

The following commands are available to trigger optional jobs:

/test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade
/test bootstrap-unit
/test e2e-alibabacloud-ovn
/test e2e-aws-disruptive
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-fips-op
/test e2e-aws-ovn-workers-rhel8
/test e2e-aws-proxy
/test e2e-aws-serial
/test e2e-aws-single-node
/test e2e-aws-upgrade-single-node
/test e2e-aws-workers-rhel8
/test e2e-azure
/test e2e-azure-ovn-upgrade
/test e2e-azure-upgrade
/test e2e-gcp-op-single-node
/test e2e-gcp-ovn-rt-upgrade
/test e2e-gcp-rt
/test e2e-gcp-rt-op
/test e2e-gcp-single-node
/test e2e-gcp-upgrade
/test e2e-hypershift
/test e2e-metal-assisted
/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-ipv6
/test e2e-openstack
/test e2e-openstack-externallb-techpreview
/test e2e-openstack-parallel
/test e2e-ovirt
/test e2e-ovirt-upgrade
/test e2e-ovn-step-registry
/test e2e-vsphere
/test e2e-vsphere-upgrade
/test e2e-vsphere-upi
/test okd-e2e-aws
/test okd-e2e-gcp-op
/test okd-e2e-upgrade
/test okd-e2e-vsphere
/test okd-images
/test okd-scos-e2e-aws-ovn
/test okd-scos-e2e-gcp-op
/test okd-scos-e2e-gcp-ovn-upgrade
/test okd-scos-e2e-vsphere

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-machine-config-operator-master-e2e-alibabacloud-ovn
pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn
pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-upgrade
pull-ci-openshift-machine-config-operator-master-e2e-gcp-op
pull-ci-openshift-machine-config-operator-master-e2e-gcp-ovn-rt-upgrade
pull-ci-openshift-machine-config-operator-master-e2e-hypershift
pull-ci-openshift-machine-config-operator-master-images
pull-ci-openshift-machine-config-operator-master-okd-images
pull-ci-openshift-machine-config-operator-master-okd-scos-e2e-aws-ovn
pull-ci-openshift-machine-config-operator-master-okd-scos-e2e-gcp-ovn-upgrade
pull-ci-openshift-machine-config-operator-master-okd-scos-images
pull-ci-openshift-machine-config-operator-master-unit
pull-ci-openshift-machine-config-operator-master-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cgwalters · 2023-03-03T22:56:04Z

/payload-job periodic-ci-openshift-release-master-ci-4.13-upgrade-from-stable-4.12-e2e-gcp-ovn-rt-upgrade

xref #3485 (comment)

openshift-ci · 2023-03-03T22:56:07Z

@cgwalters: trigger 1 job(s) for the /payload-(job|aggregate) command

periodic-ci-openshift-release-master-ci-4.13-upgrade-from-stable-4.12-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/93cd3720-ba16-11ed-9286-0a1a70c20a75-0

cgwalters · 2023-03-04T13:30:40Z

Man, I was so confused why the code wasn't working and yeah...I modified the legacy dead-code OS update path 😢 😢 Going to do a separate PR to excise that from existence 🪓 entirely. (Edit: done in #3583)

cgwalters · 2023-03-04T13:38:02Z

/payload-job periodic-ci-openshift-release-master-ci-4.13-upgrade-from-stable-4.12-e2e-gcp-ovn-rt-upgrade

openshift-ci · 2023-03-04T13:38:05Z

@cgwalters: trigger 1 job(s) for the /payload-(job|aggregate) command

periodic-ci-openshift-release-master-ci-4.13-upgrade-from-stable-4.12-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c96e59b0-ba91-11ed-83e3-f9126a759ef7-0

cgwalters · 2023-03-04T17:30:08Z

🎉 Got a green payload run on that previous commit. I had to push a fixup to handle the case of going rt -> throughput without an OS update. I think this only happens in the MCO's CI runs, switching rt -> throughput is an unusual thing to do in production.

So let's do one more payload run with tip
/payload-job periodic-ci-openshift-release-master-ci-4.13-upgrade-from-stable-4.12-e2e-gcp-ovn-rt-upgrade
and if both that and e2e-gcp-op are good, I think let's merge this.

openshift-ci-robot · 2023-03-07T15:44:03Z

@cgwalters: This pull request references Jira Issue OCPBUGS-8113, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.14.0) matches configured target version for branch (4.14.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @rioliu-rh

In response to this:

daemon: Clean up switchKernel a bit

De-duplicate calls to canonicalizeKernelType to make the
logic easier to read. Also add a few comments.

vendor: Bump coreos/rpm-ostree-client-go

In prep for usage in MCD.

daemon: Make switchKernel less stateful

This is prep for fixing RHEL9 upgrades while maintaining kernel-rt.

Previously the switchKernel logic tried to carefully handle
all 4 cases (default -> default, default -> rt, rt -> default, rt -> rt).

But, the last one (rt -> rt) was not quite right because
the previous rpm-ostree rebase command already preserved the previous
kernel. In fact it was pretty expensive to do things this way
because we'd e.g. regenerate the initramfs twice.

To say this another way: when doing a RHEL9 update, it's actually
the first rpm-ostree rebase command which fails before we
even get to switchKernel.

And the reason is due to the introduction of a new -core subpackage;
xref https://issues.redhat.com/browse/OCPBUGS-8113

So here's the new logic to handle this:

Before we do the rebase operation to the new OS, we detect
any previous overrides of any packages starting with kernel-rt
and we remove them. Notably this avoids hardcoding any specific
kernel subpackages; we just remove everything starting with
kernel-rt which should be more robust to subpackage changes
in the future.

Consequently the rebase operation will hence start out by deploying the
stock image i.e. with throughput kernel (though note we are
carefully preserving other local overrides)

The switchKernel function now longer needs to take the previous
machineconfig state into account (except for logging).
Instead, we just detect if the target is RT, and if so we then we
apply the latest packages.

This significantly simplifies the logic in switchKernel, and will
help fix RHEL9 upgrades.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sinnykumari · 2023-03-07T16:03:32Z

pkg/daemon/update.go

 		return nil
 	}

+	// TODO: Drop this code and use https://github.com/coreos/rpm-ostree/issues/2542 instead
 	defaultKernel := []string{"kernel", "kernel-core", "kernel-modules", "kernel-modules-extra"}


I missed to ask this, since we are not yet adding kernel-modules-core in defaultKernel list packages. Fixing OCPBUGS-8113 will still need that, correct?

Yes; but we can't land this in 4.14/master (still using rhel8.6) if we specify that package. The change to do so for rhel9 is part of that PR, see 4e9fca2

But at a technical level I think we can say that this is still "the" fix for OCPBUGS-8113 since it has 98% of the required code changes?

Thanks, this will help QE when they perform testing.

cc @rioliu-rh @sergiordlr

sinnykumari · 2023-03-07T16:29:08Z

/lgtm
/test e2e-gcp-op

Putting hold for qe approval under pre-merge testing
/hold

cgwalters · 2023-03-07T18:14:54Z

Out of curiosity what would QE be testing that isn't covered by the payload test run and e2e-gcp-op?

De-duplicate calls to `canonicalizeKernelType` to make the logic easier to read. Also add a few comments.

In prep for usage in MCD.

This is prep for fixing RHEL9 upgrades while maintaining `kernel-rt`. Previously the `switchKernel` logic tried to carefully handle all 4 cases (default -> default, default -> rt, rt -> default, rt -> rt). But, the last one (rt -> rt) was not quite right because the previous `rpm-ostree rebase` command already preserved the previous kernel. In fact it was pretty expensive to do things this way because we'd e.g. regenerate the initramfs *twice*. To say this another way: when doing a RHEL9 update, it's actually the first `rpm-ostree rebase` command which fails before we even get to `switchKernel`. And the reason is due to the introduction of a new `-core` subpackage; xref https://issues.redhat.com/browse/OCPBUGS-8113 So here's the new logic to handle this: - Before we do the `rebase` operation to the new OS, we detect any previous overrides of any packages starting with `kernel-rt` and we remove them. Notably this avoids hardcoding any specific kernel subpackages; we just remove *everything* starting with `kernel-rt` which should be more robust to subpackage changes in the future. - Consequently the `rebase` operation will hence start out by deploying the stock image i.e. with throughput kernel (though note we *are* carefully preserving other local overrides) - The `switchKernel` function now longer needs to take the *previous* machineconfig state into account (except for logging). Instead, we just detect if the target is RT, and if so we then we apply the latest packages. This significantly simplifies the logic in `switchKernel`, and will help fix RHEL9 upgrades.

cgwalters · 2023-03-08T13:52:22Z

Rebased 🏄 since another PR bumped the Go deps in the meantime

sdodson · 2023-03-08T14:19:08Z

/hold cancel
This has been tested in CI and needs to be backported to release-4.13 with some urgency. There will be a final QE round of testing once all of the pieces have landed which I believe when paired with CI testing is sufficient.

sdodson · 2023-03-08T14:19:39Z

/lgtm
Code has just been rebased, no material changes since last lgtm.

openshift-ci · 2023-03-08T14:21:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, sdodson, sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters,sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sdodson · 2023-03-08T14:45:25Z

/cherry-pick release-4.13

openshift-ci-robot · 2023-03-08T14:45:28Z

@cgwalters: Jira Issue OCPBUGS-8113: All pull requests linked via external trackers have merged:

openshift/machine-config-operator#3580

Jira Issue OCPBUGS-8113 has been moved to the MODIFIED state.

In response to this:

daemon: Clean up switchKernel a bit

De-duplicate calls to canonicalizeKernelType to make the
logic easier to read. Also add a few comments.

vendor: Bump coreos/rpm-ostree-client-go

In prep for usage in MCD.

daemon: Make switchKernel less stateful

This is prep for fixing RHEL9 upgrades while maintaining kernel-rt.

Previously the switchKernel logic tried to carefully handle
all 4 cases (default -> default, default -> rt, rt -> default, rt -> rt).

But, the last one (rt -> rt) was not quite right because
the previous rpm-ostree rebase command already preserved the previous
kernel. In fact it was pretty expensive to do things this way
because we'd e.g. regenerate the initramfs twice.

To say this another way: when doing a RHEL9 update, it's actually
the first rpm-ostree rebase command which fails before we
even get to switchKernel.

And the reason is due to the introduction of a new -core subpackage;
xref https://issues.redhat.com/browse/OCPBUGS-8113

So here's the new logic to handle this:

Before we do the rebase operation to the new OS, we detect
any previous overrides of any packages starting with kernel-rt
and we remove them. Notably this avoids hardcoding any specific
kernel subpackages; we just remove everything starting with
kernel-rt which should be more robust to subpackage changes
in the future.

Consequently the rebase operation will hence start out by deploying the
stock image i.e. with throughput kernel (though note we are
carefully preserving other local overrides)

The switchKernel function now longer needs to take the previous
machineconfig state into account (except for logging).
Instead, we just detect if the target is RT, and if so we then we
apply the latest packages.

This significantly simplifies the logic in switchKernel, and will
help fix RHEL9 upgrades.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-cherrypick-robot · 2023-03-08T14:46:18Z

@sdodson: #3580 failed to apply on top of branch "release-4.13":

Applying: daemon: Clean up `switchKernel` a bit
Applying: vendor: Bump coreos/rpm-ostree-client-go
.git/rebase-apply/patch:15768: trailing whitespace.
 
.git/rebase-apply/patch:15845: trailing whitespace.
 
.git/rebase-apply/patch:15872: trailing whitespace.
    
.git/rebase-apply/patch:15896: trailing whitespace.
    
.git/rebase-apply/patch:15971: trailing whitespace.
                quotedString.WriteString(fmt.Sprintf("\\u%04x", c))         
error: patch failed: vendor/github.com/klauspost/compress/README.md:9
error: vendor/github.com/klauspost/compress/README.md: patch does not apply
error: Did you hand edit your patch?
It does not apply to blobs recorded in its index.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Using index info to reconstruct a base tree...
M	go.mod
M	go.sum
M	vendor/google.golang.org/grpc/balancer/balancer.go
A	vendor/google.golang.org/grpc/balancer/conn_state_evaluator.go
M	vendor/google.golang.org/grpc/clientconn.go
M	vendor/google.golang.org/grpc/dialoptions.go
M	vendor/google.golang.org/grpc/internal/envconfig/xds.go
M	vendor/google.golang.org/grpc/internal/grpcutil/method.go
M	vendor/google.golang.org/grpc/internal/transport/http2_client.go
M	vendor/google.golang.org/grpc/internal/transport/http2_server.go
M	vendor/google.golang.org/grpc/internal/transport/http_util.go
M	vendor/google.golang.org/grpc/server.go
M	vendor/google.golang.org/grpc/stream.go
M	vendor/google.golang.org/grpc/version.go
M	vendor/google.golang.org/grpc/vet.sh
M	vendor/modules.txt
Patch failed at 0002 vendor: Bump coreos/rpm-ostree-client-go
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cgwalters · 2023-03-08T14:51:35Z

Set up a manual cherry pick in #3595

sinnykumari · 2023-03-09T10:33:54Z

Out of curiosity what would QE be testing that isn't covered by the payload test run and e2e-gcp-op?

I understand your point. As per agreement with our QE team, we are following pre-merge testing process since OCP 4.13 to keep things stable for sprintly releases that includes stories with qe_required label and all OCPBUGS related PRs . QE are free to take the call when to test but we manually need qe_approved ack (hopefully someday prow will have automated workflow for this like we have for backport bugs).
And of-coruse, these can be overriden by staff-engineers with follow-up risks 😆

openshift-ci bot requested review from cheesesashimi and sinnykumari March 3, 2023 16:58

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 3, 2023

cgwalters mentioned this pull request Mar 3, 2023

Switch to rhel-coreos-9 #3485

Closed

cheesesashimi reviewed Mar 3, 2023

View reviewed changes

pkg/daemon/update.go Outdated Show resolved Hide resolved

cgwalters force-pushed the kernel-updates-refactor branch from 6121b6a to 8954ae6 Compare March 3, 2023 19:47

cgwalters mentioned this pull request Mar 3, 2023

daemon: Clean up switchKernel a bit && vendor: Bump coreos/rpm-ostree-client-go #3579

Closed

cgwalters commented Mar 3, 2023

View reviewed changes

pkg/daemon/update.go Outdated Show resolved Hide resolved

cgwalters force-pushed the kernel-updates-refactor branch from 9ab3588 to 6ac06ee Compare March 3, 2023 22:55

cgwalters force-pushed the kernel-updates-refactor branch 2 times, most recently from a902c97 to 6d929c3 Compare March 4, 2023 13:29

cgwalters force-pushed the kernel-updates-refactor branch from 6d929c3 to 42ebe46 Compare March 4, 2023 16:16

sinnykumari reviewed Mar 7, 2023

View reviewed changes

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 7, 2023

openshift-ci bot assigned sinnykumari Mar 7, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 7, 2023

cgwalters added 3 commits March 8, 2023 08:36

daemon: Clean up switchKernel a bit

b75c7af

De-duplicate calls to `canonicalizeKernelType` to make the logic easier to read. Also add a few comments.

vendor: Bump coreos/rpm-ostree-client-go

cae67a6

In prep for usage in MCD.

cgwalters force-pushed the kernel-updates-refactor branch from 234c2cb to 8ac5bee Compare March 8, 2023 13:48

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 8, 2023

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 8, 2023

openshift-ci bot assigned sdodson Mar 8, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 8, 2023

sdodson merged commit 4fb7117 into openshift:master Mar 8, 2023

cgwalters mentioned this pull request Mar 8, 2023

OCPBUGS-8703: Backport switchkernel 4.13 #3595

Merged

cgwalters mentioned this pull request Mar 8, 2023

New Package Request: passt coreos/fedora-coreos-tracker#1436

Closed

cgwalters mentioned this pull request Mar 9, 2023

OCPBUGS-9685: daemon: Always remove pending deployment before we do updates #3599

Merged

sinnykumari mentioned this pull request Mar 9, 2023

OCPBUGS-8113: daemon: Only switchkernel if we are doing an OS update or kernel change #3600

Merged

cgwalters mentioned this pull request Sep 12, 2023

Boot performance vs dnf image ostreedev/ostree#3041

Open


		if canonicalizeKernelType(oldConfig.Spec.KernelType) == ctrlcommon.KernelTypeRealtime && canonicalizeKernelType(newConfig.Spec.KernelType) == ctrlcommon.KernelTypeDefault {
		switchingToThroughput := oldKtype == ctrlcommon.KernelTypeRealtime && newKtype == ctrlcommon.KernelTypeDefault

OCPBUGS-8113: daemon: Make switchKernel less stateful #3580

OCPBUGS-8113: daemon: Make switchKernel less stateful #3580

Conversation

cgwalters commented Mar 3, 2023 • edited Loading

cgwalters commented Mar 3, 2023

openshift-ci bot commented Mar 3, 2023

cgwalters commented Mar 3, 2023

cheesesashimi Mar 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cheesesashimi commented Mar 3, 2023

cgwalters commented Mar 3, 2023

cgwalters commented Mar 3, 2023

cheesesashimi commented Mar 3, 2023

cgwalters commented Mar 3, 2023

cgwalters commented Mar 3, 2023

cgwalters commented Mar 3, 2023

openshift-ci bot commented Mar 3, 2023

cgwalters commented Mar 3, 2023

openshift-ci bot commented Mar 3, 2023

cgwalters commented Mar 4, 2023 • edited Loading

cgwalters commented Mar 4, 2023

openshift-ci bot commented Mar 4, 2023

cgwalters commented Mar 4, 2023 • edited Loading

openshift-ci-robot commented Mar 7, 2023

Choose a reason for hiding this comment

cgwalters Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sinnykumari commented Mar 7, 2023

cgwalters commented Mar 7, 2023

cgwalters commented Mar 8, 2023

sdodson commented Mar 8, 2023

sdodson commented Mar 8, 2023

openshift-ci bot commented Mar 8, 2023

sdodson commented Mar 8, 2023

openshift-ci-robot commented Mar 8, 2023

openshift-cherrypick-robot commented Mar 8, 2023

cgwalters commented Mar 8, 2023

sinnykumari commented Mar 9, 2023

cgwalters commented Mar 3, 2023 •

edited

Loading

cheesesashimi Mar 3, 2023 •

edited

Loading

cgwalters commented Mar 4, 2023 •

edited

Loading

cgwalters commented Mar 4, 2023 •

edited

Loading

cgwalters Mar 7, 2023 •

edited

Loading