Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-8447: MCO-496: Support ignition versions 3.3 + 3.4 but keep version 3.2 as default #3576

Merged
merged 4 commits into from
May 18, 2023

Conversation

jkyros
Copy link
Contributor

@jkyros jkyros commented Feb 28, 2023

This allows the MCO to receive and serve ignition versions 3.3 and 3.4, but DOES NOT change the default from 3.2, and as discussed does not include MCO feature implementations for the kernel args feature nor the "offline tang" feature (lest we introduce more risk).

The MCO has traditionally tried to support an ignition version at least one release before it's actually used so we have the ability to roll back an upgrade, etc. This PR is in service to that goal so we can use 3.4 in the next version, and next version we will come back and do the "rest of the work" to move the default version to 3.4.

- What I did

Added ignition 3.3 version support (but not support for the kargs feature yet)
Added ignition 3.4 version support (but not support for the offline tang feature yet)
Updated unit tests to account for the new versions
Bumped ignition deps for ignition 3.4

This means that:

  • Ignition versions > 3.2 will be "down-translated" to 3.2
  • Ignition versions < 3.2 will be still be "up-translated" to 3.2
  • If something asks the MCS for a 3.3 or 3.4 version, it will "up-translate" the 3.2 to that version to serve it

- How to verify it
Feed the MCO something in the ignition 3.3/3.4 format:

Feed the MCO something in the ignition 3.3/3.4 format that contains fields not supported in 3.2 (kargs, offline tang):

Request a 3.3/3.4 version ignition from the MCS:

- Description for the changelog
Allow MCO to serve and receive ignition spec versions 3.3 and 3.4

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 28, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 28, 2023

@jkyros: This pull request references MCO-496 which is a valid jira issue.

In response to this:

[ This is currently using my fork of ign-convert, I am not proposing it merge that way, I am just testing ]

This allows the MCO to receive and serve ignition versions 3.3 and 3.4, but DOES NOT change the default from 3.2, and as discussed does not include MCO feature implementations for the kernel args feature nor the "offline tang" feature (lest we introduce more risk).

The MCO has traditionally tried to support an ignition version at least one release before it's actually used so we have the ability to roll back an upgrade, etc. This PR is in service to that goal so we can use 3.4 in the next version, and next version we will come back and do the "rest of the work" to move the default version to 3.4.

- What I did

Added ignition 3.3 version support (but not support for the kargs feature yet)
Added ignition 3.4 version support (but not support for the offline tang feature yet)
Updated unit tests to account for the new versions
Bumped ignition deps for ignition 3.4

This means that:

  • Ignition versions > 3.2 will be "down-translated" to 3.2
  • Ignition versions < 3.2 will be still be "up-translated" to 3.2
  • If something asks the MCS for a 3.3 or 3.4 version, it will "up-translate" the 3.2 to that version to serve it

- How to verify it
Feed the MCO something in the ignition 3.3/3.4 format:

Feed the MCO something in the ignition 3.3/3.4 format that contains fields not supported in 3.2 (kargs, offline tang):

Request a 3.3/3.4 version ignition from the MCS:

- Description for the changelog
Allow MCO to serve and receive ignition spec versions 3.3 and 3.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 28, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 28, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 28, 2023
@jkyros
Copy link
Contributor Author

jkyros commented Feb 28, 2023

/test e2e-gcp-op

1 similar comment
@jkyros
Copy link
Contributor Author

jkyros commented Feb 28, 2023

/test e2e-gcp-op

@jkyros
Copy link
Contributor Author

jkyros commented Feb 28, 2023

/test e2e-aws-ovn-upgrade

@jkyros jkyros force-pushed the ignition-34-minimal branch 2 times, most recently from 9994593 to ce4cda5 Compare February 28, 2023 22:56
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 28, 2023

@jkyros: This pull request references MCO-496 which is a valid jira issue.

In response to this:

[ This is currently using my fork of ign-convert, I am not proposing it merge that way, I am just testing ]

This allows the MCO to receive and serve ignition versions 3.3 and 3.4, but DOES NOT change the default from 3.2, and as discussed does not include MCO feature implementations for the kernel args feature nor the "offline tang" feature (lest we introduce more risk).

The MCO has traditionally tried to support an ignition version at least one release before it's actually used so we have the ability to roll back an upgrade, etc. This PR is in service to that goal so we can use 3.4 in the next version, and next version we will come back and do the "rest of the work" to move the default version to 3.4.

- What I did

Added ignition 3.3 version support (but not support for the kargs feature yet)
Added ignition 3.4 version support (but not support for the offline tang feature yet)
Updated unit tests to account for the new versions
Bumped ignition deps for ignition 3.4

This means that:

  • Ignition versions > 3.2 will be "down-translated" to 3.2
  • Ignition versions < 3.2 will be still be "up-translated" to 3.2
  • If something asks the MCS for a 3.3 or 3.4 version, it will "up-translate" the 3.2 to that version to serve it

- How to verify it
Feed the MCO something in the ignition 3.3/3.4 format:

Feed the MCO something in the ignition 3.3/3.4 format that contains fields not supported in 3.2 (kargs, offline tang):

Request a 3.3/3.4 version ignition from the MCS:

- Description for the changelog
Allow MCO to serve and receive ignition spec versions 3.3 and 3.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 28, 2023

@jkyros: This pull request references MCO-496 which is a valid jira issue.

In response to this:

This allows the MCO to receive and serve ignition versions 3.3 and 3.4, but DOES NOT change the default from 3.2, and as discussed does not include MCO feature implementations for the kernel args feature nor the "offline tang" feature (lest we introduce more risk).

The MCO has traditionally tried to support an ignition version at least one release before it's actually used so we have the ability to roll back an upgrade, etc. This PR is in service to that goal so we can use 3.4 in the next version, and next version we will come back and do the "rest of the work" to move the default version to 3.4.

- What I did

Added ignition 3.3 version support (but not support for the kargs feature yet)
Added ignition 3.4 version support (but not support for the offline tang feature yet)
Updated unit tests to account for the new versions
Bumped ignition deps for ignition 3.4

This means that:

  • Ignition versions > 3.2 will be "down-translated" to 3.2
  • Ignition versions < 3.2 will be still be "up-translated" to 3.2
  • If something asks the MCS for a 3.3 or 3.4 version, it will "up-translate" the 3.2 to that version to serve it

- How to verify it
Feed the MCO something in the ignition 3.3/3.4 format:

Feed the MCO something in the ignition 3.3/3.4 format that contains fields not supported in 3.2 (kargs, offline tang):

Request a 3.3/3.4 version ignition from the MCS:

- Description for the changelog
Allow MCO to serve and receive ignition spec versions 3.3 and 3.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jkyros
Copy link
Contributor Author

jkyros commented Feb 28, 2023

/test all

@jkyros jkyros changed the title [WIP]MCO-496: Support ignition versions 3.3 + 3.4 but keep version 3.2 as default MCO-496: Support ignition versions 3.3 + 3.4 but keep version 3.2 as default Feb 28, 2023
@jkyros jkyros force-pushed the ignition-34-minimal branch 2 times, most recently from 0563818 to 206c01e Compare March 1, 2023 00:57
@jkyros jkyros marked this pull request as ready for review March 1, 2023 01:12
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 1, 2023
// can get down to 3.2
ignCfgV3_3, errV3_3 := v34tov33.Translate(ignCfgV3)
if errV3_3 != nil {
// This case should only be hit if fields that only exist in v3_4 are being used which are not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Would an example of those fields be the tang / kernel args fields?

If so, I don't think that would produce an error if the MCO can't support it; only if the Ignition payload cannot be translated from one Ignition version to another. We may need a check that those unsupported fields aren't being used so we can explicitly show an error vs. silently failing. If those already exist, then please disregard my question and concern 😄.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, correct. The "tang/kernel args fields" are exactly what I'm referring to.

The checks you're describing do exist but not in the MCO -- they're in ign-converter, e.g;

https://github.com/coreos/ign-converter/blob/897284baf1179e76ac6ef14e8ca8542334d58c55/translate/v34tov33/v34tov33.go#L125

and they're applied as part of the "down-translation" process, so I'm "freeloading" on that so we don't have to duplicate that bit of code into the MCO.

And so yes, if you feed us an ignition containing offline tang fields:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name:  99-cant-convert-34
spec:
  config:
    ignition:
      version: 3.4.0
    storage:
    luks:
    - name: luks-tang
      device: "/dev/sdb"
      clevis:
        tang:
        - url: https://tang.example.com
          thumbprint: REPLACE-THIS-WITH-YOUR-TANG-THUMBPRINT
          advertisement: '{"payload": "...", "protected": "...", "signature": "..."}'

Much like any other render/conversion error, your pool will degrade:

NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-a79af855fcc5ad725c369aa16a2d6f32   True      False      False      3              3                   3                     0                      46m
worker   rendered-worker-8664798e8b1c6588299c00a261886237   True      False      True       3              3                   3                     0                      46m
    Last Transition Time:  2023-03-01T20:36:33Z
    Message:               Failed to render configuration for pool worker: translating Ignition config failed with error Invalid input config: tang offline provisioning is not supported in spec v3.3
    Reason:                
    Status:                True
    Type:                  RenderDegraded

And the render controller will whine about it:

I0301 20:36:38.767708       1 render_controller.go:377] Error syncing machineconfigpool worker: translating Ignition config failed with error Invalid input config: tang offline provisioning is not supported in spec v3.3

For the record I'd absolutely love to have an admission controller to parse and reject it up front. Maybe someday 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we bump our default from 3.2 to 3.4 without the MCO implementing the features described in 3.3 and 3.4 then yes we should do exactly what you describe.

@jkyros
Copy link
Contributor Author

jkyros commented Mar 1, 2023

Revendored now that coreos/ign-converter#44 merged

@bgilbert
Copy link
Contributor

bgilbert commented Mar 1, 2023

The MCO has traditionally tried to support an ignition version at least one release before it's actually used so we have the ability to roll back an upgrade, etc.

For the record, this is actually a new practice, driven by the requirement for clean downgrade tests.

Added ignition 3.3 version support (but not support for the kargs feature yet)

The general design here is that 4.13 will support any configs declaring themselves as 3.4.0 that are also translatable to 3.2.0, right? And then in 4.14, we can start enabling 3.4.0 features, which is fine because we only need to support downgrades to 4.13 when no 4.14-specific features are configured.

That seems okay in general, but if MCO 4.14 starts translating MachineConfig kernelArguments to a rendered config that uses Ignition kernelArguments, then MCO 4.14 will convert a MachineConfig without 4.14-specific features into a rendered config that 4.13 won't accept.

Added ignition 3.4 version support (but not support for the offline tang feature yet)

Offline Tang shouldn't need explicit MCO support AFAIK; the field just needs to be passed through to Ignition.

However, I noticed that reconcilable() doesn't compare Storage.Luks in the old and new configs, so any MachineConfig changes that affect LUKS will be silently allowed and ignored for existing nodes. This isn't a new problem, but might be worth fixing while here?

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense, thanks for working on this! Some questions below

// ConvertRawExtIgnitionToV3_4 ensures that the Ignition config in
// the RawExtension is spec v3.4, or translates to it.
func ConvertRawExtIgnitionToV3_4(inRawExtIgn *runtime.RawExtension) (runtime.RawExtension, error) {
// TODO(jkyros): since 3.4 is "ahead" of our current default 3.2, we're going "up" not down, which is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this comment was for the V3_3 conversion (Once we bump to 3.3 this will be briefly obsolete should be 3.4, and the rest will be future proofing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You correct sir, good catch 😄

ignCfgV3_2, rptV3_2, errV3_2 := ign3.Parse(rawIgn)
if errV3_2 == nil && !rptV3_2.IsFatal() {
// ParseCompatibleVersion will parse any config <= N to version N
ignCfgV3, rptV3, errV3 := ign3_4.ParseCompatibleVersion(rawIgn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice! We no longer have to chain ourselves 👍

}
serveConf = &converted34

} else if reqConfigVer.Equal(*semver.New("3.3.0")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think RHCOS has been on 3.3 for awhile, and technically we've been feeding it 3.2 up to now. I don't think this would change any actual functionality though right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that and wasn't sure if things cared if they got a different version than they asked for.

It shouldn't change any actual functionality, but we can totally keep feeding them 3.2 if you're worried.

That probably turns into "If they ask for a 3.x version > our default, give them our default" .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be fine to send a newer config version. Ignition uptranslates older configs internally anyway.

@yuqi-zhang
Copy link
Contributor

However, I noticed that reconcilable() doesn't compare Storage.Luks in the old and new configs, so any MachineConfig changes that affect LUKS will be silently allowed and ignored for existing nodes. This isn't a new problem, but might be worth fixing while here?

I believe this is https://bugzilla.redhat.com/show_bug.cgi?id=1964701 + https://issues.redhat.com/browse/RFE-1894 ?

@bgilbert
Copy link
Contributor

bgilbert commented Mar 1, 2023

I believe this is https://bugzilla.redhat.com/show_bug.cgi?id=1964701 + https://issues.redhat.com/browse/RFE-1894 ?

IMO those issues are a bit different, in that they cover one case where the MCO might want to learn to reconcile Luks section changes. For now, though, any and all changes to the Luks section are being silently ignored for existing nodes, rather than degrading the node as happens for other irreconcilable config changes.

@jkyros
Copy link
Contributor Author

jkyros commented Mar 1, 2023

IMO those issues are a bit different, in that they cover one case where the MCO might want to learn to reconcile Luks section changes. For now, though, any and all changes to the Luks section are being silently ignored for existing nodes, rather than degrading the node as happens for other irreconcilable config changes.

We were kind of "too chicken" to fix it by just degrading because the folks in those issues Jerry linked were fixing it by updating it in MachineConfig (for any new nodes that come online), and then retrofitting their old ones via SSH or whatever. If we degrade, they'd have to retrofit out-of-band forever, whereas if we let them change it they can say "from this point on, all the new nodes will be right".

@yuqi-zhang
Copy link
Contributor

Right. I was more so pointing to the fact that some users may be depending on one behaviour already. If we want to switch to "degrade" or "MCO manages it" instead of "the silent treatment", I think we would need to consider the ramifications to switch. I would be tentatively be ok with us degrading, if only to warn the user changes may be coming in future versions

@jkyros
Copy link
Contributor Author

jkyros commented Apr 26, 2023

/test e2e-gcp-op

@sergiordlr
Copy link

Verified using IPI on GCP with proxy.

Steps that we followed in order to verity the PR (similar to those in #3576 (comment)):

  • Create a MC using ignition version 3.4 and "shouldNotExist" attribute

The behavior was the expected one.The result was the same that we got in the first verification.

  • Create a MC using ignition version 3.4 with kernel arguments

The behavior was the expected one. The result was the same that we got in the first verification.

  • Check that there is no problem when we ask MCS for different ignition versions:

The behavior was the expected one. The result was the same that we got in the first verification.

Note that the kernel arguments are not present in the 3.4.0/3.3.0 versions of the ignition configs.

# curl -H "Accept: application/vnd.coreos.ignition+json; version=3.4.0" -k https://api.sregidor-ver3.qe.gcp.devcluster.openshift.com:22623/config/worker | jq  | grep enforce

  • Create a MC using unsupported luks.tang configuration:

The MCP was not degraded. Since this comment this issue is expected #3576 (comment)

We can think about creating a different jira issue to track this problem.

  • Create a MC using kernels arguments and 3.2.0 version.

The behavior was the expected one. The result was the same that we got in the first verification.

  • Create a MC using both ways to declare kernel arguments

The behavior was the expected one. The result was the same that we got in the first verification.

  • Create a random test file in /etc using MCs with ignition versions 2.2.0, 3.1.0, 3.2.0, 3.3.0, 3.4.0

All files were created properly. The error in ignition version 2.2.0 does not happen anymore.

  • Create a random unit using MCs with ignition versions 2.2.0 and 3.4.0
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: mc-unit-22
spec:
  config:
    ignition:
      version: 2.2.0
    systemd:
      units:
      - name: my22unit.service
        contents: |
          [Unit]
          Description=A simple oneshot service
          
          [Service]
          Type=oneshot
          ExecStart=/bin/bash -c "echo Hello world"
          RemainAfterExit=yes

          [Install]
          WantedBy=multi-user.target

Both 2.2.0 and 3.4.0 version behaved as expected. New rendered 3.2.0 machineconfigs were created and the units worked as expected

sh-5.1# systemctl status my22unit.service
○ my22unit.service - A simple oneshot service

sh-5.1# systemctl status my34unit.service
○ my34unit.service - A simple oneshot service

  • All Critical/Medium/Low priority automated e2e test cases were executed using a IPI GCP cluster with proxy

No issues found

  • All High priority automated e2e test cases were executed using a IPI on GCP cluster without proxy

One issue found. We have a test (TC OCP-42390) to make sure that when the ignition version is empty, the MCP is degraded showing this message:

parsing Ignition config failed: unknown version. Supported spec versions: 2.2, 3.0, 3.1, 3.2

Instead of this message now we get this other message

Failed to render configuration for pool worker: parsing Ignition config spec v3 failed with error: invalid config version (couldn't parse)

We can modify the expected test message, but we would like to make sure that the message that we use is the one that we actually want to use.

I think that the TC's expected message is more descriptive and helpful than the one we are reporting now.

Apart from that minor issue, no failure was found and everything is working as expected. Once we decide if we want to modify this error message or keep it, we will add the qe-approved label.

Thank you very much for your great job!!

jkyros added 4 commits May 1, 2023 16:19
This adds MCS api support so the MCS can serve ignition 3.3 if
requested, but the MCO will still default to ignition 3.2 internally.

In order to keep 3.2 as the default/internal version while supporting
the newer versions, this parses incoming configs up to 3.3 and then
uses the ignition converter to "down-translate" them back to version
3.2.
This adds MCS api support so the MCS can serve ignition 3.4 if
requested, but the MCO will still default to ignition 3.2 internally.

In order to keep 3.2 as the default/internal version while supporting
the newer versions, this parses incoming configs up to 3.4 and then
uses the ignition converter to "down-translate" them back to version
3.2 (via 3.3, so 3.4->3.3->3.2).

This does not include MCO implementations of the new features added by
the new ignition spec, just support for the spec itself.
A future MCO is going to understand and implement ignition
kargs, but this one doesn't, and we need to allow for that

This:
- adds the ignition kargs ShouldExist into MachineConfig kargs if
  ignition kargs are present
- scrapes the ignition kargs out of our 3.3+ ignition configs during
  down-translation so we can still down-translate
- degrades if ShouldNotExist is present in ignition kargs, since we
  don't have a way to handle it in this MCO version

This code will all come out when the MCO actually implements ignition
kargs, it's just a "bridge" between versions so we can downgrade without
breaking. .
GOPROXY=direct go get github.com/coreos/ign-converter/translate/v33tov32
GOPROXY=direct go get github.com/coreos/ign-converter/translate/v34tov33
go get github.com/coreos/ignition/v2/config/v3_4
make go-deps
@jkyros
Copy link
Contributor Author

jkyros commented May 1, 2023

Rebased and added a more helpful error message for "Invalid Version" .

@sergiordlr I kind of "split the difference" on the error message. The new ignition parsing introduced in this PR differentiates between ErrInvalidVersion ("I can't parse this into a version number that makes sense") and ErrUnknownVersion ("I can parse this into a version number, but it's not a version number I understand"). I wanted to keep that differentiation, but I also wanted to be helpful so the user knows what to put in there.

The new error message for the ErrInvalidVersion case is:

parsing Ignition config failed: invalid version. Supported spec versions: 2.2, 3.0, 3.1, 3.2, 3.3, 3.4

(I just handled that error type explicitly as a separate case, and changed unknown to invalid, so we could tell by the logs which case we hit)

@sergiordlr
Copy link

sergiordlr commented May 3, 2023

The error reported now is similar to the one reported before this PR.

May  3 10:09:59.999: INFO: condition info of mcp/worker-RenderDegraded : '{"lastTransitionTime":"2023-05-03T10:08:58Z","message":"Failed to render configuration for pool worker: parsing Ignition config failed: invalid version. Supported spec versions: 2.2, 3.0, 3.1, 3.2, 3.3, 3.4","reason":"","status":"True","type":"RenderDegraded"}'

We add the qe-approved label. @jkyros Thank you very much!!

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label May 3, 2023
@jkyros
Copy link
Contributor Author

jkyros commented May 5, 2023

/test e2e-gcp-op

1 similar comment
@jkyros
Copy link
Contributor Author

jkyros commented May 8, 2023

/test e2e-gcp-op

Copy link
Contributor

@sinnykumari sinnykumari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, nice work John!
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 10, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 10, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jkyros, sinnykumari, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jkyros,sinnykumari,yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jkyros
Copy link
Contributor Author

jkyros commented May 16, 2023

Thanks for looking at it Sinny, and since we have QE approval, it looks like we're good to go
/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 16, 2023
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 49f32d4 and 2 for PR HEAD c251e55 in total

@jkyros
Copy link
Contributor Author

jkyros commented May 17, 2023

What a weird e2e-aws-ovn failure. That's been green the whole time. I'm struggling to correlate anything we've done here to service recreation failing.

{"ProwJobName":"pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn","ProwJobRunID":1658595958570618880,"Release":"Presubmits","CompareRelease":"4.14","Tests":[{"Name":"[sig-network] Services should work after the service has been recreated [Suite:openshift/conformance/parallel] [Suite:k8s]","Risk":{"Level":{"Name":"Medium","Level":50},"Reasons":["This test has passed 97.59% of 83 runs on release 4.14 [amd64 aws ha ovn] in the last week."]},"OpenBugs":[]}],"OverallRisk":{"Level":{"Name":"Medium","Level":50},"Reasons":["Maximum failed test risk: Medium"]},"OpenBugs":[]}

/test e2e-aws-ovn

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 17, 2023

@jkyros: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-hypershift c251e55 link false /test e2e-hypershift
ci/prow/okd-scos-e2e-gcp-ovn-upgrade c251e55 link false /test okd-scos-e2e-gcp-ovn-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jkyros
Copy link
Contributor Author

jkyros commented May 18, 2023

Looks like we hit a grace period on that one. Feels like global issues. We'll try one more time.

{"component":"entrypoint","file":"k8s.io/test-infra/prow/entrypoint/run.go:254","func":"k8s.io/test-infra/prow/entrypoint.gracefullyTerminate","level":"error","msg":"Process did not exit before 10m0s grace period","severity":"error","time":"2023-05-17T17:06:08Z"} {"component":"entrypoint","error":"os: process already finished","file":"k8s.io/test-infra/prow/entrypoint/run.go:256","func":"k8s.io/test-infra/prow/entrypoint.gracefullyTerminate","level":"error","msg":"Could not kill process after grace period","severity":"error","time":"2023-05-17T17:06:08Z"} {"component":"entrypoint","error":"process aborted","file":"k8s.io/test-infra/prow/entrypoint/run.go:79","func":"k8s.io/test-infra/prow/entrypoint.Options.Run","level":"error","msg":"Error executing test process","severity":"error","time":"2023-05-17T17:06:08Z"} error: failed to execute wrapped command: exit status 130

/test e2e-aws-ovn

@openshift-merge-robot openshift-merge-robot merged commit 655ba33 into openshift:master May 18, 2023
@openshift-ci-robot
Copy link
Contributor

@jkyros: Jira Issue OCPBUGS-8447: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-8447 has been moved to the MODIFIED state.

In response to this:

This allows the MCO to receive and serve ignition versions 3.3 and 3.4, but DOES NOT change the default from 3.2, and as discussed does not include MCO feature implementations for the kernel args feature nor the "offline tang" feature (lest we introduce more risk).

The MCO has traditionally tried to support an ignition version at least one release before it's actually used so we have the ability to roll back an upgrade, etc. This PR is in service to that goal so we can use 3.4 in the next version, and next version we will come back and do the "rest of the work" to move the default version to 3.4.

- What I did

Added ignition 3.3 version support (but not support for the kargs feature yet)
Added ignition 3.4 version support (but not support for the offline tang feature yet)
Updated unit tests to account for the new versions
Bumped ignition deps for ignition 3.4

This means that:

  • Ignition versions > 3.2 will be "down-translated" to 3.2
  • Ignition versions < 3.2 will be still be "up-translated" to 3.2
  • If something asks the MCS for a 3.3 or 3.4 version, it will "up-translate" the 3.2 to that version to serve it

- How to verify it
Feed the MCO something in the ignition 3.3/3.4 format:

Feed the MCO something in the ignition 3.3/3.4 format that contains fields not supported in 3.2 (kargs, offline tang):

Request a 3.3/3.4 version ignition from the MCS:

- Description for the changelog
Allow MCO to serve and receive ignition spec versions 3.3 and 3.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jkyros
Copy link
Contributor Author

jkyros commented May 23, 2023

I figured this was going to have to be a manual backport, but the manual backport seems to think it's clean so we'll try the cherry-pick bot
/cherry-pick release-4.13

@openshift-cherrypick-robot

@jkyros: new pull request created: #3714

In response to this:

I figured this was going to have to be a manual backport, but the manual backport seems to think it's clean so we'll try the cherry-pick bot
/cherry-pick release-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

travier added a commit to travier/butane that referenced this pull request Sep 21, 2023
Stabilise openshift 4.14.0 spec on fcos 1.5 and ignition 3.4 stable
specs. The MCO is now capable of understanding Ignition 3.4.0 specs and
uses that by default.

See: openshift/machine-config-operator#3576
See: openshift/machine-config-operator#3814
@travier travier mentioned this pull request Sep 21, 2023
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet