Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add RHCOS oscontainer into payload, render to 00-$role-osimageurl MC #273

Closed
wants to merge 2 commits into from

Conversation

cgwalters
Copy link
Member

@cgwalters cgwalters commented Jan 8, 2019

PR Status:


This closes the gap in getting the RHCOS oscontainer into the update
payload, and having the MCO then render that down into the MachineConfigs.

Introduce a new ConfigMap machine-config-osimageurl that points to
the oscontainer and is applied by the CVO. Then, the controller
renders that into machineconfigs/00-$role-osimageurl which then finally
goes into the rendered config, and should be applied by the daemon.

Closes: #183

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 8, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 8, 2019
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 8, 2019
@cgwalters
Copy link
Member Author

cgwalters commented Jan 8, 2019

The high level idea with this so far (still writing this code) is that we have a ConfigMap that points to the oscontainer and is applied by the CVO. Then, the controller renders that into machineconfigs/00-$role-osimageurl which should make it into the final config.

This supercedes #258 and also takes a small bit of #228 in that we're taking the first non-empty osImageURL.

@cgwalters
Copy link
Member Author

A very important thing here is that we need to make an immediate choice - do we pin the oscontainer in this repo and then update it via PRs? (Would be very noisy and painful, but give us CI gating) Or in the short term do we have it float :latest or so?

Or, (this would be my preference probably) matching discussion in openshift/installer#987 do we have the release controller "gather" it?

@cgwalters
Copy link
Member Author

(Hm, since AIUI the release controller reacts to ImageStream changes, if we fixed the pipeline to push into the main namespace it should then trigger release payloads?)

@smarterclayton
Copy link
Contributor

I would recommend pushing to the image stream because it is exactly what ART will do, so we can then get the RHCoS image integrated into OCP exactly like origin more quickly. You can always control how often you push.

@smarterclayton
Copy link
Contributor

Instead of pinning or pushing, you can also just manually tag a new one in, or we can set up a periodic job that grabs the latest maipo, tests it, and then promotes it if the test passes (I could set that up in < 1hr)

@cgwalters
Copy link
Member Author

or we can set up a periodic job that grabs the latest maipo, tests it, and then promotes it if the test passes (I could set that up in < 1hr)

This one sounds good! I'd be interested in seeing the code and understanding how that works...there's a lot of magic in the ci-operator/release stuff that I haven't yet fully grasped. The "test it" part is blocked a bit on this PR landing though right?

@cgwalters
Copy link
Member Author

And it turns out we aren't pushing a :latest today; we could easily but it's probably better to get the "rhcos oscontainer promotion" going.

Although...maybe we really do need to separate out "add rhcos to payload" from "apply updates"?

Here's a possibility, we could add a config option or so to ignore osImageURL changes and flip that on by default in this PR. That way landing this wouldn't impact development immediately.

@smarterclayton
Copy link
Contributor

yes - once you have an image, with a payload, it's really easy to script overlaying that test. I can go over what would be necessary there. That would block promoting a new image, but we would still need a job for PRs that overrides the image appropriately (which is not difficult)

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 9, 2019
@cgwalters
Copy link
Member Author

OK this is updated 🆕 - we now have a CONFIG_IGNORE_OSIMAGEURL and use it by default. This will allow us to land this PR to act as a mechanism and then in both local development and CI we can override the env setting to test things out.

cgwalters added a commit to cgwalters/machine-config-operator that referenced this pull request Jan 9, 2019
Today each MC will contain both an Ignition fragment and an
`osImageURL`.  Define "merging" as using the first
non-empty `osImageURL` so we don't have to be very picky about
ordering.

This is a smaller version of
openshift#228

Prep for: openshift#273
@cgwalters
Copy link
Member Author

Hm:

error: unable to create a release: operator "machine-config-operator" failed to map images: image file "/tmp/release-image-0.0.1-2019-01-09-161203356873649/machine-config-operator/image-references" referenced image "machine-os-content" that is not part of the input images

Hmm but it worked in #258

@cgwalters
Copy link
Member Author

/test images

@cgwalters
Copy link
Member Author

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 9, 2019
@cgwalters
Copy link
Member Author

/retest

@cgwalters
Copy link
Member Author

This PR is held - it's probably OK to just leave it as it won't be merging yet. We have more issues to work through.

@cgwalters cgwalters changed the title Add RHCOS oscontainer into payload, render to 00-$role-osimageurl MC WIP: Add RHCOS oscontainer into payload, render to 00-$role-osimageurl MC Jan 16, 2019
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 16, 2019
@kikisdeliveryservice
Copy link
Contributor

Success! Retested using a diff os container and it worked (using image files up to commit 7999a25 )

oc logs -p:
http://pastebin.test.redhat.com/697033

oc logs -f:
http://pastebin.test.redhat.com/697034

@cgwalters
Copy link
Member Author

OK so #301 is definitely happening in some of the CI runs from this PR - but is it actually a new issue? As far as I understand it...e2e-aws could pass just fine if a node went degraded (I think). But let's add a test for that to e2e-aws-op.

The other thing (in the new PR status at the top) is...actually I think I am wrong, there's no "initial master" versus "secondary masters", the installer creates them all at once. Which means they should all consistently do the early pivot.

So I think the design here is going to work.

@kikisdeliveryservice
Copy link
Contributor

@cgwalters IME with AWS clusters (which is what I'm running everything on), while I see the error syncing messages, my nodes do not degrade bc of them. Is this perhaps a libvirt issue?

@cgwalters
Copy link
Member Author

cgwalters commented Jan 16, 2019

while I see the error syncing messages, my nodes do not degrade bc of them.

Yeah; those error messages aren't relevant to the real problem in #301 - I edited the initial message to remove them. The actual problem in #301 is that if a MC is garbage collected while a node is booting targeting it, the MCD will fail to load it on start, and the node will go degraded. And that's what I saw in at least one CI run.

I think we have this in CI but we're not noticing it.
If it's happening we need to fix it.

Ref: openshift#301
@cgwalters
Copy link
Member Author

cgwalters commented Jan 17, 2019

OK I rebased 🏄‍♂️ this, squashed the two commits into one for clarity, did s/Restart=always/Restart=on-failure/, and added a test case (rolled in #319 to avoid conflicts and also since I want that test to run too on this PR).

This is still though waiting for openshift/pivot#25 and the corresponding MCD work, but we can at least run what we have through e2e here.

@cgwalters
Copy link
Member Author

cgwalters commented Jan 17, 2019

--- FAIL: TestNoDegraded (0.10s)
	sanity_test.go:79: 3 degraded nodes found

Well, that's a kind of progress! New test successfully failing. But the cluster got GC'd before I could really debug it...

@abhinavdahiya
Copy link
Contributor

abhinavdahiya commented Jan 17, 2019

Why do i see changes to pkg/server? I'm asking because I see no docs. ;)

Secondly I would live to avoid any more diff that is served by the server from the generated machineconfig contents.

@cgwalters
Copy link
Member Author

cgwalters commented Jan 17, 2019

Why do i see changes to pkg/server? I'm asking because I see no docs. ;)

It's to handle the difference between the "bootimage" (e.g. AMI) versus the CVO-driven payload.

#273 (comment)

Will push this in a bit:

* *Early pivot* - `/etc/rhcos-initial-pivot-target`

    While the Ignition content configures the node, it may actually be booted into an older OS image than is specified by the release payload managed by the [Cluster Version Operator](https://github.com/openshift/cluster-version-operator/).  The MCS writes out the `osImageURL` to this file, and the system will (if necessary) performs an "early pivot" before the node has actually joined the cluster.  This then ensures that when the MachineConfigDaemon starts, it will validate the `currentConfig` (including files written by Ignition and the `osImageURL`).

Secondly I would live to avoid any more diff that is served by the server from the generated machineconfig contents.

How would we pass the target data other than writing a file in Ignition?

@cgwalters
Copy link
Member Author

cgwalters commented Jan 17, 2019

It's to handle the difference between the "bootimage" (e.g. AMI) versus the CVO-driven payload.

And to elaborate, this is now required because of #245 - previously the MCD would have just said "oh it's a config difference" and updated, but the problem with that approach was we could get into reboot loops and other issues as described in that PR's commit message.

(And an advantage of this is that we reboot before node joins the cluster, doesn't make sense to land cluster workloads and then immediately evict them and reboot)

@abhinavdahiya
Copy link
Contributor

Why do i see changes to pkg/server? I'm asking because I see no docs. ;)

It's to handle the difference between the "bootimage" (e.g. AMI) versus the CVO-driven payload.

#273 (comment)

Will push this in a bit:

* *Early pivot* - `/etc/rhcos-initial-pivot-target`

    While the Ignition content configures the node, it may actually be booted into an older OS image than is specified by the release payload managed by the [Cluster Version Operator](https://github.com/openshift/cluster-version-operator/).  The MCS writes out the `osImageURL` to this file, and the system will (if necessary) performs an "early pivot" before the node has actually joined the cluster.  This then ensures that when the MachineConfigDaemon starts, it will validate the `currentConfig` (including files written by Ignition and the `osImageURL`).

Secondly I would live to avoid any more diff that is served by the server from the generated machineconfig contents.

How would we pass the target data other than writing a file in Ignition?

Why is getting served as addition by server not through the machineconfig?

@cgwalters
Copy link
Member Author

Why is getting served as addition by server not through the machineconfig?

You're arguing for having the MCD handle it? Then we'd need to special case the "initial pivot" there - maybe a good way to do that would be to have "bootstrap node annotations exist" as a flag specifying that. That has some advantages; it's more consistent. But one downside as I noted is the node comes online and then reboots right away.

Does anyone else have a strong opinion on this?

@abhinavdahiya
Copy link
Contributor

Why is getting served as addition by server not through the machineconfig?

You're arguing for having the MCD handle it? Then we'd need to special case the "initial pivot" there - maybe a good way to do that would be to have "bootstrap node annotations exist" as a flag specifying that. That has some advantages; it's more consistent. But one downside as I noted is the node comes online and then reboots right away.

Does anyone else have a strong opinion on this?

I think i'm confused, any doc detailing the exact flow that this PR is moving towards will help me judge the approach or even provide feedback (if that matters).. currently i am not flowing whats happening and why :(

Have the MCC take `osImageURL` as provided by the cluster update/release payload
and generate a `00-{master,worker}-osimageurl` MC from it, which ensures
the MCD will update the node to it.

However, we need special handling for the *initial* case where we boot
into a target config, but we may be using an old OS image.

Change the MCC to write the target osImageURL from the MC it uses for
bootstrapping to `/etc/rhcos-initial-pivot-target`.  This will then be
handled by the `rhcos-initial-pivot.service` systemd unit.

Closes: openshift#183
@cgwalters
Copy link
Member Author

No worries; we've gone through a lot of architecture changes here, and there are now a whole lot of comments on this PR that Github is hiding.

Did you see #273 (comment) ?

To rephrase a bit, if we landed the bit of this PR to add osImageURL to the rendered MC without the MCS changes, then if there was any difference between the AMI and the update payload's oscontainer, the MCD would go degraded - because it validates its expected state on boot.

And see #273 (comment) again for the semantics of "validate".

Options:

  • "early pivot" as is implemented by this PR right now; MCS provides target osImageURL so we can pivot before any cluster workloads land (including the MCD).
  • Change the MCD to special case this by noticing it's the "firstboot case" as defined by "picked up node annotations from /etc"

@ashcrow
Copy link
Member

ashcrow commented Jan 18, 2019

/test e2e-aws-op

@openshift-ci-robot
Copy link
Contributor

@cgwalters: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-op 01c309d link /test e2e-aws-op

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ashcrow
Copy link
Member

ashcrow commented Jan 18, 2019

level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply using Terraform"

@ashcrow
Copy link
Member

ashcrow commented Jan 18, 2019

@abhinavdahiya @cgwalters it may be worth chatting about this in a higher bandwidth system

@openshift-ci-robot
Copy link
Contributor

@cgwalters: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 29, 2019
@cgwalters
Copy link
Member Author

This is obsoleted by #363

@cgwalters cgwalters closed this Feb 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Design for osImageURL updates - integration with CVO/release payload
7 participants