Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change bootstrap to pivot #2542

Open
cgwalters opened this issue Oct 21, 2019 · 21 comments
Open

change bootstrap to pivot #2542

cgwalters opened this issue Oct 21, 2019 · 21 comments
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects

Comments

@cgwalters
Copy link
Member

cgwalters commented Oct 21, 2019

See this thread, specifically this comment.

TL;DR - today the installer launches a bootimage, which is usually the pinned RHCOS version (for IPI installs), but can be different - see this issue which is about making it easier for people to find the correct bootimage.

See also #2532

Filing this issue to track changing the installer to do a pivot on the bootstrap node.

I think the architecture would look like a new bootstrap-pivot.service between release-image.service and bootkube.service - we'd run the MCD on the bootstrap host, telling it to pivot to the target machine-os-content.

In fact...if we did this, we could streamline down the bootimages - e.g. no reason to ship kubelet/cri-o in the bootimages. (Though, we'd need new terminology like "bootRHCOS" to distinguish from "normal" RHCOS in machine-os-content or so?) Anyways, not required for this change.

@wking
Copy link
Member

wking commented Oct 21, 2019

Though, we'd need new terminology like "bootRHCOS"...

Red Hat Pivot OS? All this OS does is pivot to an OSTree pulled from a container image. It is not certified for any otger purpose.

@vrutkovs
Copy link
Member

if we did this, we could streamline down the bootimages

That would be very helpful for OKD-on-FCOS, however that would require a few changes to RHCOS oscontainer build process.

@vrutkovs vrutkovs added this to In Progress in OKD4 Oct 23, 2019
@vrutkovs vrutkovs moved this from In Progress to To Do in OKD4 Oct 23, 2019
@LorbusChris
Copy link
Member

@vrutkovs @cgwalters this is implemented by Vadim's commit here, right? c7da474

Should we try to get this into master soon? (i.e. before tackling spec3 for OCP)

vrutkovs added a commit to vrutkovs/installer that referenced this issue Oct 24, 2019
Currently every machine instance we launch uses the same "bootimage", 
including the bootstrap host. However, everything except bootstrap (i.e. 
control plane and workers) replace their OS content with the 
machine-os-content from the release payload before joining the cluster.

For more information, see: 
https://github.com/openshift/machine-config-operator/blob/master/docs/OSUpgrades.md

This changes the bootstrap host to do the same, which will help avoid 
issues from "bootimage drift".

Closes: openshift#2542
vrutkovs added a commit to vrutkovs/installer that referenced this issue Oct 24, 2019
Currently every machine instance we launch uses the same "bootimage", 
including the bootstrap host. However, everything except bootstrap (i.e. 
control plane and workers) replace their OS content with the 
machine-os-content from the release payload before joining the cluster.

For more information, see: 
https://github.com/openshift/machine-config-operator/blob/master/docs/OSUpgrades.md

This changes the bootstrap host to do the same, which will help avoid 
issues from "bootimage drift".

Closes: openshift#2542
@abhinavdahiya
Copy link
Contributor

pullling the machine-os-content on the bootstrap host and pivoting adding downloading 800MB into the critical part of bootstrapping, and also causing a reboot of the bootstrap-host.

the bootstrap-host, doesn't use/need close tie to openshift binaries as control-plane host.
we only use kubelet to run static pods, podman to run some pods.

So i don't get why there is requirement for pivot on the bootstrap-host??

@vrutkovs
Copy link
Member

vrutkovs commented Nov 6, 2019

bootstrap node would use kubelet/crio/machine-config-daemon from original AMI. That means fixes to these components would not be applied during bootstrap phase - that might be critical for some deployments

@wking
Copy link
Member

wking commented Nov 6, 2019

bootstrap node would use kubelet/crio/machine-config-daemon...

Is there an MCD baked into RHCOS? I'd be surprised if we ran one on the bootstrap machine. We certainly extract machine-config components from the target release image and run them, but the only bootimage exposure in that is podman/kubelet/crio (and I have no cost/benefit opinion of pivoting for those ;).

@cgwalters
Copy link
Member Author

So i don't get why there is requirement for pivot on the bootstrap-host??

First, this helps OKD which will use FCOS, which won't include a kubelet by default (at least, not right now).

Second, it does help avoid "bootimage drift" issues with the installer as noted also in the initial comment.

Further, the initial comment links to openshift/enhancements#78 (comment) - so perhaps to disintermediate we can summon @smarterclayton

Is there an MCD baked into RHCOS?

Yes.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 20, 2020
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 21, 2020
@cgwalters
Copy link
Member Author

/remove-lifecycle stale

@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link
Contributor

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@LorbusChris
Copy link
Member

/reopen

@openshift-ci-robot
Copy link
Contributor

@LorbusChris: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@LorbusChris
Copy link
Member

/remove-lifecycle rotten

@openshift-ci-robot openshift-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 21, 2020
@ashcrow
Copy link
Member

ashcrow commented May 13, 2020

I'd love to see this happen. It would get us closer to boot images being "basic boot and pivot to expected content" in both the installer (bootstrap node) and the hosts themselves (which is the case today).

@cgwalters
Copy link
Member Author

With some nontrivial but also not extremely difficult work, we could change the OS update stack to support an "update and restart all of userspace, but not the kernel" semantic which would shave some time off this.

@cgwalters
Copy link
Member Author

pullling the machine-os-content on the bootstrap host and pivoting adding downloading 800MB into the critical part of bootstrapping

One thing also - it can't be that hard to teach the bootstrap host how to serve the images it pulled to the control plane - so if we did that it would reduce the 3 separate pulls of m-o-c from the upstream registry to one. (And similar for other images)

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 30, 2020
@LorbusChris
Copy link
Member

/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2020
@travier
Copy link
Member

travier commented Nov 6, 2020

/lifecycle frozen

@openshift-ci-robot openshift-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Nov 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
OKD4
  
To Do
9 participants