Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move rhel7 worker node updates entirely to openshift-ansible #1592

Open
cgwalters opened this issue Mar 26, 2020 · 13 comments
Open

move rhel7 worker node updates entirely to openshift-ansible #1592

cgwalters opened this issue Mar 26, 2020 · 13 comments
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@cgwalters
Copy link
Member

It's really confusing today the role and split between openshift-ansible and the MCO (we need a FAQ for this).

Anyways here's an idea: what if we stopped the MCD running on !CoreOS systems, and instead openshift-ansible always needed to be run to apply changes from MachineConfig. It'd be openshift-ansible that'd somehow trigger drain/reboot etc.

The downside of this is e.g. if a user wants to change the kubelet config, they oc edit but then need to bounce out of Kube down to Ansible+ssh.

But the massive benefit is that it's super clear how things work - the MCO can optimize for CoreOS, openshift-ansible optimizes fully for traditional.

And with PRs like #1586 for example - we can rely on /usr/etc for original files, and do things like make better use of OSTree.

@jlebon
Copy link
Member

jlebon commented Mar 26, 2020

A random idea I discussed with @runcom which is less radical but more cumbersome is we detect the case where the cluster is only RHCOS nodes and activating "OSTree-optimized" mode where we can use things like /usr/etc. But yeah, not fun to maintain multiple paths to do the same thing.

@sdodson
Copy link
Member

sdodson commented Mar 27, 2020

Anyways here's an idea: what if we stopped the MCD running on !CoreOS systems, and instead openshift-ansible always needed to be run to apply changes from MachineConfig. It'd be openshift-ansible that'd somehow trigger drain/reboot etc.

Do you specifically mean running MCD as a daemon or would even the once-from mode be eliminated outside of CoreOS?

/cc @crawford

@cgwalters
Copy link
Member Author

cgwalters commented Mar 27, 2020

Do you specifically mean running MCD as a daemon or would even the once-from mode be eliminated outside of CoreOS?

Just the MCD as daemon.

Well, that only kind of helps. To really make this work we may need to do something more like have code that translates the MCs into Ansible or so, and then that code lives in openshift-ansible.

So it would probably really be both, but we could also move e.g. the MCD once-from code into a Go library or binary that openshift-ansible would use.

@cgwalters
Copy link
Member Author

cgwalters commented Mar 27, 2020

One thing I could imagine is that openshift-ansible owns a "translator" from MachineConfig into Ansible...e.g. if kernelArguments in a MC are set that could be a playbook that calls out to grubby.

But OTOH, maybe the core problem for openshift-ansible to solve is updating host config files that relate to OpenShift (kubelet configs, etc.) and leave basically everything else to external tooling.

@cgwalters
Copy link
Member Author

To elaborate on this, today openshift-ansible applies machineconfig updates - in doing this we'd be making openshift-ansible work the same way the MCO does: we don't distinguish between "configuration change" and "upgrade". And not making that distinction is the correct thing to do because it ensures there's only one code path which needs to work.

@cgwalters
Copy link
Member Author

One other thing about this is that if we stop running the MCD on rhel7, then we can move the MCD container image to RHEL8, which would make it significantly saner to also extract the binary to the host and run it there.

@michaelgugino
Copy link
Contributor

The suggestions here seem to be going in the complete opposite direction. We should be making efforts to manage more within the cluster, not pushing more into openshift-ansible.

@cgwalters
Copy link
Member Author

The suggestions here seem to be going in the complete opposite direction. We should be making efforts to manage more within the cluster, not pushing more into openshift-ansible.

But people using BYO RHEL7 are presumably using it precisely because they have nontrivial investment in existing external management systems, non-containerized system software agents etc. That just directly conflicts with moving things into the cluster and containerizing - the more you do that the more the question becomes "why aren't you using RHCOS"?

@michaelgugino
Copy link
Contributor

But people using BYO RHEL7 are presumably using it precisely because

I don't agree with your conclusions. I think the primary reason to use RHEL7 is hardware/software/compliance that is only certified/works for/on RHEL7. That's why we only support RHEL7 workers, to enable workloads that wouldn't other wise be possible.

IMO, there's no reason we can't support RHEL with MCO/MCD the same way we support RHCOS.

@cgwalters cgwalters changed the title move config updates entirely to openshift-ansible move rhel7 worker node updates entirely to openshift-ansible Sep 25, 2020
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 24, 2020
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 24, 2021
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link
Contributor

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kikisdeliveryservice kikisdeliveryservice added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jul 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

7 participants