Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud agents #12

Closed
cgwalters opened this issue Jul 12, 2018 · 9 comments
Closed

cloud agents #12

cgwalters opened this issue Jul 12, 2018 · 9 comments

Comments

@cgwalters
Copy link
Member

cgwalters commented Jul 12, 2018

Today, Container Linux uses an OEM partition for various cloud agents (e.g. GCE). For Fedora Atomic, we never created such a thing and mostly limped along with the (very limited) support that cloud-init has for different sites. The only exception here is that for RHEL Atomic Host we did make a VMWare agent container.

The architecture for Fedora CoreOS calls for us to close to CL here (Ignition + coreos-metadata) but that doesn't answer the larger cloud agent problem.

A known major issue with the CL approach is that there is no update mechanism for the OEM partition.

We have a few options, and we can consider different strategies per cloud.

  • Layering it on as a package just for that cloud
  • Layering but not updating it (i.e. we don't engage the rpm-md machinery)
  • separate ostree streams per cloud
  • Rkt/atomic system containers style
  • Statically linked binary in /opt
@bgilbert
Copy link
Contributor

The CL model is to ship a different install image per platform, with common root and /usr partitions but a platform-specific OEM partition containing the agent (if any). Then, for updates, we ship a /usr partition and kernel. That means we can have a single update payload per CPU architecture per release, but conversely we can't update the OEM partition. That causes practical problems (sometimes we have to fix OEM bugs by putting ad-hoc code in coreos-postinst to modify the OEM partition) as well as problems of principle (we can't actually update everything we ship). So I'd say universal automatic updating is a necessity for FCOS, but it'd be good to avoid having separate update streams for each platform.

One other thing we've found: cloud agents are often not very necessary and are sometimes not great code. In many cases they bundle lots of additional functionality which ranges from potentially useful to irrelevant to actively harmful -- such as the ability to manage OS functionality that doesn't even exist on CL, or the ability for the platform to run arbitrary code on the machine. We can't eliminate agents entirely, since some platforms require their agent to report a successful boot before they'll allow the user to interact with the machine. In many cases, though, we should be able to implement a minimal cross-platform agent ourselves, e.g. as part of coreos-metadata.

In early internal discussions, there was a rough consensus around the following:

Short term: Do not build any functionality equivalent to the OEM partition, do not try to ship substantively different images for different platforms, and do not try to install agents via layers or containers. Ship any and all agents as part of the OS, launch them conditionally depending on the current platform, and live with the extra storage overhead.

Long term: Replace the platform agents, where possible, with our own minimal implementations. Ship those as part of the OS.

Thoughts?

@cgwalters
Copy link
Member Author

Ah right. I'd forgotten about that discussion. Yes, baking them all in and doing conditional launching is also a pretty simple way to do things.

@ajeddeloh
Copy link
Contributor

I am strongly in favor of avoiding shipping agents whenever possible. My dream is that instead of trying to expose all the little odds and ends of clouds (e.g. oslogin on gce) we try to make it as similar across clouds as possible. Running FCOS on gce should be the same as on aws and bare metal. Not only is this easier to manage from a development point of view, it makes FCOS more consistent across cluods. You have "the FCOS way" of adding users, not "The FCOS way, or the gce way, or the aws way, etc".

@cgwalters
Copy link
Member Author

You have "the FCOS way" of adding users, not "The FCOS way, or the gce way, or the aws way, etc".

The clouds are going to dislike us for that, but I think I agree. In the end...for the clouds having "nicer/integrated" ways for users to manage guest OSes is sort of a previous battleground anyways, now it's all about services.

@ajeddeloh
Copy link
Contributor

Yeah I agree they won't like that, but I think we that's a battle worth fighting with the clouds. We can be explicit that "If you special cloud bits, FCOS is not for you".

@eparis
Copy link

eparis commented Jul 12, 2018

Or even better, FCOS is for you. Ship your agent as a container and it will work!

@dustymabe
Copy link
Member

Short term: Do not build any functionality equivalent to the OEM partition, do not try to ship substantively different images for different platforms, and do not try to install agents via layers or containers. Ship any and all agents as part of the OS, launch them conditionally depending on the current platform, and live with the extra storage overhead.

Long term: Replace the platform agents, where possible, with our own minimal implementations. Ship those as part of the OS.

👍 👍

@dustymabe
Copy link
Member

considering this to be decided then. will close

@dustymabe
Copy link
Member

FYI this made it into the design doc in #40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants