Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first draft proposal for apply #3037

Merged
merged 7 commits into from
Feb 4, 2021
Merged

Add first draft proposal for apply #3037

merged 7 commits into from
Feb 4, 2021

Conversation

michaelbeaumont
Copy link
Contributor

@michaelbeaumont michaelbeaumont commented Jan 6, 2021

Description

See #2774
Rendered

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the userdocs directory)
  • Manually tested
  • Added labels for change area (e.g. area/nodegroup), target version (e.g. version/0.12.0) and kind (e.g. kind/improvement)
  • Make sure the title of the PR is a good description that can go into the release notes

Comment on lines 17 to 18
- e.g. if I `apply` with a list of 1 nodegroups, are any other existing
nodegroups deleted?
Copy link
Contributor Author

@michaelbeaumont michaelbeaumont Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes 😃

Ideally by the end the config in general should be smarter, ie a user should be able to change any information and see it happen

Originally posted by @Callisto13

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so too like I wrote in the next section but it's not totally clear whether that's the right "default" (or should be the only option). For example the google terraform provider supports IAM with 3 different levels of authoritativeness.

Comment on lines 19 to 20
- do we also delete non-eksctl managed/created resources? i.e. does `eksctl`
own the cluster by default? (potential for configurability here)
Copy link
Contributor Author

@michaelbeaumont michaelbeaumont Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think deleting would go well for us, but we will have find a new method of "adoption".

There would have to be a way to get an existing cluster's information and render it into a eksctl compatible config which then can be controlled via eksctl from then on.

It does get tricky if:

  • Create non-eksctl cluster
  • Get config, start controlling by eksctl
  • Manually make "some change"
  • What does eksctl do when it sees the new config? By then probably the same thing as it would if the cluster had been "owned" from the start.

Originally posted by @Callisto13

Copy link
Contributor Author

@michaelbeaumont michaelbeaumont Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 being able to generate configs for unowned resources, I mentioned generating configs for imperatively created resources here. Expanding it to also get unowned is a great idea

Originally posted by @aclevername

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a vague memory of there being a risk here... didn't someone say in a call once that it is difficult (maybe impossible) to get a full cluster configuration?

Originally posted by @Callisto13

Comment on lines 64 to 70
we discover an existing nodegroup `ng-1` with the `eksctl.io/owned: true` tag (for example),
do we delete it?

The proposed answer is yes, it should be deleted. With `apply` we assume that all
previous changes initiated by `eksctl` were initiated through the same config
file/reconciliation process and that this nodegroup must have been
deleted from this file.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think while the imperative commands are so easy and feature rich (you have flags for doing a lot of what you can do in the config file) this is probably going to lead to a lot of users accidentally deleting things because they made it imperatively.

I think we can help reduce this by actively driving users towards using apply and away from the imperative commands by:

  • Reducing the functionality of imperative commands. It reminds me of how kubectl's imperative commands are great for beginners and getting introduced to kubectl, but the moment you need to do anything more complex you are forced into a config file, which is great
  • Being able to generate a config file for existing imperatively created resources. This might not be feasible, but for getting folks onboard to apply with existing clusters having a eksctl get config or something to that affect would be great

I know you called out in the intro Existing behavior and commands are unaffected, but I couldn't help myself 😁

Originally posted by @aclevername

Copy link
Contributor Author

@michaelbeaumont michaelbeaumont Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be helped by not supporting imperative commands for a new config. Like once you specify nodegroups in v1alpha6 or whatever, you can no longer run create nodegroup. Or if we have a new kind for reconciliation.

I don't think removing functionality would go over well with anyone

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be helped by not supporting imperative commands for a new config

+1 this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be helped by not supporting imperative commands for a new config

sounds good 💯

I don't think removing functionality would go over well with anyone

I'm definitely biased here, but using apply and config file to me is without a doubt the best way to use eksctl and something that we as a project should push users towards (the later might be a controversial opinion). "pushing" users towards that feels like something worth considering. A greater adoption of apply would also most likely also help get users into a space where using GitOps is more applicable and perhaps open a pathway for them using more of WW's tools. Reducing the imperative commands functionality is definitely a big thing and not something we should do lightly, but I don't think it should be dismissed.

Looking at it from an alternative point of view, if we view v1.0 as a fresh start for eksctl, what would you want it to look like? I'm masively in favour of how kubectl handles this. It has barebone imperative commands for getting beginners active quickly and quickly pushes you towards config files for serious usage.

The above perhaps belongs in a different discussion, but I wanted to make sure my point made sense 😄

Copy link
Contributor

@Callisto13 Callisto13 Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree. v1 lets us do away with all imperative commands in a socially acceptable way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think getting rid of imperative commands has much to do with this proposal/eksctl apply

@aclevername
Copy link
Contributor

Once we have settled on this proposal do we want to open up a v1 proposal to encapsulate this and other (mainly breaking) changes we might want to make

@Callisto13
Copy link
Contributor

Can we add a note to explicitly call out that you should be able to define something in one part of the config, and use it in another (eg. define an iam role name, and use it elsewhere)? The goal is that you can define a full cluster everything will be created in the correct order, so this behaviour is implied, but probably good to call it out.

If we decide, _for example_, to add support for reconciling tags, we might
decide it fits better as a top level key. We would then, with `v1alpha6`, remove
`tags` under `metadata` and make it a top level field. It should be possible to
automatically rewrite the previous version into the current version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the new eksctl apply work with clusters that were created with version v1alpha5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely!

@michaelbeaumont michaelbeaumont added this to the v1.0 milestone Jan 20, 2021
Comment on lines +69 to +70
6. As we expand `apply` support, we reevaluate and update the config structure as
necessary in a new API version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to spec out precisely what an entire cluster config looks like before we start. I feel one of the reasons the current config is incohesive is because it was patched together issue-by-issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we start implementing perhaps, but IMO that's really the next step from this proposal. IMO the main reason the config is incohesive right now is simply because we can't/don't make breaking changes. Perhaps we could sketch out a tentative structure for the entire thing in terms of "let's think about which resources we're managing here" so we can think through cases for apply but I don't think we should wait for the whole entire config to be supported before releasing some version of apply and I think we shouldn't pretend we'll get it right the first time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't pretend we'll get it right the first time.

👍 , totally. I was thinking more on "based on the resources we know we're going to manage"

Comment on lines +123 to +124
Storing some kind of state/config "somewhere" is another option and has been discussed in
https://github.com/weaveworks/eksctl/issues/642
Copy link
Contributor

@Callisto13 Callisto13 Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea (from that linked thread) of representing the full cluster in a ConfigMap in the cluster itself (with a view to use a CRD in the future) and reconciling config changes against that diff. Would be good if we could get more information around the difficulties found in the original exploration and whether this needs to be a "thing" in and of itself:

I'm removing this from the milestone, as MVP is not a simple as I hoped for. We should also create a design document for this actually.

(have asked ilya about it in that thread)

Comment on lines +163 to +164
- **"Parse, don't validate"**: the code that uses the config should be insulated from how we
serialize/deserialize the config.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ ❤️ ❤️

@Callisto13
Copy link
Contributor

could we have a table of contents at the top plz @michaelbeaumont ?

is it possible to also include a goals/non-goals bit at the top so it is super clear what the scope of this proposal and work is?

2. Take full ownership of `eksctl`-managed resources (i.e. delete missing resources by default)
3. Options for non-`eksctl`-managed resources as well as resources where
ownership is unknowable are covered below
4. Changing immutable fields through recreation is the default
Copy link
Contributor

@aclevername aclevername Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line adds a large amount of extra complexity. An example would be disabling OIDC which isn't currently possible (AFAIK), do we also delete all of the IAMServiceAccounts as they wouldn't have been created in the first place if OIDC was false.

A more explosive example would be if I specified a lower version of Kubernetes in my config would we delete the existing cluster and create a new cluster at the lower version, same for nodegroups etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can disable OIDC (not through eksctl), but yes all dependent resources need to be altered/recreated. what else can we do? the user would have to fall back to imperative commands otherwise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have some immutable fields that we don't support changing through reaction. I don't think we should do it lightly, but some fields seem impractical to support.

Copy link
Contributor Author

@michaelbeaumont michaelbeaumont Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in that case the user will then have to just delete and recreate the resources themselves, I think the power of apply is that eksctl recognizes and handles that case. See the example of nodegroups, a big motivator for apply and a duplicate issue. I don't think the user should "ever" have to fallback to imperative commands and I don't see any other recourse if they can't replace it. What should the user do if they can't recreate a resource because of a field?

Copy link
Contributor

@aclevername aclevername Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the nodegroups example makes complete sense to support, I'm not disagreeing, very few fields are worth restricting.

To be more specific about whats worth restricting I think config field changes that would invoke the deletion and recreation of the cluster itself are good examples. They feel very explosive and might not be what the user expects. They might not know that downgrading kubernetes version isn't possible on an existing cluster for example, or other fields like modifying the subnets a cluster uses (AFAIK you can't change them). We do have the apply --plan to display that to them before the destruction occurs, but is implementing these (IMO complex) codepaths worth it when we could just restrict these immutable fields? I might be wrong, but it feels like a discussion worth having

Edit: Out of curiosity how does things like CAPA handle fields like downgrading kubernetes version (or other immutable fields)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was definitely assuming we have apply --plan, for sure. I don't think the codepath needs to be complex, I think we need only know, for each resource, on which other resources it depends.
With CAPA you can't make such changes, but it's running as a controller and there's no great way to handle the "approve" flow with custom resources, at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much we should consider the case of "eksctl as a controller" here I'm definitely not sure about and should be open to discussion IMO!

@aclevername
Copy link
Contributor

What are the req's for merging? I think this is a good baseline proposal, undoubtedly changes will be made over time but it feels like a good start to me

@michaelbeaumont michaelbeaumont added the skip-release-notes Causes PR not to show in release notes label Jan 28, 2021
@Callisto13
Copy link
Contributor

What are the req's for merging? I think this is a good baseline proposal, undoubtedly changes will be made over time but it feels like a good start to me

Yeh it does not need to be perfect now, we just need a baseline. Then we can open separate PRs to argue and embellish the finer points

Copy link
Contributor

@Callisto13 Callisto13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with this as a solid place to build on, nice work 🎉 .
Could you add a "Status: WIP" (or whatever) at the top to be clear on where we are at with it?
The next stage I guess would be to create individual tickets to address key points, questions, unknowns and extra sections, wdyt?

Copy link
Contributor

@aclevername aclevername left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, +1 setting status as WIP

Base automatically changed from master to main February 2, 2021 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip-release-notes Causes PR not to show in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants