Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: remove all references to linking #48

Merged
merged 6 commits into from
Dec 22, 2023
Merged

Conversation

rakechill
Copy link
Contributor

@rakechill rakechill commented Nov 28, 2023

Description
With the API update from machine -> nodeclaim, the karpenter provider removed the linking controller completely.

How was this change tested?

  • gc tests still behaving correctly
  • e2e suites (running in the PR)
  • scale up/scale down works correctly

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note


@rakechill rakechill changed the title [WIP] chore: Remove all references to linking [WIP] chore: remove all references to linking Nov 28, 2023
@rakechill rakechill changed the title [WIP] chore: remove all references to linking chore: remove all references to linking Nov 28, 2023
@rakechill rakechill marked this pull request as ready for review November 28, 2023 19:31
Copy link
Contributor Author

@rakechill rakechill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

@Bryce-Soghigian
Copy link
Collaborator

https://github.com/kubernetes-sigs/karpenter/blob/main/pkg/controllers/nodeclaim/lifecycle/launch.go#L59 looks like we dont support launchign nodeclaims with linking anymore in karpenter core so I am all for this change.

Do you know what version of karpneter core it was removed? Was it v0.33 or v0.32?

@rakechill
Copy link
Contributor Author

https://github.com/kubernetes-sigs/karpenter/blob/main/pkg/controllers/nodeclaim/lifecycle/launch.go#L59 looks like we dont support launchign nodeclaims with linking anymore in karpenter core so I am all for this change.

Do you know what version of karpenter core it was removed? Was it v0.33 or v0.32?

@Bryce-Soghigian I'm looping back to this now + rebased from main. I feel fairly confident we don't need it. However, I still can't find the exact reason why the karpenter aws provider decided to remove this. They claimed it had to do w/ removing machineutils in karpenter-core, but I'm not entirely clear why that would have made linking unnecessary.

The two relevant PRs seem to be:

And they were both in v0.33.0

Copy link
Contributor Author

@rakechill rakechill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

@Bryce-Soghigian
Copy link
Collaborator

https://github.com/kubernetes-sigs/karpenter/blob/main/pkg/controllers/nodeclaim/lifecycle/launch.go#L59 looks like we dont support launchign nodeclaims with linking anymore in karpenter core so I am all for this change.
Do you know what version of karpenter core it was removed? Was it v0.33 or v0.32?

@Bryce-Soghigian I'm looping back to this now + rebased from main. I feel fairly confident we don't need it. However, I still can't find the exact reason why the karpenter aws provider decided to remove this. They claimed it had to do w/ removing machineutils in karpenter-core, but I'm not entirely clear why that would have made linking unnecessary.

The two relevant PRs seem to be:

And they were both in v0.33.0

@jackfrancis can you provide any insight here before we merge this?

@tallaxes
Copy link
Collaborator

tallaxes commented Dec 22, 2023

The way I see it, linking is/was about Karpenter adopting instances that it did not provision, potentially useful for migrating existing clusters to Karpenter (or maybe incompatible Karpenter versions). Removing this means either that these scenarios are considered less important, or they (possibly in some cases) did not actually work well, or the implementation itself was lacking (e.g. the separation between cloud provider and core responsibilities, including annotations being used, was not clean enough) - and could be revisited in the future. (Asked for confirmation / further insight on karpenter-dev)

In any case, lets remove it and revisit later, if needed, together with required core changes.

@tallaxes tallaxes merged commit e228c3a into main Dec 22, 2023
15 checks passed
@tallaxes tallaxes deleted the rakechill/remove-link branch December 22, 2023 20:26
@jonathan-innis
Copy link

jonathan-innis commented Dec 22, 2023

the karpenter provider removed the linking controller completely

We dropped this from our controllers since this logic was never intended to be added in as a mechanism for Karpenter to re-own instances that it didn't launch. This mechanism was put into place when Machines were introduced in v1alpha5 (now NodeClaims in v1beta1) and was intended to take nodes that were launched and owned by Karpenter, hydrate a Machine resource on the cluster for that Node, mark that that Machine shouldn't be launched but should be discovered through the linked annotation and then resolve all of its data.

We didn't want the overhead of managing the complexity of this flow anymore (since there's some gating and coordination that has to happen to ensure that Machines aren't launched and you want to make sure that you don't mess with certain things on the node since they've already existed for a bit). It also makes reasoning about ordering more complex.

Dropping the linked behavior means that we can now say that NodeClaims are created before Nodes always and that a NodeClaim will always exist on the cluster for a Karpenter-launched node.

@jonathan-innis
Copy link

I think if we want to talk about Karpenter re-owning existing nodes, we can start a design discussion about this or open an issue on kubernetes-sigs/karpenter to talk about the ramifications of doing this, but the stance up to this point has been to just ask users to deprovision old instances and let Karpenter launch new ones, rather than having to reason about the edge behavior that might happen if we tag and re-own an instance that was launched by a different system.

@rakechill rakechill mentioned this pull request Dec 22, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants