Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing API types pulls in lots of dependencies #9011

Open
22 of 25 tasks
JoelSpeed opened this issue Jul 18, 2023 · 46 comments
Open
22 of 25 tasks

Importing API types pulls in lots of dependencies #9011

JoelSpeed opened this issue Jul 18, 2023 · 46 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@JoelSpeed
Copy link
Contributor

JoelSpeed commented Jul 18, 2023

What steps did you take and what happened?

Create a simple program, as below and run go mod tidy:

package main

import (
	"fmt"

	capiv1beta1 "sigs.k8s.io/cluster-api/api/v1beta1"
)

func main() {
	myCluster := &capiv1beta1.Cluster{}

	fmt.Printf("My Cluster: %+v\n", myCluster)
}

This results in a go.mod with 56 dependencies. Included in this list is controller-runtime which, is also imported by a lot of consumers of the API types. controller-runtime has evolved a lot over time and, because it is being imported by the APIs currently, means that any project consuming the API types must also import the same controller-runtime as the rest of CAPI.

There are use cases where, building a consumer, I do not need to use any of the code (defaulting, validation etc) from CAPI and as such, do not need such a restriction on my controller-runtime version.

If we modify the way the SchemeBuilder works and move the webhooks to a separate package, then we can reduce the required imports for the API package down to just 31 dependencies, and, these dependencies do not include controller-runtime or other fast moving/changing packages. There are still obviously a good number of deps in there from K8s paths, but these are mostly API machinery related and have pretty stable APIs.

I think if we can separate the webhooks out in this way (I notice Cluster and ClusterClass already are?) then I think this would make consuming these API types in other projects a lot easier.

Note how k8s.io/api does not include much code with the API types for this reason as well.

The alternative solution for consumers is to copy/paste the types out to their own repos which is less ideal.

What did you expect to happen?

API types should not be tied to many dependencies, especially not controller-runtime.

Cluster API version

main branch

Kubernetes version

N/A

Anything else you would like to add?

No response

Label(s) to be applied

/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

TODO

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 18, 2023
@vincepri
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 18, 2023
@vincepri
Copy link
Member

This all sounds good, a few additional comments would be:

  • We should have a clear way on how to structure webhooks in our API packages across Cluster API and its ecosystem of providers
  • Separate the SchemeBuilder changes from the webhook ones
  • Open an issue in Kubebuilder to potentially change the default way new projects or packages are generated

Thoughts?
cc @CecileRobertMichon @fabriziopandini @sbueringer

@JoelSpeed
Copy link
Contributor Author

We should have a clear way on how to structure webhooks in our API packages across Cluster API and its ecosystem of providers

Agreed, I don't have context on why Cluster and ClusterClass are separate, but, their pattern works well. Do you think we need to have versioned defaulting/validation? If so, does it make sense to have the webhooks in versioned packages as in my POC branch?

If they don't need to be versioned, having them centrally as for Cluster and ClusterClass would work.

Separate the SchemeBuilder changes from the webhook ones

Ack yes, I can make a PR for this separately so it's considered as a different part of the same effort


I think it would be good to have a consistent approach across CAPI and the related repos, if there is a pattern we can come up with that we can document then I would like to also pursue updating the provider APIs to mitigate this issue there as well

@sbueringer
Copy link
Member

Agreed, I don't have context on why Cluster and ClusterClass are separate, but, their pattern works well. Do you think we need to have versioned defaulting/validation? If so, does it make sense to have the webhooks in versioned packages as in my POC branch?

The reason why Cluster and ClusterClass are separate is the following:

  • we couldn't just implement the Validate / Default methods on the types directly because we needed a client in those webhooks
  • So we already had to use CustomDefaulter / CustomValidator
  • I didn't want to add even more dependecies to our API package so I suggested to move them to a separate package

I agree that we can just follow the same pattern for all webhooks.

If they don't need to be versioned, having them centrally as for Cluster and ClusterClass would work.

The validate / default funcs / webhooks don't have to be versioned. Whenever we introduced a new apiVersion we moved the methods from the previous apiVersion to the new one :).

So overall sounds good to me. I'm absolutely in favor!

@CecileRobertMichon
Copy link
Contributor

+1

Agree anything we can do to limit importing controller-runtime only where necessary is an improvement

@enxebre
Copy link
Member

enxebre commented Jul 19, 2023

+1, thanks for creating this Joel. I find having the ability to vendor the API cleanly critical for consumers and to ease adoption of the project.
I think reducing the gomod hard dep on webhooks and solve it generically makes total sense and is a step in the right direction.
For ref old discussion on this topic.

@sbueringer
Copy link
Member

sbueringer commented Jul 19, 2023

We discussed this issue at the backlog grooming on Oct 29th 2021, the consensus was that for now we're not able to fully decouple our API types from the controller runtime dependencies due to conversion webhooks which are strictly tied to our types.
#3465 (comment)

Just to double-check. I think I might have brought this up at the time. If I now take a look at our current code this doesn't seem to be an issue as as we only depend on "sigs.k8s.io/controller-runtime/pkg/conversion" (for conversion.Hub) in old apiVersions. So it won't be a problem for cleaning up dependencies of the latest apiVersion (which I assume is our main goal).

@fabriziopandini
Copy link
Member

+1 from my side, thanks for reviving this discussion @JoelSpeed!

@JoelSpeed
Copy link
Contributor Author

I did a bit of work on the code for this today and came across a small issue.

Currently the package internal/test/builder imports the defaulting functions from clusterv1, if we move these to webhooks, it creates an import cycle because the tests are importing the internal/test/builder to run tests on the webhooks.

I can see 3 possible paths forward:

  • Keep the Default() functions inside the clusterv1 package and require that they do not use any import from controller-runtime (not ideal, don't know that this is even possible)
  • Remove the defaulting bit from the builder (it's for MHC and MD) and handle the defaulting in the tests where those are needed (there are 9 instances for MHC, 1 for MD)
  • Try to move the webhook tests to a separate go package, webhooks_test, which may solve the issue as well (loses access to private functions)

Any preferred method from those or alternative suggestions from anyone?

@killianmuldoon
Copy link
Contributor

Remove the defaulting bit from the builder (it's for MHC and MD) and handle the defaulting in the tests where those are needed (there are 9 instances for MHC, 1 for MD)

From a quick look it should be relatively easy to move the defaulting to the tests themselves rather than the builders. There's only a few instances of this usage. @chrischdi - I think you added this when moving these tests to envtest, is there something I'm not thinking of that could stand in the way?

@JoelSpeed
Copy link
Contributor Author

Remove the defaulting bit from the builder (it's for MHC and MD) and handle the defaulting in the tests where those are needed (there are 9 instances for MHC, 1 for MD)

Going to play with this idea then, will report back if I hit more issues. Expect some PRs over the coming days

@sbueringer
Copy link
Member

sbueringer commented Jul 19, 2023

If I got it right we already had and solved this issue before. We have a separate test package here for those kind of tests: https://github.com/kubernetes-sigs/cluster-api/tree/3e16abc3df71314bbaee770846beba45c0eb170d/internal/webhooks/test

EDIT: My bad. That's a separate issue, but maybe you'll hit that one as well :). Agree with Killian for your current problem
EDIT2: Okay not I'm entirely sure, maybe the solution works the same way

@chrischdi
Copy link
Member

Remove the defaulting bit from the builder (it's for MHC and MD) and handle the defaulting in the tests where those are needed (there are 9 instances for MHC, 1 for MD)

From a quick look it should be relatively easy to move the defaulting to the tests themselves rather than the builders. There's only a few instances of this usage. @chrischdi - I think you added this when moving these tests to envtest, is there something I'm not thinking of that could stand in the way?

Nope, looks good. At the time of implementing it was just the most straightforward one. Should be totally fine to default at the test and remove that from the builder 👍

@sbueringer because we just chatted about the depguard linter: maybe that one would be useful if the config is flexible enough, to prevent regressions if this issue got solved 🙂

@JoelSpeed
Copy link
Contributor Author

I've just posted two PRs to show the progress I've made so far on the main Cluster v1beta1 APIs, the first of which for scheme builder should be ready for review, the second I think is ready for review apart from I know I have to sort the MachineDeployment defaulting in tests, otherwise I think that should be fairly good to go

@JoelSpeed
Copy link
Contributor Author

Both PRs are now ready for review and, combined, clear the deps for the main APIs (Machine, MS, MD, Cluster etc). Will start to look at the exp APIs next

@JoelSpeed
Copy link
Contributor Author

Have added a TODO list with my initial thoughts on scope here, any additional points we need for this issue?

@sbueringer
Copy link
Member

Seems ~ good to me. Does your list cover all those APIs?

stable
./api
./controlplane/kubeadm/api
./bootstrap/kubeadm/api

exp
./exp/api
./exp/addons/api
./exp/ipam/api
./exp/runtime/api

clusterctl
./cmd/clusterctl/api (only one usage of scheme.Builder)

test
./test/infrastructure/docker/api
./test/infrastructure/docker/exp/api
./test/infrastructure/inmemory/api

Totally fine to focus on stable + exp + maybe clusterctl (because it's a quickwin) and then create a follow-up issue for test (I would mostly migrate test as well for consistency across the entire code base)

P.S. I'll try to find some time to review your PRs soon. Pretty low bandwidth at the moment

@JoelSpeed
Copy link
Contributor Author

I've updated the list to be very explicit, will keep track of the PRs in the list as we go along as well

@sbueringer
Copy link
Member

sbueringer commented Aug 3, 2023

@killianmuldoon Do you have some time to replicate this issue for CAPV and work on it going forward? (implementation is not urgent, won't make it / doesn't have to make it into CAPV v1.8)

@killianmuldoon
Copy link
Contributor

@killianmuldoon Do you have some time to replicate this issue for CAPV and work on it going forward? (implementation is not urgent, won't make it / doesn't have to make it into CAPV v1.8)

By that you mean removing the controller-runtime package from the CAPV API packages?

@sbueringer
Copy link
Member

By that you mean removing the controller-runtime package from the CAPV API packages?

Basically implementing the same in CAPV that Joel is implementing in core CAPI. So yes, if I got your question correctly :)
(although what Joel is implementing only removes the dependency from the v1beta1 packages, but that's enough)

@killianmuldoon
Copy link
Contributor

Sure thing - I'll create an issue and assign it to myself.

@sbueringer
Copy link
Member

Sounds good. Sorry didn't see it

@Ankitasw
Copy link
Member

Will pick .exp/addons/api next

@sbueringer
Copy link
Member

Let's do a final round at the end to add import restrictions everywhere. I forgot to ask for them at least a few times and we don't have them in the packages we added earlier. Should be good enough to just add them once we moved everything.

@Ankitasw
Copy link
Member

Will pick ./test/infrastructure/docker/api now

@Ankitasw
Copy link
Member

Also picking ./test/infrastructure/docker/exp/api and ./test/infrastructure/inmemory/api

@sbueringer
Copy link
Member

Thank you very much!

@sbueringer
Copy link
Member

Thx everyone so far. As far as I can tell we're now down to Ankita's enforce import restriction PR (#9461) and docs

@sbueringer
Copy link
Member

Nice work everyone!!

As far as I can tell we should be done - apart from documentation.

@JoelSpeed Would be good if you can maybe do a final verification if the transitive dependencies look good from your side (in case you didn't do it already)

@JoelSpeed
Copy link
Contributor Author

@JoelSpeed Would be good if you can maybe do a final verification if the transitive dependencies look good from your side (in case you didn't do it already)

Based on JoelSpeed/capi-api-import-test@8800fd3 we are doing pretty well across the board, however, the Addons API, which I added separately, JoelSpeed/capi-api-import-test@8c6aa3d, adds a bunch of deps including controller runtime back in.

I see the import restrictions are in place for the addons, I guess it doesn't work transitively?

@JoelSpeed
Copy link
Contributor Author

Seems we are importing sigs.k8s.io/cluster-api/util here which is bringing in the additional imports, even though the code it uses within that module doesn't itself use CR

@killianmuldoon
Copy link
Contributor

Yeah - looks like we need a refactor of some of that code - I'll have a quick go and see if it's straightforward.

@fabriziopandini
Copy link
Member

Yeah - looks like we need a refactor of some of that code - I'll have a quick go and see if it's straightforward.

If it is only for one function, let's consider duplicating it

@sbueringer
Copy link
Member

Added a new sub task:

[ ] Add verification to avoid regressions (e.g. https://github.com/kubernetes-sigs/cluster-api/pull/9482#issuecomment-1730060650)

@fabriziopandini
Copy link
Member

@sbueringer @enxebre should we close this issue? the development work is completed and we have our code as a documentation/example for how to do the same in providers (if they want to)

@sbueringer
Copy link
Member

I think we have to implement " Add verification to avoid regressions" otherwise we have 0 protection against transitive dependencies sneaking in. Which can happen very easily.

One idea how this can be done was suggested here: #9482 (comment) (same PR shows how easy it is to get transitive dependencies)

@fabriziopandini
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 12, 2024
@fabriziopandini fabriziopandini added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests