Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAS: cloudprovider-specific nodegroupset #6907

Merged

Conversation

jackfrancis
Copy link
Contributor

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This PR moves the cloudprovider-specific implementations of nodegroupset into cloudprovider-managed code areas so that they can be more easily maintained without involving core cluster-autoscaler maintainers.

This should not have any impact on compilation outcomes, the idea is to simply re-organize code.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 7, 2024
@k8s-ci-robot k8s-ci-robot added area/provider/aws Issues or PRs related to aws provider area/provider/azure Issues or PRs related to azure provider area/provider/gce size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 7, 2024
@jackfrancis
Copy link
Contributor Author

/assign @x13n

This follows along from #6634

Not sure why the verify test is failing, UT are passing when I run make test-unit.

@jackfrancis
Copy link
Contributor Author

/retest

2 similar comments
@jackfrancis
Copy link
Contributor Author

/retest

@jackfrancis
Copy link
Contributor Author

/retest

@nojnhuh
Copy link
Contributor

nojnhuh commented Jul 15, 2024

/test pull-cluster-autoscaler-e2e-azure

@jackfrancis jackfrancis force-pushed the nodegroupset-cloudprovider branch 2 times, most recently from 5f9cf96 to 172dcc5 Compare July 15, 2024 22:45
@jackfrancis
Copy link
Contributor Author

@x13n @gjtempleton @MaciekPytel this one is ready for a final review

Copy link
Member

@x13n x13n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for taking so long to review. Looks good, just one minor comment. Feel free to cancel the hold if you disagree.

/lgtm
/approve
/hold

schedulerframework "k8s.io/kubernetes/pkg/scheduler/framework"
)

// CreateAwsNodeInfoComparator returns a comparator that checks if two nodes should be considered
// part of the same NodeGroupSet. This is true if they match usual conditions checked by IsCloudProviderNodeInfoSimilar,
// even if they have different AWS-specific labels.
func CreateAwsNodeInfoComparator(extraIgnoredLabels []string, ratioOpts config.NodeGroupDifferenceRatios) NodeInfoComparator {
func CreateAwsNodeInfoComparator(extraIgnoredLabels []string, ratioOpts config.NodeGroupDifferenceRatios) nodegroupset.NodeInfoComparator {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Since aws is already the package name, maybe rename to just CreateNodeInfoComparator to avoid redundancy? Same for azure & gce.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. A better way would be to define an interface once, but the value of this cleanup work (IMO) doesn't warrant that type of surgery.

Done!

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Sep 6, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 6, 2024
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 6, 2024
@jackfrancis
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 6, 2024
@x13n
Copy link
Member

x13n commented Sep 6, 2024

Thanks!

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 6, 2024
@k8s-ci-robot k8s-ci-robot merged commit 352176c into kubernetes:master Sep 6, 2024
7 of 8 checks passed
@MaciekPytel
Copy link
Contributor

This PR seems to have broken the ability to build CA with just one provider. Specifically:

  • Let's be honest - the bar for adding CA provider is not very high. Provider owners have full autonomy to develop their code.
  • A malicious actor hiding something nefarious in an init module somewhere deep in an obscure cloudprovider is a risk.

For this reason we've introduced https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/builder as the only import path for cloudproviders and have each provider gated on build tags. This allows using build tags to build CA with just the provider you want (and presumably trust) to mitigate the risk above.

This PR I think breaks this mechanism by introducing a new import path for individual cloudprovider implementations. I think the right solution would be a refactor that allows cloudproviders to register their processors via cloudprovider/builder mechanism.
In the meantime I think we should rollback this PR. WDYT @jackfrancis @x13n

note: it is not lost on me that we take a different security stance regarding cloudprovider code and provider-specific processors, but practically speaking the possibility to hide something nasty is vastly different between the two - not least because of set of approvers needed to add a processor today.

@jackfrancis jackfrancis deleted the nodegroupset-cloudprovider branch October 4, 2024 16:20
@jackfrancis
Copy link
Contributor Author

@MaciekPytel I agree 100%, TIL about provider-specific build flows

I'll revert this now.

Do we want to add additional test coverage that enumerates through the set of provider build tags so that we catch this kind of thing in CI next time?

@x13n
Copy link
Member

x13n commented Oct 7, 2024

Yeah, good point. Revert + follow up refactor of cloudprovider/builder to allow compilation for a specific cloud provider makes sense to me.

@x13n
Copy link
Member

x13n commented Oct 7, 2024

And re: CI - that is also reasonable. If we say we support building CA with specific build tags, we should test whether these builds actually work.

@MaciekPytel
Copy link
Contributor

Big +1 to all the comments above.

@jackfrancis
Copy link
Contributor Author

@MaciekPytel @x13n see #7391 and kubernetes/test-infra#33656

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider area/provider/azure Issues or PRs related to azure provider area/provider/gce cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants