Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Azure VMs Pool] Support mixed agentpool types in Azure Cache #6689

Conversation

wenxuan0923
Copy link
Contributor

@wenxuan0923 wenxuan0923 commented Apr 4, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Use Azure cache to track VMs pools. As of today, CAS can only be used for single type of agentpool - either vmss or standard. We need to support vms pool and mixed agentpool types (vmss + vms). During the cache generation, we will fetch all resources regardless of the vmType, and also track the vms pools using a set (implemented with golang map) so we can distinguish between vmss vm instances and vms instances.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Right now the file azure_vms_pool.go is just a place holder without real implementation.

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 4, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @wenxuan0923. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 4, 2024
@k8s-ci-robot k8s-ci-robot requested review from nilo19 and x13n April 4, 2024 03:45
@k8s-ci-robot k8s-ci-robot added the area/provider/azure Issues or PRs related to azure provider label Apr 4, 2024
Comment on lines +88 to +93
// GetOptions returns NodeGroupAutoscalingOptions that should be used for this particular
// NodeGroup. Returning a nil will result in using default options.
func (agentPool *VMsPool) GetOptions(defaults config.NodeGroupAutoscalingOptions) (*config.NodeGroupAutoscalingOptions, error) {
// TODO(wenxuan): Implement this method
return nil, cloudprovider.ErrNotImplemented
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@wenxuan0923 wenxuan0923 Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'm aware. This whole file is now mostly place holders without any real implementation. I will add the required methods in the following PRs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern. Thank you.

@@ -137,7 +137,24 @@ func (m *AzureManager) fetchExplicitNodeGroups(specs []string) error {
return nil
}

// this pattern is for vms pool only "<minSize>:<maxSize>:<name>:vms"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the name of the nodepool?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see from reading tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this spec change so we don't expose AKS internal terms to the public

@@ -32,6 +34,36 @@ type NodeGroupSpec struct {
MaxSize int `json:"maxSize"`
// Specifies whether this node group can scale to zero nodes.
SupportScaleToZero bool
// specifies the agentpool type, used for vms pool only
Copy link
Member

@Bryce-Soghigian Bryce-Soghigian Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agentpool and vm pool is an AKS specific concept. config/dynamic falls outside of our cloudprovider, so when adding changes we should consider how the behavior will affect other cloud providers.

I wonder if we can call it PoolType instead since it will the the homogenous type of the pool, and that seems more cloud neutral vs vmtype which means something to aks but not anyone else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that makes sense! okay I will use a different approach them.

: change to use azure cache to track vms pools

@wenxuan0923 wenxuan0923 changed the title [Azure VMs Pool] Adding new spec pattern for vms pool [WIP][Azure VMs Pool] Adding new spec pattern for vms pool Apr 4, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 4, 2024
@wenxuan0923 wenxuan0923 changed the title [WIP][Azure VMs Pool] Adding new spec pattern for vms pool [WIP][Azure VMs Pool] Support mixed agentpool types when generating Azure cache Apr 4, 2024
@wenxuan0923 wenxuan0923 changed the title [WIP][Azure VMs Pool] Support mixed agentpool types when generating Azure cache [WIP][Azure VMs Pool] Support mixed agentpool types Apr 4, 2024
@wenxuan0923 wenxuan0923 changed the title [WIP][Azure VMs Pool] Support mixed agentpool types [WIP][Azure VMs Pool] Support mixed agentpool types in Azure Cache Apr 5, 2024
@wenxuan0923 wenxuan0923 changed the title [WIP][Azure VMs Pool] Support mixed agentpool types in Azure Cache [Azure VMs Pool] Support mixed agentpool types in Azure Cache Apr 5, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 5, 2024
vmPoolName := tags[agentpoolNameTag]
// fall back to legacy tag name if not found
if vmPoolName == nil {
vmPoolName = tags["poolName"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add this as a const as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -51,6 +51,7 @@ type azureCache struct {
// Cache content.
resourceGroup string
vmType string
vmsPoolMap map[string]struct{} // track the nodepools that're vms pool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a more defined type as opposed to just using struct? VMpools? I observe that we have defined this inside azure_vm_pool.go

Copy link
Contributor Author

@wenxuan0923 wenxuan0923 Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map[string]struct{} is just being used as set here, since golang doesn't have set data structure. It is a common pattern as you can see many of such usage in rp code base.
The purpose is only to be able to find out if an agentpool is vms pool, given its name, so we don't need to use any specific data type here.

We are setting value like this:

vmsPoolMap[to.String(vmPoolName)] = struct{}{}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Bryce, the Set in apimachinery is also a map, so I don't think it's necessary:

image

Copy link
Member

@Bryce-Soghigian Bryce-Soghigian Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All go implementations of set will use map[T]EmptyStruct for zero bytes.

https://pkg.go.dev/k8s.io/apimachinery/pkg/util/sets#Set is the location of the set package though that works with all types. Set in pkg/fields isn't one i have heard of before.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the field to vmsPoolSet to make it much more clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

Comment on lines +88 to +93
// GetOptions returns NodeGroupAutoscalingOptions that should be used for this particular
// NodeGroup. Returning a nil will result in using default options.
func (agentPool *VMsPool) GetOptions(defaults config.NodeGroupAutoscalingOptions) (*config.NodeGroupAutoscalingOptions, error) {
// TODO(wenxuan): Implement this method
return nil, cloudprovider.ErrNotImplemented
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern. Thank you.

// number is different from the number of nodes registered in Kubernetes.
func (agentPool *VMsPool) TargetSize() (int, error) {
// TODO(wenxuan): Implement this method
return -1, cloudprovider.ErrNotImplemented
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming we'll be implementing all of these in follow-up PRs?

Copy link
Contributor Author

@wenxuan0923 wenxuan0923 Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's right, to implement all the methods we will need the agentpool client PR to be merged first.

I'm trying to keep the PR small and self-contained, so it's easier to review. There will be more PRs coming next.

@gandhipr
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 10, 2024
@wenxuan0923
Copy link
Contributor Author

/retest

@k8s-ci-robot
Copy link
Contributor

@wenxuan0923: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Bryce-Soghigian
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 10, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tallaxes, wenxuan0923

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 12, 2024
@k8s-ci-robot k8s-ci-robot merged commit a7b4386 into kubernetes:cluster-autoscaler-release-1.29 Apr 12, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants