Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Azure VMs pool] Introducing agentpool client #6685

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
131 changes: 131 additions & 0 deletions cluster-autoscaler/cloudprovider/azure/azure_client.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,16 @@ import (
"os"
"time"

_ "go.uber.org/mock/mockgen/model" // for go:generate

azextensions "github.com/Azure/azure-sdk-for-go-extensions/pkg/middleware"
"github.com/Azure/azure-sdk-for-go/sdk/azcore"
"github.com/Azure/azure-sdk-for-go/sdk/azcore/arm/policy"
"github.com/Azure/azure-sdk-for-go/sdk/azcore/cloud"
azurecore_policy "github.com/Azure/azure-sdk-for-go/sdk/azcore/policy"
"github.com/Azure/azure-sdk-for-go/sdk/azcore/runtime"
"github.com/Azure/azure-sdk-for-go/sdk/azidentity"
"github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/containerservice/armcontainerservice/v4"
"github.com/Azure/azure-sdk-for-go/services/compute/mgmt/2019-07-01/compute"
"github.com/Azure/azure-sdk-for-go/services/resources/mgmt/2017-05-10/resources"
"github.com/Azure/azure-sdk-for-go/services/storage/mgmt/2021-02-01/storage"
Expand Down Expand Up @@ -138,6 +148,118 @@ func (az *azDeploymentsClient) Delete(ctx context.Context, resourceGroupName, de
return future.Response(), err
}

//go:generate sh -c "mockgen k8s.io/autoscaler/cluster-autoscaler/cloudprovider/azure AgentPoolsClient >./agentpool_client.go"

// AgentPoolsClient interface defines the methods needed for scaling vms pool.
// it is implemented by track2 sdk armcontainerservice.AgentPoolsClient
type AgentPoolsClient interface {
Comment on lines +153 to +155
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an implicit implementation choice here of implementing a new client here, rather than using those in cloud-provider-azure (and maybe extending them; looks like there is an MC client, but not AP one), which are used throughout the rest of CAS Azure provider. What are the tradeoffs, and which ones push us in this direction?

Get(ctx context.Context,
resourceGroupName, resourceName, agentPoolName string,
options *armcontainerservice.AgentPoolsClientGetOptions) (
armcontainerservice.AgentPoolsClientGetResponse, error)
BeginCreateOrUpdate(
ctx context.Context,
resourceGroupName, resourceName, agentPoolName string,
parameters armcontainerservice.AgentPool,
options *armcontainerservice.AgentPoolsClientBeginCreateOrUpdateOptions) (
*runtime.Poller[armcontainerservice.AgentPoolsClientCreateOrUpdateResponse], error)
BeginDeleteMachines(
ctx context.Context,
resourceGroupName, resourceName, agentPoolName string,
machines armcontainerservice.AgentPoolDeleteMachinesParameter,
options *armcontainerservice.AgentPoolsClientBeginDeleteMachinesOptions) (
*runtime.Poller[armcontainerservice.AgentPoolsClientDeleteMachinesResponse], error)
}

Comment on lines +153 to +173
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing against this interface is something I would like to start and think about and is important for our current work to improve the testing scenarios for cluster autoscaler.

For karpenter, we use env test to simulate teh kubernetes environment, and we have fakes for all of the azure apis. https://github.com/Azure/karpenter-provider-azure/tree/main/pkg/fake

I want us to start thinking about the best way to fake the Agentpools client. It doesn't have to be solved in this pr, but its good to start thinking about

func getAgentpoolClientCredentials(cfg *Config) (azcore.TokenCredential, error) {
var cred azcore.TokenCredential
var err error
if cfg.AuthMethod == authMethodCLI {
cred, err = azidentity.NewAzureCLICredential(&azidentity.AzureCLICredentialOptions{
TenantID: cfg.TenantID})
if err != nil {
klog.Errorf("NewAzureCLICredential failed: %v", err)
return nil, err
}
} else if cfg.AuthMethod == "" || cfg.AuthMethod == authMethodPrincipal {
cred, err = azidentity.NewClientSecretCredential(cfg.TenantID, cfg.AADClientID, cfg.AADClientSecret, nil)
if err != nil {
klog.Errorf("NewClientSecretCredential failed: %v", err)
return nil, err
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no msi support here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could get this PR in first and add MSI support in a following PR

return nil, fmt.Errorf("unsupported authorization method: %s", cfg.AuthMethod)
}
return cred, nil
}

Comment on lines +174 to +195
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of code seems to be copy pasted around quite a bit. I wonder if we can start to share some of it via the azure-sdk-for-go-extensions repo

Karpenter has a very similar implementation of the same thing essentially we want to authenticate for SP, MSI and CLI cred we all share this code.
https://github.com/Azure/karpenter-provider-azure/blob/main/pkg/auth/cred.go#L28

func getAgentpoolClientRetryOptions(cfg *Config) azurecore_policy.RetryOptions {
if cfg.AuthMethod == authMethodCLI {
return azurecore_policy.RetryOptions{
MaxRetries: -1, // no retry when using CLI auth for UT
}
}
return azextensions.DefaultRetryOpts()
}

func newAgentpoolClient(cfg *Config) (AgentPoolsClient, error) {
retryOptions := getAgentpoolClientRetryOptions(cfg)

if cfg.ARMBaseURLForAPClient != "" {
klog.V(10).Infof("Using ARMBaseURLForAPClient to create agent pool client")
return newAgentpoolClientWithConfig(cfg.SubscriptionID, nil, cfg.ARMBaseURLForAPClient, "UNKNOWN", retryOptions)
}

return newAgentpoolClientWithPublicEndpoint(cfg, retryOptions)
}

func newAgentpoolClientWithConfig(subscriptionID string, cred azcore.TokenCredential,
cloudCfgEndpoint, cloudCfgAudience string, retryOptions azurecore_policy.RetryOptions) (AgentPoolsClient, error) {
agentPoolsClient, err := armcontainerservice.NewAgentPoolsClient(subscriptionID, cred,
&policy.ClientOptions{
ClientOptions: azurecore_policy.ClientOptions{
Cloud: cloud.Configuration{
Services: map[cloud.ServiceName]cloud.ServiceConfiguration{
cloud.ResourceManager: {
Endpoint: cloudCfgEndpoint,
Audience: cloudCfgAudience,
},
},
},
Telemetry: azextensions.DefaultTelemetryOpts(getUserAgentExtension()),
Transport: azextensions.DefaultHTTPClient(),
Retry: retryOptions,
},
})

if err != nil {
return nil, fmt.Errorf("failed to init cluster agent pools client: %w", err)
}

klog.V(10).Infof("Successfully created agent pool client with ARMBaseURLForAPClient")
return agentPoolsClient, nil
}

func newAgentpoolClientWithPublicEndpoint(cfg *Config, retryOptions azurecore_policy.RetryOptions) (AgentPoolsClient, error) {
cred, err := getAgentpoolClientCredentials(cfg)
if err != nil {
klog.Errorf("failed to get agent pool client credentials: %v", err)
return nil, err
}

// default to public cloud
env := azure.PublicCloud
if cfg.Cloud != "" {
env, err = azure.EnvironmentFromName(cfg.Cloud)
if err != nil {
klog.Errorf("failed to get environment from name %s: with error: %v", cfg.Cloud, err)
return nil, err
}
}

return newAgentpoolClientWithConfig(cfg.SubscriptionID, cred, env.ResourceManagerEndpoint, env.TokenAudience, retryOptions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful!

}

type azAccountsClient struct {
client storage.AccountsClient
}
Expand All @@ -151,6 +273,7 @@ type azClient struct {
disksClient diskclient.Interface
storageAccountsClient storageaccountclient.Interface
skuClient compute.ResourceSkusClient
agentPoolClient AgentPoolsClient
}

// newServicePrincipalTokenFromCredentials creates a new ServicePrincipalToken using values of the
Expand Down Expand Up @@ -278,6 +401,13 @@ func newAzClient(cfg *Config, env *azure.Environment) (*azClient, error) {
skuClient.Authorizer = azClientConfig.Authorizer
klog.V(5).Infof("Created sku client with authorizer: %v", skuClient)

agentPoolClient, err := newAgentpoolClient(cfg)
if err != nil {
// we don't want to fail the whole process so we don't break any existing functionality
// since this may not be fatal - it is only used by vms pool which is still under development.
klog.Warningf("newAgentpoolClient failed with error: %s", err)
}

return &azClient{
disksClient: disksClient,
interfacesClient: interfacesClient,
Expand All @@ -287,5 +417,6 @@ func newAzClient(cfg *Config, env *azure.Environment) (*azClient, error) {
virtualMachinesClient: virtualMachinesClient,
storageAccountsClient: storageAccountsClient,
skuClient: skuClient,
agentPoolClient: agentPoolClient,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on L408, it is possible that the agentPoolClient is nil. Be sure to check it when refer it in later code development.

}, nil
}
20 changes: 18 additions & 2 deletions cluster-autoscaler/cloudprovider/azure/azure_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,16 @@ type Config struct {
Location string `json:"location" yaml:"location"`
TenantID string `json:"tenantId" yaml:"tenantId"`
SubscriptionID string `json:"subscriptionId" yaml:"subscriptionId"`
ResourceGroup string `json:"resourceGroup" yaml:"resourceGroup"`
VMType string `json:"vmType" yaml:"vmType"`
ClusterName string `json:"clusterName" yaml:"clusterName"`
// ResourceGroup is the MC_ resource group where the nodes are located.
ResourceGroup string `json:"resourceGroup" yaml:"resourceGroup"`
// ClusterResourceGroup is the resource group where the cluster is located.
ClusterResourceGroup string `json:"clusterResourceGroup" yaml:"clusterResourceGroup"`
VMType string `json:"vmType" yaml:"vmType"`

// ARMBaseURLForAPClient is the URL to use for operations for the VMs pool.
// It can override the default public ARM endpoint for VMs pool scale operations.
ARMBaseURLForAPClient string `json:"armBaseURLForAPClient" yaml:"armBaseURLForAPClient"`

// AuthMethod determines how to authorize requests for the Azure
// cloud. Valid options are "principal" (= the traditional
Expand Down Expand Up @@ -294,6 +302,12 @@ func BuildAzureConfig(configReader io.Reader) (*Config, error) {
}
}
}

// always read the following from environment variables since azure.json doesn't have these fields
cfg.ClusterName = os.Getenv("CLUSTER_NAME")
cfg.ClusterResourceGroup = os.Getenv("ARM_CLUSTER_RESOURCE_GROUP")
cfg.ARMBaseURLForAPClient = os.Getenv("ARM_BASE_URL_FOR_AP_CLIENT")

cfg.TrimSpace()

if cloudProviderRateLimit := os.Getenv("CLOUD_PROVIDER_RATE_LIMIT"); cloudProviderRateLimit != "" {
Expand Down Expand Up @@ -460,7 +474,9 @@ func (cfg *Config) TrimSpace() {
cfg.Location = strings.TrimSpace(cfg.Location)
cfg.TenantID = strings.TrimSpace(cfg.TenantID)
cfg.SubscriptionID = strings.TrimSpace(cfg.SubscriptionID)
cfg.ClusterName = strings.TrimSpace(cfg.ClusterName)
cfg.ResourceGroup = strings.TrimSpace(cfg.ResourceGroup)
cfg.ClusterResourceGroup = strings.TrimSpace(cfg.ClusterResourceGroup)
cfg.VMType = strings.TrimSpace(cfg.VMType)
cfg.AADClientID = strings.TrimSpace(cfg.AADClientID)
cfg.AADClientSecret = strings.TrimSpace(cfg.AADClientSecret)
Expand Down
6 changes: 6 additions & 0 deletions cluster-autoscaler/cloudprovider/azure/azure_manager_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,9 @@ func TestCreateAzureManagerWithNilConfig(t *testing.T) {
TenantID: "tenantId",
SubscriptionID: "subscriptionId",
ResourceGroup: "resourceGroup",
ClusterName: "mycluster",
ClusterResourceGroup: "myrg",
ARMBaseURLForAPClient: "nodeprovisioner-svc.nodeprovisioner.svc.cluster.local",
VMType: "vmss",
AADClientID: "aadClientId",
AADClientSecret: "aadClientSecret",
Expand Down Expand Up @@ -449,6 +452,9 @@ func TestCreateAzureManagerWithNilConfig(t *testing.T) {
t.Setenv("BACKOFF_DURATION", "1")
t.Setenv("BACKOFF_JITTER", "1")
t.Setenv("CLOUD_PROVIDER_RATE_LIMIT", "true")
t.Setenv("CLUSTER_NAME", "mycluster")
t.Setenv("ARM_CLUSTER_RESOURCE_GROUP", "myrg")
t.Setenv("ARM_BASE_URL_FOR_AP_CLIENT", "nodeprovisioner-svc.nodeprovisioner.svc.cluster.local")

t.Run("environment variables correctly set", func(t *testing.T) {
manager, err := createAzureManagerInternal(nil, cloudprovider.NodeGroupDiscoveryOptions{}, mockAzClient)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
/*
Copyright 2020 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package azure

import (
context "context"
reflect "reflect"

runtime "github.com/Azure/azure-sdk-for-go/sdk/azcore/runtime"
armcontainerservice "github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/containerservice/armcontainerservice/v4"
gomock "go.uber.org/mock/gomock"
)

// MockAgentPoolsClient is a mock of AgentPoolsClient interface.
type MockAgentPoolsClient struct {
ctrl *gomock.Controller
recorder *MockAgentPoolsClientMockRecorder
}

// MockAgentPoolsClientMockRecorder is the mock recorder for MockAgentPoolsClient.
type MockAgentPoolsClientMockRecorder struct {
mock *MockAgentPoolsClient
}

// NewMockAgentPoolsClient creates a new mock instance.
func NewMockAgentPoolsClient(ctrl *gomock.Controller) *MockAgentPoolsClient {
mock := &MockAgentPoolsClient{ctrl: ctrl}
mock.recorder = &MockAgentPoolsClientMockRecorder{mock}
return mock
}

// EXPECT returns an object that allows the caller to indicate expected use.
func (m *MockAgentPoolsClient) EXPECT() *MockAgentPoolsClientMockRecorder {
return m.recorder
}

// BeginCreateOrUpdate mocks base method.
func (m *MockAgentPoolsClient) BeginCreateOrUpdate(arg0 context.Context, arg1, arg2, arg3 string, arg4 armcontainerservice.AgentPool, arg5 *armcontainerservice.AgentPoolsClientBeginCreateOrUpdateOptions) (*runtime.Poller[armcontainerservice.AgentPoolsClientCreateOrUpdateResponse], error) {
m.ctrl.T.Helper()
ret := m.ctrl.Call(m, "BeginCreateOrUpdate", arg0, arg1, arg2, arg3, arg4, arg5)
ret0, _ := ret[0].(*runtime.Poller[armcontainerservice.AgentPoolsClientCreateOrUpdateResponse])
ret1, _ := ret[1].(error)
return ret0, ret1
}

// BeginCreateOrUpdate indicates an expected call of BeginCreateOrUpdate.
func (mr *MockAgentPoolsClientMockRecorder) BeginCreateOrUpdate(arg0, arg1, arg2, arg3, arg4, arg5 any) *gomock.Call {
mr.mock.ctrl.T.Helper()
return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "BeginCreateOrUpdate", reflect.TypeOf((*MockAgentPoolsClient)(nil).BeginCreateOrUpdate), arg0, arg1, arg2, arg3, arg4, arg5)
}

// BeginDeleteMachines mocks base method.
func (m *MockAgentPoolsClient) BeginDeleteMachines(arg0 context.Context, arg1, arg2, arg3 string, arg4 armcontainerservice.AgentPoolDeleteMachinesParameter, arg5 *armcontainerservice.AgentPoolsClientBeginDeleteMachinesOptions) (*runtime.Poller[armcontainerservice.AgentPoolsClientDeleteMachinesResponse], error) {
m.ctrl.T.Helper()
ret := m.ctrl.Call(m, "BeginDeleteMachines", arg0, arg1, arg2, arg3, arg4, arg5)
ret0, _ := ret[0].(*runtime.Poller[armcontainerservice.AgentPoolsClientDeleteMachinesResponse])
ret1, _ := ret[1].(error)
return ret0, ret1
}

// BeginDeleteMachines indicates an expected call of BeginDeleteMachines.
func (mr *MockAgentPoolsClientMockRecorder) BeginDeleteMachines(arg0, arg1, arg2, arg3, arg4, arg5 any) *gomock.Call {
mr.mock.ctrl.T.Helper()
return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "BeginDeleteMachines", reflect.TypeOf((*MockAgentPoolsClient)(nil).BeginDeleteMachines), arg0, arg1, arg2, arg3, arg4, arg5)
}

// Get mocks base method.
func (m *MockAgentPoolsClient) Get(arg0 context.Context, arg1, arg2, arg3 string, arg4 *armcontainerservice.AgentPoolsClientGetOptions) (armcontainerservice.AgentPoolsClientGetResponse, error) {
m.ctrl.T.Helper()
ret := m.ctrl.Call(m, "Get", arg0, arg1, arg2, arg3, arg4)
ret0, _ := ret[0].(armcontainerservice.AgentPoolsClientGetResponse)
ret1, _ := ret[1].(error)
return ret0, ret1
}

// Get indicates an expected call of Get.
func (mr *MockAgentPoolsClientMockRecorder) Get(arg0, arg1, arg2, arg3, arg4 any) *gomock.Call {
mr.mock.ctrl.T.Helper()
return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Get", reflect.TypeOf((*MockAgentPoolsClient)(nil).Get), arg0, arg1, arg2, arg3, arg4)
}
Loading
Loading