-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use getvmss api for spot instances in azure #6470
use getvmss api for spot instances in azure #6470
Conversation
e6c51a3
to
6ca161e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments. Please also update the PR name and description, and add a short summary of user-facing change
if getVmssSizeRefreshPeriod := os.Getenv("AZURE_GET_VMSS_SIZE_REFRESH_PERIOD"); getVmssSizeRefreshPeriod != "" { | ||
cfg.GetVmssSizeRefreshPeriod, err = strconv.Atoi(getVmssSizeRefreshPeriod) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to parse AZURE_GET_VMSS_SIZE_REFRESH_PERIOD %q: %v", getVmssSizeRefreshPeriod, err) | ||
} | ||
} else { | ||
cfg.GetVmssSizeRefreshPeriod = VmssSizeRefreshPeriodDefault | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic uses either environment variable or default, never the value from cloud config file. If we don't support setting this from cloud config file - need to remove from docs. If we need to support it - should update the logic. (It looks like the same applies to the EnableDynamicInstanceList above?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Looks like we're doing the same for others as well like - vmssVmsCacheTTL, vmssVmsCacheTTL, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, those lack the "else" part, will preserve config file setting in the absence of env override
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the cloud config file settings will already have been primed into the cfg
object? If so this should work
...
} else if cfg.GetVmssSizeRefreshPeriod != 0 {
cfg.GetVmssSizeRefreshPeriod = VmssSizeRefreshPeriodDefault
}
Is that what you're thinking @tallaxes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, line 127 will result in it being taken from cloud config file already.
GetVmssSizeRefreshPeriod int `json:"getVmssSizeRefreshPeriod" yaml:"getVmssSizeRefreshPeriod"`
Line 155-162 should take care of the parsing:
if configReader != nil {
body, err := ioutil.ReadAll(configReader)
if err != nil {
return nil, fmt.Errorf("failed to read config: %v", err)
}
err = json.Unmarshal(body, cfg)
if err != nil {
return nil, fmt.Errorf("failed to unmarshal config body: %v", err)
}
}
Then, on line 163 and below, the environment variables will be used only when cloud config file does not exist.
else {
cfg.Cloud = os.Getenv("ARM_CLOUD")
...
}
The problem I am seeing is when cloud config file exists, but does not contain getVmssSizeRefreshPeriod
, and we set AZURE_GET_VMSS_SIZE_REFRESH_PERIOD
, then this AZURE_GET_VMSS_SIZE_REFRESH_PERIOD
will never be used and this configuration will goes to default.
My proposal: consider each variable individually rather than entire file at once when determining whether to use an environment variable. But not in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To your second point:
My proposal: consider each variable individually rather than entire file at once when determining whether to use an environment variable. But not in this PR.
Do you mean:
- read from config file fully and populate
cfg
- then for each variable, check to see if it is already set in
cfg
- if it's not, check to see if the corresponding env var is set + set in
cfg
if it is - finally, if no
cfg
or env var is set, set it to default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the assumption here is that people may set values with a mix of cloud config file and env vars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not true, right?
The logic is:
if env var is set AND not empty string, then convert string to int and set cfg field
else: use default
It is true (right?, you may help me check again). All these environment variable parsing, from line 163 to 314, is encased in a giant else
statement. It will reach this part only if configReader == nil
, which, if we track it down, indicates that the user never provided the path to the config file.
if configReader != nil {
body, err := ioutil.ReadAll(configReader)
if err != nil {
return nil, fmt.Errorf("failed to read config: %v", err)
}
err = json.Unmarshal(body, cfg)
if err != nil {
return nil, fmt.Errorf("failed to unmarshal config body: %v", err)
}
}
Then, on line 163 and below, the environment variables will be used only when cloud config file does not exist.
else {
cfg.Cloud = os.Getenv("ARM_CLOUD")
...
// What we are looking at
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean:
- read from config file fully and populate
cfg
- then for each variable, check to see if it is already set in
cfg
- if it's not, check to see if the corresponding env var is set + set in
cfg
if it is- finally, if no
cfg
or env var is set, set it to default
That's one way to go with it---with config file > env > default.
Another way is env > config file > default.
I actually prefer the latter more, given env being more delicate(?) than the config file, as well as having a use case I see in AKS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I'm going to leave the logic as is.
However, want to explicitly call out that we are not handling a key case for most of these variables:
- If config is not nil, but doesn't set a field
- the env var for that field will never be picked up
- the default for that field (if present) will never be picked up
@comtalyst is aiming to address this scenario in his PR to defork cloud-provider-azure
It's a rather large PR so we may determine that this logic is better to be merged separately. Either way, I think it's fine to keep this field using the same structure as the others and address this in a follow-up PR.
/retest |
12fb2e9
to
eb99a37
Compare
Assigning since looks like this is being reviewed by you already. /assign @tallaxes @rakechill |
@x13n: GitHub didn't allow me to assign the following users: rakechill. Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test pull-cluster-autoscaler-e2e-azure |
/retest |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@x13n can you assign this to me? |
Looks like I can, done! |
@@ -128,6 +131,9 @@ type Config struct { | |||
// Jitter in seconds subtracted from the VMSS cache TTL before the first refresh | |||
VmssVmsCacheJitter int `json:"vmssVmsCacheJitter" yaml:"vmssVmsCacheJitter"` | |||
|
|||
// GetVmssSizeRefreshPeriod (seconds) defines how frequently to call GET VMSS API to fetch VMSS info per nodegroup instance | |||
GetVmssSizeRefreshPeriod int `json:"getVmssSizeRefreshPeriod" yaml:"getVmssSizeRefreshPeriod"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this qualifies as a nit, but maybe rename all of these properties that start w/ "get" so that they don't confuse readers that they are getter funcs. I think "vmssGet...|VmssGet..." instead would work without regressing the semantics for human readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not quite applicable for this case. GetVMSS
is the actual function we're calling. switching the order around almost makes it sound (to me) like we're calling a function that gets the size rather than the VMSS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: comtalyst, gandhipr, jackfrancis, rakechill The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -128,6 +131,9 @@ type Config struct { | |||
// Jitter in seconds subtracted from the VMSS cache TTL before the first refresh | |||
VmssVmsCacheJitter int `json:"vmssVmsCacheJitter" yaml:"vmssVmsCacheJitter"` | |||
|
|||
// GetVmssSizeRefreshPeriod (seconds) defines how frequently to call GET VMSS API to fetch VMSS info per nodegroup instance | |||
GetVmssSizeRefreshPeriod int `json:"getVmssSizeRefreshPeriod" yaml:"getVmssSizeRefreshPeriod"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I missed this, but GetVmssSizeRefreshPeriod
being an int
introduces an inconsistency with the fork, where it is time.Duration
. This time it is not too bad, but let us all be careful next time. I could see cases like this getting worse if there are more dependencies.
/cherry-pick cluster-autoscaler-release-1.27 |
@rakechill: only kubernetes org members may request cherry picks. If you are already part of the org, make sure to change your membership to public. Otherwise you can still do the cherry-pick manually. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cherry-pick cluster-autoscaler-release-1.27 |
@willie-yao: #6470 failed to apply on top of branch "cluster-autoscaler-release-1.27":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR adds support for using getVMSS api for instances in order to collect most recent count in the spot nodepool
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: