-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] Alibaba recommitted #5291
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
os.Setenv(envAccessKeyID, accessKeyID) | ||
os.Setenv(envAccessKeySecret, accessKeySecret) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not be setting environment variables. The API should be accessed programmatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll delete it
// Before deploying the cluster, the user must manually create a resource group. | ||
// The parameter ResourceGroupID is required. | ||
ResourceGroupID string `json:"resourceGroupID"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do users have to create the resource group rather than the installer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a 7-day buffer time after alicloud resource groups are deleted. During this period, resource groups with the same name cannot be created. Therefore, in the initial design, users need to create them manually. If this change is not necessary, we plan to support the function of creating new resource groups in later versions.
|
||
// GenerateIgnitionShim is used to generate an ignition file that contains a user ca bundle | ||
// in its Security section. | ||
func GenerateIgnitionShim(bootstrapConfigURL string, userCA string) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this moves the AWS ignition shim into a reusable function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I separated this function from AWS. Is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bd233 yes this is good. Sorry for the confusing comment. I meant that as a note to myself.
// does not need to be user-supplied (e.g. because it can be retrieved | ||
// from external APIs). | ||
type Metadata struct { | ||
client *Client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stored *Client
is not being utilized, instead NewClient
is called throughout the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it may be good to use the stored client rather than recreating the client multiple times.
return allErrs | ||
} | ||
|
||
func validateMachinePool(client *Client, ic *types.InstallConfig, fldPath *field.Path, pool *alibabacloudtypes.MachinePool, replicas *int64) field.ErrorList { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like none of this code is tested.
UserDataSecret: &corev1.LocalObjectReference{Name: userDataSecret}, | ||
CredentialsSecret: &corev1.LocalObjectReference{Name: "alibabacloud-credentials"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work with manual credential mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having creds in kube-system is not useful only for CCO. Even in manual mode the creds could be used for other purposes. We should, however, document that the creds are needed and what permissions the user must have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The credentials for the machine-api-operator are used to create the machine. The credentials in the machine spec are used for kubelet, kube-controller-manager, or the out-of-tree providers of such that run on the machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. But should this be handled by the cloud controller manager operator's credential request then?
3cdd09a
to
67775fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bd233 @staebler @kwoodson I have just completed an initial review of this PR, which reorganizes #5018. The purpose is to figure out whether we can merge this PR, or further substantial changes are needed. I have taken the liberty of making small changes myself (reorganizing imports, fixing grammar, removing unnecessary code).
Here are the main outstanding issues I see after my review:
- There is code to create credentials for the CCO despite this running in manual mode
- There is a ValidateForProvisioning check that I think should be moved to the earlier validation stages.
- A client is stored in Metadata but it is not used (a new client is created each time).
- Authentication credentials for the client are being set programmatically through environment variables. I don't know if this is necessarily a problem, and it is platform specific. But it is unusual
- machineset code is not tested
I would also like to point out two items which I do no think are necessarily problematic but worth drawing attention to:
- I noticed that a user is required to create a resource group]([wip] Alibaba recommitted #5291 (comment)). This seems unneccessary to me, but has been considered already so I am fine with it, at least for the time being.
- This is not a problem, but I wanted to draw attention to the fact that AWS code for ignition shim is extracted and made reusable for Alibaba. This is good because we will need this for ASH.
I did a cursory review of Terraform and destroy but I did not do an in-depth review.
So I would like to discuss what would be the best path forward. I am not sure if it is worth holding the PR for these items, or we should merge and fix in follow-up PRs. If we are going to hold, I would like to come up with a plan for how to integrate the changes.
67775fb
to
74e7d94
Compare
Adds the Alibaba platform and validation to types package. Also adds supporting files for explain.
Adds preliminary assets for the Alibaba platform: cluster, install config, machines, manifests, quota, rhcos.
Adds Terraform plugin, tfvars and stages for Alibaba.
Adds Terraform configurations for the Alibaba platform.
Adds destroy code for the Alibaba platform.
This commit was produced by running , , and all modules verified. Signed-off-by: sunhui <wb-sh373163@alibaba-inc.com>
74e7d94
to
e1d3c17
Compare
Yes, I think I should removed these codes
Thank you very much for your work. Based on the modification of this PR, I recreated a new branch and fixed the above problems one by one. If this is the path you expect, then I should use this branch to create a new PR? If there is anything I need to do, please let me know. |
@patrickdillon: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Sure a new PR would be fine. |
@bd233 @dongchen126
These are generated during the Here are the errors that I see when running the cluster:
I am able to resolve these for the worker nodes by adding a few fields:
Since the installation occurs before these variables are set I'm not sure how to resolve these until after the cluster installation has started. I believe this can be done but wanted to report this as an extra step that is required before installation can complete successfully. If we need to merge this PR and fix this afterwards that should be okay as then the Alibaba team can reproduce. I wanted to bring this up and begin to think about how we populate these fields during the installation? |
I don't think we'll be able to use a VPC ID in the machinesets. As you point out, the actual VPC ID is not known until after the terraform runs. Other platforms use a well-known VPC name instead. |
|
||
// TODO: more appropriate to use asynchronous. It is advisable to optimise in the future | ||
for _, execute := range deletedFuncs { | ||
err = o.executeDeleteFunction(execute.executeFunc, execute.resourceName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The destroyer cannot wait for a given delete function to complete successfully before moving on to the next delete functions. Instead of waiting indefinitely on one delete function, the destroyer should instead loop through each delete function, making one attempt at each delete function during each iteration of the loop.
for _, arn := range tagResources { | ||
notDeletedResources = append(notDeletedResources, arn.ResourceARN) | ||
} | ||
return errors.New(fmt.Sprintf("There are undeleted cloud resources %q", notDeletedResources)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The destroyer must not stop when there are resources that have not been deleted. The destroyer must keep trying to delete the resources until the user stops the destroyer.
@patrickdillon @staebler @kwoodson |
@patrickdillon: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/close in favor of #5333 |
/close |
@patrickdillon: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The Alibaba PR #5018 up until this point has been divided among dozens of commits; the PR has recently been squashed down into two large commits, one of all code/configuration, the other for all vendoring.
This PR, takes the most recent state of the PR with the two commits e7297fa443e64e842c7e7fa3166bd7f380ab4339 and 8962496f84393e5c6668330d5a054c622a599977, attempts to help reorganize them in a logical manner for easier review. This PR simply organizes the commits around the code structure of the Installer. There are separate commits for:
I propose that @bd233 and his team take the commits from this PR and either update #5018 with the new organization or open a new PR to replace #5018. Again this PR simply reorganizes the current state of #5018 with the goal of making it easier to review.
Moving forward changes to the PR would either be rebased into the appropriate commit or added using FIXUP commits. Let's make an agreement here before proceeding.
@staebler and @kwoodson thoughts on this plan?