Add e2e test for kubelet instance config #3126

HirazawaUi · 2024-11-18T14:25:25Z

Add e2e test for kubelet instance config.

relate: kubernetes/test-infra#33807

--- edit neolit123
xref

Add kubelet instance configuration to configure CRI socket for each node #3042

HirazawaUi · 2024-11-18T14:38:50Z

/retest

HirazawaUi · 2024-11-19T13:08:56Z

/cc @neolit123

neolit123

i think we should do the following (when the FG is enabled):

kubeadm init with "latest"
kubeadm join with "latest"
check that that instance config is on the nodes (it should be there)
delete the instance config from the nodes
kubeadm upgrade "latest" (it would be latest->latest technically)
check if the instance config is on the nodes (it should be there)

kinder/ci/tools/update-workflows/templates/testinfra/kubeadm-kinder-instance-config.yaml

kinder/ci/tools/update-workflows/templates/workflows/instance-config-tasks.yaml

HirazawaUi · 2024-11-28T13:18:50Z

i think we should do the following (when the FG is enabled):

kubeadm init with "latest"

kubeadm join with "latest"

check that that instance config is on the nodes (it should be there)

delete the instance config from the nodes

kubeadm upgrade "latest" (it would be latest->latest technically)

check if the instance config is on the nodes (it should be there)

Yes, this is more comprehensive. I'll make the revisions according to it later.

neolit123

just one comment. i think the test is fine, otherwise.

kinder/ci/tools/update-workflows/templates/workflows/instance-config-tasks.yaml

neolit123

/lgtm
/approve
thanks

HirazawaUi · 2024-11-29T15:35:35Z

/hold

I just realized that we currently have no way to test this PR against the new tests added in kubernetes/test-infra#33807. I am conducting local tests, and once the testing is complete, the PR can be merged.

HirazawaUi · 2024-11-30T12:36:08Z

/hold cancel

HirazawaUi · 2024-11-30T12:48:10Z

kinder/ci/workflows/instance-config-tasks.yaml

+    - |
+      set -x
+
+      for node in worker-{1..2}; do


We only need to delete the instance kubelet configuration for worker nodes. For control-plane nodes, they must obtain the criSocket from either the annotation or the local instance kubelet configuration.

This behavior is expected: for control-plane nodes, the instance kubelet configuration file is generated only when kubeadm init is executed with the NodeLocalCRISocket feature gate enabled, and the kubeadm.alpha.kubernetes.io/cri-socket annotation will not added to the node object.

In this case, deleting the instance kubelet configuration file will result in an error state.

ref: https://github.com/kubernetes/kubernetes/blob/810e9e212ec5372d16b655f57b9231d8654a2179/cmd/kubeadm/app/cmd/upgrade/node.go#L193-L196

that doesn't seem correct. the behavior should be the same for both workers and CP. if the FG is enabled the instance config should be written on all nodes. and in this step we want to remove it so that upgrade regenerates it if the FG is enabled. note workers fetch the ClusterConfiguration FGs from the kubeeadm-config CM.

why is there are difference here?

Yes, both control plane nodes and worker nodes will write to the kubelet instance configuration. However, during the execution of kubeadm upgrade node, the control plane nodes will retrieve the CRI information either from the Node object or the kubelet instance configuration.

If the feature gate was enabled when the cluster was initialized using kubeadm init, the Node objects will not have the kubeadm.alpha.kubernetes.io/cri-socket annotation. If the kubelet instance configuration is deleted, an error will occur during kubeadm upgrade because the CRI information cannot be retrieved.

hm, ok understand the situation.

i think we need two upgrade scenarios (tasks) in the workflow

upgrading from FG enabled -> FG enabled

delete the annotation / make sure it's not there

instance-config is present

call upgrade

make sure instance-config is present

upgrading from FG disabled -> FG enabled

delete instance config

apply the node annotations with kubectl

call upgrade

make sure instance-config is present

some other notes:

we should ideally also test that the kubeadm-flags.env file no longer has the containeruntime flag (that's still missing in k/k)

the same story should be the same for workers and control-plane nodes in the e2e

the tasks in the workflow should have unique names

3. upgrading from FG disabled -> FG enabled

delete instance config

apply the node annotations with kubectl

call upgrade

make sure instance-config is present

It seems there is an issue. The instance config did not exist before the upgrade.

The correct process should be:

create the cluster with the feature gate disabled.

verify that the annotation exists and the instance config does not exist.

call the upgrade with the feature gate enabled.

verify that the annotation not exists and the instance config is present.

we need to create the cluster with FG enabled to verify instance config will be created.
note both 1 and 2 are steps in the same kinder workflow file.

then my steps in 2 make sure that if the FG is enabled (in clusterconfiguration) but if the instance config is missing it would use the annotation or flags on disk values to migrate the user to start using the FG and instance config. is that not a supported case in the code in k/k?

kubeadm feature gates can also be enabled before upgrade. by the user manually editing the kubeadm-config CM and then "kubeadm upgrade" commands can execute differently based on the FG values.

we need to create the cluster with FG enabled to verify instance config will be created.
note both 1 and 2 are steps in the same kinder workflow file.

Ah, sorry, I initially thought it needed two files, like this:

tasks: - import: instance-config-enabled-upgrade-to-enabled-tasks.yaml - import: instance-config-disabled-upgrade-to-enabled-tasks.yaml

I've got what you mean. Thank you for your patient explanation.

pacoxu · 2024-12-05T03:36:27Z

BTW, @neolit123 is it neccessary for us to add some CIs like https://testgrid.k8s.io/sig-release-master-blocking#gce-cos-master-alpha-features to enable all alpha or beta features during the upgrade and init? This is out of the scope of this PR.

pacoxu · 2024-12-05T03:56:34Z

kinder/ci/workflows/instance-config-tasks.yaml

+  - --name={{ .vars.clusterName }}
+  - --loglevel=debug
+  - --kubeadm-verbosity={{ .vars.kubeadmVerbosity }}
+  - --kubeadm-feature-gate="NodeLocalCRISocket=true"


only init with the FG.

we can enable the FG before join and upgrade.

kubeadm/kinder/pkg/cluster/manager/actions/actions.go

Lines 40 to 53 in 33b7812

"kubeadm-config": func(c *status.Cluster, flags *RunOptions) error {

// Nb. this action is invoked automatically at kubeadm init/join time, but it is possible

// to invoke it separately as well

return KubeadmConfig(c, flags.kubeadmConfigVersion, flags.copyCertsMode, flags.discoveryMode, flags.featureGate, flags.encryptionAlgorithm, flags.upgradeVersion, c.K8sNodes().EligibleForActions()...)

},

"kubeadm-init": func(c *status.Cluster, flags *RunOptions) error {

return KubeadmInit(c, flags.usePhases, flags.copyCertsMode, flags.kubeadmConfigVersion, flags.patchesDir, flags.ignorePreflightErrors, flags.featureGate, flags.encryptionAlgorithm, flags.wait, flags.vLevel)

},

"kubeadm-join": func(c *status.Cluster, flags *RunOptions) error {

return KubeadmJoin(c, flags.usePhases, flags.copyCertsMode, flags.discoveryMode, flags.kubeadmConfigVersion, flags.patchesDir, flags.ignorePreflightErrors, flags.wait, flags.vLevel)

},

"kubeadm-upgrade": func(c *status.Cluster, flags *RunOptions) error {

return KubeadmUpgrade(c, flags.upgradeVersion, flags.patchesDir, flags.wait, flags.vLevel)

},

only calling the FG on init is fine, because it will be stored in the CM.
this plan will work #3126 (comment)

neolit123 · 2024-12-05T12:18:25Z

BTW, @neolit123 is it neccessary for us to add some CIs like https://testgrid.k8s.io/sig-release-master-blocking#gce-cos-master-alpha-features to enable all alpha or beta features during the upgrade and init? This is out of the scope of this PR.

i personally think that our current setup is better which is - one dedicated e2e for each FG.
if we have FGs that interact with each other we might want to allow kinder to enable >1 but that's more of a corner case.

kinder/ci/tools/update-workflows/templates/workflows/instance-config-tasks.yaml

neolit123 · 2024-12-06T15:07:33Z

kinder/ci/tools/update-workflows/templates/workflows/instance-config-tasks.yaml

+
+      exit 0
+  timeout: 5m
+- name: disable-features-gate


we shouldn't disable the FG.

if the FG is enabled before upgrade manually by the user it should transform the cluster.

removing the annotation from each node object (maybe we can do this for 1.33)

writing the instance-config.yaml on each node

we shouldn't disable the FG.

I think whether the feature gate was enabled before the upgrade is not important to us.

For the new functionality we introduced, we won’t check whether the feature gate was enabled previously; we will only verify its annotation and instance-config.yaml.

Here, we simulate a cluster that has never used the NodeLocalCRISocket feature gate before. It has the kubeadm.alpha.kubernetes.io/cri-socket annotation and no instance-config.yaml. During the upgrade, we enable the feature gate and observe whether the subsequent behavior meets our expectations.

Or I have misunderstood your meaning.

here is the test that we need explained again. some of these might need k/k code changes.

keep the FG enabled in the cluster

make sure the annotation is present

make sure that the instance config is not present

run upgrade

make sure that instance config was generated

make sure that the --container-runtime-endpoint flag is removed from the kubeadm-flags.env (needs k/k change)

make sure that the annotation was removed (needs k/k change)

this covers the scenario where users will enable the FG before upgrading and the same cluster did not have the FG enabled.
on upgrade kubeadm does behave differently if the FG is enabled.
https://github.com/kubernetes/kubernetes/pull/128031/files#diff-7bb5f1d4cb628a783d6b247da73a9ef0b93f793b62ed12478eb74d356f8809bfR128

here is the test that we need explained again. some of these might need k/k code changes.

keep the FG enabled in the cluster

make sure the annotation is present

make sure that the instance config is not present

run upgrade

make sure that instance config was generated

make sure that the --container-runtime-endpoint flag is removed from the kubeadm-flags.env (needs k/k change)

make sure that the annotation was removed (needs k/k change)

I now understand the core of our disagreement. I should not disable the NodeLocalCRISocket feature gate in kubeadm-config before the kubeadm upgrade and then attempt to enable it during kubeadm upgrade using the --feature-gates parameter.

This is because kubeadm upgrade does not accept the --feature-gates parameter. For it, the only way to determine whether a feature gate is enabled is through the feature-gates defined in kubeadm-config.

I think the current test tasks already cover the relevant cases:

task-add-cri-socket-annotation: Ensure the annotation is present.

task-remove-config: Ensure the instance config is not present.

task-upgrade-again: Run the upgrade process.

task-check-config-after-second-upgrade: Ensure the instance config was generated.

But...., there is another issue to address. If we disable the NodeLocalCRISocket feature gate in kubeadm-config, then the task-check-config-after-second-upgrade should fail. This is because the --kubeadm-feature-gates="NodeLocalCRISocket=true" parameter added in task-upgrade-again is ineffective. I will investigate tomorrow why this is happening.

But...., there is another issue to address. If we disable the NodeLocalCRISocket feature gate in kubeadm-config, then the task-check-config-after-second-upgrade should fail. This is because the --kubeadm-feature-gates="NodeLocalCRISocket=true" parameter added in task-upgrade-again is ineffective. I will investigate tomorrow why this is happening.

The reason is that the scripts in check-config-after-second-upgrade and check-config always end with exit code 0 because I did not capture the return result of the commands executed inside the container, and the script always ends with exit 0.

I have removed the disable-features-gate task and modified all the scripts to capture the return values from the container. Please review it again to see if it now meets our expectations.

kinder/ci/tools/update-workflows/templates/workflows/instance-config-tasks.yaml

neolit123

SGTM, minus you don't need to pass the FG on anything but init.

also we need to add the two missing things after we update k/k:

on upgrade, remove --container-runtime-endpoint from kubeadm-flags.env check in this e2e check that it's not there anymore
after upgrade if the FG is enabled make sure that the annotation is also removed from all nodes

HirazawaUi · 2024-12-15T14:33:54Z

@neolit123 Do you think we can merge it now, or should we first fix the missing things in k/k?

pacoxu · 2024-12-16T07:17:37Z

/lgtm

k8s-ci-robot · 2024-12-16T07:17:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: HirazawaUi, neolit123, pacoxu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [neolit123,pacoxu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

neolit123 · 2024-12-16T12:39:22Z

@neolit123 Do you think we can merge it now, or should we first fix the missing things in k/k?

the ci is green. please find time to fix the two remaining issues in k/k. we should try to get them in 1.33 and update the e2e.

HirazawaUi · 2024-12-16T14:06:01Z

@neolit123 Do you think we can merge it now, or should we first fix the missing things in k/k?

the ci is green. please find time to fix the two remaining issues in k/k. we should try to get them in 1.33 and update the e2e.

I checked the test logs, and everything matches expectations.

Many thanks to @neolit123 for the help! I will address the remaining issues right away.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/kinder Issues to track work in the kinder tool labels Nov 18, 2024

k8s-ci-robot requested review from carlory and SataQiu November 18, 2024 14:25

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 18, 2024

HirazawaUi mentioned this pull request Nov 18, 2024

kubeadm: add e2e test for kubelet instance config kubernetes/test-infra#33807

Merged

HirazawaUi force-pushed the add-instance-config-tests branch from 0825734 to 2c7be5a Compare November 18, 2024 14:31

HirazawaUi force-pushed the add-instance-config-tests branch 3 times, most recently from 42dd92f to 7d1b378 Compare November 19, 2024 01:12

k8s-ci-robot requested a review from neolit123 November 19, 2024 13:08

neolit123 reviewed Nov 26, 2024

View reviewed changes

kinder/ci/tools/update-workflows/templates/testinfra/kubeadm-kinder-instance-config.yaml Outdated Show resolved Hide resolved

kinder/ci/tools/update-workflows/templates/workflows/instance-config-tasks.yaml Outdated Show resolved Hide resolved

HirazawaUi force-pushed the add-instance-config-tests branch 2 times, most recently from b390952 to 8dbb097 Compare November 28, 2024 13:52

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 28, 2024

HirazawaUi force-pushed the add-instance-config-tests branch from 8dbb097 to 00aebcc Compare November 28, 2024 13:57

neolit123 reviewed Nov 28, 2024

View reviewed changes

kinder/ci/tools/update-workflows/templates/workflows/instance-config-tasks.yaml Outdated Show resolved Hide resolved

HirazawaUi force-pushed the add-instance-config-tests branch from 00aebcc to c47bde5 Compare November 29, 2024 15:13

neolit123 reviewed Nov 29, 2024

View reviewed changes

k8s-ci-robot assigned neolit123 Nov 29, 2024

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 29, 2024

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 29, 2024

HirazawaUi force-pushed the add-instance-config-tests branch from c47bde5 to 36b599d Compare November 30, 2024 12:25

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 30, 2024

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 30, 2024

HirazawaUi commented Nov 30, 2024

View reviewed changes

pacoxu reviewed Dec 5, 2024

View reviewed changes

HirazawaUi force-pushed the add-instance-config-tests branch from 36b599d to a021c22 Compare December 6, 2024 01:17

neolit123 reviewed Dec 6, 2024

View reviewed changes

neolit123 mentioned this pull request Dec 6, 2024

Add kubelet instance configuration to configure CRI socket for each node #3042

Open

HirazawaUi force-pushed the add-instance-config-tests branch 3 times, most recently from a50135b to a32e57c Compare December 8, 2024 07:17

neolit123 reviewed Dec 9, 2024

View reviewed changes

kinder/ci/tools/update-workflows/templates/workflows/instance-config-tasks.yaml Outdated Show resolved Hide resolved

neolit123 reviewed Dec 9, 2024

View reviewed changes

kinder/ci/tools/update-workflows/templates/workflows/instance-config-tasks.yaml Outdated Show resolved Hide resolved

neolit123 reviewed Dec 9, 2024

View reviewed changes

kinder/ci/tools/update-workflows/templates/workflows/instance-config-tasks.yaml Outdated Show resolved Hide resolved

neolit123 reviewed Dec 9, 2024

View reviewed changes

HirazawaUi force-pushed the add-instance-config-tests branch from a32e57c to 65235e1 Compare December 9, 2024 13:52

add e2e test for kubelet instance config

9c4740c

HirazawaUi force-pushed the add-instance-config-tests branch from 65235e1 to 9c4740c Compare December 9, 2024 13:56

k8s-ci-robot assigned pacoxu Dec 16, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 16, 2024

k8s-ci-robot merged commit f56222e into kubernetes:main Dec 16, 2024
4 checks passed

HirazawaUi deleted the add-instance-config-tests branch December 29, 2024 12:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add e2e test for kubelet instance config #3126

Add e2e test for kubelet instance config #3126

HirazawaUi commented Nov 18, 2024 •

edited by neolit123

Loading

HirazawaUi commented Nov 18, 2024

HirazawaUi commented Nov 19, 2024

neolit123 left a comment •

edited

Loading

HirazawaUi commented Nov 28, 2024

neolit123 left a comment

neolit123 left a comment

HirazawaUi commented Nov 29, 2024

HirazawaUi commented Nov 30, 2024

HirazawaUi Nov 30, 2024 •

edited

Loading

neolit123 Dec 2, 2024 •

edited

Loading

HirazawaUi Dec 2, 2024

neolit123 Dec 2, 2024 •

edited

Loading

HirazawaUi Dec 4, 2024

neolit123 Dec 4, 2024 •

edited

Loading

HirazawaUi Dec 4, 2024

pacoxu commented Dec 5, 2024

pacoxu Dec 5, 2024

neolit123 Dec 5, 2024

neolit123 commented Dec 5, 2024

neolit123 Dec 6, 2024 •

edited

Loading

HirazawaUi Dec 6, 2024

HirazawaUi Dec 6, 2024

neolit123 Dec 6, 2024 •

edited

Loading

HirazawaUi Dec 7, 2024

HirazawaUi Dec 8, 2024

neolit123 left a comment

HirazawaUi commented Dec 15, 2024

pacoxu commented Dec 16, 2024

k8s-ci-robot commented Dec 16, 2024

neolit123 commented Dec 16, 2024

HirazawaUi commented Dec 16, 2024

	"kubeadm-config": func(c status.Cluster, flags RunOptions) error {
	// Nb. this action is invoked automatically at kubeadm init/join time, but it is possible
	// to invoke it separately as well
	return KubeadmConfig(c, flags.kubeadmConfigVersion, flags.copyCertsMode, flags.discoveryMode, flags.featureGate, flags.encryptionAlgorithm, flags.upgradeVersion, c.K8sNodes().EligibleForActions()...)
	},
	"kubeadm-init": func(c status.Cluster, flags RunOptions) error {
	return KubeadmInit(c, flags.usePhases, flags.copyCertsMode, flags.kubeadmConfigVersion, flags.patchesDir, flags.ignorePreflightErrors, flags.featureGate, flags.encryptionAlgorithm, flags.wait, flags.vLevel)
	},
	"kubeadm-join": func(c status.Cluster, flags RunOptions) error {
	return KubeadmJoin(c, flags.usePhases, flags.copyCertsMode, flags.discoveryMode, flags.kubeadmConfigVersion, flags.patchesDir, flags.ignorePreflightErrors, flags.wait, flags.vLevel)
	},
	"kubeadm-upgrade": func(c status.Cluster, flags RunOptions) error {
	return KubeadmUpgrade(c, flags.upgradeVersion, flags.patchesDir, flags.wait, flags.vLevel)
	},

Add e2e test for kubelet instance config #3126

Add e2e test for kubelet instance config #3126

Conversation

HirazawaUi commented Nov 18, 2024 • edited by neolit123 Loading

HirazawaUi commented Nov 18, 2024

HirazawaUi commented Nov 19, 2024

neolit123 left a comment • edited Loading

Choose a reason for hiding this comment

HirazawaUi commented Nov 28, 2024

neolit123 left a comment

Choose a reason for hiding this comment

neolit123 left a comment

Choose a reason for hiding this comment

HirazawaUi commented Nov 29, 2024

HirazawaUi commented Nov 30, 2024

HirazawaUi Nov 30, 2024 • edited Loading

Choose a reason for hiding this comment

neolit123 Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neolit123 Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neolit123 Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pacoxu commented Dec 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neolit123 commented Dec 5, 2024

neolit123 Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neolit123 Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neolit123 left a comment

Choose a reason for hiding this comment

HirazawaUi commented Dec 15, 2024

pacoxu commented Dec 16, 2024

k8s-ci-robot commented Dec 16, 2024

neolit123 commented Dec 16, 2024

HirazawaUi commented Dec 16, 2024

HirazawaUi commented Nov 18, 2024 •

edited by neolit123

Loading

neolit123 left a comment •

edited

Loading

HirazawaUi Nov 30, 2024 •

edited

Loading

neolit123 Dec 2, 2024 •

edited

Loading

neolit123 Dec 2, 2024 •

edited

Loading

neolit123 Dec 4, 2024 •

edited

Loading

neolit123 Dec 6, 2024 •

edited

Loading

neolit123 Dec 6, 2024 •

edited

Loading