Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet does not start on Windows when using the Prepare-Node.ps1 script #5857

Closed
antoninbas opened this issue Jan 9, 2024 · 0 comments · Fixed by #5858
Closed

kubelet does not start on Windows when using the Prepare-Node.ps1 script #5857

antoninbas opened this issue Jan 9, 2024 · 0 comments · Fixed by #5858
Assignees
Labels
area/OS/windows Issues or PRs related to the Windows operating system. kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@antoninbas
Copy link
Contributor

Describe the bug
When using our Prepare-Node.ps1 to install and configure K8s, the generated c:\k\StartKubelet.ps1 script, which is the command executed by the kubelet nssm service, is incorrect. As a result, kubelet cannot run.

To Reproduce
Follow our Windows installation instructions.

Expected
kubelet runs successfully.
restart-service kubelet does not return an error.

Actual behavior
The kubelet service immediately goes into the paused state, and the K8s Node is not functional.

restart-service kubelet fails:

PS C:\Users\Administrator> restart-service kubelet
restart-service : Failed to start service 'kubelet (kubelet)'.
At line:1 char:1
+ restart-service kubelet
+ ~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OpenError: (System.ServiceProcess.ServiceController:ServiceController) [Restart-Service], ServiceCommandException
    + FullyQualifiedErrorId : StartServiceFailed,Microsoft.PowerShell.Commands.RestartServiceCommand

When one tries to run c:\k\StartKubelet.ps1 manually, one gets the following error:

E0109 20:25:42.017607    2872 bootstrap.go:241] unable to read existing bootstrap client config from /etc/kubernetes/kubelet.conf: invalid configuration: [unable to read client-cert C:\etc\kubernetes\var\lib\kubelet\pki\kubelet-client-current.pem for default-auth due to open C:\etc\kubernetes\var\lib\kubelet\pki\kubelet-client-current.pem: The system cannot find the path specified., unable to read client-key C:\etc\kubernetes\var\lib\kubelet\pki\kubelet-client-current.pem for default-auth due to open C:\etc\kubernetes\var\lib\kubelet\pki\kubelet-client-current.pem: The system cannot find the path specified.]

kubelet is not looking for certificates in the right place. This is a known kubelet issue on Windows (see kubernetes/kubernetes#77710), which is avoided (workaround) by providing an explicit --cert-dir when running kubelet. See https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/hostprocess/PrepareNode.ps1#L73.

Our own version of the script (Prepare-Node.ps1) is also supposed to use --cert-dir but since #5071, it is not done correctly:

PS C:\Users\Administrator> cat c:\k\StartKubelet.ps1
$FileContent = Get-Content -Path "/var/lib/kubelet/kubeadm-flags.env"
$global:KubeletArgs = $FileContent.Trim("KUBELET_KUBEADM_ARGS=`"")

$global:KubeletArgs += "--cert-dir=$env:SYSTEMDRIVE\var\lib\kubelet\pki --config=/var/lib/kubelet/config.yaml --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --hostname-override=$(hostname) --pod-infra-container-image=`"mcr.microsoft.com/oss/kubernetes/pause:1.4.1`" --enable-debugging-handlers --cgroups-per-qos=false --enforce-node-allocatable=`"`" --resolv-conf=`"`" --node-ip=$env:NODE_IP"
$cmd = "C:\k\kubelet.exe $global:KubeletArgs"
Invoke-Expression $cmd

The generated script is not correct. We should have:

diff --git a/hack/windows/Prepare-Node.ps1 b/hack/windows/Prepare-Node.ps1
index 0c71fce7..3b983e30 100644
--- a/hack/windows/Prepare-Node.ps1
+++ b/hack/windows/Prepare-Node.ps1
@@ -123,7 +123,7 @@ if ($InstallKubeProxy) {
     $StartKubeletFileContent += [Environment]::NewLine + '& C:\k\Prepare-ServiceInterface.ps1 -InterfaceAlias "HNS Internal NIC"' + [Environment]::NewLine
 }

-$StartKubeletFileContent += [Environment]::NewLine + '$global:KubeletArgs += "--cert-dir=$env:SYSTEMDRIVE\var\lib\kubelet\pki --config=/var/lib/kubelet/config.yaml --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --hostname-override=$(hostname) --pod-infra-container-image=`"mcr.microsoft.com/oss/kubernetes/pause:1.4.1`" --enable-debugging-handlers --cgroups-per-qos=false --enforce-node-allocatable=`"`" --resolv-conf=`"`" --node-ip=$env:NODE_IP"'
+$StartKubeletFileContent += [Environment]::NewLine + '$global:KubeletArgs += " --cert-dir=$env:SYSTEMDRIVE\var\lib\kubelet\pki --config=/var/lib/kubelet/config.yaml --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --hostname-override=$(hostname) --pod-infra-container-image=`"mcr.microsoft.com/oss/kubernetes/pause:1.4.1`" --enable-debugging-handlers --cgroups-per-qos=false --enforce-node-allocatable=`"`" --resolv-conf=`"`" --node-ip=$env:NODE_IP"'

 $targetVersion = [version]"1.28.0"

Notice the missing whitespace before --cert-dir.

Versions:
Antrea v1.13.2 (and earlier 1.13 patch versions), v1.14.1 (and earlier 1.14 patch versions).

@antoninbas antoninbas added area/OS/windows Issues or PRs related to the Windows operating system. kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jan 9, 2024
@antoninbas antoninbas added this to the Antrea v1.15 release milestone Jan 9, 2024
@antoninbas antoninbas self-assigned this Jan 9, 2024
antoninbas added a commit to antoninbas/antrea that referenced this issue Jan 9, 2024
There was no space before `--cert-dir`. As a result, the command-line
arg had no effect and kubelet could not start.

Fixes antrea-io#5857

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit that referenced this issue Jan 10, 2024
There was no space before `--cert-dir`. As a result, the command-line
arg had no effect and kubelet could not start.

Fixes #5857

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Jan 10, 2024
There was no space before `--cert-dir`. As a result, the command-line
arg had no effect and kubelet could not start.

Fixes antrea-io#5857

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Jan 10, 2024
There was no space before `--cert-dir`. As a result, the command-line
arg had no effect and kubelet could not start.

Fixes antrea-io#5857

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit that referenced this issue Jan 11, 2024
There was no space before `--cert-dir`. As a result, the command-line
arg had no effect and kubelet could not start.

Fixes #5857

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit that referenced this issue Jan 11, 2024
There was no space before `--cert-dir`. As a result, the command-line
arg had no effect and kubelet could not start.

Fixes #5857

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/OS/windows Issues or PRs related to the Windows operating system. kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant