Guide for adding windows node: RBAC config not found #261

twity1337 · 2022-12-18T23:52:33Z

Describe the bug
In the guide for adding windows nodes (link) in section "Getting started: Adding a Windows Node to Your Cluster" there is a dead link to the RBAC file.
https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/kubeadm/flannel/kube-flannel-rbac.yml -> Results in 404

Therefore the step 5 does not succeed and flannel (in pod "kube-flannel-ds-windows-...") is failing with the following error:

Starting flannel
I1219 00:36:55.650017    6060 alivpc_windows.go:22] AliVpc is not supported on this platform
I1219 00:36:55.651310    6060 awsvpc_windows.go:22] AWS VPC is not supported on this platform
I1219 00:36:55.651310    6060 gce_windows.go:45] GCE is not supported on this platform
I1219 00:36:55.652126    6060 ipip_windows.go:23] ipip is not supported on this platform
I1219 00:36:55.652126    6060 ipsec_windows.go:20] ipsec is not supported on this platform
I1219 00:36:55.652126    6060 tencentvpc_windows.go:22] TencentVpc is not supported on this platform
I1219 00:36:55.652126    6060 udp_windows.go:22] udp is not supported on this platform
I1219 00:36:55.653902    6060 main.go:456] Searching for interface using 172.18.80.101
I1219 00:36:55.660936    6060 main.go:533] Using interface with name Ethernet and address 172.18.80.101
I1219 00:36:55.660936    6060 main.go:550] Defaulting external address to interface address (172.18.80.101)
E1219 00:36:55.737636    6060 main.go:251] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-windows-amd64-qbw26': pods "kube-flannel-ds-windows-amd64-qbw26" is forbidden: User "system:serviceaccount:kube-system:flannel" cannot get resource "pods" in API group "" in the namespace "kube-system"

If somebody could tell me a complete working guide for how to setup flannel on Windows, I would highly appreciate that.

To Reproduce
Steps to reproduce the behavior:

Follow the steps in the guide for adding windows nodes.

Expected behavior
A clear and concise description of what you expected to happen.

Kubernetes (please complete the following information):

Windows Server version: 2019 Version 1809
Kubernetes Version: 1.25.3
CNI: 0.2.0

The text was updated successfully, but these errors were encountered:

Mik4sa · 2022-12-19T17:57:05Z

The rbac.yml moved to an other location with this commit a83a9a4#diff-08e0f4b10a121110eacb209aa162f255c8c730afc9bd5666c62a5fbaf04c8174.
Use this one: https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/hostprocess/flannel/flanneld/kube-flannel-rbac.yml

twity1337 · 2022-12-19T18:15:45Z

Thank you very much.
Furthermore I think the line

curl -L https://github.com/kubernetes-sigs/sig-windows-tools/releases/latest/download/kube-proxy.yml | sed 's/KUBE_PROXY_VERSION/v1.25.3/g' | kubectl apply -f -

should read (replaced KUBE_PROXY_VERSION with VERSION)

curl -L https://github.com/kubernetes-sigs/sig-windows-tools/releases/latest/download/kube-proxy.yml | sed 's/VERSION/v1.25.3/g' | kubectl apply -f -

Right?

Mik4sa · 2022-12-19T19:02:45Z

I just used the yaml files from master and not from the latest release

twity1337 · 2022-12-19T19:09:37Z

What have you done to make it work in the end?
I followed the guide, but in the end kube-proxy is failing either with the message hcs::CreateComputeSystem kube-proxy: The directory name is invalid. (on Windows Server 2019) or error creating endpoint hcnCreateEndpoint failed in Win32: IP address is either invalid or not part of any configured subnet(s). (on Windows Server 2022).

It does not matter, what kind of CNI config for NAT I apply at C:\etc\cni\net.d\0-containerd-nat.json. (Assuming I have to create it by myself on the node, since it wasn't deployed automatically)

Mik4sa · 2022-12-19T19:57:13Z

As far as I remember this was everything. So except of the both mentioned things above I was following the guide without extras.
Since I'm using k8s 1.26 I had to consider the PR #259 too

Also I changed no config for cni. I stayed with defaults

twity1337 · 2022-12-19T23:43:22Z

Okay, thanks then. Actually I'm facing a lot of issues setting up a Kubernetes cluster under Windows with containerd as runtime.
Maybe the fact that I run the nodes in a Hyper-V VM might cause some troubles with the container runtime then.

fabi200123 · 2022-12-20T14:29:05Z

@twity1337 I have updated the actual guide on this repo and tested it as well for flannel and calico as well. It works okay for me now for v1.25.3. Let me know if you have problems anymore with this. Also the scripts are gonna be updated soon here #262

Mik4sa · 2022-12-21T16:11:48Z

@twity1337 I noticed that I still had some problems running some pods on that node. So I created a fork on my own and tried to fix all the things. Maybe you want to have a look and maybe this help you to solve your problems (though that I never had the errors you currently seem to have).
I update the guide, check it out: https://github.com/Mik4sa/sig-windows-tools/blob/flannel-hostprocess/guides/guide-for-adding-windows-node.md

Note: I had to create my own images for flannel-hostprocess and kube-proxy which are referenced in the updated .yaml files aswell. Keep that in mind before executing them.

twity1337 · 2023-01-05T14:50:05Z

@Mik4sa Thanks for sharing that link, unfortunately I don't have access on it.

How do you both run the the Worker node? I'm running it on Hyper-V, therefore I'm wondering if there are any known issues while running the Windows worker node on Hyper-V.
So my (development) cluster used to look like this:

Controlplane - Debian Bullseye, Hyper-V VM
Worker Node - Windows Server 2019, Hyper-V VM
both running on a Windows Server 2019 machine as the Hyper-V host, where I don't have full admin rights.

I'm currently trying to set up everything on bare metal, while facing some other issues with my local setup. However, I keep you updated.

Mik4sa · 2023-01-05T14:56:56Z

That's because all my required changes were merged and I deleted my fork. You should now be able to follow the master branch as it is right now.

I have one control plane with Ubuntu 22.04 and a worker node with Windows Server 2022. Both are real machines, no VMs.

Note: About one year ago I tested this on Hyper-V on my Windows 10 (or 11?) machine. Back then I used Kubernetes 1.23.x and Docker (with non-process images). That worked so far

twity1337 · 2023-01-09T03:19:23Z

Thanks for your detailed answer, @Mik4sa .
I want to avoid using docker, because it is already deprecated.

However, after setting up the stuff on my private physical machines and Evaluation release of Windows Server 2019, I was able to get it half way running.
At least, the pod has been created now. However, since the step of copying the C:\run\flannel\subnet.env file to the worker node is just written in an "Info-Note" box, I oversaw it before joining the node. After running "kubeadm reset", deleting the node object on the controlplane, deleting the created files on the C-drive of the worker node, copying the subnet file, and rejoining the node with "kubeadm join", I still get an error:

root@controlplane:/# kubectl logs -n kube-system kube-proxy-windows-c5hrw 
Write files so the kubeconfig points to correct locations


    Directory: C:\var\lib


Mode                LastWriteTime         Length Name                                                                  
----                -------------         ------ ----                                                                  
d-----       09.01.2023     03:44                kube-proxy                                                            
Finding sourcevip
Cannot index into a null array.
At C:\C\9a841a2e8684bdbdc81630803f1b4e51dc2b9bb025b039df935b228c232e5888\kube-proxy\start.ps1:19 char:9
+         $subnet = $hnsNetwork.Subnets[0].AddressPrefix
+         ~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : NullArray

As you can see, the HNS-Network doesn't seem to have a Subnet configured. The access in the powershell script is therefore failing. What am I missing?

Let me investigate a little bit more if Hyper-V is really the cause for those error messages, that I got in the first place. After isolating the error I might open an issue - in the containerd repo (?).

Mik4sa · 2023-01-09T06:07:57Z

What's the content of the both sourcevip files on your worker node?

Note: When I was experimenting with resetting and rejoining my worker node I had to carefully revert everything what was done by the Install and the Prepare script. Otherwise I later got errors in different situations.
Like that stuff is already existing and similar.

twity1337 · 2023-01-09T11:58:16Z

What's the content of the both sourcevip files on your worker node?

What are files are you talking about? The directory "C:\sourcevip" on my worker node is empty.

Mik4sa · 2023-01-09T12:02:51Z

Then this is your problem. There should be two files, sourceVip.json and sourceVipRequest.json. You might want to check why these two are missing.

Edit: Oh I'm sorry. These files get created after you resolved your current problem

Mik4sa · 2023-01-11T16:58:23Z

Interestingly I got the same error now you described first in this issue. It started after we rebooted the control plane (Linux) and the worker node (windows). I did it simultaneously so I can't say which one, if not both, was the cause. I'm going to have a look at it tomorrow or the day after. Maybe I find something out which helps you aswell.

twity1337 · 2023-01-11T23:04:47Z

Okay, so I managed to make it work now (on bare-metal, Windows Server 2022). For some reason, the RBAC file was not applied, and therefore the HNS-Network and the pod itself was not created by the "kube-flannel-ds-windows" pod.
Btw: Manually copying the subnet.env was not required in my case, since it automatically was created. (Different, than it was proposed in the guide.)

However, pod networking still seems not fully functional:
The guide for scheduling Windows containers in Kubernetes (from the Kubernetes website) lists several verification steps after deploying a basic webserver application.

It seems, my windows pods don't have outbound connectivity and are only reachable from the controlplane node.

$ kubectl get service
NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
kubernetes      ClusterIP   10.96.0.1       <none>        443/TCP        2d20h
win-webserver   NodePort    10.96.173.206   <none>        80:30040/TCP   32m

$ kubectl get pods -o wide
NAME                             READY   STATUS    RESTARTS   AGE    IP            NODE                            NOMINATED NODE   READINESS GATES
win-webserver-585f6c9dc6-5f4xn   1/1     Running   0          40m    10.244.6.2    win-server2022           <none>           <none>
win-webserver-585f6c9dc6-bmfls   1/1     Running   0          40m    10.244.6.3    win-server2022           <none>           <none>

$ kubectl get nodes -o wide
NAME                            STATUS   ROLES           AGE     VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                    KERNEL-VERSION      CONTAINER-RUNTIME
kube-controlplane               Ready    control-plane   2d21h   v1.26.0   192.168.0.39   <none>        Ubuntu 22.04.1 LTS                          5.15.0-56-generic   containerd://1.6.14
win-server2022                  Ready    <none>          73m     v1.26.0   192.168.0.20   <none>        Windows Server 2022 Datacenter Evaluation   10.0.20348.1487     containerd://1.6.8

According to the guide:

Node-to-pod:
- curl 10.244.6.2 -> fail
Pod-to-Pod:
- kubectl exec pods/win-webserver-585f6c9dc6-5f4xn -- curl 10.244.6.3 -> success
Service-to-pod:
- curl 10.96.173.206 -> fail (is Cluster-IP = Virtual-Service IP?)
- kubectl exec pods/win-webserver-585f6c9dc6-5f4xn -- curl 10.96.173.206 -> fail
Service discovery:
- kubectl exec pods/win-webserver-585f6c9dc6-5f4xn -- nslookup kubernetes.default -> fail
- kubectl exec pods/win-webserver-585f6c9dc6-5f4xn -- nslookup win-webserver.default -> fail
Inbound connectivitiy:
- curl 192.168.0.20:30040 -> used to work, but after rejoining the cluster it doesn't anymore
Outbound connectivity:
- kubectl exec pods/win-webserver-585f6c9dc6-5f4xn -- curl 142.251.36.238 -> success
- kubectl exec pods/win-webserver-585f6c9dc6-5f4xn -- curl www.google.com -> fail

So, I think something might be wrong with the DNS. Which is strange, because flannel should care about this.
Did you face this issue already somewhere? Also, what am I missing about the Service-To-Pod connection?

k8s-triage-robot · 2023-04-11T23:13:47Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2023-05-11T23:38:37Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2023-06-10T23:56:56Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2023-06-10T23:57:01Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

twity1337 changed the title ~~Guid for adding windows node: RBAC config not found~~ Guide for adding windows node: RBAC config not found Dec 18, 2022

Mik4sa mentioned this issue Jan 12, 2023

kube-flannel for linux fails to (re)start after rbac for kube-flannel for windows is applied #277

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 11, 2023

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 11, 2023

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guide for adding windows node: RBAC config not found #261

Guide for adding windows node: RBAC config not found #261

twity1337 commented Dec 18, 2022 •

edited

Loading

Mik4sa commented Dec 19, 2022 •

edited

Loading

twity1337 commented Dec 19, 2022

Mik4sa commented Dec 19, 2022

twity1337 commented Dec 19, 2022

Mik4sa commented Dec 19, 2022

twity1337 commented Dec 19, 2022 •

edited

Loading

fabi200123 commented Dec 20, 2022

Mik4sa commented Dec 21, 2022

twity1337 commented Jan 5, 2023

Mik4sa commented Jan 5, 2023 •

edited

Loading

twity1337 commented Jan 9, 2023 •

edited

Loading

Mik4sa commented Jan 9, 2023 •

edited

Loading

twity1337 commented Jan 9, 2023 •

edited

Loading

Mik4sa commented Jan 9, 2023 •

edited

Loading

Mik4sa commented Jan 11, 2023

twity1337 commented Jan 11, 2023

k8s-triage-robot commented Apr 11, 2023

k8s-triage-robot commented May 11, 2023

k8s-triage-robot commented Jun 10, 2023

k8s-ci-robot commented Jun 10, 2023

Guide for adding windows node: RBAC config not found #261

Guide for adding windows node: RBAC config not found #261

Comments

twity1337 commented Dec 18, 2022 • edited Loading

Mik4sa commented Dec 19, 2022 • edited Loading

twity1337 commented Dec 19, 2022

Mik4sa commented Dec 19, 2022

twity1337 commented Dec 19, 2022

Mik4sa commented Dec 19, 2022

twity1337 commented Dec 19, 2022 • edited Loading

fabi200123 commented Dec 20, 2022

Mik4sa commented Dec 21, 2022

twity1337 commented Jan 5, 2023

Mik4sa commented Jan 5, 2023 • edited Loading

twity1337 commented Jan 9, 2023 • edited Loading

Mik4sa commented Jan 9, 2023 • edited Loading

twity1337 commented Jan 9, 2023 • edited Loading

Mik4sa commented Jan 9, 2023 • edited Loading

Mik4sa commented Jan 11, 2023

twity1337 commented Jan 11, 2023

k8s-triage-robot commented Apr 11, 2023

k8s-triage-robot commented May 11, 2023

k8s-triage-robot commented Jun 10, 2023

k8s-ci-robot commented Jun 10, 2023

twity1337 commented Dec 18, 2022 •

edited

Loading

Mik4sa commented Dec 19, 2022 •

edited

Loading

twity1337 commented Dec 19, 2022 •

edited

Loading

Mik4sa commented Jan 5, 2023 •

edited

Loading

twity1337 commented Jan 9, 2023 •

edited

Loading

Mik4sa commented Jan 9, 2023 •

edited

Loading

twity1337 commented Jan 9, 2023 •

edited

Loading

Mik4sa commented Jan 9, 2023 •

edited

Loading