<Arktos Mizar Integration>To able to add worker nodes into RP clusters in scale-out 1 X 1 enviroment #1230

q131172019 · 2021-11-23T07:26:24Z

What type of PR is this?
/kind documentation
/kind feature

What this PR does / why we need it:
When working on the project of Arktos Mizar Integration, Mizar team requests Arktos team to provide scale-out environment and the worker nodes can be added into RP cluster.

This PR enables to add the worker nodes into RP cluster in scale-out 1 TP X 1 RP environment and has been tested in the following 2 sets of scale out 1 TP X 1 RP (1 master node +1 worker node) environment, which is prerequisite of scale-out 2 TPs X 2 RPs environment. All nodes are AWS EC2 instance - t2.2xlarge running Ubuntu 18.04.

TP1: 172.31.3.192
RP1: 172.31.5.191
Worker node-1: 172.31.4.110
Worker node-2: 172.31.29.26

TP2: 172.31.5.56
RP1: 172.31.13.237
Worker node: 172.31.2.149

The script ./hack/arktos-worker-up.sh running on worker node enables kubelet and kube-proxy as well as the flannel is installed in process mode on worker node so that joining into RP cluster can be successful.

Also, the codes of this PR also works in scale-up environment ( master node + n x worker nodes) and have been tested in scale-up 1 + 2 environment below.

Master node : 172.31.22.85
Worker node-1: 172.31.29.128
Worker node-2: 172.31.24.185
Worker node-3: 172.21.5.205

Which issue(s) this PR fixes:
N/A

Special notes for your reviewer:
In scale-out 1 TP X 1 RP cluster, currently when 2nd worker node attempts to join RP cluster, in its flannel log /tmp/flanneld.log, you will see the following error:

E1203 06:00:52.242285 1298 route_network.go:115] Error adding route to 10.244.0.0/24 via 172.31.5.191 dev index 2: network is unreachable
I1203 06:00:52.242309 1298 route_network.go:86] Subnet added: 10.244.1.0/24 via 172.31.4.110
E1203 06:00:52.242497 1298 route_network.go:115] Error adding route to 10.244.1.0/24 via 172.31.4.110 dev index 2: network is unreachable

In scale-up cluster, currently when 3rd worker node attempts to join cluster, in its flannel log /tmp/flanneld.log, you will see the following error:

E1203 04:15:33.103839 18746 route_network.go:115] Error adding route to 10.244.0.0/24 via 172.31.22.85 dev index 2: network is unreachable
I1203 04:15:33.103858 18746 route_network.go:86] Subnet added: 10.244.1.0/24 via 172.31.29.128
E1203 04:15:33.103956 18746 route_network.go:115] Error adding route to 10.244.1.0/24 via 172.31.29.128 dev index 2: network is unreachable
I1203 04:15:33.103975 18746 route_network.go:86] Subnet added: 10.244.2.0/24 via 172.31.24.185
E1203 04:15:33.104110 18746 route_network.go:115] Error adding route to 10.244.2.0/24 via 172.31.24.185 dev index 2: network is unreachable

Need further investigate some limits of network on AWS EC2 instance type - t2.2xlarge.

Does this PR introduce a user-facing change?:
YES.

======== Scale out environment ===================
0.  Follow up the procedure at https://github.com/q131172019/arktos/blob/CarlXie_singleNodeArktosCluster/docs/setup-guide/setup-dev-env.md to create development environment on each node

1.  Follow up the procedure of [set up scale-out 1 X 1 environment](https://github.com/CentaurusInfra/arktos/blob/master/docs/setup-guide/scale-out-local-dev-setup.md) and run the following scripts to automatically start TP1 and RP1
1.1) On TP1: 
        ./hack/scale-out-1x1-rp1-multi-nodes/scale-out-TP1-node.sh <RP1_IP>
1.2) On RP1: 
        ./hack/scale-out-1x1-rp1-multi-nodes/scale-out-RP1-node.sh <TP1_IP>

2. On worker nodes to join into RP1 cluster, run the script to automatically join into RP1 cluster
     ./hack/scale-out-1x1-rp1-multi-nodes/scale-out-RP1-worker-node-join.sh <RP1_IP>

3.  On RP1 node, check the status of node and check the flannel log on each node of RP1 cluster
     ./cluster/kubectl.sh get nodes
      cat /tmp/flanneld.log

4. Test whether the ngnix application can be deployed successfully
     ./cluster/kubectl.sh run nginx --image=nginx --replicas=10
     ./cluster/kubectl.sh get pod -n default -o wide
     ./cluster/kubectl.sh delete deployment/nginx

5. Please follow up the steps to do end-to-end verification of service in scale-out cluster at https://github.com/CentaurusInfra/arktos/issues/1143

======== Scale up environment ===================
0.  Follow up the procedure at https://github.com/q131172019/arktos/blob/CarlXie_singleNodeArktosCluster/docs/setup-guide/setup-dev-env.md to create development environment on each node

1.  On master node: run the following script to start single node scale-up cluster,
     ./hack/scale-up-multi-nodes/scale-up-master-node.sh

2.  On worker nodes: run the following script to join into scale-up cluster
     ./hack/scale-up-multi-nodes/scale-up-worker-node-join.sh <MASTER_NODE_IP>

3.  On master node, check the status of nodes and check the flannel log /tmp/flanneld.log on each node
     ./cluster/kubectl.sh get nodes
      cat /tmp/flanneld.log

4. Test whether the ngnix application can be deployed successfully
     ./cluster/kubectl.sh run nginx --image=nginx --replicas=10
     ./cluster/kubectl.sh get pod -n default -o wide
     ./cluster/kubectl.sh exec -ti <1st pod> -- curl <IP of  another nginx pods>
     ./cluster/kubectl.sh exec -ti <2st pod> -- curl <IP of  another nginx pods>
     ./cluster/kubectl.sh delete deployment/nginx

5. Please follow up the steps to do end-to-end verification of service in scale-up cluster at https://github.com/CentaurusInfra/arktos/issues/1142

…onment

centaurus-cloud-bot · 2021-11-23T07:26:27Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign dingyin
You can assign the PR to them by writing /assign @dingyin in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

hack/arktos-worker-up.sh

Sindica · 2021-11-23T22:48:24Z

Question for set up instruction:

Set up instruction should be a .md file checked in together in this commit so we later can find the instruction easily.
Step 2: "On worker node to join into RP cluster, copy the following files from RP node to the directory /tmp/arktos". Why /tmp/arktos? Can we use same config folder for master and worker?
Step 3: "Clean up the directories - /opt/cni/bin and /etc/cni/net.d" Can this be incorporated into arktos-worker-up.sh?
Step 5: too many manual steps. Can this be done automatically with a single flag?

q131172019 · 2021-11-23T23:42:22Z

Question for set up instruction:

Set up instruction should be a .md file checked in together in this commit so we later can find the instruction easily.
Yes. I agree to write the .md file

Step 2: "On worker node to join into RP cluster, copy the following files from RP node to the directory /tmp/arktos". Why >>/tmp/arktos? Can we use same config folder for master and worker?

We can use same config folder for master and worker if it does not create confusion between master and worker.

Step 3: "Clean up the directories - /opt/cni/bin and /etc/cni/net.d" Can this be incorporated into arktos-worker-up.sh?
Yes.

Step 5: too many manual steps. Can this be done automatically with a single flag?
Yes.

hack/arktos-cni.rc

hack/arktos-up-scale-out-poc.sh

hack/arktos-cni.rc

hack/arktos-up-scale-out-poc.sh

hack/arktos-worker-up.sh

q131172019 · 2022-02-25T04:15:57Z

This PR can be closed because it is replaced by larger PR 1382.

To able to add worker nodes into RP clusters in scale-out 1 X 1 envir…

1d8254f

…onment

centaurus-cloud-bot added the size/L label Nov 23, 2021

q131172019 requested review from Sindica and h-w-chen November 23, 2021 07:27

Re-word change

f5666a9

q131172019 closed this Nov 23, 2021

q131172019 reopened this Nov 23, 2021

Sindica reviewed Nov 23, 2021

View reviewed changes

hack/arktos-worker-up.sh Outdated Show resolved Hide resolved

q131172019 added 2 commits November 23, 2021 22:08

Make improvment based on Ying's suggestion

906654a

Make improvment CentaurusInfra#2 based on Ying's suggestion

5a38c8e

Sindica reviewed Nov 23, 2021

View reviewed changes

hack/arktos-worker-up.sh Outdated Show resolved Hide resolved

q131172019 closed this Nov 23, 2021

q131172019 reopened this Nov 23, 2021

h-w-chen reviewed Nov 24, 2021

View reviewed changes

hack/arktos-cni.rc Outdated Show resolved Hide resolved

h-w-chen reviewed Nov 24, 2021

View reviewed changes

hack/arktos-up-scale-out-poc.sh Outdated Show resolved Hide resolved

h-w-chen reviewed Nov 24, 2021

View reviewed changes

hack/arktos-cni.rc Outdated Show resolved Hide resolved

h-w-chen reviewed Nov 29, 2021

View reviewed changes

hack/arktos-cni.rc Outdated Show resolved Hide resolved

h-w-chen reviewed Nov 29, 2021

View reviewed changes

hack/arktos-up-scale-out-poc.sh Outdated Show resolved Hide resolved

h-w-chen reviewed Nov 29, 2021

View reviewed changes

hack/arktos-worker-up.sh Show resolved Hide resolved

Optimize the codes

1225e1e

q131172019 closed this Dec 3, 2021

q131172019 reopened this Dec 3, 2021

add md file and add comment for AWS

b33cb75

centaurus-cloud-bot added size/XL and removed size/L labels Dec 3, 2021

q131172019 added 2 commits December 3, 2021 08:48

Reword .md file

41d6e95

Remove some unncessary comments

38b4dd9

q131172019 closed this Dec 3, 2021

q131172019 reopened this Dec 3, 2021

q131172019 closed this Dec 3, 2021

q131172019 reopened this Dec 3, 2021

yb01 closed this Dec 3, 2021

yb01 reopened this Dec 3, 2021

q131172019 mentioned this pull request Feb 11, 2022

[Arktos]Add workers in scale-up environment on AWS Ubuntu1804&Ubuntu2004 and scale-out 2x2 environment on AWS Ubuntu 2004 #1354

Closed

q131172019 mentioned this pull request Feb 25, 2022

[Arktos] The scripts for scale-up + workers environment on AWS Ubuntu1804&Ubuntu2004 and scale-out 2x2 + workers environment on AWS Ubuntu 2004 #1382

Merged

q131172019 closed this Feb 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

<Arktos Mizar Integration>To able to add worker nodes into RP clusters in scale-out 1 X 1 enviroment #1230

<Arktos Mizar Integration>To able to add worker nodes into RP clusters in scale-out 1 X 1 enviroment #1230

q131172019 commented Nov 23, 2021 •

edited

Loading

centaurus-cloud-bot commented Nov 23, 2021

Sindica commented Nov 23, 2021

q131172019 commented Nov 23, 2021 •

edited

Loading

q131172019 commented Feb 25, 2022

<Arktos Mizar Integration>To able to add worker nodes into RP clusters in scale-out 1 X 1 enviroment #1230

<Arktos Mizar Integration>To able to add worker nodes into RP clusters in scale-out 1 X 1 enviroment #1230

Conversation

q131172019 commented Nov 23, 2021 • edited Loading

centaurus-cloud-bot commented Nov 23, 2021

Sindica commented Nov 23, 2021

q131172019 commented Nov 23, 2021 • edited Loading

q131172019 commented Feb 25, 2022

q131172019 commented Nov 23, 2021 •

edited

Loading

q131172019 commented Nov 23, 2021 •

edited

Loading