-
Notifications
You must be signed in to change notification settings - Fork 69
Computing 130 2022 Tracks
Yunwen Bai edited this page Feb 7, 2022
·
133 revisions
- VM support
- Mizar integration
- Daemonset, system pod support
- Cluster initialization tool
- Scalabity
- Adopt service support automation into Kubemark
- PVC support - PR 977
- Management plane resource optimization - thoughts
- 1TP/1RP 25K cluster p99 ~ 5s with system throughput 100/25
- 2TP/2RP 50K cluster p99 7.32s with system throughput 200/50
- 5TP/5RP 50K cluster p99 3.7s with system throughput 200/50
-
VM support
- Arktos API extension to support OpenStack-like REST APIsto CREATE/GET/LIST/OPERATIONs on VMs - Yunwen
Design doc (PR 1226) - done
Time Description 12/15/2021Support single VM creation/list/get/delete, with default network, image and flavor PR 12421/6/2022Support vm actions
Regression in VM action operations in Arktos Issue 12511/4/2022 Add static image registry and flavor route and config map for basic images, and basic Openstack flavorsPR 1249, PR 12671/21/2022 Support create VMs in batch with deployment/replicaset
-batch creation with replicaset PR 12841/24/2022 - buffer -
Priority of tasks for the next 1-2 weeks.
Bar for merging master, essential UT cases. PR 1331- Arktos-up.sh use Mizar for VM network. hopefully this is just a verification effort after mizar+arktos integration, with minor modification of the VM request structure. Pending fix and verification on Mizar issue: # and Arktos issue #1324
- Get a test VM for other team to try out, OR instruction on how to set the env up and running the test scripts. Done with Ubuntu 18.04. Pending fix of issue #1324 for OS 20.04
Design doc update -- TODOs to make it an official Arktos API extension to support OpenStack NOVA-like-API calls. PR 1341Punt. Batch deletion option for not re-creating VM after being deleted -- likely punt for post 130
-
Kubemark(perf/scale)test automation with VM types
- test infra changes
- baseline of perf data with VM types
-
Arktos VM runtime design optimization(design and POC)
- runtime to eliminate the Podsandbox for VM types
- podConvertor to directly convert VM to libvirt xml VM format
- I/O optimization for VM type
- network optimization for VM type
- Arktos API extension to support OpenStack-like REST APIsto CREATE/GET/LIST/OPERATIONs on VMs - Yunwen
-
Cluster initialization tool
Kube up support scale out - (PR 1227)-
Kube up admin cluster support ubuntu 20.04 and scale out
Replace docker with containerd in kube up - SonyaDaemonset support in kube up scale out - HongweiNode ready investigation PR 1292 HongweiCreate default network object for system tenant upon cluster start up - Carl Issue 1279 PR 1310Start arktos network controller - Sonya RP 1298- 500 node perf test with scale up & scale out (2x2) - 1/12, 1/27 working
Confirm kube up scale up with host network for management plane pods, flannel for customer pods, Ubuntu 20.04 - On hold
- Kube up optimization
-
Mizar integration - Hongwei, Carl, Ying H.
- Mizar test/release process update request - Vinay Issue 595
- Mizar integration with Arktos scale up - Hong, Vinay, YingH.
-
Case 1: VPC0 creation - POC ready 11/12, Mizar code has issue after gPRC code mergePR 1114 not stable (1/4 works) - Hong Chang's change works 11/11Stabilize Arktos integration with Mizar - (RP 1223) 11/12
-
Case 1.5: Create second VPC in system tenant default namespace, attach pod to the new VPC - assigned IP from vpc0 - (Issue 567) - Closed as this case works following new Mizar manualPrevious configuration via arkto network object stopped working, Mizar plan to decouple with Arktos, does not have plan to maintain itPod created in system tenant can be attached to secondary VPC successfully
-
Case 2: Create VPC for new tenant Issue 568Pod in non system tenant cannot be attached to VPC created in system tenant-
Tenant pod was correct annotation with new VPC, but got ip in default VPCMizar currently uses Annotation for K8s and Label for Arktos. Mizar will make changes to use Annotation only.
- Case 3: Multi-tenant network isolation - verify - YingH
-
Case 4: Add minion, validate pod created in system tenant with VPC0 can communicate with pod created in master-
Not working with kubeadm set up cluster (Issue 562)Started master with Mizar enabled, started worker with Mizar enabled, node is ready (via 8080), netpods are running. Pods deployed on master node can ping each other on the same node, cannot ping other pods on worker. Pods deployed on worker node cannot ping others. - Hong fixed it 12/2
Vinay said set up cluster and join worker then start Mizar should work, will check the new steps in Arkto. If it works, 562 won't be a blocking issue - 11/30 update on issue 562 tells us this steps should not work as well.Pod created in worker cannot ping others - Hong
-
-
Case 5: deploy multiple pods via deploymentEach worker can only start 1-3 pods. Rest are pending network interface Issue 578
-
-
Mizar integration with Arktos scale out - YingH, Carl
-
1x1 with worker integrated with Mizar on dev envMizar node controller cannot get new worker node - PR 1240
-
2x2 with Mizar on dev env
-
2x1 with Mizar w/o minion - Hong, YingH
Arktos bug: Incorrect env variable set up for multiple TPs Issue 1253 PR 1257Arkto bug: service support in multiple TP did not set FIRST_SERVICE_CLUSTER_IP for local dev env PR 1259Mizar: pod deployed from 2nd TP cannot ping other PR 1263Trying to start multiple pods, failed to start when there is more than one pod - Hong
-
2x1 with Mizar with minionSupport multiple TPs in worker start script PR 1265
-
2x2 with Mizar w/o minionTrying to start 2 netpod on tp1, only one started, 2nd failed to start with network issue (same as 2x1 w/o worker)
2x2 with Mizar with minion
-
2x1 with Mizar w/o minion - Hong, YingH
- Dev env scale out support multiple minions - Carl
Arktos up add minion with flannel in process mode- Arktos scale out 1x1 add minions PR 1230 - on hold (Carl)
KCM Mizar controller changes for multiple RP - YingH.
-
-
Tenant VPC support in Arktos-Mizar integration - Carl, YingH
Test create pod with VPC without arktos network object - DoneMulti-tenant support with Mizar-Arktos integration - Design - Hongwei, YingH PR 1231Arktos network controller cannot get event PR 1237Automatically create VPC and subnet for new tenant - Carl PR 1281-
Automatically assign pod ip from default tenant vpc - YingH
, Carl-
Integration Test in local scale up env, with workers - YingH
Mizar does not release unused IP and stops after 32 ip addresses were used across the entire system - HongTenant coredns-default pod stuck in ContainerCreating due to vpc annotation is missing Issue 1293 PR 1295 - YingHPod created with VPC annotation but force changed to default VPC Issue 1285 PR 1296 - YingHWhen pod is scheduled to worker, it cannot be started correctly - missing proxy in worker - Ying H, pending PR 1304- Create a VPC/subnet manually, delete subnet/vpc stuck -P2 Issue 1287
-
Integration Test in local scale out env, with workers - YingH
- 2TP1RP with 1 worker, naturally deploy mizar operator - Hongwei
- System Coredns, kube-dns are crashing and restarting in scale up and scale up env 1309 - YingH
-
Integration Test in local scale up env, with workers - YingH
- Automatically assign service ip from default tenant vpc - Ying, Phu
- Scalability design for Mizar controller - Hongwei, Carl, YingH, Phu, Vinay
- Mizar network controller needs following information: tenant, pod, service, endpoint, node
- Data volume is huge, check whether it can be minimized by getting partial objects - Phu
- OS upgrade from Ubuntu 16.04 to 18.04/20.04 - Carl, Sonya
- Mizar does not support 16.04. 1604 LTS end of life 4/2021
- Arktos for dev need some changes to support 20.04 LTS
- Kube up current depends on 16.04. Needs significant work - Upgrade to 20.04?
- Kubemark(perf/scale) test automation with Mizar
-
System pod support in scale out - YingH.Kube proxy PR 1250
-
Daemonset support in scale out - Hongwei-
Manual set up daemonset in scale out: 1x1 - Hongwei (Issue 1222)Add RP clients to daemonset controller - done PR 1224
Review previous design, possibly combined with system pod supportKubelet changes PR 1244Admin plugin - restrict to system tenant only PR 1252
-
-
Scalability
- 60/75K - Sonya
75K - 3TPx3RP QPS 100/2560K - 3TPx3RP QPS 80/20?- Premium network in GCE
Scale up single cluster size 10/15K - Sonya - Done-
Updated perf test tool running with K8s 1.18 - Sonya/Ying H.K8s 1.18 code: https://github.com/Sindica/arktos/commits/release-1.18.5Perf test code for 1.18: https://github.com/Sindica/perf-tests/tree/test-1.18.5-with-perftest-tools-lw-reduceBefore change,zgrep "\"verb\":\"list\"" kube-apiserver-audit.log* | grep pods | grep "\"userAgent\":\"clusterloader" | wc -l
997; after change, 8 (500 nodes density test 10/12).
- Watcher reduce impact experiment - Yunwen - on hold
- API server throttling mechanism - Yunwen/Ying H. - on hold
- 60/75K - Sonya
-
Adopt service support automation into Kubemark - on hold
- PV/PVC support - Ying H.
- CSINodeInfo feature is disabled in scheduler (Issue 1099)
- Underlying storage picking/support in GCP for release 1.0
- TODO
- Arktos network controller - integration with Mizar or KCM or adding Promethus monitoring
- https://github.com/CentaurusInfra/arktos/issues/1192
- Scheduler resource calculation correctness - post 1.0
- Target release 1.0
- Chinese New Year is Tuesday February 1st. Considering time difference and weekend, the targeted latest release date will be Friday, January 28th, 2022.