Skip to content

Computing 130 2022 Tracks

Yunwen Bai edited this page Feb 7, 2022 · 133 revisions

Goals

Top priority

  1. VM support
  2. Mizar integration
  3. Daemonset, system pod support
  4. Cluster initialization tool

Secondary priority and backlog

  1. Scalabity
  2. Adopt service support automation into Kubemark
  3. PVC support - PR 977
  4. Management plane resource optimization - thoughts

Current status

Release 0.9

  1. 1TP/1RP 25K cluster p99 ~ 5s with system throughput 100/25
  2. 2TP/2RP 50K cluster p99 7.32s with system throughput 200/50
  3. 5TP/5RP 50K cluster p99 3.7s with system throughput 200/50

Current Work in Progress (2/1)

  1. VM support

    • Arktos API extension to support OpenStack-like REST APIsto CREATE/GET/LIST/OPERATIONs on VMs - Yunwen
    Time Description
    12/15/2021 Support single VM creation/list/get/delete, with default network, image and flavor PR 1242
    1/6/2022 Support vm actions
    Regression in VM action operations in Arktos Issue 1251
    1/4/2022 Add static image registry and flavor route and config map for basic images, and basic Openstack flavors
    PR 1249, PR 1267
    1/21/2022 Support create VMs in batch with deployment/replicaset
    - batch creation with replicaset PR 1284
    1/24/2022 - buffer
    • Priority of tasks for the next 1-2 weeks.

      • Bar for merging master, essential UT cases. PR 1331
      • Arktos-up.sh use Mizar for VM network. hopefully this is just a verification effort after mizar+arktos integration, with minor modification of the VM request structure. Pending fix and verification on Mizar issue: # and Arktos issue #1324
      • Get a test VM for other team to try out, OR instruction on how to set the env up and running the test scripts. Done with Ubuntu 18.04. Pending fix of issue #1324 for OS 20.04
      • Design doc update -- TODOs to make it an official Arktos API extension to support OpenStack NOVA-like-API calls. PR 1341
      • Punt. Batch deletion option for not re-creating VM after being deleted -- likely punt for post 130
    • Kubemark(perf/scale)test automation with VM types

      • test infra changes
      • baseline of perf data with VM types
    • Arktos VM runtime design optimization(design and POC)

      • runtime to eliminate the Podsandbox for VM types
      • podConvertor to directly convert VM to libvirt xml VM format
      • I/O optimization for VM type
      • network optimization for VM type
  2. Cluster initialization tool

    • Kube up support scale out - (PR 1227)
    • Kube up admin cluster support ubuntu 20.04 and scale out
      • Replace docker with containerd in kube up - Sonya
      • Daemonset support in kube up scale out - Hongwei
      • Node ready investigation PR 1292 Hongwei
      • Create default network object for system tenant upon cluster start up - Carl Issue 1279 PR 1310
      • Start arktos network controller - Sonya RP 1298
      • 500 node perf test with scale up & scale out (2x2) - 1/12, 1/27 working
      • Confirm kube up scale up with host network for management plane pods, flannel for customer pods, Ubuntu 20.04 - On hold
    • Kube up optimization
  3. Mizar integration - Hongwei, Carl, Ying H.

    • Mizar test/release process update request - Vinay Issue 595
    • Mizar integration with Arktos scale up - Hong, Vinay, YingH.
      • Case 1: VPC0 creation - POC ready 11/12, Mizar code has issue after gPRC code merge
        • PR 1114 not stable (1/4 works) - Hong Chang's change works 11/11
        • Stabilize Arktos integration with Mizar - (RP 1223) 11/12
      • Case 1.5: Create second VPC in system tenant default namespace, attach pod to the new VPC - assigned IP from vpc0 - (Issue 567) - Closed as this case works following new Mizar manual
        • Previous configuration via arkto network object stopped working, Mizar plan to decouple with Arktos, does not have plan to maintain it
        • Pod created in system tenant can be attached to secondary VPC successfully
      • Case 2: Create VPC for new tenant Issue 568
        • Pod in non system tenant cannot be attached to VPC created in system tenant
        • Tenant pod was correct annotation with new VPC, but got ip in default VPC
          • Mizar currently uses Annotation for K8s and Label for Arktos. Mizar will make changes to use Annotation only.
      • Case 3: Multi-tenant network isolation - verify - YingH
      • Case 4: Add minion, validate pod created in system tenant with VPC0 can communicate with pod created in master
        • Not working with kubeadm set up cluster (Issue 562)
          • Started master with Mizar enabled, started worker with Mizar enabled, node is ready (via 8080), netpods are running. Pods deployed on master node can ping each other on the same node, cannot ping other pods on worker. Pods deployed on worker node cannot ping others. - Hong fixed it 12/2
        • Vinay said set up cluster and join worker then start Mizar should work, will check the new steps in Arkto. If it works, 562 won't be a blocking issue - 11/30 update on issue 562 tells us this steps should not work as well.
        • Pod created in worker cannot ping others - Hong
      • Case 5: deploy multiple pods via deployment
        • Each worker can only start 1-3 pods. Rest are pending network interface Issue 578
    • Mizar integration with Arktos scale out - YingH, Carl
      • 1x1 with worker integrated with Mizar on dev env
        • Mizar node controller cannot get new worker node - PR 1240
      • 2x2 with Mizar on dev env
        • 2x1 with Mizar w/o minion - Hong, YingH
          • Arktos bug: Incorrect env variable set up for multiple TPs Issue 1253 PR 1257
          • Arkto bug: service support in multiple TP did not set FIRST_SERVICE_CLUSTER_IP for local dev env PR 1259
          • Mizar: pod deployed from 2nd TP cannot ping other PR 1263
          • Trying to start multiple pods, failed to start when there is more than one pod - Hong
        • 2x1 with Mizar with minion
          • Support multiple TPs in worker start script PR 1265
        • 2x2 with Mizar w/o minion
          • Trying to start 2 netpod on tp1, only one started, 2nd failed to start with network issue (same as 2x1 w/o worker)
        • 2x2 with Mizar with minion
      • Dev env scale out support multiple minions - Carl
        • Arktos up add minion with flannel in process mode
        • Arktos scale out 1x1 add minions PR 1230 - on hold (Carl)
      • KCM Mizar controller changes for multiple RP - YingH.
    • Tenant VPC support in Arktos-Mizar integration - Carl, YingH
      • Test create pod with VPC without arktos network object - Done
      • Multi-tenant support with Mizar-Arktos integration - Design - Hongwei, YingH PR 1231
      • Arktos network controller cannot get event PR 1237
      • Automatically create VPC and subnet for new tenant - Carl PR 1281
      • Automatically assign pod ip from default tenant vpc - YingH, Carl
        • Integration Test in local scale up env, with workers - YingH
          • Mizar does not release unused IP and stops after 32 ip addresses were used across the entire system - Hong
          • Tenant coredns-default pod stuck in ContainerCreating due to vpc annotation is missing Issue 1293 PR 1295 - YingH
          • Pod created with VPC annotation but force changed to default VPC Issue 1285 PR 1296 - YingH
          • When pod is scheduled to worker, it cannot be started correctly - missing proxy in worker - Ying H, pending PR 1304
          • Create a VPC/subnet manually, delete subnet/vpc stuck -P2 Issue 1287
        • Integration Test in local scale out env, with workers - YingH
          • 2TP1RP with 1 worker, naturally deploy mizar operator - Hongwei
          • System Coredns, kube-dns are crashing and restarting in scale up and scale up env 1309 - YingH
      • Automatically assign service ip from default tenant vpc - Ying, Phu
    • Scalability design for Mizar controller - Hongwei, Carl, YingH, Phu, Vinay
      • Mizar network controller needs following information: tenant, pod, service, endpoint, node
      • Data volume is huge, check whether it can be minimized by getting partial objects - Phu
    • OS upgrade from Ubuntu 16.04 to 18.04/20.04 - Carl, Sonya
      • Mizar does not support 16.04. 1604 LTS end of life 4/2021
      • Arktos for dev need some changes to support 20.04 LTS
      • Kube up current depends on 16.04. Needs significant work - Upgrade to 20.04?
    • Kubemark(perf/scale) test automation with Mizar
  4. System pod support in scale out - YingH.

  5. Daemonset support in scale out - Hongwei

    • Manual set up daemonset in scale out: 1x1 - Hongwei (Issue 1222)
      • Add RP clients to daemonset controller - done PR 1224
    • Review previous design, possibly combined with system pod support
    • Kubelet changes PR 1244
    • Admin plugin - restrict to system tenant only PR 1252
  6. Scalability

  7. Adopt service support automation into Kubemark - on hold

Backlog

  1. PV/PVC support - Ying H.
    • CSINodeInfo feature is disabled in scheduler (Issue 1099)
    • Underlying storage picking/support in GCP for release 1.0
  2. TODO

Note

  1. Target release 1.0
  2. Chinese New Year is Tuesday February 1st. Considering time difference and weekend, the targeted latest release date will be Friday, January 28th, 2022.