-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wg-k8s-infra: canary prowjobs for sig-scalability #22430
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,366 @@ | ||
periodics: | ||
- cron: '1 12 * * *' # Run daily at 4:01PST (12:01 UTC) | ||
name: ci-kubernetes-e2e-gce-scale-correctness-canary | ||
cluster: k8s-infra-prow-build | ||
labels: | ||
preset-service-account: "true" | ||
preset-k8s-ssh: "true" | ||
preset-e2e-scalability-common: "true" | ||
preset-e2e-scalability-periodics: "true" | ||
preset-e2e-scalability-periodics-master: "true" | ||
decorate: true | ||
decoration_config: | ||
timeout: 270m | ||
annotations: | ||
testgrid-dashboards: wg-k8s-infra-canaries | ||
testgrid-tab-name: gce-master-scale-correctness-canary | ||
description: "Uses kubetest to run correctness tests against a 5000-node cluster created with cluster/kube-up.sh" | ||
spec: | ||
containers: | ||
- image: gcr.io/k8s-testimages/kubekins-e2e:v20210601-ea6aa4e-master | ||
command: | ||
- runner.sh | ||
- /workspace/scenarios/kubernetes_e2e.py | ||
args: | ||
- --cluster=gce-scale-cluster | ||
- --env=CONCURRENT_SERVICE_SYNCS=5 | ||
- --env=HEAPSTER_MACHINE_TYPE=e2-standard-32 | ||
- --extract=ci/latest-fast | ||
- --extract-ci-bucket=k8s-release-dev | ||
# Overrides CONTROLLER_MANAGER_TEST_ARGS from preset-e2e-scalability-periodics. | ||
- --env=CONTROLLER_MANAGER_TEST_ARGS=--profiling --kube-api-qps=100 --kube-api-burst=100 --endpointslice-updates-batch-period=500ms --endpoint-updates-batch-period=500ms | ||
- --gcp-master-image=gci | ||
- --gcp-node-image=gci | ||
- --gcp-node-size=e2-small | ||
- --gcp-nodes=5000 | ||
- --gcp-project=k8s-infra-e2e-scale-5k-project | ||
- --gcp-ssh-proxy-instance-name=gce-scale-cluster-master | ||
- --gcp-zone=us-east1-b | ||
- --ginkgo-parallel=40 | ||
- --provider=gce | ||
- --test_args=--ginkgo.skip=\[Serial\]|\[Disruptive\]|\[Flaky\]|\[Feature:.+\]|\[DisabledForLargeClusters\] --minStartupPods=8 --node-schedulable-timeout=90m | ||
- --timeout=240m | ||
- --use-logexporter | ||
- --logexporter-gcs-path=gs://k8s-infra-scalability-tests-logs/$(JOB_NAME)/$(BUILD_ID) | ||
resources: | ||
requests: | ||
cpu: 6 | ||
memory: "48Gi" | ||
limits: | ||
cpu: 6 | ||
memory: "48Gi" | ||
|
||
- cron: '1 17 * * *' # Run daily at 9:01PST (17:01 UTC) | ||
name: ci-kubernetes-e2e-gce-scale-performance-canary | ||
tags: | ||
- "perfDashPrefix: gce-5000Nodes" | ||
- "perfDashBuildsCount: 270" | ||
- "perfDashJobType: performance" | ||
cluster: k8s-infra-prow-build | ||
labels: | ||
preset-service-account: "true" | ||
preset-k8s-ssh: "true" | ||
preset-e2e-scalability-common: "true" | ||
preset-e2e-scalability-periodics: "true" | ||
preset-e2e-scalability-periodics-master: "true" | ||
decorate: true | ||
decoration_config: | ||
timeout: 450m | ||
extra_refs: | ||
- org: kubernetes | ||
repo: kubernetes | ||
base_ref: master | ||
path_alias: k8s.io/kubernetes | ||
- org: kubernetes | ||
repo: perf-tests | ||
base_ref: master | ||
path_alias: k8s.io/perf-tests | ||
annotations: | ||
testgrid-dashboards: wg-k8s-infra-canaries | ||
testgrid-tab-name: gce-master-scale-performance-canary | ||
spec: | ||
containers: | ||
- image: gcr.io/k8s-testimages/kubekins-e2e:v20210601-ea6aa4e-master | ||
command: | ||
- runner.sh | ||
- /workspace/scenarios/kubernetes_e2e.py | ||
args: | ||
- --cluster=gce-scale-cluster | ||
- --env=HEAPSTER_MACHINE_TYPE=e2-standard-32 | ||
# TODO(mborsz): Adjust or remove this change once we understand coredns | ||
# memory usage regression. | ||
- --env=KUBE_DNS_MEMORY_LIMIT=300Mi | ||
- --extract=ci/latest-fast | ||
- --extract-ci-bucket=k8s-release-dev | ||
- --gcp-nodes=5000 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same. This should use the same project as correctness. What's the node type used here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Took a lot at a successful job of |
||
- --gcp-project=k8s-infra-e2e-scale-5k-project | ||
- --gcp-zone=us-east1-b | ||
- --provider=gce | ||
- --metadata-sources=cl2-metadata.json | ||
- --env=CL2_LOAD_TEST_THROUGHPUT=50 | ||
- --env=CL2_DELETE_TEST_THROUGHPUT=30 | ||
- --env=CL2_ENABLE_HUGE_SERVICES=true | ||
# Overrides CONTROLLER_MANAGER_TEST_ARGS from preset-e2e-scalability-periodics. | ||
- --env=CONTROLLER_MANAGER_TEST_ARGS=--profiling --kube-api-qps=100 --kube-api-burst=100 --endpointslice-updates-batch-period=500ms --endpoint-updates-batch-period=500ms | ||
- --env=CL2_ENABLE_API_AVAILABILITY_MEASUREMENT=true | ||
- --env=CL2_API_AVAILABILITY_PERCENTAGE_THRESHOLD=99.5 | ||
- --test=false | ||
- --test-cmd=$GOPATH/src/k8s.io/perf-tests/run-e2e.sh | ||
- --test-cmd-args=cluster-loader2 | ||
- --test-cmd-args=--experimental-gcp-snapshot-prometheus-disk=true | ||
- --test-cmd-args=--experimental-prometheus-disk-snapshot-name=$(JOB_NAME)-$(BUILD_ID) | ||
- --test-cmd-args=--nodes=5000 | ||
- --test-cmd-args=--prometheus-scrape-node-exporter | ||
- --test-cmd-args=--provider=gce | ||
- --test-cmd-args=--report-dir=$(ARTIFACTS) | ||
- --test-cmd-args=--testconfig=testing/load/config.yaml | ||
- --test-cmd-args=--testconfig=testing/access-tokens/config.yaml | ||
- --test-cmd-args=--testoverrides=./testing/experiments/enable_restart_count_check.yaml | ||
- --test-cmd-args=--testoverrides=./testing/experiments/ignore_known_gce_container_restarts.yaml | ||
- --test-cmd-args=--testoverrides=./testing/overrides/5000_nodes.yaml | ||
- --test-cmd-name=ClusterLoaderV2 | ||
- --timeout=420m | ||
- --use-logexporter | ||
- --logexporter-gcs-path=gs://k8s-infra-scalability-tests-logs/$(JOB_NAME)/$(BUILD_ID) | ||
resources: | ||
requests: | ||
cpu: 6 | ||
memory: "16Gi" | ||
limits: | ||
cpu: 6 | ||
memory: "16Gi" | ||
|
||
- name: ci-kubernetes-kubemark-gce-scale-canary | ||
tags: | ||
- "perfDashPrefix: kubemark-5000Nodes" | ||
- "perfDashJobType: performance" | ||
# Run twice a day (at 00:01 and 16:01 UTC) on the odd days of each month. The | ||
# job is expected to take ~12-14h, hence the 16 hour gap. | ||
cron: '1 0,16 1-31/2 * *' | ||
labels: | ||
preset-service-account: "true" | ||
preset-k8s-ssh: "true" | ||
preset-dind-enabled: "true" | ||
preset-e2e-kubemark-common: "true" | ||
preset-e2e-kubemark-gce-scale: "true" | ||
preset-e2e-scalability-periodics: "true" | ||
preset-e2e-scalability-periodics-master: "true" | ||
decorate: true | ||
decoration_config: | ||
timeout: 1100m | ||
extra_refs: | ||
- org: kubernetes | ||
repo: kubernetes | ||
base_ref: master | ||
path_alias: k8s.io/kubernetes | ||
- org: kubernetes | ||
repo: perf-tests | ||
base_ref: master | ||
path_alias: k8s.io/perf-tests | ||
annotations: | ||
testgrid-dashboards: wg-k8s-infra-canaries | ||
testgrid-tab-name: kubemark-5000-canary | ||
testgrid-num-columns-recent: '3' | ||
spec: | ||
containers: | ||
- image: gcr.io/k8s-testimages/kubekins-e2e:v20210601-ea6aa4e-master | ||
command: | ||
- runner.sh | ||
- /workspace/scenarios/kubernetes_e2e.py | ||
args: | ||
- --cluster=kubemark-5000 | ||
- --extract=ci/latest | ||
- --gcp-node-image=gci | ||
- --gcp-node-size=e2-standard-8 | ||
- --gcp-nodes=84 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Honestly not sure if this will need quota increase or not There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need to increase the quota for this job. the smallest quota in us-east1 is 1250 (for all the projects with type |
||
- --gcp-project=k8s-infra-e2e-scale-5k-project | ||
- --gcp-zone=us-east1-b | ||
- --kubemark | ||
- --kubemark-nodes=5000 | ||
- --provider=gce | ||
- --metadata-sources=cl2-metadata.json | ||
- --test=false | ||
- --test_args=--ginkgo.focus=xxxx | ||
- --test-cmd=$GOPATH/src/k8s.io/perf-tests/run-e2e.sh | ||
- --test-cmd-args=cluster-loader2 | ||
- --test-cmd-args=--experimental-gcp-snapshot-prometheus-disk=true | ||
- --test-cmd-args=--experimental-prometheus-disk-snapshot-name=$(JOB_NAME)-$(BUILD_ID) | ||
- --test-cmd-args=--nodes=5000 | ||
- --test-cmd-args=--provider=kubemark | ||
- --env=CL2_ENABLE_PVS=false # TODO(https://github.com/kubernetes/perf-tests/issues/803): Fix me | ||
- --test-cmd-args=--report-dir=$(ARTIFACTS) | ||
- --test-cmd-args=--testconfig=testing/load/config.yaml | ||
- --test-cmd-args=--testconfig=testing/access-tokens/config.yaml | ||
- --test-cmd-args=--testoverrides=./testing/experiments/enable_restart_count_check.yaml | ||
- --test-cmd-args=--testoverrides=./testing/experiments/ignore_known_kubemark_container_restarts.yaml | ||
- --test-cmd-args=--testoverrides=./testing/overrides/kubemark_5000_nodes.yaml | ||
- --test-cmd-name=ClusterLoaderV2 | ||
- --timeout=1080m | ||
- --use-logexporter | ||
- --logexporter-gcs-path=gs://k8s-infra-scalability-tests-logs/$(JOB_NAME)/$(BUILD_ID) | ||
# docker-in-docker needs privilged mode | ||
securityContext: | ||
privileged: true | ||
resources: | ||
requests: | ||
cpu: 6 | ||
memory: "16Gi" | ||
limits: | ||
cpu: 6 | ||
memory: "16Gi" | ||
|
||
- name: ci-kubernetes-kubemark-gce-scale-scheduler-canary | ||
tags: | ||
- "perfDashPrefix: kubemark-5000Nodes-scheduler" | ||
- "perfDashJobType: performance" | ||
# Run at 10:01 UTC on the even days of each month. There will be ample time | ||
# between the kubemark-5000Nodes job (expected to start at 16:01 UTC the | ||
# previous day and finish in around ~12-14 hours) and this job. This job is | ||
# expected to take ~6-8 hours, which should allow it to finish well before | ||
# the next kubemark-5000Nodes job (at 00:01 UTC). | ||
cron: '1 10 2-31/2 * *' | ||
labels: | ||
preset-service-account: "true" | ||
preset-k8s-ssh: "true" | ||
preset-dind-enabled: "true" | ||
preset-e2e-kubemark-common: "true" | ||
preset-e2e-kubemark-gce-scale: "true" | ||
preset-e2e-scalability-periodics: "true" | ||
preset-e2e-scalability-periodics-master: "true" | ||
decorate: true | ||
decoration_config: | ||
timeout: 1100m | ||
extra_refs: | ||
- org: kubernetes | ||
repo: kubernetes | ||
base_ref: master | ||
path_alias: k8s.io/kubernetes | ||
- org: kubernetes | ||
repo: perf-tests | ||
base_ref: master | ||
path_alias: k8s.io/perf-tests | ||
annotations: | ||
testgrid-dashboards: wg-k8s-infra-canaries | ||
testgrid-tab-name: kubemark-5000-scheduler-canaries | ||
testgrid-num-columns-recent: '3' | ||
spec: | ||
containers: | ||
- image: gcr.io/k8s-testimages/kubekins-e2e:v20210601-ea6aa4e-master | ||
command: | ||
- runner.sh | ||
- /workspace/scenarios/kubernetes_e2e.py | ||
args: | ||
- --cluster=kubemark-5000 | ||
- --extract=ci/latest | ||
- --gcp-node-image=gci | ||
- --gcp-node-size=e2-standard-8 | ||
- --gcp-nodes=84 | ||
- --gcp-project=k8s-infra-e2e-scale-5k-project | ||
- --gcp-zone=us-east1-b | ||
- --kubemark | ||
- --kubemark-nodes=5000 | ||
- --provider=gce | ||
- --metadata-sources=cl2-metadata.json | ||
- --test=false | ||
- --test_args=--ginkgo.focus=xxxx | ||
- --test-cmd=$GOPATH/src/k8s.io/perf-tests/run-e2e.sh | ||
- --test-cmd-args=cluster-loader2 | ||
- --test-cmd-args=--experimental-gcp-snapshot-prometheus-disk=true | ||
- --test-cmd-args=--experimental-prometheus-disk-snapshot-name=$(JOB_NAME)-$(BUILD_ID) | ||
- --test-cmd-args=--nodes=5000 | ||
- --test-cmd-args=--provider=kubemark | ||
- --env=CL2_ENABLE_PVS=false # TODO(https://github.com/kubernetes/perf-tests/issues/803): Fix me | ||
- --test-cmd-args=--report-dir=$(ARTIFACTS) | ||
- --test-cmd-args=--testsuite=testing/density/scheduler-suite.yaml | ||
- --test-cmd-args=--testoverrides=./testing/experiments/enable_restart_count_check.yaml | ||
- --test-cmd-args=--testoverrides=./testing/experiments/ignore_known_kubemark_container_restarts.yaml | ||
- --test-cmd-args=--testoverrides=./testing/overrides/kubemark_5000_nodes.yaml | ||
- --test-cmd-name=ClusterLoaderV2 | ||
- --timeout=1080m | ||
- --use-logexporter | ||
- --logexporter-gcs-path=gs://k8s-infra-scalability-tests-logs/$(JOB_NAME)/$(BUILD_ID) | ||
# docker-in-docker needs privilged mode | ||
securityContext: | ||
privileged: true | ||
resources: | ||
requests: | ||
cpu: 6 | ||
memory: "16Gi" | ||
|
||
- interval: 4h | ||
name: ci-golang-tip-k8s-1-18-canary | ||
cluster: k8s-infra-prow-build | ||
tags: | ||
- "perfDashPrefix: golang-tip-k8s-1-18" | ||
- "perfDashBuildsCount: 240" | ||
- "perfDashJobType: performance" | ||
labels: | ||
preset-service-account: "true" | ||
preset-k8s-ssh: "true" | ||
preset-dind-enabled: "true" | ||
preset-e2e-kubemark-common: "true" | ||
preset-e2e-kubemark-gce-scale: "true" | ||
preset-e2e-scalability-periodics: "true" | ||
preset-e2e-scalability-periodics-master: "true" | ||
decorate: true | ||
decoration_config: | ||
timeout: 210m | ||
extra_refs: | ||
- org: kubernetes | ||
repo: perf-tests | ||
base_ref: master | ||
base_sha: 39a6c09ddca620a430d38e5de1400844ea954c2f # head of perf-tests' master as of 2020-11-06 | ||
path_alias: k8s.io/perf-tests | ||
annotations: | ||
testgrid-dashboards: wg-k8s-infra-canaries | ||
testgrid-tab-name: ci-golang-tip-k8s-1-18-canary | ||
spec: | ||
containers: | ||
- image: gcr.io/k8s-testimages/kubekins-e2e:v20210601-ea6aa4e-master | ||
command: | ||
- runner.sh | ||
- /workspace/scenarios/kubernetes_e2e.py | ||
args: | ||
- --cluster=gce-golang | ||
- --env=CL2_ENABLE_PVS=false | ||
- --env=CL2_LOAD_TEST_THROUGHPUT=50 | ||
- --env=KUBEMARK_CONTROLLER_MANAGER_TEST_ARGS=--profiling --kube-api-qps=200 --kube-api-burst=200 | ||
- --env=KUBEMARK_SCHEDULER_TEST_ARGS=--profiling --kube-api-qps=200 --kube-api-burst=200 | ||
- --extract=gs://k8s-scale-golang-build/ci/latest-1.18.txt | ||
- --gcp-node-size=e2-standard-8 | ||
- --gcp-nodes=50 | ||
- --gcp-project=k8s-infra-e2e-scale-5k-project | ||
- --gcp-zone=us-east1-b | ||
- --provider=gce | ||
- --kubemark | ||
- --kubemark-nodes=2500 | ||
- --test=false | ||
- --test-cmd=$GOPATH/src/k8s.io/perf-tests/golang/run-e2e.sh | ||
- --test-cmd-args=cluster-loader2 | ||
- --test-cmd-args=--experimental-gcp-snapshot-prometheus-disk=true | ||
- --test-cmd-args=--experimental-prometheus-disk-snapshot-name=$(JOB_NAME)-$(BUILD_ID) | ||
- --test-cmd-args=--nodes=2500 | ||
- --test-cmd-args=--provider=kubemark | ||
- --test-cmd-args=--report-dir=$(ARTIFACTS) | ||
- --test-cmd-args=--testconfig=testing/load/config.yaml | ||
- --test-cmd-args=--testoverrides=./testing/experiments/enable_restart_count_check.yaml | ||
- --test-cmd-args=--testoverrides=./testing/experiments/ignore_known_kubemark_container_restarts.yaml | ||
- --test-cmd-args=--testoverrides=./testing/overrides/5000_nodes.yaml | ||
- --test-cmd-args=--testoverrides=./testing/load/golang/custom_api_call_thresholds.yaml | ||
- --test-cmd-name=ClusterLoaderV2 | ||
- --timeout=180m | ||
- --use-logexporter | ||
- --logexporter-gcs-path=gs://k8s-infra-scalability-tests-logs/$(JOB_NAME)/$(BUILD_ID) | ||
env: | ||
- name: CL2_ENABLE_VIOLATIONS_FOR_API_CALL_PROMETHEUS | ||
value: "true" | ||
# docker-in-docker needs privilged mode | ||
securityContext: | ||
privileged: true | ||
resources: | ||
requests: | ||
cpu: 6 | ||
memory: "16Gi" | ||
limits: | ||
cpu: 6 | ||
memory: "16Gi" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need a quota request for this. I'd rather avoid giving all scalability projects this kind of quota, we should pin to a specific project for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spiffxp I'll make the quota requests in
k8s-infra-e2e-scale-5k-project
. See kubernetes/k8s.io#2225