Skip to content

Commit

Permalink
Extend e2e queue timings
Browse files Browse the repository at this point in the history
This PR reworks the e2e timeouts to allow for more time for a given
build to wait to run e2es, but tightens the e2e deadline slightly:

* Tighten the per-e2e-configuration testcase to 1.5h. e2es are coming
in close to an hour in some cases but now that we're not running
consul, we don't need it as high as 2h. I don't think it's worth
tightening this all the way to an hour, though it would probably work.
  * Also drops the queueTtl for the CI sub-builds, these should not be
  queued for long since we serialize e2es now.

* Extends the e2e-wait-to-become-leader timeout to 3h. In higher
traffic times, we're hitting this limit often now, which only results
in a vicious cycle of retrying PRs. Instead wait longer to become
leader.

* Bumps the global timeout to 5h after aggregating:
  3h (e2e-wait-to-become-leader) + 1.5h (e2e timeout) + 0.5h (everything else)

* Remove vestigates of consul - it's no longer running anywhere.
  • Loading branch information
zmerlynn committed Mar 31, 2023
1 parent d80264d commit 81c99ad
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 23 deletions.
21 changes: 2 additions & 19 deletions build/e2e-image/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,22 +33,5 @@ fi
gcloud container clusters get-credentials $TEST_CLUSTER_NAME \
--zone=${TEST_CLUSTER_LOCATION} --project=agones-images

# TODO: Here we're using the presence of consul to dictate whether we use consul
# port forwarding or whether we rely on Cloud Build serialization from #2932. This
# allows us to quickly recover (by reinstalling consul) if something breaks.
# After a few more days of PRs, it should be safe to remove this.
if kubectl get statefulset/consul-consul-server -oname >& /dev/null
then
echo "Using legacy consul locking"
kubectl port-forward statefulset/consul-consul-server 8500:8500 &
echo "Waiting consul port-forward to launch on 8500..."
timeout 60 bash -c 'until printf "" 2>>/dev/null >>/dev/tcp/$0/$1; do sleep 1; done' 127.0.0.1 8500
echo "consul port-forward launched. Starting e2e tests..."
echo "consul lock -child-exit-code=true -timeout 90m -verbose LockE2E '/root/e2e.sh "$FEATURES" "$CLOUD_PRODUCT" "$REGISTRY"'"
consul lock -child-exit-code=true -timeout 90m -verbose LockE2E '/root/e2e.sh "'$FEATURES'" "'$CLOUD_PRODUCT'" "'$REGISTRY'"'
killall -q kubectl || true
echo "successfully killed kubectl proxy"
else
echo /root/e2e.sh "${FEATURES}" "${CLOUD_PRODUCT}" "${REGISTRY}"
/root/e2e.sh "${FEATURES}" "${CLOUD_PRODUCT}" "${REGISTRY}"
fi
echo /root/e2e.sh "${FEATURES}" "${CLOUD_PRODUCT}" "${REGISTRY}"
/root/e2e.sh "${FEATURES}" "${CLOUD_PRODUCT}" "${REGISTRY}"
4 changes: 2 additions & 2 deletions ci/e2e-test-cloudbuild.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,6 @@ steps:
- e2e-feature-gates

tags: ['e2e-test']
timeout: 7200s # 2h
queueTtl: 21600s # 6h
timeout: 5400s # 1.5h
queueTtl: 7200s # 2h // only one set of e2es should be running at once

4 changes: 2 additions & 2 deletions cloudbuild.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@ steps:
sleep 60
done
timeout: 5400s # 90m - leave an hour for e2es to run on top of the global timeout of 2.5h
timeout: 10800s # 3h - if you change this, change the global timeout as well
env:
- 'CLOUDSDK_CORE_PROJECT=$PROJECT_ID'
- 'BUILD_ID=$BUILD_ID'
Expand Down Expand Up @@ -414,7 +414,7 @@ substitutions:
_RUST_SDK_BUILD_CACHE_KEY: rust-sdk-build
_REGISTRY: us-docker.pkg.dev/${PROJECT_ID}/ci
tags: ['ci']
timeout: 9000s # 2.5h - if you change this, change e2e-wait-to-become-leader as well
timeout: 18000s # 5h: 3h (e2e-wait-to-become-leader) + 1.5h (e2e timeout) + 0.5h (everything else)
queueTtl: 259200s # 72h
images:
- '${_REGISTRY}/agones-controller'
Expand Down

0 comments on commit 81c99ad

Please sign in to comment.