Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update upstream k8s CI image store reference #1512

Merged

Conversation

jackfrancis
Copy link
Contributor

@jackfrancis jackfrancis commented Jul 12, 2021

What type of PR is this?

/kind failing-test

What this PR does / why we need it:

This PR updates all capz references to upstream Kubernetes CI images to the new URI, so that we can restore test signal for upstream Kubernetes commits (pre-release commits, not officially released bits).

kubernetes/k8s.io#2318

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1510

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. area/provider/azure Issues or PRs related to azure provider labels Jul 12, 2021
@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 12, 2021
@jackfrancis
Copy link
Contributor Author

cc @CecileRobertMichon

@CecileRobertMichon
Copy link
Contributor

/test ls

@k8s-ci-robot
Copy link
Contributor

@CecileRobertMichon: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

  • /test pull-cluster-api-provider-azure-test
  • /test pull-cluster-api-provider-azure-build
  • /test pull-cluster-api-provider-azure-e2e
  • /test pull-cluster-api-provider-azure-e2e-full
  • /test pull-cluster-api-provider-azure-e2e-windows
  • /test pull-cluster-api-provider-azure-e2e-exp
  • /test pull-cluster-api-provider-azure-capi-e2e
  • /test pull-cluster-api-provider-azure-verify
  • /test pull-cluster-api-provider-azure-conformance-v1alpha4
  • /test pull-cluster-api-provider-azure-upstream-v1alpha4-windows
  • /test pull-cluster-api-provider-azure-conformance-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-windows-upstream-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-apidiff
  • /test pull-cluster-api-provider-azure-coverage

Use /test all to run the following jobs:

  • pull-cluster-api-provider-azure-test
  • pull-cluster-api-provider-azure-build
  • pull-cluster-api-provider-azure-e2e
  • pull-cluster-api-provider-azure-e2e-windows
  • pull-cluster-api-provider-azure-verify
  • pull-cluster-api-provider-azure-apidiff
  • pull-cluster-api-provider-azure-coverage

In response to this:

/test ls

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@CecileRobertMichon
Copy link
Contributor

/test pull-cluster-api-provider-azure-conformance-with-ci-artifacts

@@ -150,7 +150,7 @@ spec:
$${GSUTIL} cp "$$CI_URL/$$CI_CONTAINER.$$CONTAINER_EXT" "$$CI_DIR/$$CI_CONTAINER.$$CONTAINER_EXT"
$${SUDO} ctr -n k8s.io images import "$$CI_DIR/$$CI_CONTAINER.$$CONTAINER_EXT" || echo "* ignoring expected 'ctr images import' result"
$${SUDO} ctr -n k8s.io images tag k8s.gcr.io/$$CI_CONTAINER-amd64:"$${CI_VERSION//+/_}" k8s.gcr.io/$$CI_CONTAINER:"$${CI_VERSION//+/_}"
Copy link
Member

@sbueringer sbueringer Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see you can also drop those lines as k8s.gcr.io is used. But I'm not 100% sure, maybe worth a try to get rid of those references. (The only code occurrences of the old or new ci repo I could find was the conformance images in the e2e test framework in the cluster-api repo. Apart from that we only seem to use it here.)

Copy link
Member

@sbueringer sbueringer Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay found it: kubernetes/kubernetes@8ba5a06

So looks like kubeadm >=20 (?) uses the new repo.

So if we want this to work with kubeadm <= 1.19 from ci we would have to tag both? (not sure if we want that though)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was cherry-picked pretty far back. So the newest version which has the old ci repo is v1.17.*. So I think that's not relevant as we're (probably) not testing 1.17 CI versions anywhere anymore.

@k8s-ci-robot
Copy link
Contributor

@jackfrancis: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
pull-cluster-api-provider-azure-conformance-with-ci-artifacts 8bed5d9 link /test pull-cluster-api-provider-azure-conformance-with-ci-artifacts

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@CecileRobertMichon
Copy link
Contributor

@jackfrancis the test failed with

[  101.712056] cloud-init[1927]: [2021-07-12 17:57:41] 	[ERROR ImagePull]: failed to pull image gcr.io/kubernetes-ci-images/kube-controller-manager:v1.22.0-beta.1.140_db183316273852: output: time="2021-07-12T17:57:37Z" level=fatal msg="pulling image failed: rpc error: code = NotFound desc = failed to pull and unpack image \"gcr.io/kubernetes-ci-images/kube-controller-manager:v1.22.0-beta.1.140_db183316273852\": failed to resolve reference \"gcr.io/kubernetes-ci-images/kube-controller-manager:v1.22.0-beta.1.140_db183316273852\": gcr.io/kubernetes-ci-images/kube-controller-manager:v1.22.0-beta.1.140_db183316273852: not found"

Is it possible you missed a reference?

@sbueringer
Copy link
Member

sbueringer commented Jul 12, 2021

@jackfrancis the test failed with

[  101.712056] cloud-init[1927]: [2021-07-12 17:57:41] 	[ERROR ImagePull]: failed to pull image gcr.io/kubernetes-ci-images/kube-controller-manager:v1.22.0-beta.1.140_db183316273852: output: time="2021-07-12T17:57:37Z" level=fatal msg="pulling image failed: rpc error: code = NotFound desc = failed to pull and unpack image \"gcr.io/kubernetes-ci-images/kube-controller-manager:v1.22.0-beta.1.140_db183316273852\": failed to resolve reference \"gcr.io/kubernetes-ci-images/kube-controller-manager:v1.22.0-beta.1.140_db183316273852\": gcr.io/kubernetes-ci-images/kube-controller-manager:v1.22.0-beta.1.140_db183316273852: not found"

Is it possible you missed a reference?

I think it fails when it tries to download kubectl and then continues with kubeadm (1.18)

[2021-07-12 17:57:26] *************************************************
[2021-07-12 17:57:26] * testing CI version v1.22.0-beta.1.140+db183316273852
[2021-07-12 17:57:28] CommandException: One or more URLs matched no objects.
[2021-07-12 17:57:28] * downloading binary: gs://kubernetes-release-dev/ci/v1.22.0-beta.1.140+db183316273852/bin/linux/amd64/kubectl
[2021-07-12 17:57:29] CommandException: No URLs matched: gs://kubernetes-release-dev/ci/v1.22.0-beta.1.140+db183316273852/bin/linux/amd64/kubectl
[2021-07-12 17:57:32] W0712 17:57:32.064063    2689 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[2021-07-12 17:57:32] [init] Using Kubernetes version: v1.22.0-beta.1.140+db183316273852
[2021-07-12 17:57:32] [preflight] Running pre-flight checks
[2021-07-12 17:57:32] 	[WARNING KubernetesVersion]: Kubernetes version is greater than kubeadm version. Please consider to upgrade kubeadm. Kubernetes version: 1.22.0-beta.1.140+db183316273852. Kubeadm version: 1.18.x
[2021-07-12 17:57:33] [preflight] Pulling images required for setting up a Kubernetes cluster
[2021-07-12 17:57:33] [preflight] This might take a minute or two, depending on the speed of your internet connection
[2021-07-12 17:57:33] [preflight] You can also perform this action in beforehand using 'kubeadm config images pull'```
https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/1512/pull-cluster-api-provider-azure-conformance-with-ci-artifacts/1414642669447024640/artifacts/clusters/capz-conf-w4b3gr/machines/capz-conf-w4b3gr-control-plane-bzptl/cloud-init-output.log

@jackfrancis
Copy link
Contributor Author

@CecileRobertMichon this might be a required change as well:

kubernetes/test-infra#22860

cc @chewong

@sbueringer
Copy link
Member

@jackfrancis I think in the last test run the problem was only that kubeadm 1.18 was used which has afaik the old CI image repo hard-coded (kubernetes/kubernetes@8ba5a06).

Not sure why the kubectl download failed though.

@CecileRobertMichon
Copy link
Contributor

the problem was only that kubeadm 1.18 was used

not sure why that would be the case, looking

@CecileRobertMichon
Copy link
Contributor

[2021-07-12 17:57:28] CommandException: One or more URLs matched no objects.
[2021-07-12 17:57:28] * downloading binary: gs://kubernetes-release-dev/ci/v1.22.0-beta.1.140+db183316273852/bin/linux/amd64/kubectl
[2021-07-12 17:57:29] CommandException: No URLs matched: gs://kubernetes-release-dev/ci/v1.22.0-beta.1.140+db183316273852/bin/linux/amd64/kubectl

This is the issue. The preKubeadmCommand script didn't complete because of this and it proceeded to running kubeadm join without having downloaded a newer kubeadm

@sbueringer
Copy link
Member

the problem was only that kubeadm 1.18 was used

not sure why that would be the case, looking

I think the problem is that the kubectl download fails because it tries to use Kubernetes "v1.22.0-beta.1.140+db183316273852". Which doesn't seem to exist: https://console.cloud.google.com/storage/browser/kubernetes-release-dev/ci;tab=objects?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%257B_22k_22_3A_22_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22v1.22.0-beta.1.1_5C_22_22%257D%255D%22))&project=kubernetes-release-dev&prefix=v1.22.0-beta.1.1&forceOnObjectsSortingFiltering=false

Not sure where it does get the version from. The latest files point to 117 (e.g. link)

@@ -338,7 +338,7 @@ spec:
$${GSUTIL} cp "$$CI_URL/$$CI_CONTAINER.$$CONTAINER_EXT" "$$CI_DIR/$$CI_CONTAINER.$$CONTAINER_EXT"
$${SUDO} ctr -n k8s.io images import "$$CI_DIR/$$CI_CONTAINER.$$CONTAINER_EXT" || echo "* ignoring expected 'ctr images import' result"
$${SUDO} ctr -n k8s.io images tag k8s.gcr.io/$$CI_CONTAINER-amd64:"$${CI_VERSION//+/_}" k8s.gcr.io/$$CI_CONTAINER:"$${CI_VERSION//+/_}"
$${SUDO} ctr -n k8s.io images tag k8s.gcr.io/$$CI_CONTAINER-amd64:"$${CI_VERSION//+/_}" gcr.io/kubernetes-ci-images/$$CI_CONTAINER:"$${CI_VERSION//+/_}"
$${SUDO} ctr -n k8s.io images tag k8s.gcr.io/$$CI_CONTAINER-amd64:"$${CI_VERSION//+/_}" gcr.io/k8s-staging-ci-images/$$CI_CONTAINER:"$${CI_VERSION//+/_}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to change line 324 https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1512/files#diff-f898c4eb527239bc68896f9fa132833ff4ec72509aea67f17a072940acdf2550R324

CI_URL="gs://kubernetes-release-dev/ci-periodic/$$CI_VERSION/bin/linux/amd64"

to

CI_URL="gs://k8s-release-dev/ci-periodic/$$CI_VERSION/bin/linux/amd64"

@sbueringer
Copy link
Member

sbueringer commented Jul 12, 2021

@CecileRobertMichon Looks like: https://storage.googleapis.com/k8s-release-dev/ci/latest.txt is pointing to a non-existing release :/

func resolveCIVersion(label string) (string, error) {
if ciVersion, ok := os.LookupEnv("CI_VERSION"); ok {
return ciVersion, nil
}
if strings.HasPrefix(label, "latest") {
if kubernetesVersion, err := latestCIVersion(label); err == nil {
return kubernetesVersion, nil
}
}
// default to https://dl.k8s.io/ci/latest.txt if the label can't be resolved
return kubernetesversions.LatestCIRelease()
}
// latestCIVersion returns the latest CI version of a given label in the form of latest-1.xx.
func latestCIVersion(label string) (string, error) {
ciVersionURL := fmt.Sprintf("https://dl.k8s.io/ci/%s.txt", label)
resp, err := http.Get(ciVersionURL)
if err != nil {
return "", err
}
defer resp.Body.Close()
b, err := io.ReadAll(resp.Body)
if err != nil {
return "", err
}
return strings.TrimSpace(string(b)), nil
}

https://storage.googleapis.com/k8s-release-dev/ci/latest-1.22.txt as well (not sure which is used)

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 12, 2021
@jackfrancis
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-conformance-with-ci-artifacts

@sbueringer
Copy link
Member

@CecileRobertMichon In case it's helpful. In the cluster-api repo we're using other "latest"-files to resolve the version: https://github.com/kubernetes-sigs/cluster-api/blob/master/scripts/ci-e2e-lib.sh#L89-L93

@sbueringer
Copy link
Member

Ah got it. k8s-release-dev instead of kubernetes-release-dev as you wrote above.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jul 12, 2021
@CecileRobertMichon
Copy link
Contributor

Conformance is running, the cluster built successfully. Let's merge this and observe CI signal in testgrid.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 12, 2021
@sbueringer
Copy link
Member

/lgtm

@CecileRobertMichon
Copy link
Contributor

@jackfrancis once this merges, could you please also open a cherry-pick PR to release-0.4 to fix https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-provider-azure#capz-periodic-conformance-v1alpha3-k8s-main-release-0-4 ?

@spiffxp
Copy link
Member

spiffxp commented Jul 12, 2021

ref: kubernetes/k8s.io#2318

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

update gcr.io/kubernetes-ci-images references to gcr.io/k8s-staging-ci-images
5 participants