Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update e2e tests to use k8s v1.22.1 #1588

Merged
merged 2 commits into from
Aug 24, 2021

Conversation

nader-ziada
Copy link
Contributor

@nader-ziada nader-ziada commented Aug 6, 2021

What type of PR is this?
/kind feature

What this PR does / why we need it:

  • update e2e tests to use k8s v1.22.0

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Update e2e tests to use k8s v1.22.1

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/provider/azure Issues or PRs related to azure provider labels Aug 6, 2021
@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 6, 2021
@nader-ziada
Copy link
Contributor Author

kept WINDOWS_KUBERNETES_VERSION for now until we validate kubernetes-sigs/cloud-provider-azure#706 is fixed

@CecileRobertMichon
Copy link
Contributor

it looks like Windows deployments are failing consistently in CI

/cc @jsturtevant

@jsturtevant
Copy link
Contributor

it looks like Windows deployments are failing consistently in CI

Looks like the pod isn't starting up correctly. Kubelet reports running but logs for kubelet are missing. I can take a look tomorrow but will likely need to spin it up locally to diagnose further.

@jsturtevant
Copy link
Contributor

The faliure messages was in the log files (https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/1588/pull-cluster-api-provider-azure-e2e-windows/1423694792830750720/artifacts/clusters/capz-e2e-jnjsf2-win-ha/kube-system/kube-flannel-ds-amd64-626xq/kube-flannel.log).

Basically flannel was in a crash loop backoff.

I0806 17:47:45.335070       1 main.go:518] Determining IP address of default interface
I0806 17:47:45.335705       1 main.go:531] Using interface with name eth0 and address 10.0.0.6
I0806 17:47:45.335725       1 main.go:548] Defaulting external address to interface address (10.0.0.6)
W0806 17:47:45.335745       1 client_config.go:517] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
E0806 17:47:45.357087       1 main.go:243] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-amd64-626xq': pods "kube-flannel-ds-amd64-626xq" is forbidden: User "system:serviceaccount:kube-system:flannel" cannot get resource "pods" in API group "" in the namespace "kube-system"

In 1.22 alot of the APIs got upgraded. I found this and fixed it as part of #1388 with commit dcc9efa.

Should I open an separate PR or do you want to include it in this one?

@nader-ziada
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e

1 similar comment
@nader-ziada
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e

@CecileRobertMichon
Copy link
Contributor

/retest
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 12, 2021
@nader-ziada
Copy link
Contributor Author

investigating the failure, seems the private cluster bastion host get stuck or takes a long time and the test times out

@nader-ziada
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e

3 similar comments
@nader-ziada
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e

@nader-ziada
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e

@nader-ziada
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e

@nader-ziada
Copy link
Contributor Author

I'll try the e2e again after the Calico PR merged

/test pull-cluster-api-provider-azure-e2e

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 16, 2021
@CecileRobertMichon
Copy link
Contributor

@nader-ziada any chance we need newer CAPI changes to allow k8s 1.22 management clusters? We could try to update to CAPI v0.4.1 in case it helps

@nader-ziada
Copy link
Contributor Author

@nader-ziada any chance we need newer CAPI changes to allow k8s 1.22 management clusters? We could try to update to CAPI v0.4.1 in case it helps

its one specific test that keeps failing, but I can't repro it locally

@CecileRobertMichon
Copy link
Contributor

@nader-ziada could you try 1.21.4 to see if the same issue repros? 1.21.4 also has the fix for kubernetes-sigs/cloud-provider-azure#706

@nader-ziada nader-ziada force-pushed the k8s-1220 branch 2 times, most recently from 19ac7ee to ecf3eed Compare August 23, 2021 15:30
@nader-ziada
Copy link
Contributor Author

@nader-ziada could you try 1.21.4 to see if the same issue repros? 1.21.4 also has the fix for kubernetes-sigs/cloud-provider-azure#706

did that and rebased to get the update to capi v0.4.1 (which finally merged)

@nader-ziada
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-full

@nader-ziada
Copy link
Contributor Author

@CecileRobertMichon all the tests passed in the last run except for the GPU one, which I don't think is because of the this change, but will try again now

/test pull-cluster-api-provider-azure-e2e-full

@CecileRobertMichon
Copy link
Contributor

/test pull-cluster-api-provider-azure-capi-e2e

@@ -96,22 +96,22 @@ providers:
targetName: "cluster-template-custom-vnet.yaml"

variables:
KUBERNETES_VERSION: "${KUBERNETES_VERSION:-v1.21.2}"
KUBERNETES_VERSION: "${KUBERNETES_VERSION:-v1.22.0}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.22.1 is out now, should we rev all the 1.22.0 versions to 1.22.0?

@nader-ziada
Copy link
Contributor Author

update to v1.22.1

@CecileRobertMichon
Copy link
Contributor

@nader-ziada can you please update the PR title, release, and commit message to reflect 1.22.1?

let's get this merged asap, a lot of other PRs are getting hit by LB flakes which will hopefully be fixed by this PR

@nader-ziada nader-ziada changed the title Update e2e tests to use k8s v1.22.0 Update e2e tests to use k8s v1.22.1 Aug 23, 2021
@nader-ziada
Copy link
Contributor Author

@nader-ziada can you please update the PR title, release, and commit message to reflect 1.22.1?

let's get this merged asap, a lot of other PRs are getting hit by LB flakes which will hopefully be fixed by this PR

done with the updates for title, release notes, and commit message

@CecileRobertMichon
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 23, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

# using a different version for windows because of an issue on azure cloud provider
# that only affects windows and external load balancer
# https://github.com/kubernetes-sigs/cloud-provider-azure/issues/706
WINDOWS_KUBERNETES_VERSION: "${WINDOWS_KUBERNETES_VERSION:-v1.19.11}"
WINDOWS_KUBERNETES_VERSION: "${WINDOWS_KUBERNETES_VERSION:-v1.22.1}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this now, let's follow up on that after this merges though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do a followup PR after this is done

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Aug 24, 2021

@nader-ziada: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
pull-cluster-api-provider-azure-e2e-full ecf3eed link /test pull-cluster-api-provider-azure-e2e-full

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@CecileRobertMichon
Copy link
Contributor

VMSS flake :sigh: which is NOT an ELB flake, so that's good

/retest

@k8s-ci-robot k8s-ci-robot merged commit e9a2f2a into kubernetes-sigs:main Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants