Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e: reject unknown providers #73402

Merged
merged 3 commits into from
Feb 4, 2019

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Jan 28, 2019

What type of PR is this?
/kind cleanup

What this PR does / why we need it:

This finishes the work started for 1.13: instead of merely warning
about an unknown value given to --profile, the test/e2e/e2e.test
binary will now print an error and refuse to run.

That was the intended behavior, we just couldn't do it earlier because it broke some users of the binary. At least kubeadm testing is now fixed (kubernetes/test-infra#10913).

Which issue(s) this PR fixes:

Fixes #70200

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

e2e.test now rejects unknown --provider values instead of merely warning about them. An empty provider name is not accepted anymore and was replaced by "skeleton" (= a provider with no special behavior).

/sig testing
/cc @neolit123 @timothysc

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jan 28, 2019
@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/testing Categorizes an issue or PR as relevant to SIG Testing. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 28, 2019
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @pohly

/area test
/priority important-longterm
/lgtm

@kubernetes/sig-testing-pr-reviews
sent a ping in the #sig-testing chat as well.

@k8s-ci-robot k8s-ci-robot added area/test priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 28, 2019
@neolit123
Copy link
Member

pull-kubernetes-node-e2e this one is failing on all PRs ATM.

@BenTheElder
Copy link
Member

Needs gofmting for verify.

@pohly pohly force-pushed the e2e-vendor-parameter branch from 9d3ed49 to b4384e9 Compare January 28, 2019 15:35
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 28, 2019
@pohly
Copy link
Contributor Author

pohly commented Jan 28, 2019

/test ci-kubernetes-e2e-kubeadm-gce-1-13

Let's see whether the kubeadm job still works. I hope this runs it...

@neolit123
Copy link
Member

/test ci-kubernetes-e2e-kubeadm-gce-1-13
Let's see whether the kubeadm job still works. I hope this runs it...

that job moved to skeleton
one that i forgot to modify is:
https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes/sig-testing/bazel-build-test.yaml#L63

sending a PR in a bit.

@neolit123
Copy link
Member

@pohly
Copy link
Contributor Author

pohly commented Jan 28, 2019

/test pull-kubernetes-node-e2e

@neolit123
Copy link
Member

pull-kubernetes-node-e2e

this one might fail for a while due to a GCE zone outage.

@@ -264,7 +266,7 @@ func RegisterClusterFlags() {
flag.StringVar(&TestContext.KubeVolumeDir, "volume-dir", "/var/lib/kubelet", "Path to the directory containing the kubelet volumes.")
flag.StringVar(&TestContext.CertDir, "cert-dir", "", "Path to the directory containing the certs. Default is empty, which doesn't use certs.")
flag.StringVar(&TestContext.RepoRoot, "repo-root", "../../", "Root directory of kubernetes repository, for finding test files.")
flag.StringVar(&TestContext.Provider, "provider", "", "The name of the Kubernetes provider (gce, gke, local, etc.)")
flag.StringVar(&TestContext.Provider, "provider", "", "The name of the Kubernetes provider (gce, gke, local, skeleton, etc.)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not default to skeleton here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because "" traditionally had the meaning "fallback to 'local' with a specific warning". Selecting "skeleton" would change that.

I don't have any strong opinion either way, but as I don't know the background behind that behavior I don't want to be the one to change it.

Copy link
Member

@neolit123 neolit123 Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so both local and skeleton are null providers here:
https://github.com/kubernetes/kubernetes/blob/b4384e9f828b97c380b93483bea279d2a71d1d00/test/e2e/framework/provider.go#L60-L66

if the previous fallback from "" as provider was local then today setting to skeleton as default should not have a regressive effect (famous last words).

local and skeleton in test-infra/kubetest on the other hand are different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is whether anyone cares about having this log message when --provider is not set:
https://github.com/kubernetes/kubernetes/blob/b4384e9f828b97c380b93483bea279d2a71d1d00/test/e2e/framework/provider.go#L69

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we hard fail we should have a sane default imo.

Copy link
Member

@neolit123 neolit123 Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm convinced that log-dump.sh is evil at this point.
needs :atom: + 💣

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pohly so "" already does not fallback to "local" if i'm not wrong, as per:
https://github.com/kubernetes/kubernetes/blob/b4384e9f828b97c380b93483bea279d2a71d1d00/test/e2e/framework/provider.go#L68-L71

it effectively falls back to a unnamed null-provider in 1.13 and master, so from what i'm seeing we already modified the old behavior.

@timothysc

If we hard fail we should have a sane default imo.

from my understanding we are not failing but instead we are only showing a warning if --provider is not set (equals "").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neolit123 right, it currently basically falls back to "skeleton plus warning". Forget what I said about falling back to "local". I think that the current behavior is consistent with <1.13 but haven't checked.

I'm in favor of removing this odd log message and just making skeleton the explicit default. Any objections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. See latest commit in this PR.

Failf("Failed to setup fallback skeleton provider config: %v", err)
if os.IsNotExist(errors.Cause(err)) {
// Provide a more helpful error message when the provider is unknown.
var providers []string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a //TODO: when providers are extracted this code should be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a TODO and link to #70194

I've kept the wording a bit more open-ended because I can imagine scenarios where kubernetes/kubernetes itself has no provider-specific code, but still supports building custom testsuites where such code is included. That would allow different cloud providers to share common tests in a neutral location when those tests depend on cloud-provider specific code. Whether that is something that Kubernetes as an organization wants to support I don't know.

} else {
klog.Errorf("Failed to setup provider config for %q: %v", TestContext.Provider, err)
}
os.Exit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really a fan, but if your long term goal is to extract provider code then I'll be ok with this if there are tracking/umbrealla issues that outline path forwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't have plans in that direction, but it is getting tracked here: #70194

This finishes the work started for 1.13: instead of merely warning
about an unknown value given to --profile, the test/e2e/e2e.test
binary will now print an error and refuse to run.

Fixes: kubernetes#70200
@pohly pohly force-pushed the e2e-vendor-parameter branch from b4384e9 to f3d79e1 Compare January 28, 2019 18:57
The empty string was the default and then triggered a special
warning. There's no good reason for that behavior, so now the special
handling for "unset provider" is gone and "skeleton" is the non-empty
default for the value.
@pohly
Copy link
Contributor Author

pohly commented Jan 29, 2019

/retest

@neolit123
Copy link
Member

/lgtm
thanks for the update @pohly

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 29, 2019
@BenTheElder
Copy link
Member

/retest

1 similar comment
@pohly
Copy link
Contributor Author

pohly commented Jan 30, 2019

/retest

@pohly
Copy link
Contributor Author

pohly commented Jan 31, 2019

/test pull-kubernetes-node-e2e

@neolit123
Copy link
Member

/skip

@neolit123
Copy link
Member

@pohly
it seems the node-e2e test is failing with:

I0131 07:24:17.735] [8] E0131 07:24:13.285727    1314 test_context.go:420] Unknown provider "". The following providers are known: local skeleton
I0131 07:24:17.735] 
I0131 07:24:17.735] Ginkgo ran 1 suite in 3.021294514s
I0131 07:24:17.735] Test Suite Failed

hm, i though "" should be defaulting to skeleton at this point.

@pohly
Copy link
Contributor Author

pohly commented Jan 31, 2019

@neolit123 thanks for investigating the error, I wasn't sure whether it was just flakiness or caused by the PR.

The --provider parameter now defaults to skeleton when not set. But if it is passed, a valid provider must be specified, so --provider= is currently not okay.

Should --provider= be allowed? Do we then want to proceed with an empty TestContext.Provider string or silently continue as if --provider=skeleton had been used?

@neolit123
Copy link
Member

neolit123 commented Jan 31, 2019

Should --provider= be allowed? Do we then want to proceed with an empty TestContext.Provider string or silently continue as if --provider=skeleton had been used?

my initial reaction is that we should print a message that we are defaulting "" -> skeleton for both these cases:
A) --provider=
B) missing --provider flag

Not accepting --provider= (i.e. setting an empty provider name) broke
some test jobs. As suggested in
kubernetes#73402 (comment),
now --provider= and not passing --provider at all both trigger a
message and then continue as if --provider=skeleton had been used.
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 31, 2019
@pohly
Copy link
Contributor Author

pohly commented Jan 31, 2019

Pushed another commit with that change.

@neolit123
Copy link
Member

thanks you.
if CI passes and unless others have objections for the above, we can squash the commits and merge the PR.

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jan 31, 2019

@pohly: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-kops-aws dde3445 link /test pull-kubernetes-e2e-kops-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@pohly
Copy link
Contributor Author

pohly commented Jan 31, 2019

/test pull-kubernetes-verify

@neolit123
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 31, 2019
@pohly
Copy link
Contributor Author

pohly commented Jan 31, 2019

@neolit123 do you still want me to squash? I think it is not necessary, each commit makes sense by itself, and I would prefer to merge it like it is now (given that we have clean test results, a lgtm, etc.).

@neolit123
Copy link
Member

I think it is not necessary, each commit makes sense by itself, and I would prefer to merge it like it is now

seems fine to me!
would leave the call on that to @timothysc

@pohly
Copy link
Contributor Author

pohly commented Jan 31, 2019

@timothysc please have another look and if you agree then approve the PR.

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 4, 2019
@k8s-ci-robot k8s-ci-robot merged commit 732cb10 into kubernetes:master Feb 4, 2019
@neolit123
Copy link
Member

neolit123 commented Feb 5, 2019

looks like this PR still manged to break our signal even if it wasn't supposed to:
https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-all#kubeadm-gce-master

will investigate why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

e2e: strict --provider checking
5 participants