Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Various fixes for running e2e tests #67

Merged
merged 18 commits into from
Nov 1, 2017

Conversation

wallrj
Copy link
Member

@wallrj wallrj commented Oct 30, 2017

See #45 for discussion

NONE

@jetstack-bot
Copy link
Collaborator

@wallrj: Adding do-not-merge/release-note-label-needed because the release note process has not been followed.

One of the following labels is required "release-note", "release-note-action-required", or "release-note-none".
Please see: https://github.com/kubernetes/community/blob/master/contributors/devel/pull-requests.md#write-release-notes-if-needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wallrj
Copy link
Member Author

wallrj commented Oct 30, 2017

In #45 (comment) @munnerz wrote:

Can we switch this back? These test two different things - the api may be available whilst the controller is in a crash loop.

I don't mind putting it back but my thinking is that we shouldn't be poking around for the state of controller in e2e tests...unless the tests fail.
If the Navigator API is ready we should be able to create a Navigator database and then expect the database to start up in some time.

And in any case, the controller is running in the tests right now, but failing because RBAC policy prevents its leader election routine to run.

ERROR: logging before flag.Parse: I1030 17:40:31.035544       1 round_trippers.go:417] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: navigator-controller/v1.8.2 (linux/amd64) kubernetes/$Format/leader-election" -H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJkZWZhdWx0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6Im5hdi1lMmUtbmF2aWdhdG9yLWNvbnRyb2xsZXItdG9rZW4tenRsNDUiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibmF2LWUyZS1uYXZpZ2F0b3ItY29udHJvbGxlciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImI5MzM3ZmI3LWJkOTYtMTFlNy04ZWQ4LTUyNTQwMDg2NzBlYyIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpkZWZhdWx0Om5hdi1lMmUtbmF2aWdhdG9yLWNvbnRyb2xsZXIifQ.rq5OBsceqNzikBLbjIViy-yk1A22nn-dfRuRF_MWZiFjbRwBmMe4ZACk2O06mSPb-GaDvHS6ryeAaEwXNQZB_cyIKKgxPUabcRprgTh0-Ghl6K2w4d77s2gdERb-yBgRjffaa1QGAj_n8M0MbAGOVfHPvs4x8M83QnjIDfwmkIDw0u_-GboOWS1qKyb42sU3tFu7ByoMPqvlV7VX5gXmdJWcSKBoyv7GkoNFP_1bNp_2NieCC_XgmcmGAHOMxUgawFD4idtrMr3I3ReCIoC_p_mKJyqMaQKyNbuINRQ72lqgna3ZgCg7Nlo2h8eqoFvxOvKFWnsPA5HzAeRE7jxJRA" https://10.0.0.1:443/api/v1/namespaces/kube-system/endpoints/navigator-controller
ERROR: logging before flag.Parse: I1030 17:40:31.158865       1 round_trippers.go:436] GET https://10.0.0.1:443/api/v1/namespaces/kube-system/endpoints/navigator-controller 403 Forbidden in 123 milliseconds
ERROR: logging before flag.Parse: I1030 17:40:31.158889       1 round_trippers.go:442] Response Headers:
ERROR: logging before flag.Parse: I1030 17:40:31.158894       1 round_trippers.go:445]     Date: Mon, 30 Oct 2017 17:40:31 GMT
ERROR: logging before flag.Parse: I1030 17:40:31.158898       1 round_trippers.go:445]     Content-Type: text/plain
ERROR: logging before flag.Parse: I1030 17:40:31.158901       1 round_trippers.go:445]     X-Content-Type-Options: nosniff
ERROR: logging before flag.Parse: I1030 17:40:31.158905       1 round_trippers.go:445]     Content-Length: 118
ERROR: logging before flag.Parse: I1030 17:40:31.158928       1 request.go:836] Response Body: User "system:serviceaccount:default:nav-e2e-navigator-controller" cannot get endpoints in the namespace "kube-system".
ERROR: logging before flag.Parse: E1030 17:40:31.158961       1 leaderelection.go:224] error retrieving resource lock kube-system/navigator-controller: User "system:serviceaccount:default:nav-e2e-navigator-controller" cannot get endpoints in the namespace "kube-system". (get endpoints navigator-controller)
ERROR: logging before flag.Parse: I1030 17:40:31.158970       1 leaderelection.go:180] failed to acquire lease kube-system/navigator-controller

We should fix that.

@munnerz
Copy link
Contributor

munnerz commented Oct 31, 2017

@wallrj agreed that it'd be nice to do a full test like that, but right now we don't have the testing infra to stand up full DB clusters due to resource limits on Travis. I'm working to fix this via our new test-infra, but for the time being, it's important we know whether the controller will work with the default configuration, so I'd like to keep it in.

Yep we should fix that (#68), and also should update the tests so they fail as a result of the leader election failing (through a health check perhaps). But this second part is a bit lower priority, as failing leader election is unlikely to be the cause of test failures (once we fix #68)

.travis.yml Outdated
@@ -18,9 +18,6 @@ jobs:
# Create a cluster. We do this as root as we are using the 'docker' driver.
# We enable RBAC on the cluster too, to test the RBAC in Navigators chart
- sudo -E CHANGE_MINIKUBE_NONE_USER=true minikube start --vm-driver=none --kubernetes-version="$KUBERNETES_VERSION" --extra-config=apiserver.Authorization.Mode=RBAC
- while true; do if kubectl get nodes; then break; fi; echo "Waiting 5s for kubernetes to be ready..."; sleep 5; done
# Fix problems with kube-dns + rbac
- kubectl create serviceaccount --namespace kube-system kube-dns
Copy link
Member Author

@wallrj wallrj Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed these steps because they are already performed in the prepare-e2e.sh script.

retry TIMEOUT=600 kubectl get nodes

echo "Waiting for tiller to be ready..."
retry TIMEOUT=60 helm version
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And these steps are also performed in prepare-e2e.sh, so I removed them.

kubectl logs -c controller -l app=navigator,component=controller
exit 1
FAILURE_COUNT=$(($FAILURE_COUNT+1))
echo "TEST FAILURE: $1"
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that all the tests are performed rather failing on the first test.

--namespace "${USER_NAMESPACE}" \
service es-demo; then
fail_test "Navigator controller failed to create elasticsearchcluster service"
fi
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check to see that a service is created....a clear indication that the navigator controller is reacting to the new escluster resource.

CERT_DIR="$CONFIG_DIR/certs"
mkdir -p $CERT_DIR
TEST_DIR="$CONFIG_DIR/tmp"
mkdir -p $TEST_DIR
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this stuff is needed in this script.

kind: ServiceAccount
metadata:
name: kube-dns
namespace: kube-system
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this here rather than creating the service in the travis.yml file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning behind the separation is so that we are not opinionated in our e2e test scripts about using minikube. It'd be ideal if the e2e tests could be run against any valid/functioning kubernetes cluster (this is more relevant when it comes to us running our tests on GCE/GKE).

Happy to accept this for now, but I think there'll be more refactoring of this stuff soon 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. Well thanks for merging this anyway.

@wallrj
Copy link
Member Author

wallrj commented Oct 31, 2017

@munnerz Ok. I've added back the check for apiserver and controller replicas and added another check for kubectl get esc which was also failing when I run the tests locally.

I know you're planning to create e2e tests with Ginkgo / prow / gce, but I implore you to merge this change so that I can continue to perform some quick local e2e tests on my laptop.

@wallrj
Copy link
Member Author

wallrj commented Nov 1, 2017

e2e test failed because of failed kubectl get esc which I suspect was caused by kube-dns failing:

+kubectl get pods --all-namespaces

NAMESPACE     NAME                                                                 READY     STATUS             RESTARTS   AGE

default       nav-e2e-navigator-apiserver-4058778657-jdmx9                         2/2       Running            0          1m

default       nav-e2e-navigator-controller-2991077948-7l4s5                        1/1       Running            0          1m

kube-system   kube-addon-manager-travis-job-345a1919-cc3d-44d6-8177-e50007723a0a   1/1       Running            0          8m

kube-system   kube-dns-1326421443-q1p8c                                            2/3       CrashLoopBackOff   7          8m

kube-system   kubernetes-dashboard-qb2lw                                           0/1       CrashLoopBackOff   6          8m

kube-system   tiller-deploy-3341511835-k8zxt                                       1/1       Running            0          1m

/retest

@munnerz
Copy link
Contributor

munnerz commented Nov 1, 2017

@wallrj /retest does not work for travis jobs :(

I've hit restart. Are you able to restart jobs on travis?

@wallrj
Copy link
Member Author

wallrj commented Nov 1, 2017

Ah yes, I see it here: https://travis-ci.org/jetstack/navigator/pull_requests

Thanks @munnerz

@munnerz
Copy link
Contributor

munnerz commented Nov 1, 2017

/lgtm
/approve

@jetstack-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: munnerz

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@jetstack-bot
Copy link
Collaborator

Automatic merge from submit-queue.

@jetstack-bot jetstack-bot merged commit 3c8a77e into jetstack:master Nov 1, 2017
@wallrj wallrj deleted the 27-e2e-fixes branch November 1, 2017 15:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants