Readiness probe failed: HTTP probe failed with statuscode: 502 #603

bramvdklinkenberg opened this issue Jun 1, 2018 · 20 comments


I am trying to deploy the zalenium helm chart in my newly deployed aks kuberbetes (1.9.6) cluster in Azure. But I don't get it to work. The pod is giving the log below:

[bram@xforce zalenium]$ kubectl logs -f zalenium-zalenium-hub-6bbd86ff78-m25t2 Kubernetes service account found. Copying files for Dashboard... cp: cannot create regular file '/home/seluser/videos/index.html': Permission denied cp: cannot create directory '/home/seluser/videos/css': Permission denied cp: cannot create directory '/home/seluser/videos/js': Permission denied Starting Nginx reverse proxy... Starting Selenium Hub... ..........08:49:14.052 [main] INFO o.o.grid.selenium.GridLauncherV3 - Selenium build info: version: '3.12.0', revision: 'unknown' 08:49:14.120 [main] INFO o.o.grid.selenium.GridLauncherV3 - Launching Selenium Grid hub on port 4445 ...08:49:15.125 [main] INFO d.z.e.z.c.k.KubernetesContainerClient - Initialising Kubernetes support ..08:49:15.650 [main] WARN d.z.e.z.c.k.KubernetesContainerClient - Error initialising Kubernetes support. io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [zalenium-zalenium-hub-6bbd86ff78-m25t2] in namespace: [default] failed. at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable( at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable( at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory( at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get( at de.zalando.ep.zalenium.container.kubernetes.KubernetesContainerClient.<init>( at de.zalando.ep.zalenium.container.ContainerFactory.createKubernetesContainerClient( at de.zalando.ep.zalenium.container.ContainerFactory.getContainerClient( at de.zalando.ep.zalenium.proxy.DockeredSeleniumStarter.<clinit>( at de.zalando.ep.zalenium.registry.ZaleniumRegistry.<init>( at de.zalando.ep.zalenium.registry.ZaleniumRegistry.<init>( at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance( at sun.reflect.DelegatingConstructorAccessorImpl.newInstance( at java.lang.reflect.Constructor.newInstance( at java.lang.Class.newInstance( at org.openqa.grid.web.Hub.<init>( at org.openqa.grid.selenium.GridLauncherV3$2.launch( at org.openqa.grid.selenium.GridLauncherV3.launch( at org.openqa.grid.selenium.GridLauncherV3.main( Caused by: Hostname kubernetes.default.svc not verified: certificate: sha256/OyzkRILuc6LAX4YnMAIGrRKLmVnDgLRvCasxGXDhSoc= DN: CN=client, O=system:masters subjectAltNames: [] at okhttp3.internal.connection.RealConnection.connectTls( at okhttp3.internal.connection.RealConnection.establishProtocol( at okhttp3.internal.connection.RealConnection.connect( at okhttp3.internal.connection.StreamAllocation.findConnection( at okhttp3.internal.connection.StreamAllocation.findHealthyConnection( at okhttp3.internal.connection.StreamAllocation.newStream( at okhttp3.internal.connection.ConnectInterceptor.intercept( at okhttp3.internal.http.RealInterceptorChain.proceed( at okhttp3.internal.http.RealInterceptorChain.proceed( at okhttp3.internal.cache.CacheInterceptor.intercept( at okhttp3.internal.http.RealInterceptorChain.proceed( at okhttp3.internal.http.RealInterceptorChain.proceed( at okhttp3.internal.http.BridgeInterceptor.intercept( at okhttp3.internal.http.RealInterceptorChain.proceed( at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept( at okhttp3.internal.http.RealInterceptorChain.proceed( at okhttp3.internal.http.RealInterceptorChain.proceed( at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept( at okhttp3.internal.http.RealInterceptorChain.proceed( at okhttp3.internal.http.RealInterceptorChain.proceed( at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept( at okhttp3.internal.http.RealInterceptorChain.proceed( at okhttp3.internal.http.RealInterceptorChain.proceed( at okhttp3.RealCall.getResponseWithInterceptorChain( at okhttp3.RealCall.execute( at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse( at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse( at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet( at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet( at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet( at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory( ... 16 common frames omitted 08:49:15.651 [main] INFO d.z.e.z.c.k.KubernetesContainerClient - About to clean up any left over selenium pods created by Zalenium Usage: <main class> [options] Options: --debug, -debug <Boolean> : enables LogLevel.FINE. Specify multiple on the command line: -withoutServlet -withoutServlet org.openqa.grid.common.exception.GridConfigurationException: Error creating class with de.zalando.ep.zalenium.registry.ZaleniumRegistry : null at org.openqa.grid.web.Hub.<init>( at org.openqa.grid.selenium.GridLauncherV3$2.launch( at org.openqa.grid.selenium.GridLauncherV3.launch( at org.openqa.grid.selenium.GridLauncherV3.main( Caused by: java.lang.ExceptionInInitializerError at de.zalando.ep.zalenium.registry.ZaleniumRegistry.<init>( at de.zalando.ep.zalenium.registry.ZaleniumRegistry.<init>( at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance( at sun.reflect.DelegatingConstructorAccessorImpl.newInstance( at java.lang.reflect.Constructor.newInstance( at java.lang.Class.newInstance( at org.openqa.grid.web.Hub.<init>( ... 3 more Caused by: java.lang.NullPointerException at java.util.TreeMap.putAll( at io.fabric8.kubernetes.client.dsl.base.BaseOperation.withLabels( at io.fabric8.kubernetes.client.dsl.base.BaseOperation.withLabels( at de.zalando.ep.zalenium.container.kubernetes.KubernetesContainerClient.deleteSeleniumPods( at de.zalando.ep.zalenium.container.kubernetes.KubernetesContainerClient.initialiseContainerEnvironment( at de.zalando.ep.zalenium.container.ContainerFactory.createKubernetesContainerClient( at de.zalando.ep.zalenium.container.ContainerFactory.getContainerClient( at de.zalando.ep.zalenium.proxy.DockeredSeleniumStarter.<clinit>( ... 11 more ...........................................................................................................................................................................................GridLauncher failed to start after 1 minute, failing... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 182 100 182 0 0 36103 0 --:--:-- --:--:-- --:--:-- 45500

A describe pod gives:
Warning Unhealthy 4m (x12 over 6m) kubelet, aks-agentpool-93668098-0 Readiness probe failed: HTTP probe failed with statuscode: 502

Zalenium Image Version(s):

If using Kubernetes, specify your environment, and if relevant your manifests:
I use the templates as is from

Expected Behavior -

The zalenium pods to run

Actual Behavior - See above

bramvdklinkenberg commented Jun 1, 2018

I guess it has to do something with rbac because of this part
"Error initialising Kubernetes support. io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [zalenium-zalenium-hub-6bbd86ff78-m25t2] in namespace: [default] failed. at "

I created a clusterrole and clusterrolebinding for the service account zalenium-zalenium that is automatically created by the helm chart.

kubectl create clusterrole zalenium --verb=get,list,watch,update,delete,create,patch --resource=pods,deployments,secrets

kubectl create clusterrolebinding zalenium --clusterrole=zalenium --serviceaccount=zalenium-zalenium --namespace=default

pearj commented Jun 2, 2018

This problem:
cp: cannot create regular file '/home/seluser/videos/index.html': Permission denied cp: cannot create directory '/home/seluser/videos/css': Permission denied cp: cannot create directory '/home/seluser/videos/js':
Is because you need to mount a volume at /home/seluser/videos.

Regarding the role and rolebinding, take a look at:

I think whoever contributed the helm chart wasn't using a cluster that had RBAC enabled.

bramvdklinkenberg commented Jun 2, 2018

@pearj I deployed it on an AKS cluster which has rbac disabled (for now).
even with the clusterrole and clusterrolebinding given in the plumbing yaml I still get the same error.

Instead of the helm deployment I also tried the seperate yaml files... i created a clusterrole and clusterrolebinding but same error.

locally with minikube I get it to work when I create a clusterrolebinding of the zalenium-zalenium serviceaccount with the clusterrole cluster-admin.

I deployed the application the exact same way on an ACS cluster (Azure) and it works.

diemol commented Jun 5, 2018

Do you know what the difference is between ACS and AKS?

Besides that, I am not sure how to help, and it actually might be the first one trying to deploy Zalenium en AKS :) So I hope you get some success there and perhaps you can help us to improve the docs!

The differences shouldn't be much. AKS is a Kubernetes PaaS solution on Azure and ACS is also a Kubernetes service but more IaaS. Both don't have RBAC enabled. Only difference k8s wise is that ACS is running 1.7.7 and AKS is running 1.9.6.
Going to test if zalenium works in AKS with version 1.7.7 works or not.

It works with AKS and k8s version 1.7.7.... but it also works with minikube and k8s version 1.10.0...
bit lost at the moment why it doesn't work with aks and k8s v1.9.6 and higher.
going to dive into it.

pearj commented Jun 6, 2018

Maybe you’re not allowed to create cluster role bindings in aks? Only role bindings?
Zalenium doesn’t specifically need a cluster role binding. You could grant the admin role for the namespace to the service account.

I can create clusterrolebindings, but that shouldn't be the issue since rbac is not enabled on AKS (or ACS).
I can just do a helm install of the chart without having to do anything with clusterrolebindings.
But only on AKS with k8s v1.9.6 it doesn't work.

pearj commented Jun 6, 2018

@bramvdklinkenberg Looks like your error is:
Caused by: Hostname kubernetes.default.svc not verified: certificate: sha256/OyzkRILuc6LAX4YnMAIGrRKLmVnDgLRvCasxGXDhSoc= DN: CN=client, O=system:masters subjectAltNames: [] at okhttp3.internal.connection.RealConnection.connectTls( at
Which is kinda weird, because the kubernetes client can normally find the k8s ca certificate automatically. unless v1.9.6 puts the ca cert in a different location?
Regardless, if you know where it is ending up on disk you can specify some environment variables that override the default kubernetes-client behaviour, see:
It is probably the KUBERNETES_CERTS_CA_FILE environment variable you're after.

pearj commented Jun 6, 2018

It actually kinda looks like certificate that the kubernetes api server is giving you is actually wrong. Maybe file a bug with microsoft?

The subject alt name is:, instead of kubernetes.default.svc

Maybe it's worth setting KUBERNETES_MASTER=, as kubernetes.default.svc appears to have a broken certificate

pearj commented Jun 6, 2018

Here's the issue: Azure/AKS#399

@pearj thanks! I will also create a support request in the Azure portal and refer to the github issue(s).

pearj commented Jun 6, 2018

Looks like the next release of kubernetes-client contains a patch that will use the KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT environment variables instead of defaulting to kubernetes.default.svc.

However, in the meantime I'm pretty sure if you set KUBERNETES_MASTER= as an environment variable on the zalenium container, that will fix your problem too.

I added the KUBERNETS_MASTER env to the chart and redeployed the chart. it works now!


diemol commented Jun 11, 2018

That's great @bramvdklinkenberg!

Thanks @pearj for all the troubleshooting :)

Closing this issue.

@diemol diemol closed this as completed Jun 11, 2018
Latest comment in issue Azure AKS 399:
"We identified the bug. This impacts AKS clusters with newer infrastructure feature. We will update here once the rollout is completed"

Hi there - Apologies to hijack this thread, but heard you have been working on getting Zalanium working with Kuberentes on Azure...

We have a selenium grid working on Kubernetes, but wanted to get Zalaneium working - are you able to share how this should work? (in light of the bug mentioned above).

This is the sequence of commands we currently use to bring up Kuberenetes having created the resource group already via the interface;

az aks get-credentials --resource-group XXX--name XXXX
kubectl run XXX --image selenium/hub:3.11.0 --port 4444
kubectl expose deployment XXXX --type=LoadBalancer --name=selenium-hub
kubectl get service selenium-hub --watch
kubectl run selenium-node-chrome --image selenium/node-chrome:3.11.0 --env="HUB_PORT_4444_TCP_ADDR=selenium-hub" --env="HUB_PORT_4444_TCP_PORT=4444"
kubectl scale deployment selenium-node-chrome --replicas=XX

az aks browse --resource-group XXX --name XXXX

Obviously the documentation for Zalenium gives docker commands to use with miniKube, to work locally, so unsure on how to get them to work on Azure/Cloud with Kuberentes?

Any help or suggestions would be valued.



bramvdklinkenberg commented Jun 25, 2018

@WFTesterMikeB , the issue I had is solved. That was an AKS/Kubernetes issue.
I deployed it using helm.

With the --set command or the values.yaml you can set specific configuration for your zalenium deployment.

