Autoscaling selenium grid on kubernetes #1714

prashanth-volvocars · 2022-11-01T04:48:27Z

Description

Autoscale selenium browser nodes running in kubernetes based on the request pending in session queue using KEDA. Toggle autoscaling on/off using 'autoscalingEnabled' option in helm charts.

Motivation and Context

Auto scaling selenium grid was a problem that was pending to be solved for long. So i took it up when there was a requirement at current my work place. KEDA seemed to the best candidate for the Job and i wrote a new scalar for Selenium Grid a year ago. I would like to have this enabled by default in our charts so everyone could use it.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist

I have read the contributing document.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
All new and existing tests passed.

prashanth-volvocars · 2022-11-01T06:08:51Z

@diemol @jamesmortensen As discussed earlier, I have raised two separate PR's for autoscaling and video recoriding.

Autoscale selenium browser nodes running in kubernetes based on the request pending in session queue using KEDA. Toggle autoscaling on/off using 'autoscalingEnabled' option in helm charts.

jamesmortensen · 2022-11-03T18:39:10Z

Hi @prashanth-volvocars thank you for splitting them up. We'll take a look as soon as we can.

diemol

Thanks for this PR, @prashanth-volvocars!

I left some comments.

diemol · 2022-11-09T09:44:42Z

charts/selenium-grid/README.md

@@ -29,6 +29,11 @@ helm install selenium-grid docker-selenium/selenium-grid --version <version>
 helm install selenium-grid --set ingress.hostname=selenium-grid.k8s.local docker-selenium/chart/selenium-grid/.
 ```

+## Enable Selenium Grid Autoscaling
+Selenium Grid has the ability to autoscale browser nodes up/down based on the requests pending in session queue. You can enable it setting 'autoscalingEnabled' to `true`. You need to install KEDA by following the [instructions](https://keda.sh/docs/2.8/deploy/#helm) in order for autoscaling to work. 


Suggested change

Selenium Grid has the ability to autoscale browser nodes up/down based on the requests pending in session queue. You can enable it setting 'autoscalingEnabled' to `true`. You need to install KEDA by following the [instructions](https://keda.sh/docs/2.8/deploy/#helm) in order for autoscaling to work.

Selenium Grid has the ability to autoscale browser nodes up/down based on the pending requests in the

session queue.

You can enable it by setting 'autoscalingEnabled' to `true`. You need to install KEDA by following the

[instructions](https://keda.sh/docs/2.8/deploy/#helm) in order for autoscaling to work.

diemol · 2022-11-09T09:46:14Z

charts/selenium-grid/README.md

+## Enable Selenium Grid Autoscaling
+Selenium Grid has the ability to autoscale browser nodes up/down based on the requests pending in session queue. You can enable it setting 'autoscalingEnabled' to `true`. You need to install KEDA by following the [instructions](https://keda.sh/docs/2.8/deploy/#helm) in order for autoscaling to work. 
+
+The hpa.url value is configured to work for grid installed in `default` namespace. If you are installing the grid in some other namespace make sure to update the value of hpa.url accordingly. 


Suggested change

The hpa.url value is configured to work for grid installed in `default` namespace. If you are installing the grid in some other namespace make sure to update the value of hpa.url accordingly.

The `hpa.url` value is configured to work for Grid when installed in the `default` namespace. If you are installing

the Grid in some other namespace make sure to update the value of `hpa.url` accordingly.

diemol · 2022-11-09T09:50:23Z

charts/selenium-grid/values.yaml

@@ -363,7 +364,7 @@ chromeNode:
    # Custom annotations for service
    annotations: {}
  # Size limit for DSH volume mounted in container (if not set, default is "1Gi")
-  dshmVolumeSizeLimit: 1Gi
+  dshmVolumeSizeLimit: 2Gi


Why? This would affect people who are already using the chart, their clusters have been provisioned with the existing values.

We had some issues while running it along with video recording which vanished as soon as we increased the limit. Think it shouldn’t be part of this PR. Will revert this.

diemol · 2022-11-09T09:50:33Z

charts/selenium-grid/values.yaml

@@ -375,17 +376,17 @@ chromeNode:
    # failureThreshold: 120
    # periodSeconds: 5
  # Time to wait for pod termination
-  terminationGracePeriodSeconds: 30
+  terminationGracePeriodSeconds: 3600


If the autoscaler chooses to kill a pod that's running a test currently, we would have at most one hour for the running test to complete rather than 30 seconds.
If the test doesn't complete within the configured grace period it would result in failure so giving it the maxium allowed time would help reduce these failures.

Wouldn't this affect the setup of users who are already having this configuration? Isn't there an alternative to this hard coded value? One of the reasons we added the draining after X sessions feature was to enable this use case, so the pod could live until the session is done.

To expand on what @diemol writes:

Wouldn't it be more straight forward to use ScaledJob combined with draining after one session?
I wrote about it here:

SeleniumHQ/selenium#9845 (comment)

@diemol Killing after x sessions wouldn’t help. Consider this scenario where you create two sessions, one runs for 60 mins and another 1 min. As soon as the first is done, the hpa will try to bring down one of the pod irrespective of the configured x sessions. We have no control on which pod is chosen by hpa. This is not current flow but will happen when KEDA scales down.

Also I don’t understand the implication that would be caused by increasing it even for current users. No one would want their tests to be interrupted. Also with this value has no effect unless we have prestop lifecycle hook enabled. Even if that is enabled its gonna delay the pod termination only until the tes session is completed, which i think be desired behaviour

@prashanth-volvocars draining after X sessions will make the container exit and ideally the pod should stop, am I missing something?

I'd say that for most users ScaledJobs would be the best fit so this should be the default in the helm chart. Creating a pod is normally not expensive in Kubernetes; Fargate is a special case. But sure, you can skip that and I guess it's me or @diemol that will implement it.

@diemol yes, we no longer need X sessions as the pod can handle sessions as long as there are pending sessions in the queue.

@msvticket The default should be what works for every environment, not for a specific few. Jobs works for a few, not all, but pod works for all. If you think a scaled job suits your need best, it's available as an option that you can very well enable, but making it default will entirely break it for specific environments. Hope you understand.

What option? I don't see any mention of it in the chart.

diemol · 2022-11-09T09:52:16Z

charts/selenium-grid/values.yaml

+  lifecycle: 
+    preStop:
+      exec:
+        command:
+          - bash
+          - -c
+          - |
+            curl -X POST 127.0.0.1:5555/se/grid/node/drain --header 'X-REGISTRATION-SECRET;' && \
+            while curl 127.0.0.1:5555/status; do sleep 1; done;


Shouldn't this preStop be enabled only if autoscalingEnabled is true?

Makes sense. Will add the check

diemol · 2022-11-09T09:52:57Z

charts/selenium-grid/values.yaml

+  autoscalingEnabled: false
+  maxReplicaCount: 8
+  hpa:
+    url: http://selenium-hub.default:4444/graphql # Repalce your http graphql url here


selenium-hub.default can this be retrieved from the Hub or Router service or pod name?

I had the same thought too, but created the PR in the excitement of sharing it😁. Will add this too.

diemol · 2022-11-09T09:53:29Z

charts/selenium-grid/values.yaml

@@ -475,7 +485,7 @@ firefoxNode:
    # Custom annotations for service
    annotations: {}
  # Size limit for DSH volume mounted in container (if not set, default is "1Gi")
-  dshmVolumeSizeLimit: 1Gi
+  dshmVolumeSizeLimit: 2Gi


Same comment as above.

diemol · 2022-11-09T09:54:01Z

charts/selenium-grid/values.yaml

+    url: http://selenium-hub.default:4444/graphql # Repalce your http graphql url here
+    browserName: chrome
+    # browserVersion: '91.0' # Optional. Only required when supporting multiple versions of browser in your Selenium Grid.
+    unsafeSsl : 'true' # Optional


What does this mean?unsafeSsl?

@diemol https://keda.sh/docs/2.8/scalers/selenium-grid-scaler/

"Skip certificate validation when connecting over HTTPS."

diemol · 2022-11-09T09:54:11Z

charts/selenium-grid/values.yaml

@@ -487,17 +497,17 @@ firefoxNode:
    # failureThreshold: 120
    # periodSeconds: 5
  # Time to wait for pod termination
-  terminationGracePeriodSeconds: 30
+  terminationGracePeriodSeconds: 3600


Same comment as above.

diemol · 2022-11-09T09:54:18Z

charts/selenium-grid/values.yaml

+  lifecycle: 
+    preStop:
+      exec:
+        command:
+          - bash
+          - -c
+          - |
+            curl -X POST 127.0.0.1:5555/se/grid/node/drain --header 'X-REGISTRATION-SECRET;' && \
+            while curl 127.0.0.1:5555/status; do sleep 1; done;


Same comment as above.

Wolfe1 · 2022-11-15T15:09:46Z

charts/selenium-grid/values.yaml

+  maxReplicaCount: 8
+  hpa:
+    url: http://selenium-hub.default:4444/graphql # Repalce your http graphql url here
+    browserName: MicrosoftEdge


If you was Edge to work you will need to have "sessionBrowserName" as well set to "msedge"

Docs: https://keda.sh/docs/2.8/scalers/selenium-grid-scaler/
Issue regarding this: kedacore/keda#2709

qalinn · 2023-01-30T14:27:26Z

@prashanth-volvocars Any news regarding this PR?

QATew · 2023-05-01T21:35:48Z

@prashanth-volvocars @diemol Any updates on this?

prashanth-volvocars · 2023-05-20T07:36:14Z

@diemol i have added/addressed all the concerns raised. Please take a look

diemol · 2023-05-22T09:15:48Z

I will have a look after releasing 4.10.0, so in June.

jcputney · 2023-06-23T19:49:08Z

@diemol any updates on this and #1854 ? It seems to add a lot of needed features to this PR

prashanth-volvocars mentioned this pull request Nov 1, 2022

Autoscaling selenium grid on kubernetes with video recording #1689

Closed

8 tasks

Autoscaling selenium grid on kubernetes

96eca2e

Autoscale selenium browser nodes running in kubernetes based on the request pending in session queue using KEDA. Toggle autoscaling on/off using 'autoscalingEnabled' option in helm charts.

prashanth-volvocars force-pushed the k8s-dynamic-grid branch from 7e4f215 to 96eca2e Compare November 1, 2022 09:51

5ryde approved these changes Nov 3, 2022

View reviewed changes

Merge branch 'trunk' into k8s-dynamic-grid

b8ae285

diemol requested changes Nov 9, 2022

View reviewed changes

krmahadevan mentioned this pull request Nov 14, 2022

Dynamic Selenium 4 grid on kubernetes SeleniumHQ/selenium#9845

Closed

Wolfe1 reviewed Nov 15, 2022

View reviewed changes

mhnaeem mentioned this pull request Nov 16, 2022

[grid][chart] Allows users to turn off Deployment creation for Nodes #1709

Merged

8 tasks

Added all suggested changes

e53ea51

prashanth-volvocars requested a review from diemol April 30, 2023 07:00

msvticket mentioned this pull request May 23, 2023

Autoscaling selenium grid on kubernetes with scaledjobs #1854

Merged

8 tasks

prashanth-volvocars closed this Jun 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoscaling selenium grid on kubernetes #1714

Autoscaling selenium grid on kubernetes #1714

prashanth-volvocars commented Nov 1, 2022

prashanth-volvocars commented Nov 1, 2022

jamesmortensen commented Nov 3, 2022

diemol left a comment

diemol Nov 9, 2022

diemol Nov 9, 2022

diemol Nov 9, 2022

prashanth-volvocars Nov 15, 2022 •

edited

Loading

diemol Nov 9, 2022

prashanth-volvocars Nov 9, 2022

diemol Nov 10, 2022

msvticket Nov 14, 2022

prashanth-volvocars Nov 15, 2022

diemol Nov 15, 2022

msvticket Nov 19, 2022

prashanth-volvocars Nov 24, 2022

prashanth-volvocars Nov 24, 2022

msvticket Nov 24, 2022

diemol Nov 9, 2022

prashanth-volvocars Nov 9, 2022

diemol Nov 9, 2022

prashanth-volvocars Nov 15, 2022

diemol Nov 9, 2022

diemol Nov 9, 2022

Wolfe1 Nov 15, 2022

diemol Nov 9, 2022

diemol Nov 9, 2022

Wolfe1 Nov 15, 2022

qalinn commented Jan 30, 2023 •

edited

Loading

QATew commented May 1, 2023

prashanth-volvocars commented May 20, 2023

diemol commented May 22, 2023

jcputney commented Jun 23, 2023

	The hpa.url value is configured to work for grid installed in `default` namespace. If you are installing the grid in some other namespace make sure to update the value of hpa.url accordingly.
	The `hpa.url` value is configured to work for Grid when installed in the `default` namespace. If you are installing
	the Grid in some other namespace make sure to update the value of `hpa.url` accordingly.

Autoscaling selenium grid on kubernetes #1714

Autoscaling selenium grid on kubernetes #1714

Conversation

prashanth-volvocars commented Nov 1, 2022

Description

Motivation and Context

Types of changes

Checklist

prashanth-volvocars commented Nov 1, 2022

jamesmortensen commented Nov 3, 2022

diemol left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prashanth-volvocars Nov 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qalinn commented Jan 30, 2023 • edited Loading

QATew commented May 1, 2023

prashanth-volvocars commented May 20, 2023

diemol commented May 22, 2023

jcputney commented Jun 23, 2023

prashanth-volvocars Nov 15, 2022 •

edited

Loading

qalinn commented Jan 30, 2023 •

edited

Loading