Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscaling selenium grid on kubernetes #1714

Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions charts/selenium-grid/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,11 @@ helm install selenium-grid docker-selenium/selenium-grid --version <version>
helm install selenium-grid --set ingress.hostname=selenium-grid.k8s.local docker-selenium/chart/selenium-grid/.
```

## Enable Selenium Grid Autoscaling
Selenium Grid has the ability to autoscale browser nodes up/down based on the requests pending in session queue. You can enable it setting 'autoscalingEnabled' to `true`. You need to install KEDA by following the [instructions](https://keda.sh/docs/2.8/deploy/#helm) in order for autoscaling to work.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Selenium Grid has the ability to autoscale browser nodes up/down based on the requests pending in session queue. You can enable it setting 'autoscalingEnabled' to `true`. You need to install KEDA by following the [instructions](https://keda.sh/docs/2.8/deploy/#helm) in order for autoscaling to work.
Selenium Grid has the ability to autoscale browser nodes up/down based on the pending requests in the
session queue.
You can enable it by setting 'autoscalingEnabled' to `true`. You need to install KEDA by following the
[instructions](https://keda.sh/docs/2.8/deploy/#helm) in order for autoscaling to work.


The hpa.url value is configured to work for grid installed in `default` namespace. If you are installing the grid in some other namespace make sure to update the value of hpa.url accordingly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The hpa.url value is configured to work for grid installed in `default` namespace. If you are installing the grid in some other namespace make sure to update the value of hpa.url accordingly.
The `hpa.url` value is configured to work for Grid when installed in the `default` namespace. If you are installing
the Grid in some other namespace make sure to update the value of `hpa.url` accordingly.


## Updating Selenium-Grid release

Once you have a new chart version, you can update your selenium-grid running:
Expand Down Expand Up @@ -66,6 +71,7 @@ This table contains the configuration parameters of the chart and their default
| `ingress.annotations` | `{}` | Custom annotations for ingress resource |
| `ingress.hostname` | `selenium-grid.local` | Default host for the ingress resource |
| `ingress.tls` | `[]` | TLS backend configuration for ingress resource |
| `ingress.path` | `/` | Default path for ingress resource |
| `busConfigMap.annotations` | `{}` | Custom annotations for configmap |
| `chromeNode.enabled` | `true` | Enable chrome nodes |
| `chromeNode.deploymentEnabled` | `true` | Enable creation of Deployment for chrome nodes |
Expand Down Expand Up @@ -95,6 +101,11 @@ This table contains the configuration parameters of the chart and their default
| `chromeNode.lifecycle` | `{}` | hooks to make pod correctly shutdown or started |
| `chromeNode.extraVolumeMounts` | `[]` | Extra mounts of declared ExtraVolumes into pod |
| `chromeNode.extraVolumes` | `[]` | Extra Volumes declarations to be used in the pod (can be any supported volume type: ConfigMap, Secret, PVC, NFS, etc.) |
| `chromeNode.autoscalingEnabled` | `false` | Enable/Disable autoscaling of browser nodes |
| `chromeNode.hpa.url` | `http://selenium-hub.default:4444/graphql` | Graphql Url of the hub or the router |
| `chromeNode.hpa.browserName` | `chrome` | BrowserName from the capability |
| `chromeNode.hpa.browserVersion` | `` | BrowserVersion from the capability |
| `chromeNode.maxReplicaCount` | `8` | Max number of replicas that this browsernode can auto scale up to |
| `firefoxNode.enabled` | `true` | Enable firefox nodes |
| `firefoxNode.deploymentEnabled` | `true` | Enable creation of Deployment for firefox nodes |
| `firefoxNode.replicas` | `1` | Number of firefox nodes |
Expand Down Expand Up @@ -123,6 +134,11 @@ This table contains the configuration parameters of the chart and their default
| `firefoxNode.lifecycle` | `{}` | hooks to make pod correctly shutdown or started |
| `firefoxNode.extraVolumeMounts` | `[]` | Extra mounts of declared ExtraVolumes into pod |
| `firefoxNode.extraVolumes` | `[]` | Extra Volumes declarations to be used in the pod (can be any supported volume type: ConfigMap, Secret, PVC, NFS, etc.) |
| `firefoxNode.autoscalingEnabled` | `false` | Enable/Disable autoscaling of browser nodes |
| `firefoxNode.hpa.url` | `http://selenium-hub.default:4444/graphql` | Graphql Url of the hub or the router |
| `firefoxNode.hpa.browserName` | `firefox` | BrowserName from the capability |
| `firefoxNode.hpa.browserVersion` | `` | BrowserVersion from the capability |
| `firefoxNode.maxReplicaCount` | `8` | Max number of replicas that this browsernode can auto scale up to |
| `edgeNode.enabled` | `true` | Enable edge nodes |
| `edgeNode.deploymentEnabled` | `true` | Enable creation of Deployment for edge nodes |
| `edgeNode.replicas` | `1` | Number of edge nodes |
Expand Down Expand Up @@ -151,6 +167,11 @@ This table contains the configuration parameters of the chart and their default
| `edgeNode.lifecycle` | `{}` | hooks to make pod correctly shutdown or started |
| `edgeNode.extraVolumeMounts` | `[]` | Extra mounts of declared ExtraVolumes into pod |
| `edgeNode.extraVolumes` | `[]` | Extra Volumes declarations to be used in the pod (can be any supported volume type: ConfigMap, Secret, PVC, NFS, etc.) |
| `edgeNode.autoscalingEnabled` | `false` | Enable/Disable autoscaling of browser nodes |
| `edgeNode.hpa.url` | `http://selenium-hub.default:4444/graphql` | Graphql Url of the hub or the router |
| `edgeNode.hpa.browserName` | `edge` | BrowserName from the capability |
| `edgeNode.hpa.browserVersion` | `` | BrowserVersion from the capability |
| `edgeNode.maxReplicaCount` | `8` | Max number of replicas that this browsernode can auto scale up to |
| `customLabels` | `{}` | Custom labels for k8s resources |


Expand Down
18 changes: 18 additions & 0 deletions charts/selenium-grid/templates/chrome-node-hpa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{{- if and .Values.chromeNode.enabled .Values.chromeNode.autoscalingEnabled }}
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: selenium-grid-chrome-scaledobject
namespace: {{ .Release.Namespace }}
labels:
deploymentName: {{ template "seleniumGrid.chromeNode.fullname" . }}
spec:
maxReplicaCount: {{ .Values.chromeNode.maxReplicaCount }}
scaleTargetRef:
name: {{ template "seleniumGrid.chromeNode.fullname" . }}
triggers:
- type: selenium-grid
{{- with .Values.chromeNode.hpa }}
metadata: {{- toYaml . | nindent 8 }}
{{- end }}
{{- end }}
18 changes: 18 additions & 0 deletions charts/selenium-grid/templates/edge-node-hpa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{{- if and .Values.edgeNode.enabled .Values.edgeNode.autoscalingEnabled }}
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: selenium-grid-edge-scaledobject
namespace: {{ .Release.Namespace }}
labels:
deploymentName: {{ template "seleniumGrid.edgeNode.fullname" . }}
spec:
maxReplicaCount: {{ .Values.edgeNode.maxReplicaCount }}
scaleTargetRef:
name: {{ template "seleniumGrid.edgeNode.fullname" . }}
triggers:
- type: selenium-grid
{{- with .Values.edgeNode.hpa }}
metadata: {{- toYaml . | nindent 8 }}
{{- end }}
{{- end }}
18 changes: 18 additions & 0 deletions charts/selenium-grid/templates/firefox-node-hpa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{{- if and .Values.firefoxNode.enabled .Values.firefoxNode.autoscalingEnabled }}
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: selenium-grid-firefox-scaledobject
namespace: {{ .Release.Namespace }}
labels:
deploymentName: {{ template "seleniumGrid.firefoxNode.fullname" . }}
spec:
maxReplicaCount: {{ .Values.firefoxNode.maxReplicaCount }}
scaleTargetRef:
name: {{ template "seleniumGrid.firefoxNode.fullname" . }}
triggers:
- type: selenium-grid
{{- with .Values.firefoxNode.hpa }}
metadata: {{- toYaml . | nindent 8 }}
{{- end }}
{{- end }}
4 changes: 4 additions & 0 deletions charts/selenium-grid/templates/ingress.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,11 @@ spec:
- http:
{{- end }}
paths:
{{- if $.Values.ingress.path }}
- path: {{ .Values.ingress.path }}
{{- else }}
- path: /
{{- end }}
pathType: Prefix
backend:
service:
Expand Down
88 changes: 55 additions & 33 deletions charts/selenium-grid/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ ingress:
hostname: selenium-grid.local
# TLS backend configuration for ingress resource
tls: []
path: /

# ConfigMap that contains SE_EVENT_BUS_HOST, SE_EVENT_BUS_PUBLISH_PORT and SE_EVENT_BUS_SUBSCRIBE_PORT variables
busConfigMap:
Expand Down Expand Up @@ -363,7 +364,7 @@ chromeNode:
# Custom annotations for service
annotations: {}
# Size limit for DSH volume mounted in container (if not set, default is "1Gi")
dshmVolumeSizeLimit: 1Gi
dshmVolumeSizeLimit: 2Gi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This would affect people who are already using the chart, their clusters have been provisioned with the existing values.

Copy link
Contributor Author

@prashanth-volvocars prashanth-volvocars Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had some issues while running it along with video recording which vanished as soon as we increased the limit. Think it shouldn’t be part of this PR. Will revert this.

# Priority class name for chrome-node pods
priorityClassName: ""

Expand All @@ -375,17 +376,17 @@ chromeNode:
# failureThreshold: 120
# periodSeconds: 5
# Time to wait for pod termination
terminationGracePeriodSeconds: 30
terminationGracePeriodSeconds: 3600
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the autoscaler chooses to kill a pod that's running a test currently, we would have at most one hour for the running test to complete rather than 30 seconds.
If the test doesn't complete within the configured grace period it would result in failure so giving it the maxium allowed time would help reduce these failures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this affect the setup of users who are already having this configuration? Isn't there an alternative to this hard coded value? One of the reasons we added the draining after X sessions feature was to enable this use case, so the pod could live until the session is done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To expand on what @diemol writes:

Wouldn't it be more straight forward to use ScaledJob combined with draining after one session?
I wrote about it here:

SeleniumHQ/selenium#9845 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diemol Killing after x sessions wouldn’t help. Consider this scenario where you create two sessions, one runs for 60 mins and another 1 min. As soon as the first is done, the hpa will try to bring down one of the pod irrespective of the configured x sessions. We have no control on which pod is chosen by hpa. This is not current flow but will happen when KEDA scales down.

Also I don’t understand the implication that would be caused by increasing it even for current users. No one would want their tests to be interrupted. Also with this value has no effect unless we have prestop lifecycle hook enabled. Even if that is enabled its gonna delay the pod termination only until the tes session is completed, which i think be desired behaviour

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prashanth-volvocars draining after X sessions will make the container exit and ideally the pod should stop, am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that for most users ScaledJobs would be the best fit so this should be the default in the helm chart. Creating a pod is normally not expensive in Kubernetes; Fargate is a special case. But sure, you can skip that and I guess it's me or @diemol that will implement it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diemol yes, we no longer need X sessions as the pod can handle sessions as long as there are pending sessions in the queue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msvticket The default should be what works for every environment, not for a specific few. Jobs works for a few, not all, but pod works for all. If you think a scaled job suits your need best, it's available as an option that you can very well enable, but making it default will entirely break it for specific environments. Hope you understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What option? I don't see any mention of it in the chart.

# Allow pod correctly shutdown
lifecycle: {}
# preStop:
# exec:
# command:
# - bash
# - -c
# - |
# curl -X POST 127.0.0.1:5555/se/grid/node/drain --header 'X-REGISTRATION-SECRET;' && \
# while curl 127.0.0.1:5555/status; do sleep 1; done
lifecycle:
preStop:
exec:
command:
- bash
- -c
- |
curl -X POST 127.0.0.1:5555/se/grid/node/drain --header 'X-REGISTRATION-SECRET;' && \
while curl 127.0.0.1:5555/status; do sleep 1; done;
Comment on lines +381 to +389
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this preStop be enabled only if autoscalingEnabled is true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Will add the check


extraVolumeMounts: []
# - name: my-extra-volume
Expand All @@ -398,6 +399,15 @@ chromeNode:
# persistentVolumeClaim:
# claimName: my-pv-claim

# Keda scaled object configuration
autoscalingEnabled: false
maxReplicaCount: 8
hpa:
url: http://selenium-hub.default:4444/graphql # Repalce your http graphql url here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selenium-hub.default can this be retrieved from the Hub or Router service or pod name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought too, but created the PR in the excitement of sharing it😁. Will add this too.

browserName: chrome
# browserVersion: '91.0' # Optional. Only required when supporting multiple versions of browser in your Selenium Grid.
unsafeSsl : 'true' # Optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?unsafeSsl?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diemol https://keda.sh/docs/2.8/scalers/selenium-grid-scaler/

"Skip certificate validation when connecting over HTTPS."


# Configuration for firefox nodes
firefoxNode:
# Enable firefox nodes
Expand Down Expand Up @@ -475,7 +485,7 @@ firefoxNode:
# Custom annotations for service
annotations: {}
# Size limit for DSH volume mounted in container (if not set, default is "1Gi")
dshmVolumeSizeLimit: 1Gi
dshmVolumeSizeLimit: 2Gi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

# Priority class name for firefox-node pods
priorityClassName: ""

Expand All @@ -487,17 +497,17 @@ firefoxNode:
# failureThreshold: 120
# periodSeconds: 5
# Time to wait for pod termination
terminationGracePeriodSeconds: 30
terminationGracePeriodSeconds: 3600
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

# Allow pod correctly shutdown
lifecycle: {}
# preStop:
# exec:
# command:
# - bash
# - -c
# - |
# curl -X POST 127.0.0.1:5555/se/grid/node/drain --header 'X-REGISTRATION-SECRET;' && \
# while curl 127.0.0.1:5555/status; do sleep 1; done
lifecycle:
preStop:
exec:
command:
- bash
- -c
- |
curl -X POST 127.0.0.1:5555/se/grid/node/drain --header 'X-REGISTRATION-SECRET;' && \
while curl 127.0.0.1:5555/status; do sleep 1; done;
Comment on lines +502 to +510
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.


extraVolumeMounts: []
# - name: my-extra-volume
Expand All @@ -509,6 +519,12 @@ firefoxNode:
# - name: my-extra-volume-from-pvc
# persistentVolumeClaim:
# claimName: my-pv-claim
# Keda scaled object configuration
autoscalingEnabled: false
maxReplicaCount: 8
hpa:
url: http://selenium-hub.default:4444/graphql # Repalce your http graphql url here
browserName: firefox

# Configuration for edge nodes
edgeNode:
Expand Down Expand Up @@ -587,7 +603,7 @@ edgeNode:
annotations:
hello: world
# Size limit for DSH volume mounted in container (if not set, default is "1Gi")
dshmVolumeSizeLimit: 1Gi
dshmVolumeSizeLimit: 2Gi
# Priority class name for edge-node pods
priorityClassName: ""

Expand All @@ -599,17 +615,17 @@ edgeNode:
# failureThreshold: 120
# periodSeconds: 5
# Time to wait for pod termination
terminationGracePeriodSeconds: 30
terminationGracePeriodSeconds: 3600
# Allow pod correctly shutdown
lifecycle: {}
# preStop:
# exec:
# command:
# - bash
# - -c
# - |
# curl -X POST 127.0.0.1:5555/se/grid/node/drain --header 'X-REGISTRATION-SECRET;' && \
# while curl 127.0.0.1:5555/status; do sleep 1; done
lifecycle:
preStop:
exec:
command:
- bash
- -c
- |
curl -X POST 127.0.0.1:5555/se/grid/node/drain --header 'X-REGISTRATION-SECRET;' && \
while curl 127.0.0.1:5555/status; do sleep 1; done;

extraVolumeMounts: []
# - name: my-extra-volume
Expand All @@ -621,6 +637,12 @@ edgeNode:
# - name: my-extra-volume-from-pvc
# persistentVolumeClaim:
# claimName: my-pv-claim
# Keda scaled object configuration
autoscalingEnabled: false
maxReplicaCount: 8
hpa:
url: http://selenium-hub.default:4444/graphql # Repalce your http graphql url here
browserName: MicrosoftEdge
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you was Edge to work you will need to have "sessionBrowserName" as well set to "msedge"

Docs: https://keda.sh/docs/2.8/scalers/selenium-grid-scaler/
Issue regarding this: kedacore/keda#2709


# Custom labels for k8s resources
customLabels: {}