-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoscaling selenium grid on kubernetes #1714
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -29,6 +29,11 @@ helm install selenium-grid docker-selenium/selenium-grid --version <version> | |||||||
helm install selenium-grid --set ingress.hostname=selenium-grid.k8s.local docker-selenium/chart/selenium-grid/. | ||||||||
``` | ||||||||
|
||||||||
## Enable Selenium Grid Autoscaling | ||||||||
Selenium Grid has the ability to autoscale browser nodes up/down based on the requests pending in session queue. You can enable it setting 'autoscalingEnabled' to `true`. You need to install KEDA by following the [instructions](https://keda.sh/docs/2.8/deploy/#helm) in order for autoscaling to work. | ||||||||
|
||||||||
The hpa.url value is configured to work for grid installed in `default` namespace. If you are installing the grid in some other namespace make sure to update the value of hpa.url accordingly. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
|
||||||||
## Updating Selenium-Grid release | ||||||||
|
||||||||
Once you have a new chart version, you can update your selenium-grid running: | ||||||||
|
@@ -66,6 +71,7 @@ This table contains the configuration parameters of the chart and their default | |||||||
| `ingress.annotations` | `{}` | Custom annotations for ingress resource | | ||||||||
| `ingress.hostname` | `selenium-grid.local` | Default host for the ingress resource | | ||||||||
| `ingress.tls` | `[]` | TLS backend configuration for ingress resource | | ||||||||
| `ingress.path` | `/` | Default path for ingress resource | | ||||||||
| `busConfigMap.annotations` | `{}` | Custom annotations for configmap | | ||||||||
| `chromeNode.enabled` | `true` | Enable chrome nodes | | ||||||||
| `chromeNode.deploymentEnabled` | `true` | Enable creation of Deployment for chrome nodes | | ||||||||
|
@@ -95,6 +101,11 @@ This table contains the configuration parameters of the chart and their default | |||||||
| `chromeNode.lifecycle` | `{}` | hooks to make pod correctly shutdown or started | | ||||||||
| `chromeNode.extraVolumeMounts` | `[]` | Extra mounts of declared ExtraVolumes into pod | | ||||||||
| `chromeNode.extraVolumes` | `[]` | Extra Volumes declarations to be used in the pod (can be any supported volume type: ConfigMap, Secret, PVC, NFS, etc.) | | ||||||||
| `chromeNode.autoscalingEnabled` | `false` | Enable/Disable autoscaling of browser nodes | | ||||||||
| `chromeNode.hpa.url` | `http://selenium-hub.default:4444/graphql` | Graphql Url of the hub or the router | | ||||||||
| `chromeNode.hpa.browserName` | `chrome` | BrowserName from the capability | | ||||||||
| `chromeNode.hpa.browserVersion` | `` | BrowserVersion from the capability | | ||||||||
| `chromeNode.maxReplicaCount` | `8` | Max number of replicas that this browsernode can auto scale up to | | ||||||||
| `firefoxNode.enabled` | `true` | Enable firefox nodes | | ||||||||
| `firefoxNode.deploymentEnabled` | `true` | Enable creation of Deployment for firefox nodes | | ||||||||
| `firefoxNode.replicas` | `1` | Number of firefox nodes | | ||||||||
|
@@ -123,6 +134,11 @@ This table contains the configuration parameters of the chart and their default | |||||||
| `firefoxNode.lifecycle` | `{}` | hooks to make pod correctly shutdown or started | | ||||||||
| `firefoxNode.extraVolumeMounts` | `[]` | Extra mounts of declared ExtraVolumes into pod | | ||||||||
| `firefoxNode.extraVolumes` | `[]` | Extra Volumes declarations to be used in the pod (can be any supported volume type: ConfigMap, Secret, PVC, NFS, etc.) | | ||||||||
| `firefoxNode.autoscalingEnabled` | `false` | Enable/Disable autoscaling of browser nodes | | ||||||||
| `firefoxNode.hpa.url` | `http://selenium-hub.default:4444/graphql` | Graphql Url of the hub or the router | | ||||||||
| `firefoxNode.hpa.browserName` | `firefox` | BrowserName from the capability | | ||||||||
| `firefoxNode.hpa.browserVersion` | `` | BrowserVersion from the capability | | ||||||||
| `firefoxNode.maxReplicaCount` | `8` | Max number of replicas that this browsernode can auto scale up to | | ||||||||
| `edgeNode.enabled` | `true` | Enable edge nodes | | ||||||||
| `edgeNode.deploymentEnabled` | `true` | Enable creation of Deployment for edge nodes | | ||||||||
| `edgeNode.replicas` | `1` | Number of edge nodes | | ||||||||
|
@@ -151,6 +167,11 @@ This table contains the configuration parameters of the chart and their default | |||||||
| `edgeNode.lifecycle` | `{}` | hooks to make pod correctly shutdown or started | | ||||||||
| `edgeNode.extraVolumeMounts` | `[]` | Extra mounts of declared ExtraVolumes into pod | | ||||||||
| `edgeNode.extraVolumes` | `[]` | Extra Volumes declarations to be used in the pod (can be any supported volume type: ConfigMap, Secret, PVC, NFS, etc.) | | ||||||||
| `edgeNode.autoscalingEnabled` | `false` | Enable/Disable autoscaling of browser nodes | | ||||||||
| `edgeNode.hpa.url` | `http://selenium-hub.default:4444/graphql` | Graphql Url of the hub or the router | | ||||||||
| `edgeNode.hpa.browserName` | `edge` | BrowserName from the capability | | ||||||||
| `edgeNode.hpa.browserVersion` | `` | BrowserVersion from the capability | | ||||||||
| `edgeNode.maxReplicaCount` | `8` | Max number of replicas that this browsernode can auto scale up to | | ||||||||
| `customLabels` | `{}` | Custom labels for k8s resources | | ||||||||
|
||||||||
|
||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
{{- if and .Values.chromeNode.enabled .Values.chromeNode.autoscalingEnabled }} | ||
apiVersion: keda.sh/v1alpha1 | ||
kind: ScaledObject | ||
metadata: | ||
name: selenium-grid-chrome-scaledobject | ||
namespace: {{ .Release.Namespace }} | ||
labels: | ||
deploymentName: {{ template "seleniumGrid.chromeNode.fullname" . }} | ||
spec: | ||
maxReplicaCount: {{ .Values.chromeNode.maxReplicaCount }} | ||
scaleTargetRef: | ||
name: {{ template "seleniumGrid.chromeNode.fullname" . }} | ||
triggers: | ||
- type: selenium-grid | ||
{{- with .Values.chromeNode.hpa }} | ||
metadata: {{- toYaml . | nindent 8 }} | ||
{{- end }} | ||
{{- end }} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
{{- if and .Values.edgeNode.enabled .Values.edgeNode.autoscalingEnabled }} | ||
apiVersion: keda.sh/v1alpha1 | ||
kind: ScaledObject | ||
metadata: | ||
name: selenium-grid-edge-scaledobject | ||
namespace: {{ .Release.Namespace }} | ||
labels: | ||
deploymentName: {{ template "seleniumGrid.edgeNode.fullname" . }} | ||
spec: | ||
maxReplicaCount: {{ .Values.edgeNode.maxReplicaCount }} | ||
scaleTargetRef: | ||
name: {{ template "seleniumGrid.edgeNode.fullname" . }} | ||
triggers: | ||
- type: selenium-grid | ||
{{- with .Values.edgeNode.hpa }} | ||
metadata: {{- toYaml . | nindent 8 }} | ||
{{- end }} | ||
{{- end }} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
{{- if and .Values.firefoxNode.enabled .Values.firefoxNode.autoscalingEnabled }} | ||
apiVersion: keda.sh/v1alpha1 | ||
kind: ScaledObject | ||
metadata: | ||
name: selenium-grid-firefox-scaledobject | ||
namespace: {{ .Release.Namespace }} | ||
labels: | ||
deploymentName: {{ template "seleniumGrid.firefoxNode.fullname" . }} | ||
spec: | ||
maxReplicaCount: {{ .Values.firefoxNode.maxReplicaCount }} | ||
scaleTargetRef: | ||
name: {{ template "seleniumGrid.firefoxNode.fullname" . }} | ||
triggers: | ||
- type: selenium-grid | ||
{{- with .Values.firefoxNode.hpa }} | ||
metadata: {{- toYaml . | nindent 8 }} | ||
{{- end }} | ||
{{- end }} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,6 +22,7 @@ ingress: | |
hostname: selenium-grid.local | ||
# TLS backend configuration for ingress resource | ||
tls: [] | ||
path: / | ||
|
||
# ConfigMap that contains SE_EVENT_BUS_HOST, SE_EVENT_BUS_PUBLISH_PORT and SE_EVENT_BUS_SUBSCRIBE_PORT variables | ||
busConfigMap: | ||
|
@@ -363,7 +364,7 @@ chromeNode: | |
# Custom annotations for service | ||
annotations: {} | ||
# Size limit for DSH volume mounted in container (if not set, default is "1Gi") | ||
dshmVolumeSizeLimit: 1Gi | ||
dshmVolumeSizeLimit: 2Gi | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why? This would affect people who are already using the chart, their clusters have been provisioned with the existing values. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We had some issues while running it along with video recording which vanished as soon as we increased the limit. Think it shouldn’t be part of this PR. Will revert this. |
||
# Priority class name for chrome-node pods | ||
priorityClassName: "" | ||
|
||
|
@@ -375,17 +376,17 @@ chromeNode: | |
# failureThreshold: 120 | ||
# periodSeconds: 5 | ||
# Time to wait for pod termination | ||
terminationGracePeriodSeconds: 30 | ||
terminationGracePeriodSeconds: 3600 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the autoscaler chooses to kill a pod that's running a test currently, we would have at most one hour for the running test to complete rather than 30 seconds. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't this affect the setup of users who are already having this configuration? Isn't there an alternative to this hard coded value? One of the reasons we added the draining after X sessions feature was to enable this use case, so the pod could live until the session is done. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To expand on what @diemol writes: Wouldn't it be more straight forward to use ScaledJob combined with draining after one session? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @diemol Killing after x sessions wouldn’t help. Consider this scenario where you create two sessions, one runs for 60 mins and another 1 min. As soon as the first is done, the hpa will try to bring down one of the pod irrespective of the configured x sessions. We have no control on which pod is chosen by hpa. This is not current flow but will happen when KEDA scales down. Also I don’t understand the implication that would be caused by increasing it even for current users. No one would want their tests to be interrupted. Also with this value has no effect unless we have prestop lifecycle hook enabled. Even if that is enabled its gonna delay the pod termination only until the tes session is completed, which i think be desired behaviour There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @prashanth-volvocars draining after X sessions will make the container exit and ideally the pod should stop, am I missing something? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd say that for most users ScaledJobs would be the best fit so this should be the default in the helm chart. Creating a pod is normally not expensive in Kubernetes; Fargate is a special case. But sure, you can skip that and I guess it's me or @diemol that will implement it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @diemol yes, we no longer need X sessions as the pod can handle sessions as long as there are pending sessions in the queue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @msvticket The default should be what works for every environment, not for a specific few. Jobs works for a few, not all, but pod works for all. If you think a scaled job suits your need best, it's available as an option that you can very well enable, but making it default will entirely break it for specific environments. Hope you understand. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What option? I don't see any mention of it in the chart. |
||
# Allow pod correctly shutdown | ||
lifecycle: {} | ||
# preStop: | ||
# exec: | ||
# command: | ||
# - bash | ||
# - -c | ||
# - | | ||
# curl -X POST 127.0.0.1:5555/se/grid/node/drain --header 'X-REGISTRATION-SECRET;' && \ | ||
# while curl 127.0.0.1:5555/status; do sleep 1; done | ||
lifecycle: | ||
preStop: | ||
exec: | ||
command: | ||
- bash | ||
- -c | ||
- | | ||
curl -X POST 127.0.0.1:5555/se/grid/node/drain --header 'X-REGISTRATION-SECRET;' && \ | ||
while curl 127.0.0.1:5555/status; do sleep 1; done; | ||
Comment on lines
+381
to
+389
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense. Will add the check |
||
|
||
extraVolumeMounts: [] | ||
# - name: my-extra-volume | ||
|
@@ -398,6 +399,15 @@ chromeNode: | |
# persistentVolumeClaim: | ||
# claimName: my-pv-claim | ||
|
||
# Keda scaled object configuration | ||
autoscalingEnabled: false | ||
maxReplicaCount: 8 | ||
hpa: | ||
url: http://selenium-hub.default:4444/graphql # Repalce your http graphql url here | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had the same thought too, but created the PR in the excitement of sharing it😁. Will add this too. |
||
browserName: chrome | ||
# browserVersion: '91.0' # Optional. Only required when supporting multiple versions of browser in your Selenium Grid. | ||
unsafeSsl : 'true' # Optional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does this mean? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @diemol https://keda.sh/docs/2.8/scalers/selenium-grid-scaler/ "Skip certificate validation when connecting over HTTPS." |
||
|
||
# Configuration for firefox nodes | ||
firefoxNode: | ||
# Enable firefox nodes | ||
|
@@ -475,7 +485,7 @@ firefoxNode: | |
# Custom annotations for service | ||
annotations: {} | ||
# Size limit for DSH volume mounted in container (if not set, default is "1Gi") | ||
dshmVolumeSizeLimit: 1Gi | ||
dshmVolumeSizeLimit: 2Gi | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as above. |
||
# Priority class name for firefox-node pods | ||
priorityClassName: "" | ||
|
||
|
@@ -487,17 +497,17 @@ firefoxNode: | |
# failureThreshold: 120 | ||
# periodSeconds: 5 | ||
# Time to wait for pod termination | ||
terminationGracePeriodSeconds: 30 | ||
terminationGracePeriodSeconds: 3600 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as above. |
||
# Allow pod correctly shutdown | ||
lifecycle: {} | ||
# preStop: | ||
# exec: | ||
# command: | ||
# - bash | ||
# - -c | ||
# - | | ||
# curl -X POST 127.0.0.1:5555/se/grid/node/drain --header 'X-REGISTRATION-SECRET;' && \ | ||
# while curl 127.0.0.1:5555/status; do sleep 1; done | ||
lifecycle: | ||
preStop: | ||
exec: | ||
command: | ||
- bash | ||
- -c | ||
- | | ||
curl -X POST 127.0.0.1:5555/se/grid/node/drain --header 'X-REGISTRATION-SECRET;' && \ | ||
while curl 127.0.0.1:5555/status; do sleep 1; done; | ||
Comment on lines
+502
to
+510
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as above. |
||
|
||
extraVolumeMounts: [] | ||
# - name: my-extra-volume | ||
|
@@ -509,6 +519,12 @@ firefoxNode: | |
# - name: my-extra-volume-from-pvc | ||
# persistentVolumeClaim: | ||
# claimName: my-pv-claim | ||
# Keda scaled object configuration | ||
autoscalingEnabled: false | ||
maxReplicaCount: 8 | ||
hpa: | ||
url: http://selenium-hub.default:4444/graphql # Repalce your http graphql url here | ||
browserName: firefox | ||
|
||
# Configuration for edge nodes | ||
edgeNode: | ||
|
@@ -587,7 +603,7 @@ edgeNode: | |
annotations: | ||
hello: world | ||
# Size limit for DSH volume mounted in container (if not set, default is "1Gi") | ||
dshmVolumeSizeLimit: 1Gi | ||
dshmVolumeSizeLimit: 2Gi | ||
# Priority class name for edge-node pods | ||
priorityClassName: "" | ||
|
||
|
@@ -599,17 +615,17 @@ edgeNode: | |
# failureThreshold: 120 | ||
# periodSeconds: 5 | ||
# Time to wait for pod termination | ||
terminationGracePeriodSeconds: 30 | ||
terminationGracePeriodSeconds: 3600 | ||
# Allow pod correctly shutdown | ||
lifecycle: {} | ||
# preStop: | ||
# exec: | ||
# command: | ||
# - bash | ||
# - -c | ||
# - | | ||
# curl -X POST 127.0.0.1:5555/se/grid/node/drain --header 'X-REGISTRATION-SECRET;' && \ | ||
# while curl 127.0.0.1:5555/status; do sleep 1; done | ||
lifecycle: | ||
preStop: | ||
exec: | ||
command: | ||
- bash | ||
- -c | ||
- | | ||
curl -X POST 127.0.0.1:5555/se/grid/node/drain --header 'X-REGISTRATION-SECRET;' && \ | ||
while curl 127.0.0.1:5555/status; do sleep 1; done; | ||
|
||
extraVolumeMounts: [] | ||
# - name: my-extra-volume | ||
|
@@ -621,6 +637,12 @@ edgeNode: | |
# - name: my-extra-volume-from-pvc | ||
# persistentVolumeClaim: | ||
# claimName: my-pv-claim | ||
# Keda scaled object configuration | ||
autoscalingEnabled: false | ||
maxReplicaCount: 8 | ||
hpa: | ||
url: http://selenium-hub.default:4444/graphql # Repalce your http graphql url here | ||
browserName: MicrosoftEdge | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you was Edge to work you will need to have "sessionBrowserName" as well set to "msedge" Docs: https://keda.sh/docs/2.8/scalers/selenium-grid-scaler/ |
||
|
||
# Custom labels for k8s resources | ||
customLabels: {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.