Skip to content

Commit

Permalink
Add option to enable worker graceful shutdown
Browse files Browse the repository at this point in the history
  • Loading branch information
sdaberdaku committed Sep 25, 2024
1 parent 455a1a8 commit ebb5efa
Show file tree
Hide file tree
Showing 11 changed files with 184 additions and 13 deletions.
27 changes: 25 additions & 2 deletions charts/trino/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -494,9 +494,11 @@ Fast distributed SQL query engine for big data analytics that helps you explore

Allows mounting additional Trino configuration files from Kubernetes secrets on the coordinator node.
Example:
```yaml
- name: sample-secret
secretName: sample-secret
path: /secrets/sample.json
```
* `worker.jvm.maxHeapSize` - string, default: `"8G"`
* `worker.jvm.gcMethod.type` - string, default: `"UseG1GC"`
* `worker.jvm.gcMethod.g1.heapRegionSize` - string, default: `"32M"`
Expand Down Expand Up @@ -550,13 +552,34 @@ Fast distributed SQL query engine for big data analytics that helps you explore
```
* `worker.lifecycle` - object, default: `{}`

To enable [graceful shutdown](https://trino.io/docs/current/admin/graceful-shutdown.html), define a lifecycle preStop like bellow, Set the `terminationGracePeriodSeconds` to a value greater than or equal to the configured `shutdown.grace-period`. Configure `shutdown.grace-period` in `additionalConfigProperties` as `shutdown.grace-period=2m` (default is 2 minutes). Also configure `accessControl` because the `default` system access control does not allow graceful shutdowns.
Worker container [lifecycle events](https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/)
Example:
```yaml
preStop:
exec:
command: ["/bin/sh", "-c", "curl -v -X PUT -d '\"SHUTTING_DOWN\"' -H \"Content-type: application/json\" http://localhost:8081/v1/info/state"]
command: ["/bin/sh", "-c", "sleep 120"]
```
If provided, it will override the `preStop` lifecycle event configured by `gracefulShutdown`.
* `worker.gracefulShutdown` - object, default: `{"accessControl":{"configFile":"graceful-shutdown-rules.json","user":"admin"},"enabled":false,"gracePeriod":"2m"}`

Configure [graceful shutdown](https://trino.io/docs/current/admin/graceful-shutdown.html)
Example:
```yaml
gracefulShutdown:
enabled: true
gracePeriod: 2m
accessControl:
user: admin
configFile: graceful-shutdown-rules.json
```
Enabling this feature will:
1) Add a `preStop` lifecycle event to all worker Pods;
2) Set the `shutdown.grace-period` configuration property to `gracePeriod`;
3) Configure the workers' `accessControl` since the `default` system access control [does not allow graceful
shutdowns](https://trino.io/docs/current/admin/graceful-shutdown.html).
The user must set the `terminationGracePeriodSeconds` to a value of at least two times the configured `gracePeriod`.
The worker that receives the graceful shutdown request [will sleep for `gracePeriod` twice](https://trino.io/docs/current/admin/graceful-shutdown.html#shutdown-behavior).
Setting `worker.lifecycle` will override the `preStop` event set by this configuration.
* `worker.terminationGracePeriodSeconds` - int, default: `30`
* `worker.nodeSelector` - object, default: `{}`
* `worker.tolerations` - list, default: `[]`
Expand Down
25 changes: 25 additions & 0 deletions charts/trino/templates/configmap-access-control-worker.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{{- if .Values.worker.gracefulShutdown.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ template "trino.fullname" . }}-access-control-volume-worker
namespace: {{ .Release.Namespace }}
labels:
{{- include "trino.labels" . | nindent 4 }}
app.kubernetes.io/component: worker
data:
{{- with .Values.worker.gracefulShutdown.accessControl }}
{{ .configFile }}: >-
{
"system_information": [
{
"allow": [
"read",
"write"
],
"user": "{{ .user }}"
}
]
}
{{- end }}
{{- end }}
3 changes: 3 additions & 0 deletions charts/trino/templates/configmap-coordinator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,9 @@ data:
jmx.rmiregistry.port={{ .Values.jmx.registryPort }}
jmx.rmiserver.port={{ .Values.jmx.serverPort }}
{{- end }}
{{- if .Values.worker.gracefulShutdown.enabled }}
shutdown.grace-period={{ .Values.worker.gracefulShutdown.gracePeriod }}
{{- end }}
{{- if .Values.server.coordinatorExtraConfig }}
{{- .Values.server.coordinatorExtraConfig | nindent 4 }}
{{- end }}
Expand Down
9 changes: 9 additions & 0 deletions charts/trino/templates/configmap-worker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -57,10 +57,19 @@ data:
{{- range $configValue := .Values.additionalConfigProperties }}
{{ $configValue }}
{{- end }}
{{- if .Values.worker.gracefulShutdown.enabled }}
shutdown.grace-period={{ .Values.worker.gracefulShutdown.gracePeriod }}
{{- end }}
{{- if .Values.server.workerExtraConfig }}
{{- .Values.server.workerExtraConfig | nindent 4 }}
{{- end }}
{{- if .Values.worker.gracefulShutdown.enabled }}
access-control.properties: |
access-control.name=file
security.config-file={{ .Values.server.config.path }}/access-control/{{ .Values.worker.gracefulShutdown.accessControl.configFile }}
{{- end }}

{{- if .Values.server.exchangeManager }}
exchange-manager.properties: |
exchange-manager.name={{ .Values.server.exchangeManager.name }}
Expand Down
2 changes: 1 addition & 1 deletion charts/trino/templates/deployment-coordinator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ spec:
metadata:
annotations:
{{- if and (eq .Values.accessControl.type "configmap") (not .Values.accessControl.refreshPeriod) }}
checksum/access-control-config: {{ include (print $.Template.BasePath "/configmap-access-control.yaml") . | sha256sum }}
checksum/access-control-config: {{ include (print $.Template.BasePath "/configmap-access-control-coordinator.yaml") . | sha256sum }}
{{- end }}
checksum/catalog-config: {{ include (print $.Template.BasePath "/configmap-catalog.yaml") . | sha256sum }}
checksum/coordinator-config: {{ include (print $.Template.BasePath "/configmap-coordinator.yaml") . | sha256sum }}
Expand Down
26 changes: 26 additions & 0 deletions charts/trino/templates/deployment-worker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ spec:
annotations:
checksum/catalog-config: {{ include (print $.Template.BasePath "/configmap-catalog.yaml") . | sha256sum }}
checksum/worker-config: {{ include (print $.Template.BasePath "/configmap-worker.yaml") . | sha256sum }}
{{- if .Values.worker.gracefulShutdown.enabled }}
checksum/access-control-config: {{ include (print $.Template.BasePath "/configmap-access-control-worker.yaml") . | sha256sum }}
{{- end }}
{{- if .Values.worker.annotations }}
{{- tpl (toYaml .Values.worker.annotations) . | nindent 8 }}
{{- end }}
Expand Down Expand Up @@ -51,6 +54,11 @@ spec:
- name: schemas-volume
configMap:
name: {{ template "trino.fullname" . }}-schemas-volume-worker
{{- if .Values.worker.gracefulShutdown.enabled }}
- name: access-control-volume
configMap:
name: {{ template "trino.fullname" . }}-access-control-volume-worker
{{- end }}
{{- range .Values.configMounts }}
- name: {{ .name }}
configMap:
Expand Down Expand Up @@ -98,6 +106,10 @@ spec:
name: catalog-volume
- mountPath: {{ .Values.kafka.mountPath }}
name: schemas-volume
{{- if .Values.worker.gracefulShutdown.enabled }}
- mountPath: {{ .Values.server.config.path }}/access-control
name: access-control-volume
{{- end }}
{{- range .Values.configMounts }}
- name: {{ .name }}
mountPath: {{ .path }}
Expand Down Expand Up @@ -144,7 +156,21 @@ spec:
failureThreshold: {{ .Values.worker.readinessProbe.failureThreshold | default 6 }}
successThreshold: {{ .Values.worker.readinessProbe.successThreshold | default 1 }}
lifecycle:
{{- if .Values.worker.lifecycle }}
{{- toYaml .Values.worker.lifecycle | nindent 12 }}
{{- else if .Values.worker.gracefulShutdown.enabled }}
preStop:
exec:
command:
- "/bin/sh"
- "-c"
- >-
curl -v -X PUT
-d '"SHUTTING_DOWN"'
-H 'Content-type: application/json'
-H 'X-Trino-User: {{ .Values.worker.gracefulShutdown.accessControl.user }}'
http://localhost:{{- .Values.service.port -}}/v1/info/state
{{- end }}
resources:
{{- toYaml .Values.worker.resources | nindent 12 }}
{{- if .Values.sidecarContainers.worker }}
Expand Down
51 changes: 51 additions & 0 deletions charts/trino/templates/tests/test-graceful-shutdown.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
{{- if .Values.worker.gracefulShutdown.enabled }}
apiVersion: v1
kind: Service
metadata:
name: {{ include "trino.fullname" . }}-workers
labels:
{{- include "trino.labels" . | nindent 4 }}
app.kubernetes.io/component: test
test: graceful-shutdown
annotations:
"helm.sh/hook": test
"helm.sh/hook-weight": "1"
"helm.sh/hook-delete-policy": hook-succeeded
spec:
type: ClusterIP
ports:
- port: {{ .Values.service.port }}
targetPort: http
protocol: TCP
name: http
selector:
{{- include "trino.selectorLabels" . | nindent 4 }}
app.kubernetes.io/component: worker
---
apiVersion: v1
kind: Pod
metadata:
name: {{ include "trino.fullname" . }}-test-graceful-shutdown
labels:
{{- include "trino.labels" . | nindent 4 }}
app.kubernetes.io/component: test
test: graceful-shutdown
annotations:
"helm.sh/hook": test
"helm.sh/hook-weight": "2"
"helm.sh/hook-delete-policy": hook-succeeded
spec:
containers:
- name: graceful-shutdown
image: curlimages/curl:latest
command: ["sh", "-c"]
args:
- >-
curl -v -X PUT
-d '"SHUTTING_DOWN"'
-H 'Content-type: application/json'
-H 'X-Trino-User: {{ .Values.worker.gracefulShutdown.accessControl.user }}'
--fail-with-body
http://{{ include "trino.fullname" . }}-workers:{{- .Values.service.port -}}/v1/info/state
restartPolicy: Never
{{- end }}
42 changes: 33 additions & 9 deletions charts/trino/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -574,9 +574,11 @@ coordinator:
# files from Kubernetes secrets on the coordinator node.
# @raw
# Example:
# ```yaml
# - name: sample-secret
# secretName: sample-secret
# path: /secrets/sample.json
# ```

worker:
jvm:
Expand Down Expand Up @@ -649,21 +651,43 @@ worker:
# ```

lifecycle: {}
# worker.lifecycle -- To enable [graceful
# shutdown](https://trino.io/docs/current/admin/graceful-shutdown.html),
# define a lifecycle preStop like bellow, Set the
# `terminationGracePeriodSeconds` to a value greater than or equal to the
# configured `shutdown.grace-period`. Configure `shutdown.grace-period` in
# `additionalConfigProperties` as `shutdown.grace-period=2m` (default is 2
# minutes). Also configure `accessControl` because the `default` system
# access control does not allow graceful shutdowns.
# worker.lifecycle -- Worker container [lifecycle
# events](https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/)
# @raw
# Example:
# ```yaml
# preStop:
# exec:
# command: ["/bin/sh", "-c", "curl -v -X PUT -d '\"SHUTTING_DOWN\"' -H \"Content-type: application/json\" http://localhost:8081/v1/info/state"]
# command: ["/bin/sh", "-c", "sleep 120"]
# ```
# If provided, it will override the `preStop` lifecycle event configured by `gracefulShutdown`.

gracefulShutdown:
enabled: false
gracePeriod: 2m
accessControl:
user: admin
configFile: graceful-shutdown-rules.json
# worker.gracefulShutdown -- Configure [graceful
# shutdown](https://trino.io/docs/current/admin/graceful-shutdown.html)
# @raw
# Example:
# ```yaml
# gracefulShutdown:
# enabled: true
# gracePeriod: 2m
# accessControl:
# user: admin
# configFile: graceful-shutdown-rules.json
# ```
# Enabling this feature will:
# 1) Add a `preStop` lifecycle event to all worker Pods;
# 2) Set the `shutdown.grace-period` configuration property to `gracePeriod`;
# 3) Configure the workers' `accessControl` since the `default` system access control [does not allow graceful
# shutdowns](https://trino.io/docs/current/admin/graceful-shutdown.html).
# The user must set the `terminationGracePeriodSeconds` to a value of at least two times the configured `gracePeriod`.
# The worker that receives the graceful shutdown request [will sleep for `gracePeriod` twice](https://trino.io/docs/current/admin/graceful-shutdown.html#shutdown-behavior).
# Setting `worker.lifecycle` will override the `preStop` event set by this configuration.

terminationGracePeriodSeconds: 30

Expand Down
9 changes: 9 additions & 0 deletions test-graceful-shutdown-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
worker:
gracefulShutdown:
enabled: true
gracePeriod: 1m
accessControl:
user: admin
configFile: graceful-shutdown-rules.json

terminationGracePeriodSeconds: 130
3 changes: 2 additions & 1 deletion test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ declare -A testCases=(
[overrides]="--set coordinatorNameOverride=coordinator-overridden,workerNameOverride=worker-overridden,nameOverride=overridden"
[access_control_properties_values]="--values test-access-control-properties-values.yaml"
[exchange_manager_values]="--values test-exchange-manager-values.yaml"
[graceful_shutdown]="--values test-graceful-shutdown-values.yaml"
)

function join_by {
Expand All @@ -23,7 +24,7 @@ NAMESPACE=trino-$(LC_ALL=C tr -dc 'a-z0-9' </dev/urandom | head -c 6 || true)
HELM_EXTRA_SET_ARGS=
CT_ARGS=(--charts=charts/trino --skip-clean-up --helm-extra-args="--timeout 2m")
CLEANUP_NAMESPACE=true
TEST_NAMES=(default single_node complete_values access_control_properties_values exchange_manager_values)
TEST_NAMES=(default single_node complete_values access_control_properties_values exchange_manager_values graceful_shutdown)

usage() {
cat <<EOF 1>&2
Expand Down

0 comments on commit ebb5efa

Please sign in to comment.