-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods Stuck in CrashLoopBackoff on Fresh Deployment to Fresh Kubernetes Cluster #470
Comments
Seems to be the same issue as this: https://community.temporal.io/t/errors-while-setting-up-temporal-on-local-environment-using-helm-chart/3088/4 |
Just ran into this, you can run this to get up and running. git reset --hard 1e5ac0c |
Thanks for the tip, this got the chart working for me @myst3k |
I have run into the same exact problem. What should the fix be?
|
It looks to me like there's a problem with the configmaps that get generated... Command we use to install temporalhelm install \
--set server.replicaCount=1 \
--set cassandra.config.cluster_size=1 \
--set prometheus.enabled=false \
--set grafana.enabled=false \
--set elasticsearch.enabled=false \
my-test . --timeout 15m --namespace temporal The error
When I look at the configmaps...---
# Source: temporal/templates/server-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: "my-test-temporal-worker-config"
labels:
app.kubernetes.io/name: temporal
helm.sh/chart: temporal-0.38.1
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/instance: my-test
app.kubernetes.io/version: 1.23.1
app.kubernetes.io/part-of: temporal
data:
config_template.yaml: |-
log:
stdout: true
level: "debug,info"
persistence:
defaultStore: default
visibilityStore: visibility
numHistoryShards: 512
datastores:
default:
cassandra:
hosts: "my-test-cassandra.temporal.svc.cluster.local,"
port: 9042
password: "{{ .Env.TEMPORAL_STORE_PASSWORD }}"
consistency:
default:
consistency: local_quorum
serialConsistency: local_serial
keyspace: temporal
replicationFactor: 1
user: user
visibility:
global:
membership:
name: temporal
maxJoinDuration: 30s
broadcastAddress: {{ default .Env.POD_IP "0.0.0.0" }}
pprof:
port: 7936
metrics:
tags:
type: worker
prometheus:
timerType: histogram
listenAddress: "0.0.0.0:9090"
services:
frontend:
rpc:
grpcPort: 7233
membershipPort: 6933
bindOnIP: "0.0.0.0"
history:
rpc:
grpcPort: 7234
membershipPort: 6934
bindOnIP: "0.0.0.0"
matching:
rpc:
grpcPort: 7235
membershipPort: 6935
bindOnIP: "0.0.0.0"
worker:
rpc:
grpcPort: 7239
membershipPort: 6939
bindOnIP: "0.0.0.0"
clusterMetadata:
enableGlobalDomain: false
failoverVersionIncrement: 10
masterClusterName: "active"
currentClusterName: "active"
clusterInformation:
active:
enabled: true
initialFailoverVersion: 1
rpcName: "temporal-frontend"
rpcAddress: "127.0.0.1:7933"
dcRedirectionPolicy:
policy: "noop"
toDC: ""
archival:
status: "disabled"
publicClient:
hostPort: "my-test-temporal-frontend:7233"
dynamicConfigClient:
filepath: "/etc/temporal/dynamic_config/dynamic_config.yaml"
pollInterval: "10s" And it looks - specifically - like the problem is here: datastores:
default:
cassandra:
hosts: "my-test-cassandra.temporal.svc.cluster.local,"
port: 9042
password: "{{ .Env.TEMPORAL_STORE_PASSWORD }}"
consistency:
default:
consistency: local_quorum
serialConsistency: local_serial
keyspace: temporal
replicationFactor: 1
user: user
visibility:
# ^^^ PROBLEM There's a key for I will have to look into the helm chart more deeply to understand why this is the case...but it appears to be the real source of the problem. |
For anyone who cares, this seems to work around the issue...just create it with elasticsearch enabled, with a single elasticsearch replica...and voila, it works
|
Cassandra is not a valid visibility backend. You will have to enable ES visibility if you are using Cassandra for persistence currently. The chart does not yet support using SQL for visibility if you use Cassandra for persistence. I will file an issue to see if we can get the error message updated so that it does not unhelpfully suggest that you can use cassandra for visibility. |
Unfortunately the project readme does not provide any such information. |
Added a PR to clarify. |
* update ui image to 2.25.0 (temporalio#478) Signed-off-by: Tihomir Surdilovic <tihomir@temporal.io> * Allow forcing a specific chart version. (temporalio#479) This is useful for patch releases on older release lines. * Update Chart to 0.37.0, Temporal v1.23.1 * Ensure appVersion is used by default as the server image tag. (temporalio#488) * Bumps server version to the specified appVersion * Use chart.appversion instead of image.tag from values in deployment spec * Allow overriding deployment spec with image.tag --------- Co-authored-by: Kshitij <kshitij.tulsyan@observe.ai> * Update Chart to 0.38.1, Temporal v1.23.1 * [Bug] Allow and document configuring Web UI via values.yaml (temporalio#394) 1. remove web ui config in values.yaml 2. remove web config volume in web-deployment.yaml 3. remove web-config config map 4. remove line 280 since the install bash didn't configure web ui auth 5. update the document for web ui configuration with env variable Co-authored-by: Rob Holland <rob.holland@gmail.com> * fix(imageTag): Fix default type in values for imageTag (temporalio#489) * Update Chart to 0.39.0, Temporal v1.24.0 * Update README.md * Update Chart to 0.39.1, Temporal v1.24.1 * Switch to devrel. * Support new admintools image tag format (temporalio#493) * Require tags for server, admintools and ui. Don't use server tag for admintools, it's versioned separately now. * Update Chart to 0.40.0. * fix: add support for pre upgrade. (temporalio#476) * adding missing ImagePullSecrets section to web deployment * Whitespace. * fix: Use visibility server when defining visibility config (temporalio#436) Co-authored-by: Rob Holland <rob@temporal.io> * fix: sidecarContainers should be an array, not a dict * Use tplvalues.render for templating inside values (temporalio#492) Allow templated values inside the values.yaml for web annotations * Apply security context regardless of persistence engine. (temporalio#494) Replaces temporalio#308. * Update Chart to 0.40.1. * Allow to skip database creation (temporalio#480) Signed-off-by: Valentin Zayash <valioozz@gmail.com> * Update Cassandra host URLs to remove the ".cluster.local" suffix (temporalio#485) * Update Cassandra host URLs to remove the ".cluster.local" suffix * Configure SQL TLS environment variables in server-job (temporalio#411) * Configure SQL_TLS environment variables in server-job * Update Chart to 0.41.0. * Fix weird helm lint false alarm. Fixes temporalio#284. * Helm 2 compat. Fixes temporalio#187. * Update codeowners. * Revert "Update Cassandra host URLs to remove the ".cluster.local" suffix (temporalio#485)" (temporalio#500) This reverts commit b25c4fc. * Update Chart to 0.41.1. * Adds a PodDisruptionBudget and topologySpreadConstraints to web (temporalio#409) * Adds a PodDisruptionBudget and topologySpreadConstraints to web * fixing type on topologySpreadConstraints topologySpreadConstraints should be a list instead of map * Update Chart to 0.42.0. * Update README to mention you cannot do Cassandra only anymore. (temporalio#499) Related: temporalio#470. * Remove invalid line. (temporalio#502) Fixes temporalio#426 * Service account should be set when present, even if not creating. (temporalio#498) Fixes temporalio#403. * Correct outdated port config. (temporalio#497) Fixes temporalio#333 and temporalio#149. Context: temporalio/temporal#650 * Update Chart to 0.43.0. * ElasticSearch -> Elasticsearch * fix: Escape ES credentials (temporalio#505) * feat: updated grafana and prometheus helm dependencies (temporalio#424) * feat: bump grafana and prometheus charts versions Signed-off-by: David Calvert <david@0xdc.me> * Remove hard coded cluster.local references. (temporalio#501) Switch check-cassandra-service init container to use nc. nslookup in recent busybox images is broken and doesn't obey resolv.conf, meaning it won't check k8s search domains. * Enable HTTP API for Nexus (temporalio#511) * Enable HTTP API for Nexus * Update charts/temporal/templates/server-service.yaml * Add FE Ingress (temporalio#435) * Add ingress for frontend * Add CONTRIBUTING. * Note contributing in README. * Note slack channel. * Use a shared config map for all services. (temporalio#514) Remove helpers in favour of defaults via values file. Remove unused Elasticsearch environment variables. Stop switching between es-visibility and visibility for store names. * Pass `ct lint` * Lint chart on PR. * Setup for `ci install`. (temporalio#522) * Setup for `ci install`. Adds a test so that `helm test` now checks that the cluster is healthy after the system is deployed. Refactors server-job to remove the use of helm hooks which cause lot of users pain and don't work with --wait, or the mechanism that helm test uses. Improve the handling of elasticsearch by treating it just like the other drivers. Correctly handle the schema.createDatabase setting which previously had edge cases. * Secret key fixes. * Fix missed ES quoting issue. * Add postgres-es test. * Fix branch reference. * Update Chart to 0.44.0. * Update README to use repo. (temporalio#531) * Update README to use repo. Fixes temporalio#458 --------- Co-authored-by: Alex Garnett <axfelix@gmail.com> * Check for pgx driver as well as plain postgres. (temporalio#546) Fixes temporalio#532 * Templatize resourceLabels for standardization (temporalio#539) * Templatize resourceLabels for standardization --------- Signed-off-by: Tihomir Surdilovic <tihomir@temporal.io> Signed-off-by: Valentin Zayash <valioozz@gmail.com> Signed-off-by: David Calvert <david@0xdc.me> Co-authored-by: Tihomir Surdilovic <tihomir@temporal.io> Co-authored-by: Rob Holland <rob.holland@gmail.com> Co-authored-by: Temporal Data <commander-data@temporal.io> Co-authored-by: Kshitij Tulsyan <ktulsyan1990@gmail.com> Co-authored-by: Kshitij <kshitij.tulsyan@observe.ai> Co-authored-by: Jingyu <56581242+washanhanzi@users.noreply.github.com> Co-authored-by: Theo REY <account@reyth.dev> Co-authored-by: Alex Shtin <alex@shtin.com> Co-authored-by: Rob Holland <rob@temporal.io> Co-authored-by: Punit Kulal <punitkulal1996@gmail.com> Co-authored-by: Gerardo Enrique Mora Salazar <gerardo@ibm.com> Co-authored-by: Giovanny Gutiérrez <giovanny.gutierrez@commure.com> Co-authored-by: vogre <334187+vogre@users.noreply.github.com> Co-authored-by: Valentin Zayash <VLZZZ@users.noreply.github.com> Co-authored-by: Chris Taylor <taylor.cj@gmail.com> Co-authored-by: Grzegorz Kołakowski <grzegorz8@gmail.com> Co-authored-by: Prathyush PV <prathyushpv@gmail.com> Co-authored-by: sringel <903498+sringel@users.noreply.github.com> Co-authored-by: Alex Shtin <alex@temporal.io> Co-authored-by: David Calvert <david@0xdc.me> Co-authored-by: Roey Berman <roey@temporal.io> Co-authored-by: Rahul Kumar <rahulcomp24@gmail.com> Co-authored-by: Alex Garnett <axfelix@gmail.com> Co-authored-by: Sahil Vazirani <sahilvv@gmail.com>
* update ui image to 2.25.0 (temporalio#478) Signed-off-by: Tihomir Surdilovic <tihomir@temporal.io> * Allow forcing a specific chart version. (temporalio#479) This is useful for patch releases on older release lines. * Update Chart to 0.37.0, Temporal v1.23.1 * Ensure appVersion is used by default as the server image tag. (temporalio#488) * Bumps server version to the specified appVersion * Use chart.appversion instead of image.tag from values in deployment spec * Allow overriding deployment spec with image.tag --------- Co-authored-by: Kshitij <kshitij.tulsyan@observe.ai> * Update Chart to 0.38.1, Temporal v1.23.1 * [Bug] Allow and document configuring Web UI via values.yaml (temporalio#394) 1. remove web ui config in values.yaml 2. remove web config volume in web-deployment.yaml 3. remove web-config config map 4. remove line 280 since the install bash didn't configure web ui auth 5. update the document for web ui configuration with env variable Co-authored-by: Rob Holland <rob.holland@gmail.com> * fix(imageTag): Fix default type in values for imageTag (temporalio#489) * Update Chart to 0.39.0, Temporal v1.24.0 * Update README.md * Update Chart to 0.39.1, Temporal v1.24.1 * Switch to devrel. * Support new admintools image tag format (temporalio#493) * Require tags for server, admintools and ui. Don't use server tag for admintools, it's versioned separately now. * Update Chart to 0.40.0. * fix: add support for pre upgrade. (temporalio#476) * adding missing ImagePullSecrets section to web deployment * Whitespace. * fix: Use visibility server when defining visibility config (temporalio#436) Co-authored-by: Rob Holland <rob@temporal.io> * fix: sidecarContainers should be an array, not a dict * Use tplvalues.render for templating inside values (temporalio#492) Allow templated values inside the values.yaml for web annotations * Apply security context regardless of persistence engine. (temporalio#494) Replaces temporalio#308. * Update Chart to 0.40.1. * Allow to skip database creation (temporalio#480) Signed-off-by: Valentin Zayash <valioozz@gmail.com> * Update Cassandra host URLs to remove the ".cluster.local" suffix (temporalio#485) * Update Cassandra host URLs to remove the ".cluster.local" suffix * Configure SQL TLS environment variables in server-job (temporalio#411) * Configure SQL_TLS environment variables in server-job * Update Chart to 0.41.0. * Fix weird helm lint false alarm. Fixes temporalio#284. * Helm 2 compat. Fixes temporalio#187. * Update codeowners. * Revert "Update Cassandra host URLs to remove the ".cluster.local" suffix (temporalio#485)" (temporalio#500) This reverts commit b25c4fc. * Update Chart to 0.41.1. * Adds a PodDisruptionBudget and topologySpreadConstraints to web (temporalio#409) * Adds a PodDisruptionBudget and topologySpreadConstraints to web * fixing type on topologySpreadConstraints topologySpreadConstraints should be a list instead of map * Update Chart to 0.42.0. * Update README to mention you cannot do Cassandra only anymore. (temporalio#499) Related: temporalio#470. * Remove invalid line. (temporalio#502) Fixes temporalio#426 * Service account should be set when present, even if not creating. (temporalio#498) Fixes temporalio#403. * Correct outdated port config. (temporalio#497) Fixes temporalio#333 and temporalio#149. Context: temporalio/temporal#650 * Update Chart to 0.43.0. * ElasticSearch -> Elasticsearch * fix: Escape ES credentials (temporalio#505) * feat: updated grafana and prometheus helm dependencies (temporalio#424) * feat: bump grafana and prometheus charts versions Signed-off-by: David Calvert <david@0xdc.me> * Remove hard coded cluster.local references. (temporalio#501) Switch check-cassandra-service init container to use nc. nslookup in recent busybox images is broken and doesn't obey resolv.conf, meaning it won't check k8s search domains. * Enable HTTP API for Nexus (temporalio#511) * Enable HTTP API for Nexus * Update charts/temporal/templates/server-service.yaml * Add FE Ingress (temporalio#435) * Add ingress for frontend * Add CONTRIBUTING. * Note contributing in README. * Note slack channel. * Use a shared config map for all services. (temporalio#514) Remove helpers in favour of defaults via values file. Remove unused Elasticsearch environment variables. Stop switching between es-visibility and visibility for store names. * Pass `ct lint` * Lint chart on PR. * Setup for `ci install`. (temporalio#522) * Setup for `ci install`. Adds a test so that `helm test` now checks that the cluster is healthy after the system is deployed. Refactors server-job to remove the use of helm hooks which cause lot of users pain and don't work with --wait, or the mechanism that helm test uses. Improve the handling of elasticsearch by treating it just like the other drivers. Correctly handle the schema.createDatabase setting which previously had edge cases. * Secret key fixes. * Fix missed ES quoting issue. * Add postgres-es test. * Fix branch reference. * Update Chart to 0.44.0. * Update README to use repo. (temporalio#531) * Update README to use repo. Fixes temporalio#458 --------- Co-authored-by: Alex Garnett <axfelix@gmail.com> * Check for pgx driver as well as plain postgres. (temporalio#546) Fixes temporalio#532 * Templatize resourceLabels for standardization (temporalio#539) * Templatize resourceLabels for standardization * Feat add authorization options (temporalio#542) Server authorization config. * Quote sql password for default store in the same way as the visibility store (temporalio#551) * Update Chart to 0.45.0. * Update Chart to 0.45.1. * Update UI version description (temporalio#556) * Use admintools-env and secret for ES password consistent with e.g. jobs (temporalio#530) * Provide option to create default namespace (temporalio#550) * Provide option to create default namespace * Create multiple namespaces with optional retention --------- Co-authored-by: Manan Mangal <mmangal@paloaltonetworks.com> * add Job annotations and labels (temporalio#536) * add job labels and annotations Signed-off-by: André Bauer <andre.bauer@staffbase.com> Co-authored-by: Rob Holland <rob.holland@gmail.com> --------- Signed-off-by: André Bauer <andre.bauer@staffbase.com> Co-authored-by: Rob Holland <rob.holland@gmail.com> * Config to specify tags to be excluded in prometheus metrics (temporalio#566) * Config to specify tags to be excluded in prometheus metrics --------- Co-authored-by: Rob Holland <rob@temporal.io> * Update Chart to 0.46.0. * Ensure we use global for includes. (temporalio#568) Some call sites used . which was sometimes not the global context. * Update Chart to 0.46.1. * Update _helpers.tpl to avoid nested custom resource label (temporalio#576) * Update Chart to 0.46.2. * fix identation --------- Signed-off-by: Tihomir Surdilovic <tihomir@temporal.io> Signed-off-by: Valentin Zayash <valioozz@gmail.com> Signed-off-by: David Calvert <david@0xdc.me> Signed-off-by: André Bauer <andre.bauer@staffbase.com> Co-authored-by: Tihomir Surdilovic <tihomir@temporal.io> Co-authored-by: Rob Holland <rob.holland@gmail.com> Co-authored-by: Temporal Data <commander-data@temporal.io> Co-authored-by: Kshitij Tulsyan <ktulsyan1990@gmail.com> Co-authored-by: Kshitij <kshitij.tulsyan@observe.ai> Co-authored-by: Jingyu <56581242+washanhanzi@users.noreply.github.com> Co-authored-by: Theo REY <account@reyth.dev> Co-authored-by: Alex Shtin <alex@shtin.com> Co-authored-by: Rob Holland <rob@temporal.io> Co-authored-by: Punit Kulal <punitkulal1996@gmail.com> Co-authored-by: Gerardo Enrique Mora Salazar <gerardo@ibm.com> Co-authored-by: Giovanny Gutiérrez <giovanny.gutierrez@commure.com> Co-authored-by: vogre <334187+vogre@users.noreply.github.com> Co-authored-by: Valentin Zayash <VLZZZ@users.noreply.github.com> Co-authored-by: Chris Taylor <taylor.cj@gmail.com> Co-authored-by: Grzegorz Kołakowski <grzegorz8@gmail.com> Co-authored-by: Prathyush PV <prathyushpv@gmail.com> Co-authored-by: sringel <903498+sringel@users.noreply.github.com> Co-authored-by: Alex Shtin <alex@temporal.io> Co-authored-by: David Calvert <david@0xdc.me> Co-authored-by: Roey Berman <roey@temporal.io> Co-authored-by: Rahul Kumar <rahulcomp24@gmail.com> Co-authored-by: Alex Garnett <axfelix@gmail.com> Co-authored-by: Sahil Vazirani <sahilvv@gmail.com> Co-authored-by: Quinn <116631861+qs-synth@users.noreply.github.com> Co-authored-by: Kristian Nordman <kristian@limber.no> Co-authored-by: Alex Tideman <alex.tideman@gmail.com> Co-authored-by: Csaba Tűz <124735422+csabatuz-chess@users.noreply.github.com> Co-authored-by: Manan Mangal <mananmangal@gmail.com> Co-authored-by: Manan Mangal <mmangal@paloaltonetworks.com> Co-authored-by: André Bauer <monotek@users.noreply.github.com>
What are you really trying to do?
Install temporal locally on Docker Desktop w/ Kubernetes enabled
Ran through the steps outlined in the read me:
helm dependencies update
then the minimal setup
This results in 4 pods stuck in crashloopbackoff state
Describe the bug
There seems to be an extra step that is missing from the read me. Looking at the logs for the temporaltest-frontend, I get this error. Seems like I need to configure some sort of data source? I see a cassandra instance running...
Environment/Versions
Additional context
The text was updated successfully, but these errors were encountered: