Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial deployment fails when using nodeSelector in values.yaml #1108

Closed
hashildy opened this issue Mar 21, 2022 · 2 comments · Fixed by #1164
Closed

Initial deployment fails when using nodeSelector in values.yaml #1108

hashildy opened this issue Mar 21, 2022 · 2 comments · Fixed by #1164
Labels
type/bug Something isn't working

Comments

@hashildy
Copy link

hashildy commented Mar 21, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

When using a dedicated nodepool called consulpool1 and making use of the nodeSelector for clients and servers, the initial deployment fails as the Consul Federation Secret pod does not successfully complete. Upon further investigation, it appears that this pod is scheduled on the default system nodepool and not the dedicated nodepool: consulpool1.

Reproduction Steps

Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition

Logs

output from kubectl logs consul-create-federation-secret-mffk4:
2022-03-21T21:12:29.179Z [ERROR] Error retrieving current datacenter, retrying: err="Get "https://192.168.32.4:8501/v1/agent/self": dial tcp 192.168.32.4:8501: connect: connection refused"

Expected behavior

The expected behavior is that deploying clients and servers to a dedicated nodepool should not result in a failed deployment. It looks like this job for the create federation secret is not checking the value of the nodeSelector for the client: https://github.com/hashicorp/consul-k8s/blob/main/charts/consul/templates/create-federation-secret-job.yaml. Is it possible to implement code similar to the following so that this value is checking before scheduling this pod? https://github.com/hashicorp/consul-k8s/blob/main/charts/consul/templates/client-snapshot-agent-deployment.yaml#L211-L213.

Environment details

  • consul-k8s version: chart version: 0.41.1 App version: 1.11.3
  • values.yaml used to deploy the helm chart: see above

Additionally, please provide details regarding the Kubernetes Infrastructure, as shown below:

  • Kubernetes version: v1.21.9
  • Cloud Provider: Azure/AKS
  • Networking CNI plugin in use: N/A (Kubenet)
@hashildy hashildy added the type/bug Something isn't working label Mar 21, 2022
@t-eckert
Copy link
Contributor

@hashildy, thank you for bringing this to our attention and finding a potential cause. With a cursory look, I don't see any reason why we wouldn't be able to modify the job you mentioned so that it uses the nodePool value. I will confirm this with my team and if we are able to make this change, we will.

@hashildy
Copy link
Author

Perfect, thanks so much @t-eckert! I appreciate the quick response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants