-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coordinator is not created by druid operator #105
Comments
If the cluster was updated, you can check events from the operator. Can you do ```kubectl get druid -n namespace -o yaml ```` and check the status. It should show the coordinator deployent. |
IMHO we should not have any breaking change. Pls confirm @itamar-marom @cyril-corbon . |
Might be defaults changed? |
@itamar-marom which defaults ? |
RollingUpdate as true |
@chn217 how do you deploy a cluster? Is it possible that using Terraform? |
The status of the command output The druid operator pod logs: Insecure listen address will be removed. The ability to run kube-rbac-proxy without TLS certificates will be removed. For more information, please go to brancz/kube-rbac-proxy#187 =============================================== I0908 02:26:40.299907 1 main.go:218] Valid token audiences: Can we show the logs on the creation of resources? |
The cluster is deployed to AWS EKS, and the infrastructure code has been automated via AWS CDK. IMHO, I don't think Terraform or CDK could be the reason. Behind the scenes, the kubectl is used to deploy the cluster manifest. As I mentioned, our code has been working for several months. There is no other code changes other than the druid operator upgrade. |
The cluster is deployed to AWS EKS, and the infrastructure code has been automated via AWS CDK. IMHO, I don't think Terraform or CDK could be the reason. Behind the scenes, the kubectl is used to deploy the cluster manifest. As I mentioned, our code has been working for several months. There is no other code changes other than the druid operator upgrade.
I'm using StatefulSet type for all druid components (router/broker/coordinator/overlord/historical/middleManager). |
@chn217 you are checking logs for the sidecar proxy which runs, pls check the logs of the druid operator container.
Operator emits event logs. I am sure there is some log printed out. Make sure you check the right container |
Hi @AdheipSingh, sorry I didn't realise that there is a sidecar container for the operator now.
|
@chn217 statefulset not found, you 'll need to audit who deleted the statefulset. Operator performs non-cascading deletion of statefulsets when expanding druid cluster vertically on storage. Even in that case there are logs emitted for each action. |
@AdheipSingh The issue happened for one of new deployments (new k8s cluster + new druid cluster). As we're unable to recover it, so we went ahead with recreating the node group and redeployment of druid cluster manifest. The coordinator statefulset was seen after that. We haven't increased the storage configuration. |
@AdheipSingh I've come across this issue in a new cluster again:
Any idea? |
@chn217 how come the broker got deleted ? no log on the operator side. Do you have an audit log ? BTW is this a managed k8s offering ? |
@AdheipSingh I don't really think broker got deleted.. The druid operator log was captured in my previous comment. Here was the error message: This is Amazon EKS. Is there any specific command that I can show the audit log? Thanks |
ah ok , so just to confirm, you did not see any broker deletion by the operator ? I agree, race conditions can exist, operator acts like a state machine ( observed state ) , and due to abstractions it deals with the overall system, it is eventually consistent. |
Thanks. I didn't see any broker sts got deleted. The problem here was that the coordinator was never created (not reconciled unfortunatelly). All the pods stay on CrashLoopBackOff status as Druid Pods need to talk to Coordinator for /status/health apis to work (readiness probes). |
@chn217 did the coordinator issue reoccur again ? |
@AdheipSingh The symptom looks exactly the same. The coordinator didn't show up from the pod/sts list. kubectl get pod |
Also the router sts is not showing up. Not sure change from sts to deployment for query/master would help? |
did this occur when you did an upgrade ? |
@AdheipSingh No, it happened for a new cluster. Here are the steps that I took to create a new cluster:
This issue seems to start to occur from v1.2.0. Previously we are on v1.0.0 where we haven't seen this issue. BTW, this issue is intermittent (another symptom for race condtion). |
@chn217 operator will log if it deletes any node. |
Feel free to re-open and provide sufficient logs stating operator deleted the node. You can find in operator logs and operator events describing the current CR. |
We recently performed an upgrade of the Druid operator from version 1.0.0 to version 1.2.0, and during the process, we encountered an issue when attempting to create a new Druid cluster. It's worth noting that there were no changes made to the cluster manifest.
The specific problem we encountered was the absence of a coordinator created by the Druid operator. Upon inspecting the resource list, we noticed that there was no coordinator statefulset present. Strangely, there were no error messages recorded in the Druid operator log. This issue appears to be intermittent, as we have successfully used the Druid operator to create multiple clusters without encountering this problem, and it was only observed in one particular cluster.
Additionally, we observed that the Druid operator log does not seem to contain particularly useful information, and there is a lack of valuable info in the pod logs.
The text was updated successfully, but these errors were encountered: