-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: drop namespace from zeebe advertisedHost and initialContactPoints #2170
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw your comment that this might hurt deployments spread over multiple namespaces. Could this affect multi-region?
Other than that this change seems fine to me, it'll result in fewer failing DNS resolutions for the most common case.
I am concerned about multi-region deployments breaking as result of this patch. However, I don't think that concern is valid because in multi-region deployments, those variables are explicitly set. I think something could break from this. perhaps in a patch version. but in that scenario, we'd just revert the patch and release again. it's not a huge deal. But personally I don't think DNS failures are a huge deal either. If it's just me deciding, I'd say let's close the PR and not merge. I'm looking for someone to tell me they explicitly want it. |
Agreed on the multi-region side, we would be overwriting this explicitly anyway, so I wouldn't worry too much there. About existing deployments, I'm not sure this would break anything. Do you have a specific case in mind? As mentioned in the issue, the main benefit is removing a source of noise which may hide real DNS errors in our metrics, and also avoid doing pointless work. The fix worked by doing that. However, the impact is low, so if it were really to break anything, then it may not be worth it depending on what it breaks. As I said, it's not clear to me what would break. Say we publish a patch with the above - it gets rolled out progressively. The DNS name should be resolved regardless, assuming a standard K8S installation right? The nodes are matched by their member IDs - so as long as we don't have two nodes with different advertised hosts but same IDs running at the same time, it should be fine. This is something we can quickly test however. |
Nope. it's just a risk.
Good enough for me. Lets merge.
I'm not concerned about this scenario. I feel confident in the node + member id matching. |
@jessesimpson36 while investigating this issue https://camunda.slack.com/archives/C0486FPV076/p1723476618760859 I've discovered this PR and this may be the culprit. While you're right that in dual region we overwrite/specify |
verified my hypothesis in camunda/c8-multi-region@14659bc, that actually fixed the dual-region test issue for 10.3.0 /* haven't tried with the snapshot yet */ (workflow run) @jessesimpson36 @npepinpe @lenaschoenburg what do you think about this PR retrospectively, is this something you want to keep? otherwise the customers for multi-region setup will have slight complication of the setup by having to adjust an additional env var to make C8 work via the Helm chart. |
I initially thought the problem @maxdanilov posed would only affect test environments where multi-region was enabled (one cluster with multiple namespaces), but you're right that it would affect more than that because they all rely on namespaces in the dns request for the dns forwarding to work between cluster1 and cluster2. I think we need to revert this and release a 10.3.1 version of the helm chart. The zeebe problem that prompted us to make this patch doesn't seem worth the trouble of breaking multi-region customers (especially since we have to take into account customers using multiregion without the c8-multi-region repo). @npepinpe @lenaschoenburg , let me know if you have objections. |
As this breaks existing deployments, lets revert it. We can look into software based solutions for our issue. That said, FYI, I believe it doesn't make sense for deployments to specify host (which is the binding host), but instead just specify the advertised host (i.e. where other nodes should reply to). |
Which problem does the PR fix?
closes #1754
What's in this PR?
This change drops the namespace and svc from the initialcontactpoints and advertised host.
The idea behind this change is that since most kubernetes deployments will do this in their dns resolv.conf
That the first and second dns search will result in a failure since the namespace and svc will be duplicated.
Checklist
Please make sure to follow our Contributing Guide.
Before opening the PR:
make go.update-golden-only
.After opening the PR: