-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discovery: preserve results from other resolve calls #7886
Conversation
Properly preserve results from other resolve calls. There is an assumption that resolve() is always called with the same addresses but that is not true with gRPC and `--endpoint-group`. Without this fix, multiple resolves could happen at the same time but some of the callers will not be able to retrieve the results leading to random errors. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
@@ -68,7 +73,7 @@ func (r *resolver) ResolveNow(_ grpcresolver.ResolveNowOptions) {} | |||
func (r *resolver) resolve() error { | |||
ctx, cancel := context.WithTimeout(r.ctx, r.interval) | |||
defer cancel() | |||
return r.provider.Resolve(ctx, []string{r.target}) | |||
return r.provider.Resolve(ctx, []string{r.target}, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @GiedriusS, can you please help me understand this change. This is the only place where we set it to false. Why it is the case?
Can we just enable it to true everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we cannot, because in gRPC the same resolver is reused between --endpoint-group
s. In other words, Build() above is called from multiple places in gRPC but they all reuse the same resolver. They first resolve and then fetch the values from cache. If we flush here then some of the results are lost and the Query component will not connected to some of the endpoints.
Some addresses can be shared between --endpoint-group
s so that's why I opted to reuse the same resolver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GiedriusS
I don't know if it is intended but the behavior here is different when there is a resolve error.
Let's say worst case DNS resolve for all addresses failed. The previous resolved addresses are {"A": ["1"], "B": ["2"]}
. It tries to resolve addresses B
and C
now.
Before the change, we will have resolve results {"B": ["2"], "C": nil}
.
After this change, we will have resolved results {"A": ["1"], "B": ["2"]}
as it only keeps previous old records and flushOld
makes no difference when there is an error.
Is this intended? I expect flushOld
to only change resolved addresses when there is no error but this implementation changes the behavior with error the same time.
Properly preserve results from other resolve calls. There is an assumption that resolve() is always called with the same addresses but that is not true with gRPC and
--endpoint-group
. Without this fix, multiple resolves could happen at the same time but some of the callers will not be able to retrieve the results leading to random errors.