-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-34532: [Java][FlightSQL] Change JDBC to handle multi-endpoints #38521
Conversation
cfec8d0
to
16cd5b2
Compare
This work is partially based off #13900. |
16cd5b2
to
3b8f7bd
Compare
// Clone the builder and then set the new endpoint on it. | ||
final URI endpointUri = endpoint.getLocations().get(0).getUri(); | ||
final Builder builderForEndpoint = new Builder(ArrowFlightSqlClientHandler.this.builder) | ||
.withHost(endpointUri.getHost()) | ||
.withPort(endpointUri.getPort()) | ||
.withEncryption(endpointUri.getScheme().equals(LocationSchemes.GRPC_TLS)); | ||
|
||
final ArrowFlightSqlClientHandler endpointHandler = builderForEndpoint.build(); | ||
try { | ||
endpoints.add(new CloseableEndpointStreamPair( | ||
endpointHandler.sqlClient.getStream(endpoint.getTicket(), | ||
endpointHandler.getOptions()), endpointHandler.sqlClient)); | ||
} catch (Exception ex) { | ||
AutoCloseables.close(endpointHandler); | ||
throw ex; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some future improvements for this would be to test each of the locations and use the first one that works; ignore locations with unknown URI schemes for forwards compatibility; fall back to the current client if none of the locations work. Could we file issue(s) for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, having a cache of clients instead of opening new ones every time (e.g. via https://github.com/ben-manes/caffeine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #38573 and #38574 for these topics.
- Create new clients to connect to new locations in endpoints. - If no location is reported using the current connection. - Rework ArrowFlightSqlHandler.Builder to avoid changing its state during build() so that it can be re-used. - Add a copy constructor to ArrowFlightSqlHandler.Builder to make it easy to replicate connection settings when connecting to different locations. - Add unit test infrastructure for building a distributed FlightSql server
3b8f7bd
to
9a0bd79
Compare
private Builder(Builder original) { | ||
this.middlewareFactories.addAll(original.middlewareFactories); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to verify, this would ensure the CookiesMiddleware gets copied over for the Endpoint clients, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't think about that, but yes, it should mean that the middleware gets shared between the clients. That might be good or bad depending on the server implementation but we can follow up if it turns out to be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new CookieMiddlewareFactory is created for subsequent connections (same with the AuthMiddlewareFactory). This was intentional to give child connections a 'blank-slate' but should actually be configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #38576 . This one can cause breakage for existing users that are relying on the same cookie/auth token from being sent so I'll pick it up next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thanks! we depend on cookies to maintain state and expect the callOptions we set to go out with every request so that will be important for us!
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 2fb7fd9. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
…ts (apache#38521) ### Rationale for this change The Flight SQL JDBC Driver currently doesn't fetch at multiple endpoints correctly when the data is not at the same location as the original connection. ### What changes are included in this PR? - Create new clients to connect to new locations in endpoints. - If no location is reported using the current connection. - Make ArrowFlightSqlClientHandler's builder's build() function to be idempodent. - Add functionality to clone ArrowFlightSqClientHandler's builder so that it can be used for temporary connections to locations returned by getFlightInfo(). - Add utility classes in unit tests for constructing a distributed Flight SQL Server ### Are these changes tested? Yes. ### Are there any user-facing changes? The behavior for when there are reported endpoints from getFlightInfo is now fixed. However if users relied on the previous behavior of just getting streams from the same node, and their server only ever reported the original node, they may observe more Flight client connections opening and closing than before (since new connections get spawned for each partition that has at least one Location now). * Closes: apache#34532 Authored-by: James Duong <james.duong@improving.com> Signed-off-by: David Li <li.davidm96@gmail.com>
…ts (apache#38521) ### Rationale for this change The Flight SQL JDBC Driver currently doesn't fetch at multiple endpoints correctly when the data is not at the same location as the original connection. ### What changes are included in this PR? - Create new clients to connect to new locations in endpoints. - If no location is reported using the current connection. - Make ArrowFlightSqlClientHandler's builder's build() function to be idempodent. - Add functionality to clone ArrowFlightSqClientHandler's builder so that it can be used for temporary connections to locations returned by getFlightInfo(). - Add utility classes in unit tests for constructing a distributed Flight SQL Server ### Are these changes tested? Yes. ### Are there any user-facing changes? The behavior for when there are reported endpoints from getFlightInfo is now fixed. However if users relied on the previous behavior of just getting streams from the same node, and their server only ever reported the original node, they may observe more Flight client connections opening and closing than before (since new connections get spawned for each partition that has at least one Location now). * Closes: apache#34532 Authored-by: James Duong <james.duong@improving.com> Signed-off-by: David Li <li.davidm96@gmail.com>
Rationale for this change
The Flight SQL JDBC Driver currently doesn't fetch at multiple endpoints correctly when the data is not at the same location as the original connection.
What changes are included in this PR?
Are these changes tested?
Yes.
Are there any user-facing changes?
The behavior for when there are reported endpoints from getFlightInfo is now fixed. However if users relied on the previous behavior of just getting streams from the same node, and their server only ever reported the original node, they may observe more Flight client connections opening and closing than before (since new connections get spawned for each partition that has at least one Location now).