Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-34532: [Java][FlightSQL] Change JDBC to handle multi-endpoints #38521

Merged
merged 1 commit into from
Nov 3, 2023

Conversation

jduo
Copy link
Member

@jduo jduo commented Oct 30, 2023

Rationale for this change

The Flight SQL JDBC Driver currently doesn't fetch at multiple endpoints correctly when the data is not at the same location as the original connection.

What changes are included in this PR?

  • Create new clients to connect to new locations in endpoints.
  • If no location is reported using the current connection.
  • Make ArrowFlightSqlClientHandler's builder's build() function to be idempodent.
  • Add functionality to clone ArrowFlightSqClientHandler's builder so that it can be used for temporary connections to locations returned by getFlightInfo().
  • Add utility classes in unit tests for constructing a distributed Flight SQL Server

Are these changes tested?

Yes.

Are there any user-facing changes?

The behavior for when there are reported endpoints from getFlightInfo is now fixed. However if users relied on the previous behavior of just getting streams from the same node, and their server only ever reported the original node, they may observe more Flight client connections opening and closing than before (since new connections get spawned for each partition that has at least one Location now).

@jduo jduo force-pushed the 34532-jdbc-flight-endpoints branch from cfec8d0 to 16cd5b2 Compare October 31, 2023 18:37
@jduo
Copy link
Member Author

jduo commented Oct 31, 2023

This work is partially based off #13900.

@jduo jduo force-pushed the 34532-jdbc-flight-endpoints branch from 16cd5b2 to 3b8f7bd Compare November 3, 2023 02:03
@jduo jduo marked this pull request as ready for review November 3, 2023 02:07
@jduo jduo requested a review from lidavidm as a code owner November 3, 2023 02:07
@jduo
Copy link
Member Author

jduo commented Nov 3, 2023

@lidavidm

Comment on lines 116 to 138
// Clone the builder and then set the new endpoint on it.
final URI endpointUri = endpoint.getLocations().get(0).getUri();
final Builder builderForEndpoint = new Builder(ArrowFlightSqlClientHandler.this.builder)
.withHost(endpointUri.getHost())
.withPort(endpointUri.getPort())
.withEncryption(endpointUri.getScheme().equals(LocationSchemes.GRPC_TLS));

final ArrowFlightSqlClientHandler endpointHandler = builderForEndpoint.build();
try {
endpoints.add(new CloseableEndpointStreamPair(
endpointHandler.sqlClient.getStream(endpoint.getTicket(),
endpointHandler.getOptions()), endpointHandler.sqlClient));
} catch (Exception ex) {
AutoCloseables.close(endpointHandler);
throw ex;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some future improvements for this would be to test each of the locations and use the first one that works; ignore locations with unknown URI schemes for forwards compatibility; fall back to the current client if none of the locations work. Could we file issue(s) for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, having a cache of clients instead of opening new ones every time (e.g. via https://github.com/ben-manes/caffeine)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added #38573 and #38574 for these topics.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Nov 3, 2023
- Create new clients to connect to new locations in endpoints.
- If no location is reported using the current connection.
- Rework ArrowFlightSqlHandler.Builder to avoid changing its state
during build() so that it can be re-used.
- Add a copy constructor to ArrowFlightSqlHandler.Builder to make it
easy to replicate connection settings when connecting to different
locations.
- Add unit test infrastructure for building a distributed FlightSql
server
@jduo jduo force-pushed the 34532-jdbc-flight-endpoints branch from 3b8f7bd to 9a0bd79 Compare November 3, 2023 16:02
@lidavidm lidavidm merged commit 2fb7fd9 into apache:main Nov 3, 2023
14 checks passed
@lidavidm lidavidm removed the awaiting merge Awaiting merge label Nov 3, 2023
Comment on lines +427 to +428
private Builder(Builder original) {
this.middlewareFactories.addAll(original.middlewareFactories);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to verify, this would ensure the CookiesMiddleware gets copied over for the Endpoint clients, correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't think about that, but yes, it should mean that the middleware gets shared between the clients. That might be good or bad depending on the server implementation but we can follow up if it turns out to be an issue.

Copy link
Member Author

@jduo jduo Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new CookieMiddlewareFactory is created for subsequent connections (same with the AuthMiddlewareFactory). This was intentional to give child connections a 'blank-slate' but should actually be configurable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added #38576 . This one can cause breakage for existing users that are relying on the same cookie/auth token from being sent so I'll pick it up next.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thanks! we depend on cookies to maintain state and expect the callOptions we set to go out with every request so that will be important for us!

Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 2fb7fd9.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

prmoore77 added a commit to prmoore77/arrow that referenced this pull request Nov 6, 2023
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…ts (apache#38521)

### Rationale for this change
The Flight SQL JDBC Driver currently doesn't fetch at multiple endpoints correctly when the data is not at the same location as the original connection.

### What changes are included in this PR?
- Create new clients to connect to new locations in endpoints.
- If no location is reported using the current connection.
- Make ArrowFlightSqlClientHandler's builder's build() function to be idempodent.
- Add functionality to clone ArrowFlightSqClientHandler's builder so that it can be used for temporary connections to locations returned by getFlightInfo().
- Add utility classes in unit tests for constructing a distributed Flight SQL Server

### Are these changes tested?
Yes.

### Are there any user-facing changes?
The behavior for when there are reported endpoints from getFlightInfo is now fixed. However if users relied on the previous behavior of just getting streams from the same node, and their server only ever reported the original node, they may observe more Flight client connections opening and closing than before (since new connections get spawned for each partition that has at least one Location now).

* Closes: apache#34532

Authored-by: James Duong <james.duong@improving.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…ts (apache#38521)

### Rationale for this change
The Flight SQL JDBC Driver currently doesn't fetch at multiple endpoints correctly when the data is not at the same location as the original connection.

### What changes are included in this PR?
- Create new clients to connect to new locations in endpoints.
- If no location is reported using the current connection.
- Make ArrowFlightSqlClientHandler's builder's build() function to be idempodent.
- Add functionality to clone ArrowFlightSqClientHandler's builder so that it can be used for temporary connections to locations returned by getFlightInfo().
- Add utility classes in unit tests for constructing a distributed Flight SQL Server

### Are these changes tested?
Yes.

### Are there any user-facing changes?
The behavior for when there are reported endpoints from getFlightInfo is now fixed. However if users relied on the previous behavior of just getting streams from the same node, and their server only ever reported the original node, they may observe more Flight client connections opening and closing than before (since new connections get spawned for each partition that has at least one Location now).

* Closes: apache#34532

Authored-by: James Duong <james.duong@improving.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Java] JDBC Flight SQL driver should follow Flight RPC spec
3 participants