-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support endpoints with different URLs than initial connect #42
Support endpoints with different URLs than initial connect #42
Conversation
Thanks for looking into this @avantgardnerio . |
try { | ||
ArrayList<FlightStream> streams = new ArrayList<>(); | ||
for (FlightEndpoint ep : flightInfo.getEndpoints()) { | ||
for (Location loc : ep.getLocations()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not supposed to iterate over locations. For a given endpoint/ticket, multiple locations can be returned for redundancy and load balancing reasons and the caller can arbitrarily choose one.
It's also legal for no location to be returned, in which case the caller should just use the same node that getFlightInfo was called on. Note that if at least one location is returned, the caller cannot assume that getStream() can be called on the same node as getFlightInfo().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this info! I'll put together a more serious PR and undraft it, taking this into account.
I should point out that if you're interested in ODBC as well, the ODBC driver has a similar issue: |
How does the latest code look? |
...bc-driver/src/main/java/org/apache/arrow/driver/jdbc/client/ArrowFlightSqlClientHandler.java
Outdated
Show resolved
Hide resolved
...bc-driver/src/main/java/org/apache/arrow/driver/jdbc/client/ArrowFlightSqlClientHandler.java
Outdated
Show resolved
Hide resolved
...bc-driver/src/main/java/org/apache/arrow/driver/jdbc/client/ArrowFlightSqlClientHandler.java
Outdated
Show resolved
Hide resolved
for (FlightEndpoint ep : flightInfo.getEndpoints()) { | ||
URI uri = ep.getLocations().size() > 0 ? ep.getLocations().get(0).getUri() : null; | ||
FlightSqlClient sqlClient = this.connectionManager.getConnection(uri); | ||
streams.add(sqlClient.getStream(ep.getTicket(), getOptions())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really apparent how the sqlClient gets returned to the pool. It should get returned after the rest of the driver is done with the stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without trying to excuse the behavior, I was following the existing pattern: the builder created a FlightSqlClient
and passed ownership of it to ArrowFlightSqlClientHandler
, which closes it when it is closed itself.
This PR changes it so that: the builder creates a ConnectionManager
instead of a FlightSqlClient
, but all else remains the same.
I see that the resulting streams are AutoClosable
. I think we could wrap them in an AOP proxy and return the client to the pool when the stream is closed. To be honest, calling what is there a "pool" is pretty strong. Presently there's no notion of outstanding leases, return, etc. If this behavior is desired, I can look into an apache-commons
Pool
or similar. (It's probably more of a factory / cache?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started to make a FlightStreamWrapper
that could track the number of open FlightStreams
for a given client, but I'm at a bit of a loss as to the intended life-cycle:
- Do we want to open
FlightClient
s to each of these endpoints and keep them open? - Should do we want to close the
FlightClient
to an endpoint as soon as it's stream is closed? - Do we want to put in some kind of timeout where we close it after some amount of inactivity?
Just let me know what's desired and I can make it happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response. I've been thinking about this and I think the right thing to do is to actually change the protocol such that when a Flight Server responds to getFlightInfo, it can optionally add an optimization hint telling the caller that endpoints are long-lived and potentially re-usable. Perhaps indicating as well if contacting an endpoint can just re-use information from the originating request (such as cookies or authentication tokens).
For this PR, let's pessimistically assume that we should not reuse FlightClients and close them with the stream. And discuss how/if the server should provide lifetime information about endpoints in the dev mailing list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the PR shortly. I think the easiest way to do this is:
- Update
ConnectionManager.acquireConnection()
to create a newFlightSqlClient
each time and return ownership - Create a
FlightStreamWrapper
class that intercepts theclose()
method - Update
ArrowFlightSqlClientHandler.getStreams()
to wrap the returned streams inFlightStreamWrapper
s - Have
FlightStreamWrapper
callclose
on theFlightSqlClient
whenclose()
is called on the stream
Sound good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this sounds like a good plan.
@avantgardnerio would it be possible to make a PR to this branch here instead https://github.com/Bit-Quill/arrow/tree/BQ-flight-jdbc-driver thank you! |
Yep, I have one in the works. |
Wait @jayhomn-bitquill you caught me off guard.. I was expecting https://github.com/apache/arrow/tree/flight-sql-jdbc ? |
Ahh okay that is fine then! |
This has been moved to apache#13900 |
Clearly this is not ready for merge, butI tested it and it does allow me to use the JDBC driver through DataGrip to Apache Arrow Ballista:I thought I would post this to get the discussion started about the ways in which a real fix would differ from what I have here.