Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][FlightRPC] FlightClient.authenticate is not thread safe #41919

Open
jcferretti opened this issue Jun 1, 2024 · 5 comments
Open

[C++][FlightRPC] FlightClient.authenticate is not thread safe #41919

jcferretti opened this issue Jun 1, 2024 · 5 comments

Comments

@jcferretti
Copy link

jcferretti commented Jun 1, 2024

Describe the bug, including details regarding any error messages, version, and platform.

This is the cython code (pyx) implementation of flight_client.authenticate:

def authenticate(self, auth_handler, options: FlightCallOptions = None):

   1440     def authenticate(self, auth_handler, options: FlightCallOptions = None):
   1441         """Authenticate to the server.
   1442
   1443         Parameters
   1444         ----------
   1445         auth_handler : ClientAuthHandler
   1446             The authentication mechanism to use.
   1447         options : FlightCallOptions
   1448             Options for this call.
   1449         """
   1450         cdef:
   1451             unique_ptr[CClientAuthHandler] handler
   1452             CFlightCallOptions* c_options = FlightCallOptions.unwrap(options)
   1453
   1454         if not isinstance(auth_handler, ClientAuthHandler):
   1455             raise TypeError(
   1456                 "FlightClient.authenticate takes a ClientAuthHandler, "
   1457                 "not '{}'".format(type(auth_handler)))
   1458         handler.reset((<ClientAuthHandler> auth_handler).to_handler())
   1459         with nogil:
   1460             check_flight_status(
   1461                 self.client.get().Authenticate(deref(c_options),
   1462                                                move(handler)))

Note the object being stored inside the handler object is returned by to_handler defined later on the same file:

cdef PyClientAuthHandler* to_handler(self):

   2482 cdef class ClientAuthHandler(_Weakrefable):
   2483     """Authentication plugin for a client."""
   2484 
   [...]
   2500 
   2501     cdef PyClientAuthHandler* to_handler(self):
   2502         cdef PyClientAuthHandlerVtable vtable
   2503         vtable.authenticate = _client_authenticate
   2504         vtable.get_token = _get_token
   2505         return new PyClientAuthHandler(self, vtable)

The call to Authenticate on _flight.pyx line 1461 is to grpc_client.cc line 860:

Status Authenticate(const FlightCallOptions& options,

    860   Status Authenticate(const FlightCallOptions& options,
    861                       std::unique_ptr<ClientAuthHandler> auth_handler) override {
    862     auth_handler_ = std::move(auth_handler);
    863     ClientRpc rpc(options);
    864     return AuthenticateInternal(rpc);
    865   }

Many calls in the same file use the auth_handler_ member of the struct

std::shared_ptr<ClientAuthHandler> auth_handler_;

that is being assigned on line 862 above, eg, DoPut:

RETURN_NOT_OK(rpc->SetToken(auth_handler_.get()));

    997   Status DoPut(const FlightCallOptions& options,
    998                std::unique_ptr<internal::ClientDataStream>* out) override {
    999     using GrpcStream = ::grpc::ClientReaderWriter<pb::FlightData, pb::PutResult>;
   1000
   1001     auto rpc = std::make_shared<ClientRpc>(options);
   1002     RETURN_NOT_OK(rpc->SetToken(auth_handler_.get()));
   1003     std::shared_ptr<GrpcStream> stream = stub_->DoPut(&rpc->context);
   1004     *out = std::make_unique<GrpcClientPutStream>(std::move(rpc), std::move(stream));
   1005     return Status::OK();
   1006   }

SetToken on the same file:

Status SetToken(ClientAuthHandler* auth_handler) {

     86   /// \brief Add an auth token via an auth handler
     87   Status SetToken(ClientAuthHandler* auth_handler) {
     88     if (auth_handler) {
     89       std::string token;
     90       RETURN_NOT_OK(auth_handler->GetToken(&token));
     91       context.AddMetadata(kGrpcAuthHeader, token);
     92     }
     93     return Status::OK();
     94   }

There is a race between calling auth_handler_.get() to get out a raw pointer out of the auth_handler_ shared_ptr and using it inside SetToken, and another thread changing the value of auth_handler_ via the its assignment operator call in Authenticate, which can trigger the deletion of the previously held pointer value. That deletion can happen in another thread after the call to auth_handler_.get() to get the raw pointer value and before SetToken using that raw pointer value.

My team builds a server and client libraries based on flight. One of our customers ran into an issue while using hundreds of concurrent sessions to our service. Our client library was using calls to FlightClient.authenticate triggered by a timer once every 5 minutes; that concurrently with hundreds of calls to DoPut triggered the problem. We were able to reproduce the problem by artificially increasing the frequency of calls to FlightClient.authenticate to 3 seconds; we saw the same problem with either concurrent DoPut or DoGet. We saw the problem on pyarrow 16.0.0 on Ubuntu Linux 22.04.

More details about the symptoms we saw here: deephaven/deephaven-core#5489

Component(s)

FlightRPC

@jcferretti
Copy link
Author

Likely related: #38565

@kou kou changed the title FlightClient.authenticate is not thread safe [FlightRPC] FlightClient.authenticate is not thread safe Jun 1, 2024
@kou kou changed the title [FlightRPC] FlightClient.authenticate is not thread safe [C++][FlightRPC] FlightClient.authenticate is not thread safe Jun 1, 2024
@kou
Copy link
Member

kou commented Jun 1, 2024

This will avoid the crash but it seems that it may still use wrong authenticate handler.

@jcferretti
Copy link
Author

This will avoid the crash but it seems that it may still use wrong authenticate handler.

By "this" I assume you mean the suggestion I made in the linked issue to change SetToken to take a shared_ptr argument by value instead of a raw pointer.

it may still use wrong authenticate handler.

In what my team is trying to do there is no "wrong" authentication handler. One thread is trying to use the currently registered handler, and another thread is trying to change that handler to a new one, which may or may not use the same authentication token. If the old handler is used, that will use a valid token. If the new handler is used, that will use a valid token (which may or may not be a different authentication token; our server rotates authentication tokens after a period of time, and keeps the old one valid for a while after it has been rotated, because there is no way to prevent several RPCs in flight from different threads from racing with each other, unless the client stopped sending RPCs altogether before trying to get a new token, which it doesn't in our case).

Independently of what my team is trying to do and more generally, if some threads are trying to use the flight client to do operations like DoPut and DoGet, and another thread is changing the authentication handler, there is no way to guarantee an ordering unless the user themselves do some kind of order protection between those threads. What I am trying to elaborate here is that the idea that there is a "right" authentication handler only makes sense if a particular DoGet or DoPut or any other flight operation was done in a context with an expectation on what handler would be used; for that expectation to hold, a user would have to do their own synchronization to ensure that desired ordering, eg, stop doing DoGets or DoPuts or any other operations, change the handler, and then resume. As far as I can tell, there is nothing the flight client itself can, or should do. What I think the flight client can do is to ensure a pointer that was already deleted is not used; since things running from multiple threads can happen in any ordering, if one thread is changing the authentication handler and another is calling DoPut, the thread calling DoPut can't have an expectation on whether the old handler or the new handler will be used. What it can have is an expectation that either of those will be used correctly (as opposed to crashing).

@kou
Copy link
Member

kou commented Jun 2, 2024

By "this" I assume you mean the suggestion I made in the linked issue to change SetToken to take a shared_ptr argument by value instead of a raw pointer.

Right.

our server rotates authentication tokens after a period of time, and keeps the old one valid for a while after it has been rotated

If a server does it and a client is used only for one user, it doesn't use wrong authentication as you said.

But, in general, FlightClient can't assume them. For example, a client may be shared with multiple users.

I don't object this approach because it's better than crashing. Could you open a PR?

But I think that we still need to update our documentation even with this change: #38565 (comment)

@jcferretti
Copy link
Author

I don't object this approach because it's better than crashing. Could you open a PR?

Thanks. Done: #41927

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants