-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python][FlightRPC] Segmentation Fault when invoking authenticate concurrently over a same FlightClient #38565
Comments
Could you share full backtrace? |
Hi @kou this is related to #38519 ticket in a way, I was trying to debug the issue myself, but @jdanap took over the investigation now. Considering I am not able to go any deeper, I can just say that the backtrace looks like this: libc++abi: terminating due to uncaught exception of type std::__1::bad_function_call: std::exception
Process 95253 stopped
* thread #36, stop reason = signal SIGABRT
frame #0: 0x00000001a7b60744 libsystem_kernel.dylib`__pthread_kill + 8
libsystem_kernel.dylib`:
-> 0x1a7b60744 <+8>: b.lo 0x1a7b60764 ; <+40>
0x1a7b60748 <+12>: pacibsp
0x1a7b6074c <+16>: stp x29, x30, [sp, #-0x10]!
0x1a7b60750 <+20>: mov x29, sp
Target 0: (Python) stopped.
(lldb) bt
* thread #36, stop reason = signal SIGABRT
* frame #0: 0x00000001a7b60744 libsystem_kernel.dylib`__pthread_kill + 8
frame #1: 0x00000001a7b97c28 libsystem_pthread.dylib`pthread_kill + 288
frame #2: 0x00000001a7aa5ae8 libsystem_c.dylib`abort + 180
frame #3: 0x00000001a7b50b84 libc++abi.dylib`abort_message + 132
frame #4: 0x00000001a7b403b4 libc++abi.dylib`demangling_terminate_handler() + 320
frame #5: 0x00000001a7816e68 libobjc.A.dylib`_objc_terminate() + 160
frame #6: 0x00000001a7b4ff48 libc++abi.dylib`std::__terminate(void (*)()) + 16
frame #7: 0x00000001a7b52d34 libc++abi.dylib`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 36
frame #8: 0x00000001a7b52ce0 libc++abi.dylib`__cxa_throw + 140
frame #9: 0x000000010570e9d8 libarrow_python_flight.dylib`std::__1::__throw_bad_function_call[abi:v15006]() + 56
frame #10: 0x000000010570eee8 libarrow_python_flight.dylib`std::__1::__function::__value_func<arrow::Status (_object*, arrow::flight::ClientAuthSender*, arrow::flight::ClientAuthReader*)>::operator()[abi:v15006](_object*&&, arrow::flight::ClientAuthSender*&&, arrow::flight::ClientAuthReader*&&) const + 68
frame #11: 0x000000010570ee98 libarrow_python_flight.dylib`std::__1::function<arrow::Status (_object*, arrow::flight::ClientAuthSender*, arrow::flight::ClientAuthReader*)>::operator()(_object*, arrow::flight::ClientAuthSender*, arrow::flight::ClientAuthReader*) const + 68
frame #12: 0x000000010570ed54 libarrow_python_flight.dylib`arrow::py::flight::PyClientAuthHandler::Authenticate(arrow::flight::ClientAuthSender*, arrow::flight::ClientAuthReader*)::$_2::operator()() const + 80
frame #13: 0x0000000105703c24 libarrow_python_flight.dylib`decltype(fp()) arrow::py::SafeCallIntoPython<arrow::py::flight::PyClientAuthHandler::Authenticate(arrow::flight::ClientAuthSender*, arrow::flight::ClientAuthReader*)::$_2>(arrow::py::flight::PyClientAuthHandler::Authenticate(arrow::flight::ClientAuthSender*, arrow::flight::ClientAuthReader*)::$_2&&) + 80
frame #14: 0x0000000105703bc8 libarrow_python_flight.dylib`arrow::py::flight::PyClientAuthHandler::Authenticate(arrow::flight::ClientAuthSender*, arrow::flight::ClientAuthReader*) + 64
frame #15: 0x000000013d346fc8 libarrow_flight.1400.dylib`arrow::flight::transport::grpc::(anonymous namespace)::GrpcClientImpl::AuthenticateInternal(this=0x0000000103534be0, rpc=0x0000000179c9e398)::ClientRpc&) at grpc_client.cc:1058:7
frame #16: 0x000000013d332b60 libarrow_flight.1400.dylib`arrow::flight::transport::grpc::(anonymous namespace)::GrpcClientImpl::Authenticate(this=0x0000000103534be0, options=0x0000000123ed3f00, auth_handler=<unavailable>) at grpc_client.cc:864:12
frame #17: 0x000000013d2d9540 libarrow_flight.1400.dylib`arrow::flight::FlightClient::Authenticate(this=0x0000000103542e30, options=0x0000000123ed3f00, auth_handler=nullptr) at client.cc:567:22
frame #18: 0x0000000123e0c130 _flight.cpython-311-darwin.so`__pyx_pf_7pyarrow_7_flight_12FlightClient_8authenticate(__pyx_obj_7pyarrow_7_flight_FlightClient*, _object*, __pyx_obj_7pyarrow_7_flight_FlightCallOptions*) + 1112
frame #19: 0x0000000123e068bc _flight.cpython-311-darwin.so`__pyx_pw_7pyarrow_7_flight_12FlightClient_9authenticate(_object*, _object* const*, long, _object*) + 1152
frame #20: 0x0000000123e7f488 _flight.cpython-311-darwin.so`__Pyx_CyFunction_Vectorcall_FASTCALL_KEYWORDS(_object*, _object* const*, unsigned long, _object*) + 208
frame #21: 0x000000010063fde0 Python`_PyEval_EvalFrameDefault + 40368
frame #22: 0x0000000100644564 Python`_PyEval_Vector + 116
frame #23: 0x0000000100641cdc Python`_PyEval_EvalFrameDefault + 48300
frame #24: 0x0000000100644564 Python`_PyEval_Vector + 116
frame #25: 0x0000000100641cdc Python`_PyEval_EvalFrameDefault + 48300
frame #26: 0x0000000100644564 Python`_PyEval_Vector + 116
frame #27: 0x000000010056a198 Python`method_vectorcall + 380
frame #28: 0x00000001006f6ad4 Python`thread_run + 168
frame #29: 0x00000001006981cc Python`pythread_wrapper + 48
frame #30: 0x00000001a7b97fa8 libsystem_pthread.dylib`_pthread_start + 148
The issue is happening exactly as @jdanap mentioned, which should be here
Let me know if there is anything else you need |
Thanks. Could you execute the following commands in the debugger session?
|
Ah,
@lidavidm We don't want to recommend |
Thanks for finding that! We could/should probably add a lock for this to be consistent with other methods, but yeah, I've been slowly marking the old authentication method deprecated (not for removal) anyways because of its statefulness. |
I wish I could be able to see the symbols :D that has been the problem described in #38519. @kou @lidavidm do you have any suggestions on what should be the recommended approach here to avoid the problem? |
Do you need to authenticate concurrently with other calls/with itself? I would expect a client to authenticate once, up front, and then enter concurrent usage |
So our scenario is as follows:
In this scenario, unless we put a lock in the code in service B to avoid the authenticate being called at the same time (which I remember trying, but didn't seem to help, still I want to validate that again), don't think there is any other way, unless I am missing something. Do you agree @lidavidm ?? |
No, then each user/request should use its own client. Authenticate is explicitly stateful, and is not appropriate if you are using multiple authentication contexts. |
I think that we can't do it... |
Ah...we'd have to put a reader/writer lock over every method - I'm not sure that's worth the tradeoff. We should document the requirements for Authenticate instead, then. |
+1 |
Documenting the requirements/limitations for Authenticate makes sense to me. Correct me if I am wrong - but I believe even if we can make |
There is likely no need for locking, just making SetToken take a shared_ptr instead of a raw pointer may be enough, see linked issue. |
Actually that's wrong, there is no guarantee that a mutation and a read of a shared_ptr is consistent; thanks to lidavidm for pointing out |
Describe the bug, including details regarding any error messages, version, and platform.
Hello. I encountered issue with segmentation fault/crashing
pyarrow.flight.FlightClient
whenever the client is used concurrently.As we have narrowed down the cause of the failure to the FlightClient's
authenticate
function, I have simplified the example reproducible usage scenario to the following:With lower level of concurrency, we are less likely to encounter the crash. Instead, we would get 404 or DNS resolution error as expected (or in real use case, be able to get through the authentication successfully just fine). But set it 20 and I have been able to reproduce the crash consistently, albeit after varying number of
authenticate
calls before the script above exits with a Segmentation Fault in Unix environment.Debugging in dev mode, the point of failure was located to be here, where it generates std::bad_function_call
We could work around this by always instantiating a new client instead of reusing a singleton one. However, to my understanding, reusing the same client is encouraged.
I am wondering what the original source of failure is, and/or if re-authenticating over/reusing the same client is not the intended usage in this case.
Version(s)
pyarrow=12.0.0 and pyarrow=14.0.0
Component(s)
FlightRPC, Python
The text was updated successfully, but these errors were encountered: