-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Ship threaded send #10513
Ship threaded send #10513
Conversation
8bb6a99
to
42dd93a
Compare
std::thread thr; | ||
std::atomic<bool> send_thread_has_exception = false; | ||
std::exception_ptr eptr; | ||
boost::asio::io_context ctx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
io_context
per connection seems a bit heavy. Since most/all uses of SHiP are likely to have only a few connections it is likely ok. Could have one io_context
and share it across all connections. Just FYI, I'm ok with leaving it as is.
return; | ||
} | ||
self->socket_stream->binary(true); | ||
self->start_read(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this is not thread safe. You are not allowed to access a socket from more than one thread at a time. Here you have the main thread doing an async_read
and the internal thread doing an write
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what I understand about socket programming. According to https://stackoverflow.com/questions/7418093/use-one-socket-in-two-threads, it should be safe for TCP. AFAIK, websocket is full-duplex communication channels over a single TCP connection, it should also be safe for read/write from different threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See thread-safety at the bottom of the page: https://www.boost.org/doc/libs/1_76_0/doc/html/boost_asio/reference/ip__tcp/socket.html
Also:
https://stackoverflow.com/questions/7362894/boostasiosocket-thread-safety
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you really want to do it, you can dup() the fd and create two boost sockets referring to the same underlying TCP connection. Then use one boost socket for read, one for write, on two separate threads. Not that i'm recommending doing that here, but that's how it can be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case anyone finds this: my brain was thinking asio socket. The semantics of dup()ing a socket backed by a websocket::stream are almost certainly no good!
}); | ||
}); | ||
void send(T obj, ::std::optional<::fc::zipkin_span>&& span) { | ||
if (!send_thread_has_exception) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should at least log that we are ignoring this send.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception is reported via post at the end of thread function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems racy. send_thread_has_exception
won't be set true until some period of time after the io_context has stopped running. So it seems possible for send()
to queue up another post()
that will never be dispatched. If that's not a problem (and it might not be, I'm not sure on cursory look), then I don't see why this check needs to be here in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop in the thread will break when the first task throw an exception, all the subsequent tasks would not be able to be executed. The check here is not strictly needed, just intended to be save some cycles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to review this further so I'm just putting this in for now to block merging. In particular I'm concerned about the change to a sync write()
The sync write was already a significant shut down concern, but I get the impression that the mutex introduced to fix the threading violation now makes the sync write even more dangerous. If the sync write is blocked, but data is received, the main thread then becomes blocked because the async_read cannot start until the blocked sync write completes. Would like to understand more why this sync write is being used instead of async. I've heard performance several times but tests I've performed show async write being able to send north of 6GB/sec over a websocket. A single thread with a strand per connection would seem more than capable of handling the demand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would still like to understand why async_write
performs so poorly in this code where it seems perfectly fine in a simple test program. Hopefully a follow on task can be worked to understand.
Change Description
This PR
Select ONE:
Testing Changes
Select ANY that apply:
Consensus Changes
API Changes
Documentation Additions