fix(network): use multi io_services in asio #1016

Smityz · 2022-01-10T11:28:18Z

Background

related issue: apache/incubator-pegasus#307
This issue reported a bug that the core dump will happen in some scenes. It's caused by the race condition when multi-threads read/write/close the same socket.

how to fix this bug

Original way

Epoll
+-------------------------------+
| socket1  socket2  socket3     |
|  +--+     +--+     +--+       |
|  +--+     +--+     +--+       |
|                               |
+------------^------------------+
             |
io_service   |polling
+------------+------------------+
| task_queue                    |
| +-----------+--------------+  |
| | epollwait | call_back    |  |
| +-----+-----+--+--------+--+  |
|       |        |        |     |
+-------+--------+--------+-----+
Thread1 |  2     |     3  |
      +-v-+    +-v-+    +-v-+
      |   |    |   |    |   |
      +---+    +---+    +---+

In the past, we used multi-threads to execute polling or callback in one event loop. But like the coredump information showed, the Use-After-Free may happen in the high-traffic scene. It's hard for us to add mutex to prevent this problem, so I change the way we use ASIO in this PR.

New way

+-----------------------------------------------+
|Linux kernel                                   |
| +-----------+   +-----------+   +-----------+ |
| |   Epoll1  |   |   Epoll2  |   |   Epoll3  | |
| +-----^-----+   +-----^-----+   +-----^-----+ |
+-------|---------------|---------------|-------+
  +-----------+   +-----------+   +-----------+
  |  polling  |   |  polling  |   |  polling  |
  | +-------+ |   | +-------+ |   | +-------+ |
  | |Thread1| |   | |Thread2| |   | |Thread3| |
  | +-------+ |   | +-------+ |   | +-------+ |
  |io_service1|   |io_service2|   |io_service3|
  +-----------+   +-----------+   +-----------+

I use one loop per thread model in the network service, the operations of one socket are executed in the single thread. So we won't worry about race conditions anymore.

Benchmark

Original benchmark

+--------------------------+----------+------------+--------+------------+--------------+----------------------------------------------+-----------------------------------------------+
|      operation_case      | run_time | throughput | length | read_write | thread_count |     read(qps|ave|min|max|95|99|999|9999)     |     write(qps|ave|min|max|95|99|999|9999)     |
+--------------------------+----------+------------+--------+------------+--------------+----------------------------------------------+-----------------------------------------------+
| write=single,read=single | 8284     | 36210      | 1000   | 0 : 1      | 15           | {0 0 0 0 0 0 0 0}                            | {36212 1240 365 236287 2386 5349 12119 18929} |
| write=single,read=single | 3124     | 289183     | 1000   | 1 : 0      | 50           | {289233 518 116 163924 853 1500 15120 21369} | {0 0 0 0 0 0 0 0}                             |
| write=single,read=single | 3151     | 76148      | 1000   | 1 : 1      | 30           | {38078 712 116 179583 1892 8137 38959 51305} | {38075 1643 368 322815 3995 6908 18257 32436} |
| write=single,read=single | 3317     | 45208      | 1000   | 1 : 3      | 15           | {11306 568 121 148287 1191 5044 31439 39956} | {33909 1133 378 253524 1992 4940 16777 27567} |
| write=single,read=single | 2348     | 38309      | 1000   | 1 : 30     | 15           | {1234 588 148 99583 1204 4399 29561 38324}   | {37083 1190 374 242303 2189 5061 12209 22660} |
| write=single,read=single | 2498     | 120131     | 1000   | 3 : 1      | 30           | {90117 535 115 149247 1083 4484 29825 45375} | {30040 1379 375 303273 2717 5896 30233 50068} |
| write=single,read=single | 3393     | 267123     | 1000   | 30 : 1     | 50           | {258529 543 113 177535 957 2168 16620 25764} | {8615 1121 418 266537 1585 4980 23660 48585}  |
+--------------------------+----------+------------+--------+------------+--------------+----------------------------------------------+-----------------------------------------------+

After change

+--------------------------+----------+------------+--------+------------+--------------+-----------------------------------------------+-----------------------------------------------+
|      operation_case      | run_time | throughput | length | read_write | thread_count |     read(qps|ave|min|max|95|99|999|9999)      |     write(qps|ave|min|max|95|99|999|9999)     |
+--------------------------+----------+------------+--------+------------+--------------+-----------------------------------------------+-----------------------------------------------+
| write=single,read=single | 8000     | 37495      | 1000   | 0 : 1      | 15           | {0 0 0 0 0 0 0 0}                             | {37497 1197 359 210644 2199 5193 16247 23100} |
| write=single,read=single | 3076     | 293209     | 1000   | 1 : 0      | 50           | {293270 510 112 176489 848 1346 16665 24001}  | {0 0 0 0 0 0 0 0}                             |
| write=single,read=single | 3056     | 78516      | 1000   | 1 : 1      | 30           | {39266 720 118 170367 1954 8243 38601 51241}  | {39262 1564 353 253609 3555 6564 17687 34548} |
| write=single,read=single | 3213     | 46676      | 1000   | 1 : 3      | 15           | {11671 561 121 157055 1211 5136 30281 38335}  | {35013 1093 357 215679 1844 4887 11745 22825} |
| write=single,read=single | 2292     | 39252      | 1000   | 1 : 30     | 15           | {1266 600 150 119081 1258 4555 29892 38025}   | {37994 1161 367 242260 2003 4985 10463 19873} |
| write=single,read=single | 2422     | 123899     | 1000   | 3 : 1      | 30           | {92937 527 114 127593 1072 4433 23495 37716}  | {30979 1310 364 227199 2375 5739 17865 35332} |
| write=single,read=single | 3808     | 241281     | 1000   | 30 : 1     | 50           | {233521 613 110 186708 1264 2553 16615 24703} | {7784 1148 400 244820 1914 5060 24692 47156}  |
+--------------------------+----------+------------+--------+------------+--------------+-----------------------------------------------+-----------------------------------------------+

The change has no significant performance impact

Actual performance

This change has fixed the bug in our production environment.

src/runtime/rpc/asio_net_provider.cpp

src/runtime/rpc/asio_rpc_session.cpp

src/runtime/rpc/asio_net_provider.h

src/runtime/rpc/asio_net_provider.cpp

neverchanje

Have you reproduced the bug? And how do you ensure the new code will fix it?

src/runtime/rpc/asio_net_provider.cpp

Smityz · 2022-01-24T10:48:34Z

Have you reproduced the bug? And how do you ensure the new code will fix it?

I did a grayscale test in the production environment, the old version machine still had this problem, but the new version didn't have.

The reason of the coredump is the race condition, but in the new design there won't be a multi-thread environment in one socket

acelyc111 · 2022-01-25T02:41:51Z

If this is a bugfix PR, explicit use fix instead of refactor for the PR title.

src/runtime/rpc/asio_net_provider.cpp

src/runtime/rpc/asio_net_provider.h

src/runtime/rpc/asio_net_provider.cpp

neverchanje · 2022-01-25T10:19:06Z

src/runtime/rpc/asio_net_provider.h


 private:
    friend class asio_rpc_session;
    friend class asio_network_provider_test;

    std::shared_ptr<boost::asio::ip::tcp::acceptor> _acceptor;
-    boost::asio::io_service _io_service;
+    int _next_io_service = 0;


This number must be atomic since multiple threads may concurrently modify it. I would recommend randomly choosing the io service such that we can totally get rid of concurrent conflict.

the operation of int is naturally atomic

random function costs lots of time

This statement really disappoint me, but the overall idea to be efficient maybe work. I won't give it+1. Others might do.

https://stackoverflow.com/questions/54188/are-c-reads-and-writes-of-an-int-atomic

Yes, you are right. And I'm sorry for my irresponsible statement.

I did an experiment in Godbolt before, And I found i++ is one line in assembly language:
add eax, 1. So I think this operation is thread-safety.
But optimizations for modern processors may make it more complex. And I learned a lot from this answer

I think modifying int in multi-threads is safe(won't core dump), but it can't keep its coherence between multi processors.

Anyway, it's ub behavior to use non-atomic variables in multi-threads in C++, and I have changed my codes. You can continue reviewing it now.

acelyc111 · 2022-01-25T16:00:44Z

src/runtime/rpc/asio_net_provider.cpp

+    ++_next_io_service;
+    if (_next_io_service >= FLAGS_io_service_worker_count) {
+        _next_io_service = 0;
+    }
+
+    int tmp = _next_io_service;
+    if (tmp >= FLAGS_io_service_worker_count) {
+        tmp = 0;
+    }


How about ensure FLAGS_io_service_worker_count is 2^N, io_service_worker_mask = FLAGS_io_service_worker_count - 1, then the code can be simplfied as:

uint32_t idx = _next_io_service.fetch_add(1); return *_io_services[idx & io_service_worker_mask];

Good idea, I have thought about it too. But it's strict to limit FLAGS_io_service_worker_count. I'll do a speed test later to check if it's faster than two addition operations.

Smityz · 2022-01-26T06:49:19Z

I did a benchmark in multi-threads env again, and found random is the quickest way, so I decide to adopt this way.

round_robin_1(add):
time = 825987.000 ns
round_robin_2(MOD):
time = 421049.000 ns
round_robin_3(BIT MOD):
time = 504586.000 ns
round_robin_4(dsn::rand):
time = 420370.000 ns

CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz

https://gist.github.com/Smityz/00426f49544348676d4ddd8b0b0eb253

Smityz added 3 commits January 10, 2022 18:53

poc

8660a7e

poc

b4043a5

format

e361bf7

Smityz changed the title ~~fix(network): fix asio core dump~~ refactor(network): use multi io_services in asio Jan 24, 2022

foreverneverer reviewed Jan 24, 2022

View reviewed changes

src/runtime/rpc/asio_net_provider.cpp Outdated Show resolved Hide resolved

src/runtime/rpc/asio_net_provider.cpp Outdated Show resolved Hide resolved

src/runtime/rpc/asio_rpc_session.cpp Show resolved Hide resolved

levy5307 reviewed Jan 24, 2022

View reviewed changes

src/runtime/rpc/asio_net_provider.h Outdated Show resolved Hide resolved

levy5307 reviewed Jan 24, 2022

View reviewed changes

src/runtime/rpc/asio_net_provider.cpp Outdated Show resolved Hide resolved

levy5307 reviewed Jan 24, 2022

View reviewed changes

src/runtime/rpc/asio_net_provider.cpp Outdated Show resolved Hide resolved

levy5307 reviewed Jan 24, 2022

View reviewed changes

src/runtime/rpc/asio_net_provider.cpp Outdated Show resolved Hide resolved

levy5307 reviewed Jan 24, 2022

View reviewed changes

src/runtime/rpc/asio_net_provider.cpp Outdated Show resolved Hide resolved

neverchanje reviewed Jan 24, 2022

View reviewed changes

src/runtime/rpc/asio_net_provider.cpp Outdated Show resolved Hide resolved

update by cr

d91ef4b

neverchanje changed the title ~~refactor(network): use multi io_services in asio~~ fix(network): use multi io_services in asio Jan 25, 2022

Smityz added 2 commits January 25, 2022 10:59

rename

7c9ce59

add some comments

411d1f5

levy5307 reviewed Jan 25, 2022

View reviewed changes

src/runtime/rpc/asio_net_provider.cpp Outdated Show resolved Hide resolved

levy5307 reviewed Jan 25, 2022

View reviewed changes

src/runtime/rpc/asio_net_provider.cpp Outdated Show resolved Hide resolved

levy5307 reviewed Jan 25, 2022

View reviewed changes

src/runtime/rpc/asio_net_provider.h Outdated Show resolved Hide resolved

Smityz added 3 commits January 25, 2022 17:05

update by CR && add comments

24d3d73

add comments

9fb262c

update

92254e0

levy5307 previously approved these changes Jan 25, 2022

View reviewed changes

neverchanje reviewed Jan 25, 2022

View reviewed changes

update unique_ptr && atomic

9823940

Smityz dismissed levy5307’s stale review via 9823940 January 25, 2022 11:42

acelyc111 reviewed Jan 25, 2022

View reviewed changes

random to choose

91432e6

update

64ffef1

levy5307 approved these changes Jan 26, 2022

View reviewed changes

foreverneverer approved these changes Jan 26, 2022

View reviewed changes

Smityz merged commit 582c8bb into XiaoMi:master Jan 26, 2022

foreverneverer mentioned this pull request Jul 5, 2022

Release 2.4.0 apache/incubator-pegasus#1032

Closed

acelyc111 mentioned this pull request Aug 30, 2024

pegasus coredump at boost asio apache/incubator-pegasus#307

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(network): use multi io_services in asio #1016

fix(network): use multi io_services in asio #1016

Smityz commented Jan 10, 2022 •

edited

Loading

neverchanje left a comment

Smityz commented Jan 24, 2022 •

edited

Loading

acelyc111 commented Jan 25, 2022

neverchanje Jan 25, 2022

Smityz Jan 25, 2022 •

edited

Loading

neverchanje Jan 25, 2022 •

edited

Loading

Smityz Jan 25, 2022

acelyc111 Jan 25, 2022

Smityz Jan 25, 2022

Smityz commented Jan 26, 2022 •

edited

Loading

fix(network): use multi io_services in asio #1016

fix(network): use multi io_services in asio #1016

Conversation

Smityz commented Jan 10, 2022 • edited Loading

Background

how to fix this bug

Benchmark

Actual performance

neverchanje left a comment

Choose a reason for hiding this comment

Smityz commented Jan 24, 2022 • edited Loading

acelyc111 commented Jan 25, 2022

neverchanje Jan 25, 2022

Choose a reason for hiding this comment

Smityz Jan 25, 2022 • edited Loading

Choose a reason for hiding this comment

neverchanje Jan 25, 2022 • edited Loading

Choose a reason for hiding this comment

Smityz Jan 25, 2022

Choose a reason for hiding this comment

acelyc111 Jan 25, 2022

Choose a reason for hiding this comment

Smityz Jan 25, 2022

Choose a reason for hiding this comment

Smityz commented Jan 26, 2022 • edited Loading

Smityz commented Jan 10, 2022 •

edited

Loading

Smityz commented Jan 24, 2022 •

edited

Loading

Smityz Jan 25, 2022 •

edited

Loading

neverchanje Jan 25, 2022 •

edited

Loading

Smityz commented Jan 26, 2022 •

edited

Loading