Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](partial update) duplicate key occur when BE restart after conflict concurrent partial update #35739

Merged

Conversation

zhannngchen
Copy link
Contributor

@zhannngchen zhannngchen commented May 31, 2024

Proposed changes

Issue Number: close #xxx

  1. In [fix](partial update) mishandling of exceptions in the publish phase may result in data loss #30366 , in order to avoid that some incomplete delete bitmap left in txn_info->delete_bitmap when publish failed, we make a copy of txn_info->delete_bitmap before we start to compute the delete bitmap
  2. this copy is not updated back to txn_info->delete_bitmap after rowset->rowset_meta()->merge_rowset_meta() is successful
  3. txnManager::publish_txn() saves the contents of txn_info->delete_bitmap to RocksDB after the call to update_delete_bitmap(), due to the issue in step 2, bitmap generated during publish is not saved to RocksDB, so if BE restarts at this point, this part of the incremental delete bitmap will be lost
  4. it will result in duplicated keys on querying

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@zhannngchen
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -1114,7 +1114,7 @@ void BaseTablet::_remove_sentinel_mark_from_delete_bitmap(DeleteBitmapPtr delete
}
}

Status BaseTablet::update_delete_bitmap(const BaseTabletSPtr& self, const TabletTxnInfo* txn_info,
Status BaseTablet::update_delete_bitmap(const BaseTabletSPtr& self, TabletTxnInfo* txn_info,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'update_delete_bitmap' has cognitive complexity of 58 (threshold 50) [readability-function-cognitive-complexity]

Status BaseTablet::update_delete_bitmap(const BaseTabletSPtr& self, TabletTxnInfo* txn_info,
                   ^
Additional context

be/src/olap/base_tablet.cpp:1127: +1, including nesting penalty of 0, nesting level increased to 1

    if (txn_info->partial_update_info && txn_info->partial_update_info->is_partial_update) {
    ^

be/src/olap/base_tablet.cpp:1127: +1

    if (txn_info->partial_update_info && txn_info->partial_update_info->is_partial_update) {
                                      ^

be/src/olap/base_tablet.cpp:1128: nesting level increased to 2

        transient_rs_writer = DORIS_TRY(self->create_transient_rowset_writer(
                              ^

be/src/common/status.h:694: expanded from macro 'DORIS_TRY'

    ({                                           \
    ^

be/src/olap/base_tablet.cpp:1128: +3, including nesting penalty of 2, nesting level increased to 3

        transient_rs_writer = DORIS_TRY(self->create_transient_rowset_writer(
                              ^

be/src/common/status.h:697: expanded from macro 'DORIS_TRY'

        if (!res.has_value()) [[unlikely]] {     \
        ^

be/src/olap/base_tablet.cpp:1140: +1, including nesting penalty of 0, nesting level increased to 1

    RETURN_IF_ERROR(std::dynamic_pointer_cast<BetaRowset>(rowset)->load_segments(&segments));
    ^

be/src/common/status.h:612: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/olap/base_tablet.cpp:1140: +2, including nesting penalty of 1, nesting level increased to 2

    RETURN_IF_ERROR(std::dynamic_pointer_cast<BetaRowset>(rowset)->load_segments(&segments));
    ^

be/src/common/status.h:614: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/olap/base_tablet.cpp:1146: +1, including nesting penalty of 0, nesting level increased to 1

        if (self->tablet_state() == TABLET_NOTREADY) {
        ^

be/src/olap/base_tablet.cpp:1151: +1, including nesting penalty of 0, nesting level increased to 1

        RETURN_IF_ERROR(self->get_all_rs_id_unlocked(cur_version - 1, &cur_rowset_ids));
        ^

be/src/common/status.h:612: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/olap/base_tablet.cpp:1151: +2, including nesting penalty of 1, nesting level increased to 2

        RETURN_IF_ERROR(self->get_all_rs_id_unlocked(cur_version - 1, &cur_rowset_ids));
        ^

be/src/common/status.h:614: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/olap/base_tablet.cpp:1170: +1, including nesting penalty of 0, nesting level increased to 1

    if (segments.size() <= 1) {
    ^

be/src/olap/base_tablet.cpp:1171: +2, including nesting penalty of 1, nesting level increased to 2

        RETURN_IF_ERROR(calc_delete_bitmap(self, rowset, segments, specified_rowsets, delete_bitmap,
        ^

be/src/common/status.h:612: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/olap/base_tablet.cpp:1171: +3, including nesting penalty of 2, nesting level increased to 3

        RETURN_IF_ERROR(calc_delete_bitmap(self, rowset, segments, specified_rowsets, delete_bitmap,
        ^

be/src/common/status.h:614: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/olap/base_tablet.cpp:1174: +1, nesting level increased to 1

    } else {
      ^

be/src/olap/base_tablet.cpp:1176: +2, including nesting penalty of 1, nesting level increased to 2

        RETURN_IF_ERROR(calc_delete_bitmap(self, rowset, segments, specified_rowsets, delete_bitmap,
        ^

be/src/common/status.h:612: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/olap/base_tablet.cpp:1176: +3, including nesting penalty of 2, nesting level increased to 3

        RETURN_IF_ERROR(calc_delete_bitmap(self, rowset, segments, specified_rowsets, delete_bitmap,
        ^

be/src/common/status.h:614: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/olap/base_tablet.cpp:1179: +2, including nesting penalty of 1, nesting level increased to 2

        RETURN_IF_ERROR(token->wait());
        ^

be/src/common/status.h:612: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/olap/base_tablet.cpp:1179: +3, including nesting penalty of 2, nesting level increased to 3

        RETURN_IF_ERROR(token->wait());
        ^

be/src/common/status.h:614: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/olap/base_tablet.cpp:1183: +1, including nesting penalty of 0, nesting level increased to 1

    if (watch.get_elapse_time_us() < 1 * 1000 * 1000) {
    ^

be/src/olap/base_tablet.cpp:1185: +1, nesting level increased to 1

    } else {
      ^

be/src/olap/base_tablet.cpp:1193: nesting level increased to 1

            [](size_t sum, const segment_v2::SegmentSharedPtr& s) { return sum += s->num_rows(); });
            ^

be/src/olap/base_tablet.cpp:1200: +1, including nesting penalty of 0, nesting level increased to 1

    if (config::enable_merge_on_write_correctness_check && rowset->num_rows() != 0) {
    ^

be/src/olap/base_tablet.cpp:1200: +1

    if (config::enable_merge_on_write_correctness_check && rowset->num_rows() != 0) {
                                                        ^

be/src/olap/base_tablet.cpp:1205: +2, including nesting penalty of 1, nesting level increased to 2

        if (!st.ok()) {
        ^

be/src/olap/base_tablet.cpp:1210: +1, including nesting penalty of 0, nesting level increased to 1

    if (transient_rs_writer) {
    ^

be/src/olap/base_tablet.cpp:1211: +2, including nesting penalty of 1, nesting level increased to 2

        DBUG_EXECUTE_IF("Tablet.update_delete_bitmap.partial_update_write_rowset_fail", {
        ^

be/src/util/debug_points.h:36: expanded from macro 'DBUG_EXECUTE_IF'

    if (UNLIKELY(config::enable_debug_points)) {                              \
    ^

be/src/olap/base_tablet.cpp:1211: +3, including nesting penalty of 2, nesting level increased to 3

        DBUG_EXECUTE_IF("Tablet.update_delete_bitmap.partial_update_write_rowset_fail", {
        ^

be/src/util/debug_points.h:38: expanded from macro 'DBUG_EXECUTE_IF'

        if (dp) {                                                             \
        ^

be/src/olap/base_tablet.cpp:1212: +4, including nesting penalty of 3, nesting level increased to 4

            if (rand() % 100 < (100 * dp->param("percent", 0.5))) {
            ^

be/src/olap/base_tablet.cpp:1220: +2, including nesting penalty of 1, nesting level increased to 2

        RETURN_IF_ERROR(transient_rs_writer->flush());
        ^

be/src/common/status.h:612: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/olap/base_tablet.cpp:1220: +3, including nesting penalty of 2, nesting level increased to 3

        RETURN_IF_ERROR(transient_rs_writer->flush());
        ^

be/src/common/status.h:614: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/olap/base_tablet.cpp:1222: +2, including nesting penalty of 1, nesting level increased to 2

        RETURN_IF_ERROR(transient_rs_writer->build(transient_rowset));
        ^

be/src/common/status.h:612: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/olap/base_tablet.cpp:1222: +3, including nesting penalty of 2, nesting level increased to 3

        RETURN_IF_ERROR(transient_rs_writer->build(transient_rowset));
        ^

be/src/common/status.h:614: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/olap/base_tablet.cpp:1231: +1, including nesting penalty of 0, nesting level increased to 1

    RETURN_IF_ERROR(self->save_delete_bitmap(txn_info, txn_id, delete_bitmap,
    ^

be/src/common/status.h:612: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/olap/base_tablet.cpp:1231: +2, including nesting penalty of 1, nesting level increased to 2

    RETURN_IF_ERROR(self->save_delete_bitmap(txn_info, txn_id, delete_bitmap,
    ^

be/src/common/status.h:614: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

@@ -1114,7 +1114,7 @@
}
}

Status BaseTablet::update_delete_bitmap(const BaseTabletSPtr& self, const TabletTxnInfo* txn_info,
Status BaseTablet::update_delete_bitmap(const BaseTabletSPtr& self, TabletTxnInfo* txn_info,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'update_delete_bitmap' exceeds recommended size/complexity thresholds [readability-function-size]

Status BaseTablet::update_delete_bitmap(const BaseTabletSPtr& self, TabletTxnInfo* txn_info,
                   ^
Additional context

be/src/olap/base_tablet.cpp:1116: 117 lines including whitespace and comments (threshold 80)

Status BaseTablet::update_delete_bitmap(const BaseTabletSPtr& self, TabletTxnInfo* txn_info,
                   ^

@doris-robot
Copy link

TPC-H: Total hot run time: 40497 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d4e66ba48eb2867395497e904bc631e5beeca9c3, data reload: false

------ Round 1 ----------------------------------
q1	17626	4393	4314	4314
q2	2034	205	193	193
q3	10509	1299	1167	1167
q4	10703	845	856	845
q5	7584	2700	2669	2669
q6	221	135	140	135
q7	992	622	607	607
q8	9491	2133	2124	2124
q9	9462	6702	6636	6636
q10	9120	3770	3723	3723
q11	458	249	242	242
q12	439	234	220	220
q13	17779	2974	3050	2974
q14	255	220	231	220
q15	523	463	468	463
q16	511	386	391	386
q17	962	667	715	667
q18	8078	7621	7552	7552
q19	4469	1612	1512	1512
q20	648	322	311	311
q21	5036	3212	3913	3212
q22	379	325	333	325
Total cold run time: 117279 ms
Total hot run time: 40497 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4373	4238	4235	4235
q2	361	271	266	266
q3	3003	2747	2764	2747
q4	1889	1608	1564	1564
q5	5273	5308	5309	5308
q6	216	124	126	124
q7	2148	1780	1761	1761
q8	3214	3356	3339	3339
q9	8414	8375	8423	8375
q10	3908	3644	3723	3644
q11	572	478	497	478
q12	771	598	573	573
q13	17437	3012	2981	2981
q14	301	265	275	265
q15	513	490	464	464
q16	495	431	436	431
q17	1798	1511	1482	1482
q18	7708	7443	7489	7443
q19	1688	1491	1524	1491
q20	2010	1766	1793	1766
q21	4846	4761	4815	4761
q22	592	529	536	529
Total cold run time: 71530 ms
Total hot run time: 54027 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.28% (9231/25442)
Line Coverage: 27.63% (75705/273964)
Region Coverage: 26.85% (39195/145991)
Branch Coverage: 23.60% (19894/84284)
Coverage Report: http://coverage.selectdb-in.cc/coverage/d4e66ba48eb2867395497e904bc631e5beeca9c3_d4e66ba48eb2867395497e904bc631e5beeca9c3/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 41270 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d4e66ba48eb2867395497e904bc631e5beeca9c3, data reload: false

------ Round 1 ----------------------------------
q1	17627	4362	4310	4310
q2	2040	201	201	201
q3	10431	1230	1160	1160
q4	10201	821	856	821
q5	7454	2714	2641	2641
q6	241	134	133	133
q7	968	627	631	627
q8	9222	2159	2181	2159
q9	9523	6756	6769	6756
q10	9499	3953	3922	3922
q11	446	242	259	242
q12	434	227	243	227
q13	18628	3177	3290	3177
q14	275	235	202	202
q15	511	465	496	465
q16	485	395	393	393
q17	1005	694	600	600
q18	8460	7796	7858	7796
q19	3616	1410	1395	1395
q20	662	323	322	322
q21	5218	3362	4260	3362
q22	409	359	369	359
Total cold run time: 117355 ms
Total hot run time: 41270 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4650	4557	4617	4557
q2	373	288	277	277
q3	3176	2902	2910	2902
q4	2109	1658	1642	1642
q5	5416	5532	5513	5513
q6	228	129	131	129
q7	2225	1869	1881	1869
q8	3256	3448	3390	3390
q9	8702	8726	8716	8716
q10	4214	3880	3871	3871
q11	581	491	477	477
q12	777	611	618	611
q13	17078	3128	3136	3128
q14	330	270	279	270
q15	519	485	489	485
q16	483	422	427	422
q17	1847	1522	1475	1475
q18	8132	7750	7555	7555
q19	1782	1611	1653	1611
q20	3025	1796	1796	1796
q21	9556	4748	4870	4748
q22	668	529	519	519
Total cold run time: 79127 ms
Total hot run time: 55963 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169926 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d4e66ba48eb2867395497e904bc631e5beeca9c3, data reload: false

query1	907	382	374	374
query2	6456	2379	2359	2359
query3	6654	202	204	202
query4	19098	17527	17382	17382
query5	4126	472	460	460
query6	243	167	158	158
query7	4597	300	285	285
query8	315	286	288	286
query9	8664	2431	2401	2401
query10	429	301	264	264
query11	10602	10059	10214	10059
query12	142	94	94	94
query13	1630	369	370	369
query14	9684	7833	6528	6528
query15	221	193	186	186
query16	7776	268	262	262
query17	1348	519	537	519
query18	1951	271	293	271
query19	199	160	152	152
query20	94	86	90	86
query21	206	126	130	126
query22	4153	4007	4051	4007
query23	33888	33029	33235	33029
query24	11147	2867	2888	2867
query25	607	361	362	361
query26	945	159	158	158
query27	2320	311	321	311
query28	6306	2055	2071	2055
query29	855	624	617	617
query30	283	150	153	150
query31	945	768	763	763
query32	94	53	59	53
query33	764	288	279	279
query34	968	486	489	486
query35	747	655	593	593
query36	1087	911	922	911
query37	137	76	67	67
query38	2932	2817	2807	2807
query39	854	792	807	792
query40	206	128	126	126
query41	51	58	57	57
query42	117	98	96	96
query43	592	570	542	542
query44	1214	722	744	722
query45	201	169	171	169
query46	1083	721	710	710
query47	1856	1762	1805	1762
query48	378	293	289	289
query49	985	416	399	399
query50	782	389	401	389
query51	6841	6744	6633	6633
query52	108	89	94	89
query53	353	289	291	289
query54	893	461	456	456
query55	77	72	73	72
query56	285	293	256	256
query57	1112	1031	993	993
query58	250	253	237	237
query59	3463	3363	3344	3344
query60	290	274	273	273
query61	90	86	89	86
query62	644	457	457	457
query63	330	297	301	297
query64	8789	2295	1738	1738
query65	3212	3104	3131	3104
query66	887	322	325	322
query67	15215	14766	14960	14766
query68	5724	544	536	536
query69	606	454	364	364
query70	1076	1155	1148	1148
query71	439	282	280	280
query72	7396	2731	2548	2548
query73	768	328	324	324
query74	5982	5627	5699	5627
query75	3698	2642	2651	2642
query76	3134	978	857	857
query77	688	303	298	298
query78	10234	9811	9731	9731
query79	2034	512	510	510
query80	763	474	464	464
query81	540	218	223	218
query82	1354	108	104	104
query83	194	168	169	168
query84	251	87	85	85
query85	1313	287	265	265
query86	450	314	308	308
query87	3278	3142	3122	3122
query88	4084	2425	2437	2425
query89	507	400	383	383
query90	1812	197	195	195
query91	129	101	98	98
query92	62	50	49	49
query93	2125	510	504	504
query94	1189	203	207	203
query95	431	405	325	325
query96	590	266	274	266
query97	3149	3089	3002	3002
query98	247	222	217	217
query99	1294	858	864	858
Total cold run time: 270219 ms
Total hot run time: 169926 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.62 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d4e66ba48eb2867395497e904bc631e5beeca9c3, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.08	0.09
query5	0.51	0.49	0.49
query6	1.13	0.73	0.72
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.55	0.50	0.49
query10	0.54	0.55	0.54
query11	0.16	0.12	0.11
query12	0.15	0.12	0.12
query13	0.58	0.59	0.60
query14	0.79	0.77	0.78
query15	0.82	0.82	0.80
query16	0.36	0.37	0.36
query17	0.95	0.95	0.96
query18	0.21	0.25	0.22
query19	1.77	1.71	1.72
query20	0.02	0.01	0.00
query21	15.50	0.69	0.66
query22	4.14	7.75	1.87
query23	18.26	1.39	1.26
query24	1.44	0.37	0.23
query25	0.14	0.09	0.08
query26	0.26	0.17	0.17
query27	0.08	0.08	0.08
query28	13.27	1.03	1.01
query29	13.62	3.36	3.36
query30	0.24	0.06	0.07
query31	2.86	0.40	0.38
query32	3.27	0.47	0.47
query33	2.85	2.92	2.91
query34	17.18	4.38	4.44
query35	4.45	4.55	4.58
query36	0.69	0.46	0.46
query37	0.18	0.16	0.15
query38	0.16	0.14	0.15
query39	0.04	0.04	0.03
query40	0.17	0.13	0.15
query41	0.09	0.05	0.05
query42	0.05	0.05	0.04
query43	0.04	0.03	0.03
Total cold run time: 109.62 s
Total hot run time: 30.62 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

github-actions bot commented Jun 1, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Jun 1, 2024
Copy link
Contributor

github-actions bot commented Jun 1, 2024

PR approved by anyone and no changes requested.

@dataroaring dataroaring merged commit 9a82c9f into apache:master Jun 1, 2024
28 of 31 checks passed
xiaokang added a commit to xiaokang/doris that referenced this pull request Jun 1, 2024
xiaokang added a commit that referenced this pull request Jun 2, 2024
dataroaring pushed a commit that referenced this pull request Jun 4, 2024
…ict concurrent partial update (#35739)

## Proposed changes

Issue Number: close #xxx

1. In #30366 , in order to avoid that some incomplete delete bitmap left
in `txn_info->delete_bitmap` when publish failed, we make a copy of
`txn_info->delete_bitmap` before we start to compute the delete bitmap
2. this copy is not updated back to `txn_info->delete_bitmap` after
`rowset->rowset_meta()->merge_rowset_meta()` is successful
3. `txnManager::publish_txn()` saves the contents of
`txn_info->delete_bitmap` to RocksDB after the call to
`update_delete_bitmap()`, due to the issue in step 2, bitmap generated
during publish is not saved to RocksDB, so if BE restarts at this point,
this part of the incremental delete bitmap will be lost
4. it will result in duplicated keys on querying
seawinde pushed a commit to seawinde/doris that referenced this pull request Jun 5, 2024
…ict concurrent partial update (apache#35739)

## Proposed changes

Issue Number: close #xxx

1. In apache#30366 , in order to avoid that some incomplete delete bitmap left
in `txn_info->delete_bitmap` when publish failed, we make a copy of
`txn_info->delete_bitmap` before we start to compute the delete bitmap
2. this copy is not updated back to `txn_info->delete_bitmap` after
`rowset->rowset_meta()->merge_rowset_meta()` is successful
3. `txnManager::publish_txn()` saves the contents of
`txn_info->delete_bitmap` to RocksDB after the call to
`update_delete_bitmap()`, due to the issue in step 2, bitmap generated
during publish is not saved to RocksDB, so if BE restarts at this point,
this part of the incremental delete bitmap will be lost
4. it will result in duplicated keys on querying
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants