Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](memtable) do not reset arena in to_block() #46988

Closed
wants to merge 1 commit into from

Conversation

kaijchen
Copy link
Contributor

@kaijchen kaijchen commented Jan 14, 2025

What problem does this PR solve?

Issue Number: DORIS-18080

Related PR: #40912

Problem Summary:

Do not reset _arena in MemTable::to_block(), because it is still used in ~MemTable() when releasing agg places

Fix the following use-after-free

Use:

==3628099==ERROR: AddressSanitizer: heap-use-after-free on address 0x52100381be60 at pc 0x5648f30893f8 bp 0x7f8842433310 sp 0x7f8842433308
READ of size 8 at 0x52100381be60 thread T4767 (wg_flush_broker)
    #0 0x5648f30893f7 in phmap::priv::raw_hash_set<phmap::priv::FlatHashSetPolicy<unsigned long>, phmap::Hash<unsigned long>, phmap::EqualTo<unsigned long>, std::allocator<unsigned long>>::destroy_slots() doris/thirdparty/installed/include/parallel_hashmap/phmap.h:1992:14
    #1 0x5648f30936f6 in phmap::priv::raw_hash_set<phmap::priv::FlatHashSetPolicy<unsigned long>, phmap::Hash<unsigned long>, phmap::EqualTo<unsigned long>, std::allocator<unsigned long>>::~raw_hash_set() doris/thirdparty/installed/include/parallel_hashmap/phmap.h:1236:23
    #2 0x5648f3089276 in phmap::flat_hash_set<unsigned long, phmap::Hash<unsigned long>, phmap::EqualTo<unsigned long>, std::allocator<unsigned long>>::~flat_hash_set() doris/thirdparty/installed/include/parallel_hashmap/phmap.h:4577:7
    #3 0x5648f308922a in doris::BitmapValue::~BitmapValue() doris/be/src/util/bitmap_value.h:824:7
    #4 0x56490d319fa6 in doris::vectorized::AggregateFunctionBitmapData<doris::vectorized::AggregateFunctionBitmapUnionOp>::~AggregateFunctionBitmapData() doris/be/src/vec/aggregate_functions/aggregate_function_bitmap.h:127:8
    #5 0x56490d49636a in doris::vectorized::IAggregateFunctionDataHelper<doris::vectorized::AggregateFunctionBitmapData<doris::vectorized::AggregateFunctionBitmapUnionOp>, doris::vectorized::AggregateFunctionBitmapOp<doris::vectorized::AggregateFunctionBitmapUnionOp>>::destroy(char*) const doris/be/src/vec/aggregate_functions/aggregate_function.h:563:92
    #6 0x5648f68376e9 in doris::MemTable::~MemTable() doris/be/src/olap/memtable.cpp:159:27

Free:

0x52100381be60 is located 352 bytes inside of 4096-byte region [0x52100381bd00,0x52100381cd00)
freed by thread T4767 (wg_flush_broker) here:
    #0 0x5648f2f3ee46 in free (doris/output/be/lib/doris_be+0x57418e46) (BuildId: 298b9c91a1ec8fe0)
    #1 0x5648f3080dfc in DefaultMemoryAllocator::free(void*) doris/be/src/vec/common/allocator.h:108:41
    #2 0x5648f3080b3f in Allocator<false, false, false, DefaultMemoryAllocator>::free(void*, unsigned long) doris/be/src/vec/common/allocator.h:323:13
    #3 0x5648f30b6dee in doris::vectorized::Arena::Chunk::~Chunk() doris/be/src/vec/common/arena.h:77:31
    #4 0x5648f30b6d1f in doris::vectorized::Arena::~Arena() doris/be/src/vec/common/arena.h:151:16
    #5 0x5648f30b695a in std::default_delete<doris::vectorized::Arena>::operator()(doris::vectorized::Arena*) const env/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/unique_ptr.h:99:2
    #6 0x5648f30b67c8 in std::__uniq_ptr_impl<doris::vectorized::Arena, std::default_delete<doris::vectorized::Arena>>::reset(doris::vectorized::Arena*) env/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/unique_ptr.h:211:4
    #7 0x5648f30b5d8c in std::unique_ptr<doris::vectorized::Arena, std::default_delete<doris::vectorized::Arena>>::reset(doris::vectorized::Arena*) env/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/unique_ptr.h:509:7
    #8 0x5648f684253b in doris::MemTable::_to_block(std::unique_ptr<doris::vectorized::Block, std::default_delete<doris::vectorized::Block>>*) doris/be/src/olap/memtable.cpp:522:12
    #9 0x5648f6842ac5 in doris::MemTable::to_block(std::unique_ptr<doris::vectorized::Block, std::default_delete<doris::vectorized::Block>>*) doris/be/src/olap/memtable.cpp:528:5
    #10 0x5648f6907a72 in doris::FlushToken::_do_flush_memtable(doris::MemTable*, int, long*) doris/be/src/olap/memtable_flush_executor.cpp:144:9
    #11 0x5648f690932c in doris::FlushToken::_flush_memtable(std::shared_ptr<doris::MemTable>, int, long) doris/be/src/olap/memtable_flush_executor.cpp:183:16
    #12 0x5648f6915d18 in doris::MemtableFlushTask::run() doris/be/src/olap/memtable_flush_executor.cpp:60:20

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jan 14, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@kaijchen
Copy link
Contributor Author

run buildall

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 14, 2025
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 32870 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c3560ef15169b2248748aea063923ec0391c5c6a, data reload: false

------ Round 1 ----------------------------------
q1	17605	6167	6123	6123
q2	2059	313	175	175
q3	10552	1211	757	757
q4	10234	883	428	428
q5	8051	2203	2039	2039
q6	220	186	148	148
q7	909	762	599	599
q8	9250	1393	1200	1200
q9	5178	4930	4904	4904
q10	6828	2280	1844	1844
q11	493	284	250	250
q12	343	359	214	214
q13	17774	3697	3068	3068
q14	230	227	226	226
q15	565	534	533	533
q16	634	627	574	574
q17	562	859	324	324
q18	6906	6481	6536	6481
q19	1719	958	543	543
q20	306	306	184	184
q21	2774	2238	1952	1952
q22	361	340	304	304
Total cold run time: 103553 ms
Total hot run time: 32870 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6315	6295	6311	6295
q2	241	331	238	238
q3	2269	2662	2327	2327
q4	1407	1788	1396	1396
q5	4386	4729	4848	4729
q6	192	180	143	143
q7	2089	2044	1827	1827
q8	2638	2839	2732	2732
q9	7299	7257	7251	7251
q10	3072	3314	2751	2751
q11	571	505	496	496
q12	679	770	620	620
q13	3548	3862	3295	3295
q14	296	302	255	255
q15	588	552	511	511
q16	654	678	645	645
q17	1229	1781	1257	1257
q18	7788	7479	7289	7289
q19	781	1170	1086	1086
q20	1979	2039	1898	1898
q21	5648	5216	4957	4957
q22	608	613	568	568
Total cold run time: 54277 ms
Total hot run time: 52566 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 40.57% (10573/26062)
Line Coverage: 31.24% (89484/286439)
Region Coverage: 30.36% (45775/150764)
Branch Coverage: 26.65% (23274/87344)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c3560ef15169b2248748aea063923ec0391c5c6a_c3560ef15169b2248748aea063923ec0391c5c6a/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 195941 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c3560ef15169b2248748aea063923ec0391c5c6a, data reload: false

query1	1305	992	927	927
query2	6132	2027	2037	2027
query3	11009	4540	4475	4475
query4	61551	29638	23743	23743
query5	5481	614	466	466
query6	436	202	185	185
query7	5559	522	311	311
query8	340	236	226	226
query9	8349	2772	2767	2767
query10	466	309	246	246
query11	18132	15287	15516	15287
query12	162	111	113	111
query13	1419	562	431	431
query14	11246	7354	7159	7159
query15	215	200	189	189
query16	7327	666	474	474
query17	1170	729	563	563
query18	1872	402	315	315
query19	196	192	161	161
query20	114	110	138	110
query21	201	118	104	104
query22	4643	4630	4644	4630
query23	34333	33598	33386	33386
query24	5747	2338	2287	2287
query25	458	464	405	405
query26	644	280	157	157
query27	1925	474	328	328
query28	4232	2488	2459	2459
query29	537	556	440	440
query30	218	201	157	157
query31	958	872	832	832
query32	66	59	57	57
query33	451	359	316	316
query34	764	859	528	528
query35	797	846	759	759
query36	1031	1052	973	973
query37	121	100	80	80
query38	4205	4311	4184	4184
query39	1483	1484	1440	1440
query40	204	111	98	98
query41	50	51	48	48
query42	125	105	102	102
query43	512	544	503	503
query44	1379	863	856	856
query45	180	176	165	165
query46	878	1082	649	649
query47	1920	1989	1885	1885
query48	389	417	322	322
query49	727	491	389	389
query50	651	659	396	396
query51	7075	7168	7069	7069
query52	103	102	93	93
query53	225	256	196	196
query54	470	507	418	418
query55	83	79	88	79
query56	264	273	242	242
query57	1212	1234	1185	1185
query58	259	232	243	232
query59	3090	3166	2905	2905
query60	268	261	254	254
query61	118	111	113	111
query62	789	777	698	698
query63	224	189	189	189
query64	1304	1009	675	675
query65	3278	3168	3156	3156
query66	722	403	310	310
query67	16066	15767	15664	15664
query68	4491	825	528	528
query69	493	288	263	263
query70	1206	1133	1139	1133
query71	395	275	255	255
query72	6080	3824	3791	3791
query73	758	774	358	358
query74	10340	9000	8829	8829
query75	3225	3148	2661	2661
query76	3551	1178	751	751
query77	460	359	280	280
query78	10166	10080	9463	9463
query79	2769	804	604	604
query80	1699	547	437	437
query81	574	275	239	239
query82	350	154	130	130
query83	266	165	155	155
query84	289	99	83	83
query85	774	367	291	291
query86	518	306	305	305
query87	4488	4501	4291	4291
query88	3583	2230	2179	2179
query89	402	322	284	284
query90	1616	186	187	186
query91	131	143	115	115
query92	67	57	56	56
query93	2747	881	525	525
query94	747	413	302	302
query95	322	270	251	251
query96	492	627	284	284
query97	2812	2862	2749	2749
query98	235	198	199	198
query99	1396	1509	1381	1381
Total cold run time: 315302 ms
Total hot run time: 195941 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.29 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c3560ef15169b2248748aea063923ec0391c5c6a, data reload: false

query1	0.03	0.03	0.04
query2	0.07	0.03	0.04
query3	0.24	0.07	0.06
query4	1.62	0.10	0.11
query5	0.39	0.42	0.40
query6	1.13	0.65	0.65
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.59	0.51	0.51
query10	0.56	0.56	0.55
query11	0.15	0.10	0.10
query12	0.15	0.12	0.11
query13	0.61	0.61	0.60
query14	2.74	2.83	2.71
query15	0.90	0.83	0.83
query16	0.39	0.35	0.38
query17	1.06	1.05	1.04
query18	0.23	0.21	0.20
query19	1.87	1.91	2.01
query20	0.02	0.01	0.02
query21	15.38	0.95	0.60
query22	0.75	0.72	0.82
query23	15.18	1.48	0.64
query24	2.94	1.73	1.77
query25	0.13	0.24	0.20
query26	0.25	0.15	0.14
query27	0.04	0.06	0.05
query28	14.47	1.45	1.04
query29	12.59	3.94	3.24
query30	0.26	0.09	0.07
query31	2.83	0.60	0.38
query32	3.23	0.56	0.46
query33	2.97	3.10	3.02
query34	16.86	5.24	4.47
query35	4.50	4.52	4.49
query36	0.64	0.49	0.50
query37	0.09	0.06	0.06
query38	0.04	0.03	0.03
query39	0.04	0.02	0.02
query40	0.15	0.14	0.13
query41	0.08	0.03	0.03
query42	0.04	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 106.3 s
Total hot run time: 32.29 s

@wm1581066 wm1581066 added the p0_c label Jan 14, 2025
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kaijchen
Copy link
Contributor Author

superseded by #46997

@kaijchen kaijchen closed this Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. p0_c reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants