Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiFlash may fail to restart after FAP is enabled #9307

Closed
JaySon-Huang opened this issue Aug 9, 2024 · 4 comments · Fixed by #9319
Closed

TiFlash may fail to restart after FAP is enabled #9307

JaySon-Huang opened this issue Aug 9, 2024 · 4 comments · Fixed by #9319
Labels
component/storage severity/major type/bug The issue is confirmed as a bug.

Comments

@JaySon-Huang
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

[2024/08/01 02:02:17.654 +00:00] [ERROR] [Exception.cpp:96] ["Code: 49, e.displayText() = DB::Exception: Check last_iter != entries.end() && last_iter->second.isEntry() 
failed: {type:VAR_ENTRY, create_ver: 0.0, is_deleted: false, delete_ver: 0.0, ori_page_id: 0x.0, being_ref_count: 1, num_entries: 2}:  
[type=UPDATE_DATA_FROM_REMOTE] [page_id=0x78000001746C0000000000000096.15253] [ver=3267624.0], e.what() = DB::Exception, Stack trace:


  0x629b32a8d4c3        StackTrace::StackTrace() [tiflash+34882755]
                        dbms/src/Common/StackTrace.cpp:23
  0x629b32a8abd6        DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, int) [tiflash+34872278]
                        dbms/src/Common/Exception.h:46
  0x629b394a67e3        DB::PS::V3::VersionedPageEntries<DB::PS::V3::universal::PageDirectoryTrait>::updateLocalCacheForRemotePage(DB::PS::V3::PageVersion const&, DB::PS::V3::PageEntryV3 const&) [tiflash+146135011]
                        dbms/src/Storages/Page/V3/PageDirectory.cpp:331
  0x629b394cbc82        DB::PS::V3::PageDirectoryFactory<DB::PS::V3::universal::FactoryTrait>::applyRecord(std::__1::unique_ptr<DB::PS::V3::PageDirectory<DB::PS::V3::universal::PageDirectoryTrait>, std::__1::default_delete<DB::PS::V3::PageDirectory<DB::PS::V3::universal::PageDirectoryTrait>>> const&, DB::PS::V3::PageEntriesEdit<DB::UniversalPageId>::EditRecord const&, bool) [tiflash+146287746]
                        dbms/src/Storages/Page/V3/PageDirectoryFactory.cpp:299
  0x629b394cabfe        DB::PS::V3::PageDirectoryFactory<DB::PS::V3::universal::FactoryTrait>::loadEdit(std::__1::unique_ptr<DB::PS::V3::PageDirectory<DB::PS::V3::universal::PageDirectoryTrait>, std::__1::default_delete<DB::PS::V3::PageDirectory<DB::PS::V3::universal::PageDirectoryTrait>>> const&, DB::PS::V3::PageEntriesEdit<DB::UniversalPageId> const&, bool, unsigned long) [tiflash+146283518]
                        dbms/src/Storages/Page/V3/PageDirectoryFactory.cpp:0
  0x629b394ca1f9        DB::PS::V3::PageDirectoryFactory<DB::PS::V3::universal::FactoryTrait>::loadFromDisk(std::__1::unique_ptr<DB::PS::V3::PageDirectory<DB::PS::V3::universal::PageDirectoryTrait>, std::__1::default_delete<DB::PS::V3::PageDirectory<DB::PS::V3::universal::PageDirectoryTrait>>> const&, std::__1::shared_ptr<DB::PS::V3::WALStoreReader>&&) [tiflash+146280953]
                        dbms/src/Storages/Page/V3/PageDirectoryFactory.cpp:408
  0x629b394c9cb9        DB::PS::V3::PageDirectoryFactory<DB::PS::V3::universal::FactoryTrait>::createFromReader(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::PS::V3::WALStoreReader>, std::__1::unique_ptr<DB::PS::V3::WALStore, std::__1::default_delete<DB::PS::V3::WALStore>>) [tiflash+146279609]
                        dbms/src/Storages/Page/V3/PageDirectoryFactory.cpp:55
  0x629b394c9b85        DB::PS::V3::PageDirectoryFactory<DB::PS::V3::universal::FactoryTrait>::create(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::FileProvider>&, std::__1::shared_ptr<DB::PSDiskDelegator>&, DB::PS::V3::WALConfig const&) [tiflash+146279301]
                        dbms/src/Storages/Page/V3/PageDirectoryFactory.cpp:45
  0x629b39516bcd        DB::UniversalPageStorage::restore() [tiflash+146594765]
                        dbms/src/Storages/Page/V3/Universal/UniversalPageStorage.cpp:89
  0x629b3952c56f        DB::UniversalPageStorageService::create(DB::Context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::PSDiskDelegator>, DB::PageStorageConfig const&) [tiflash+146683247]
                        dbms/src/Storages/Page/V3/Universal/UniversalPageStorageService.cpp:57
  0x629b384b71ad        DB::Context::initializeWriteNodePageStorageIfNeed(DB::PathPool const&) [tiflash+129425837]
                        dbms/src/Interpreters/Context.cpp:1927
  0x629b32b17781        DB::Server::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&) [tiflash+35448705]
                        dbms/src/Server/Server.cpp:1362
  0x629b3a41503a        Poco::Util::Application::run() [tiflash+162316346]
                        contrib/poco/Util/src/Application.cpp:335
  0x629b32b12574        DB::Server::run() [tiflash+35427700]
                        dbms/src/Server/Server.cpp:263
  0x629b3a4278b7        Poco::Util::ServerApplication::run(int, char**) [tiflash+162392247]
                        contrib/poco/Util/src/ServerApplication.cpp:618
  0x629b32b1eae6        mainEntryClickHouseServer(int, char**) [tiflash+35478246]
                        dbms/src/Server/Server.cpp:1947
  0x629b32a5e83b        main [tiflash+34691131]
                        dbms/src/Server/main.cpp:173
  0x7edc67829d90        <unknown symbol> [libc.so.6+171408]
  0x7edc67829e40        __libc_start_main [libc.so.6+171584]
  0x629b32a5da25        _start [tiflash+34687525]"] [source="void DB::Context::initializeWriteNodePageStorageIfNeed(const DB::PathPool &)"] [thread_id=1]

4. What is your TiFlash version? (Required)

based on v7.5.3

@JaySon-Huang
Copy link
Contributor Author

Mark it as major because

  • It happens under disagg arch and FAP enabled
  • FAP is not released as GA yet

ti-chi-bot bot added a commit that referenced this issue Aug 12, 2024
…cheForRemotePage (#9311)

ref #9307

Co-authored-by: JaySon-Huang <tshent@qq.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Co-authored-by: JaySon <tshent@qq.com>
@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Aug 13, 2024

[2024/08/13 09:45:23.590 +00:00] [ERROR] [Exception.cpp:96] ["Code: 49, e.displayText() = DB::Exception: Check last_iter != entries.end() && last_iter->second.isEntry() failed:
 this={type:VAR_ENTRY, create_ver: 0.0, is_deleted: false, delete_ver: 0.0, ori_page_id: 0x.0, being_ref_count: 1, num_entries: 2},
 entries={676072.0: {is_delete:false, entry:optional(PageEntry{file: 0, offset: 0x0, size: 4264565, checksum: 0x0, tag: 0, field_offsets: [0,38,2208,5409,8610],
                     checkpoint_info: {local_data_reclaimed: true, data_location: {data_file_id: lock/s152/dat_110_1.lock_s153_115, offset_in_file: 86418244, size_in_file: 4264565}}}), being_ref_count:7}, 
          685265.0: {is_delete:true, entry:none, being_ref_count:1}
          },
 ver=693272.0,
 entry=PageEntry{file: 310, offset: 0x4491AE8, size: 4264565, checksum: 0x818273C89AB5736E, tag: 0, field_offsets: [], checkpoint_info: invalid}:
 [type=UPDATE_DATA_FROM_REMOTE] [page_id=0x78000001746C0000000000000060.3497] [ver=729397.0], e.what() = DB::Exception, Stack trace:
  0x561b29c75173    StackTrace::StackTrace() [tiflash+34484595]
                    dbms/src/Common/StackTrace.cpp:23
  0x561b29c72846    DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, int) [tiflash+34474054]
                    dbms/src/Common/Exception.h:46
  0x561b3046cd37    DB::PS::V3::VersionedPageEntries<DB::PS::V3::universal::PageDirectoryTrait>::updateLocalCacheForRemotePage(DB::PS::V3::PageVersion const&, DB::PS::V3::PageEntryV3 const&) [tiflash+143502647]
                    dbms/src/Storages/Page/V3/PageDirectory.cpp:332
  0x561b30493156    DB::PS::V3::PageDirectoryFactory<DB::PS::V3::universal::FactoryTrait>::applyRecord(std::__1::unique_ptr<DB::PS::V3::PageDirectory<DB::PS::V3::universal::PageDirectoryTrait>, std::__1::default_delete<DB::PS::V3::PageDirectory<DB::PS::V3::universal::PageDirectoryTrait>>> const&, DB::PS::V3::PageEntriesEdit<DB::UniversalPageId>::EditRecord const&, bool) [tiflash+143659350]
                    dbms/src/Storages/Page/V3/PageDirectoryFactory.cpp:299
  0x561b30492072    DB::PS::V3::PageDirectoryFactory<DB::PS::V3::universal::FactoryTrait>::loadEdit(std::__1::unique_ptr<DB::PS::V3::PageDirectory<DB::PS::V3::universal::PageDirectoryTrait>, std::__1::default_delete<DB::PS::V3::PageDirectory<DB::PS::V3::universal::PageDirectoryTrait>>> const&, DB::PS::V3::PageEntriesEdit<DB::UniversalPageId> const&, bool, unsigned long) [tiflash+143655026]
                    dbms/src/Storages/Page/V3/PageDirectoryFactory.cpp:0
  0x561b3049165c    DB::PS::V3::PageDirectoryFactory<DB::PS::V3::universal::FactoryTrait>::loadFromDisk(std::__1::unique_ptr<DB::PS::V3::PageDirectory<DB::PS::V3::universal::PageDirectoryTrait>, std::__1::default_delete<DB::PS::V3::PageDirectory<DB::PS::V3::universal::PageDirectoryTrait>>> const&, std::__1::shared_ptr<DB::PS::V3::WALStoreReader>&&) [tiflash+143652444]
                    dbms/src/Storages/Page/V3/PageDirectoryFactory.cpp:408
  0x561b304910f9    DB::PS::V3::PageDirectoryFactory<DB::PS::V3::universal::FactoryTrait>::createFromReader(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::PS::V3::WALStoreReader>, std::__1::unique_ptr<DB::PS::V3::WALStore, std::__1::default_delete<DB::PS::V3::WALStore>>) [tiflash+143651065]
                    dbms/src/Storages/Page/V3/PageDirectoryFactory.cpp:55
  0x561b30490fb7    DB::PS::V3::PageDirectoryFactory<DB::PS::V3::universal::FactoryTrait>::create(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::FileProvider>&, std::__1::shared_ptr<DB::PSDiskDelegator>&, DB::PS::V3::WALConfig const&) [tiflash+143650743]
                    dbms/src/Storages/Page/V3/PageDirectoryFactory.cpp:45
  0x561b304e000d    DB::UniversalPageStorage::restore() [tiflash+143974413]
                    dbms/src/Storages/Page/V3/Universal/UniversalPageStorage.cpp:89
  0x561b304f659a    DB::UniversalPageStorageService::create(DB::Context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::PSDiskDelegator>, DB::PageStorageConfig const&) [tiflash+144065946]
                    dbms/src/Storages/Page/V3/Universal/UniversalPageStorageService.cpp:57
  0x561b2f43177c    DB::Context::initializeWriteNodePageStorageIfNeed(DB::PathPool const&) [tiflash+126482300]
                    dbms/src/Interpreters/Context.cpp:1927
  0x561b29cffdf9    DB::Server::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&) [tiflash+35053049]
                    dbms/src/Server/Server.cpp:1362
  0x561b3143bdb8    Poco::Util::Application::run() [tiflash+160079288]
                    contrib/poco/Util/src/Application.cpp:335
  0x561b29cfac15    DB::Server::run() [tiflash+35032085]
                    dbms/src/Server/Server.cpp:263
  0x561b29d07136    mainEntryClickHouseServer(int, char**) [tiflash+35082550]
                    dbms/src/Server/Server.cpp:1947
  0x561b29c46ff8    main [tiflash+34295800]
                    dbms/src/Server/main.cpp:173
  0x7ff63d029d90    <unknown symbol> [libc.so.6+171408]
  0x7ff63d029e40    __libc_start_main [libc.so.6+171584]
  0x561b29c46225    _start [tiflash+34292261]"] [source="void DB::Context::initializeWriteNodePageStorageIfNeed(const PathPool &)"] [thread_id=1]

@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Aug 13, 2024

> tiflash pagectl -V 4 --mode=5 -P /tidb-deploy/tiflash-9000/jayson_debug/write/ > all_entries.log 2>&1

--- The page_id '0x78000001746C0000000000000060.3497' actually is a ref to '0x78000001746C0000000000000060.3359'
> grep -e 'Open' -e '0x78000001746C0000000000000060.3497' all_entries.log
[2024/08/13 09:50:37.295 +00:00] [DEBUG] [WALReader.cpp:177] ["Open log file for reading, file=/tidb-deploy/tiflash-9000/jayson_debug/write//wal/log_34_1_710960 size=241171"] [source=PageStorageControlV3] [thread_id=1]
[2024/08/13 09:50:37.376 +00:00] [INFO] [PageDirectoryFactory.cpp:225] ["{type:VAR_REF, page_id:0x78000001746C0000000000000060.3497, ori_id:0x78000001746C0000000000000060.3359, version:693272.0, entry:PageEntry{file: 0, offset: 0x0, size: 0, checksum: 0x0, tag: 0, field_offsets: [], checkpoint_info: invalid}, being_ref_count:1}"] [thread_id=1]
[2024/08/13 09:50:37.378 +00:00] [DEBUG] [WALReader.cpp:177] ["Open log file for reading, file=/tidb-deploy/tiflash-9000/jayson_debug/write//wal/log_35_0 size=2119599"] [source=PageStorageControlV3] [thread_id=1]
[2024/08/13 09:50:37.526 +00:00] [INFO] [PageDirectoryFactory.cpp:225] ["{type:UPDATE_DATA_FROM_REMOTE, page_id:0x78000001746C0000000000000060.3497, ori_id:0x.0, version:729397.0, entry:PageEntry{file: 310, offset: 0x4491AE8, size: 4264565, checksum: 0x818273C89AB5736E, tag: 0, field_offsets: [], checkpoint_info: invalid}, being_ref_count:1}"] [thread_id=1]
[2024/08/13 09:50:37.528 +00:00] [INFO] [PageDirectoryFactory.cpp:225] ["{type:UPDATE_DATA_FROM_REMOTE, page_id:0x78000001746C0000000000000060.3497, ori_id:0x.0, version:729403.0, entry:PageEntry{file: 310, offset: 0x48A2D5D, size: 4264565, checksum: 0x818273C89AB5736E, tag: 0, field_offsets: [], checkpoint_info: invalid}, being_ref_count:1}"] [thread_id=1]
[2024/08/13 09:50:37.582 +00:00] [DEBUG] [WALReader.cpp:177] ["Open log file for reading, file=/tidb-deploy/tiflash-9000/jayson_debug/write//wal/log_36_0 size=2174444"] [source=PageStorageControlV3] [thread_id=1]
[2024/08/13 09:50:37.748 +00:00] [INFO] [PageDirectoryFactory.cpp:225] ["{type:UPDATE_DATA_FROM_REMOTE, page_id:0x78000001746C0000000000000060.3497, ori_id:0x.0, version:756391.0, entry:PageEntry{file: 410, offset: 0xA26E689, size: 4264565, checksum: 0x818273C89AB5736E, tag: 0, field_offsets: [], checkpoint_info: invalid}, being_ref_count:1}"] [thread_id=1]
[2024/08/13 09:50:37.818 +00:00] [DEBUG] [WALReader.cpp:177] ["Open log file for reading, file=/tidb-deploy/tiflash-9000/jayson_debug/write//wal/log_37_0 size=609806"] [source=PageStorageControlV3] [thread_id=1]
[2024/08/13 09:50:37.876 +00:00] [INFO] [PageDirectoryFactory.cpp:225] ["{type:DEL    , page_id:0x78000001746C0000000000000060.3497, ori_id:0x.0, version:773317.0, entry:PageEntry{file: 0, offset: 0x0, size: 0, checksum: 0x0, tag: 0, field_offsets: [], checkpoint_info: invalid}, being_ref_count:1}"] [thread_id=1]

--- The page_id '0x78000001746C0000000000000060.3359'
---  created @ ver=676072, epoch=0
---  ref by .3439 @ ver=685132
---  ref by .3441 @ ver=685134
---  deleted @ ver=685265
---  ref by .3493 @ ver=693249
---  ref by .3495 @ ver=693251
---  ref by .3497 @ ver=693272
---  ref by .3499 @ ver=693274
--- Then 'xxx.3497' get read, UPDATE_DATA_FROM_REMOTE @ ver=729397
> grep -e 'Open' -e '0x78000001746C0000000000000060.3359' all_entries.log
[2024/08/13 09:50:37.295 +00:00] [DEBUG] [WALReader.cpp:177] ["Open log file for reading, file=/tidb-deploy/tiflash-9000/jayson_debug/write//wal/log_34_1_710960 size=241171"] [source=PageStorageControlV3] [thread_id=1]
[2024/08/13 09:50:37.374 +00:00] [INFO] [PageDirectoryFactory.cpp:225] ["{type:VAR_ENT, page_id:0x78000001746C0000000000000060.3359, ori_id:0x.0, version:676072.0, entry:PageEntry{file: 0, offset: 0x0, size: 4264565, checksum: 0x0, tag: 0, field_offsets: [0,38,2208,5409,8610], checkpoint_info: {local_data_reclaimed: true, data_location: {data_file_id: lock/s152/dat_110_1.lock_s153_115, offset_in_file: 86418244, size_in_file: 4264565}}}, being_ref_count:7}"] [thread_id=1]
[2024/08/13 09:50:37.374 +00:00] [INFO] [PageDirectoryFactory.cpp:225] ["{type:VAR_DEL, page_id:0x78000001746C0000000000000060.3359, ori_id:0x.0, version:685265.0, entry:PageEntry{file: 0, offset: 0x0, size: 0, checksum: 0x0, tag: 0, field_offsets: [], checkpoint_info: invalid}, being_ref_count:1}"] [thread_id=1]
[2024/08/13 09:50:37.375 +00:00] [INFO] [PageDirectoryFactory.cpp:225] ["{type:VAR_REF, page_id:0x78000001746C0000000000000060.3439, ori_id:0x78000001746C0000000000000060.3359, version:685132.0, entry:PageEntry{file: 0, offset: 0x0, size: 0, checksum: 0x0, tag: 0, field_offsets: [], checkpoint_info: invalid}, being_ref_count:1}"] [thread_id=1]
[2024/08/13 09:50:37.375 +00:00] [INFO] [PageDirectoryFactory.cpp:225] ["{type:VAR_REF, page_id:0x78000001746C0000000000000060.3441, ori_id:0x78000001746C0000000000000060.3359, version:685134.0, entry:PageEntry{file: 0, offset: 0x0, size: 0, checksum: 0x0, tag: 0, field_offsets: [], checkpoint_info: invalid}, being_ref_count:1}"] [thread_id=1]
[2024/08/13 09:50:37.376 +00:00] [INFO] [PageDirectoryFactory.cpp:225] ["{type:VAR_REF, page_id:0x78000001746C0000000000000060.3493, ori_id:0x78000001746C0000000000000060.3359, version:693249.0, entry:PageEntry{file: 0, offset: 0x0, size: 0, checksum: 0x0, tag: 0, field_offsets: [], checkpoint_info: invalid}, being_ref_count:1}"] [thread_id=1]
[2024/08/13 09:50:37.376 +00:00] [INFO] [PageDirectoryFactory.cpp:225] ["{type:VAR_REF, page_id:0x78000001746C0000000000000060.3495, ori_id:0x78000001746C0000000000000060.3359, version:693251.0, entry:PageEntry{file: 0, offset: 0x0, size: 0, checksum: 0x0, tag: 0, field_offsets: [], checkpoint_info: invalid}, being_ref_count:1}"] [thread_id=1]
[2024/08/13 09:50:37.376 +00:00] [INFO] [PageDirectoryFactory.cpp:225] ["{type:VAR_REF, page_id:0x78000001746C0000000000000060.3497, ori_id:0x78000001746C0000000000000060.3359, version:693272.0, entry:PageEntry{file: 0, offset: 0x0, size: 0, checksum: 0x0, tag: 0, field_offsets: [], checkpoint_info: invalid}, being_ref_count:1}"] [thread_id=1]
[2024/08/13 09:50:37.376 +00:00] [INFO] [PageDirectoryFactory.cpp:225] ["{type:VAR_REF, page_id:0x78000001746C0000000000000060.3499, ori_id:0x78000001746C0000000000000060.3359, version:693274.0, entry:PageEntry{file: 0, offset: 0x0, size: 0, checksum: 0x0, tag: 0, field_offsets: [], checkpoint_info: invalid}, being_ref_count:1}"] [thread_id=1]
[2024/08/13 09:50:37.378 +00:00] [DEBUG] [WALReader.cpp:177] ["Open log file for reading, file=/tidb-deploy/tiflash-9000/jayson_debug/write//wal/log_35_0 size=2119599"] [source=PageStorageControlV3] [thread_id=1]
[2024/08/13 09:50:37.582 +00:00] [DEBUG] [WALReader.cpp:177] ["Open log file for reading, file=/tidb-deploy/tiflash-9000/jayson_debug/write//wal/log_36_0 size=2174444"] [source=PageStorageControlV3] [thread_id=1]
[2024/08/13 09:50:37.818 +00:00] [DEBUG] [WALReader.cpp:177] ["Open log file for reading, file=/tidb-deploy/tiflash-9000/jayson_debug/write//wal/log_37_0 size=609806"] [source=PageStorageControlV3] [thread_id=1]

tiflash.log.gz
all_entries.log.tar.gz

@JaySon-Huang
Copy link
Contributor Author

kind of similar to #5570

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/storage severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants