Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support local indexes #25

Open
OmerCerrahoglu opened this issue Jan 27, 2018 · 1 comment
Open

Support local indexes #25

OmerCerrahoglu opened this issue Jan 27, 2018 · 1 comment
Assignees
Labels
kind/new-feature This is a request for a completely new feature

Comments

@OmerCerrahoglu
Copy link
Contributor

OmerCerrahoglu commented Jan 27, 2018

Implement local indexes.

Motivation

Provide better performance for queries with WHERE conditions that fully specify hash keys.

Requirements

A local secondary index is a distributed index. It has the following characteristics:

  • It is split and co-located with the data in the individual partitions
  • Index update is a local transaction (not distributed across nodes) and is very efficient.
  • Index lookup is a local operation (not a distributed txn) and is also very efficient
  • Designed for use as index of the data within the partition (i.e. local index). Thus the queries as expected to specify the entire hash partition key(s) to take advantage of this index.
  • Queries using the index alone (without specifying the hash partition keys) become full-table (cluster) scan

Initial plan is to implement this as:

  • Single indexed column only.
  • Can index on keys/elements of a set/list/map

Syntax

Auto-inferred when the index is created using the same hash partition keys as the table.

CREATE INDEX [IF NOT EXISTS] index_name
    ON [keyspace_name.]table_name ( (index_hash_column, ...), index_cluster_column, ... )
    [WITH CLUSTERING ORDER BY (index_cluster_column { ASC | DESC }, ...)]
    [COVERING (covered_column, ...)]
i((hash_columns (from original)), 

Note: we might not want/need to have covering columns, but I guess it can’t hurt (though I am currently not planning on using them in any way, though it shouldn’t be too hard to make use of them: just have the semantic analyzer understand if he index is enough to answer the query)
Note: as the local index already uses the hash keys from the original table as hash keys, we might want not to allow any additional hash keys, but allowing it should not break anything.

Allow additional:

  • t(h1 h2 r r2 c1 c2)
  • i((h1 h2), c1 r1 r2
@rkarthik007 rkarthik007 added the kind/enhancement This is an enhancement of an existing feature label Jan 27, 2018
@kmuthukk
Copy link
Collaborator

moving this to your queue Robert for now.

@rkarthik007 rkarthik007 added kind/new-feature This is a request for a completely new feature and removed kind/enhancement This is an enhancement of an existing feature labels Apr 8, 2018
mbautin pushed a commit that referenced this issue Jun 20, 2019
jasonyb pushed a commit that referenced this issue Dec 15, 2023
Summary:
YB Seq Scan code path is not hit because Foreign Scan is the default and
pg hint plan does not work.  Upcoming merge with YB master will bring in
master commit 465ee2c which changes the
default to YB Seq Scan.

To test YB Seq Scan, a temporary patch is needed (see the test plan).
With that, two bugs are encountered: fix them.

1. FailedAssertion("TTS_IS_VIRTUAL(slot)"

   On simple test case

       create table t (i int primary key, j int);
       select * from t;

   get

       TRAP: FailedAssertion("TTS_IS_VIRTUAL(slot)", File: "../../../../../../../src/postgres/src/backend/access/yb_access/yb_scan.c", Line: 3473, PID: 2774450)

   Details:

       #0  0x00007fd52616eacf in raise () from /lib64/libc.so.6
       #1  0x00007fd526141ea5 in abort () from /lib64/libc.so.6
       #2  0x0000000000af33ad in ExceptionalCondition (conditionName=conditionName@entry=0xc2938d "TTS_IS_VIRTUAL(slot)", errorType=errorType@entry=0xc01498 "FailedAssertion",
           fileName=fileName@entry=0xc28f18 "../../../../../../../src/postgres/src/backend/access/yb_access/yb_scan.c", lineNumber=lineNumber@entry=3473)
           at ../../../../../../../src/postgres/src/backend/utils/error/assert.c:69
       #3  0x00000000005c26bd in ybFetchNext (handle=0x2600ffc43680, slot=slot@entry=0x2600ff6c2980, relid=16384)
           at ../../../../../../../src/postgres/src/backend/access/yb_access/yb_scan.c:3473
       #4  0x00000000007de444 in YbSeqNext (node=0x2600ff6c2778) at ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:156
       #5  0x000000000078b3c6 in ExecScanFetch (node=node@entry=0x2600ff6c2778, accessMtd=accessMtd@entry=0x7de2b9 <YbSeqNext>, recheckMtd=recheckMtd@entry=0x7de26e <YbSeqRecheck>)
           at ../../../../../../src/postgres/src/backend/executor/execScan.c:133
       #6  0x000000000078b44e in ExecScan (node=0x2600ff6c2778, accessMtd=accessMtd@entry=0x7de2b9 <YbSeqNext>, recheckMtd=recheckMtd@entry=0x7de26e <YbSeqRecheck>)
           at ../../../../../../src/postgres/src/backend/executor/execScan.c:182
       #7  0x00000000007de298 in ExecYbSeqScan (pstate=<optimized out>) at ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:191
       #8  0x00000000007871ef in ExecProcNodeFirst (node=0x2600ff6c2778) at ../../../../../../src/postgres/src/backend/executor/execProcnode.c:480
       #9  0x000000000077db0e in ExecProcNode (node=0x2600ff6c2778) at ../../../../../../src/postgres/src/include/executor/executor.h:285
       #10 ExecutePlan (execute_once=<optimized out>, dest=0x2600ff6b1a10, direction=<optimized out>, numberTuples=0, sendTuples=true, operation=CMD_SELECT,
           use_parallel_mode=<optimized out>, planstate=0x2600ff6c2778, estate=0x2600ff6c2128) at ../../../../../../src/postgres/src/backend/executor/execMain.c:1650
       #11 standard_ExecutorRun (queryDesc=0x2600ff675128, direction=<optimized out>, count=0, execute_once=<optimized out>)
           at ../../../../../../src/postgres/src/backend/executor/execMain.c:367
       #12 0x000000000077dbfe in ExecutorRun (queryDesc=queryDesc@entry=0x2600ff675128, direction=direction@entry=ForwardScanDirection, count=count@entry=0, execute_once=<optimized out>)
           at ../../../../../../src/postgres/src/backend/executor/execMain.c:308
       #13 0x0000000000982617 in PortalRunSelect (portal=portal@entry=0x2600ff90e128, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x2600ff6b1a10)
           at ../../../../../../src/postgres/src/backend/tcop/pquery.c:954
       #14 0x000000000098433c in PortalRun (portal=portal@entry=0x2600ff90e128, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true,
           dest=dest@entry=0x2600ff6b1a10, altdest=altdest@entry=0x2600ff6b1a10, qc=0x7fffc14a13c0) at ../../../../../../src/postgres/src/backend/tcop/pquery.c:786
       #15 0x000000000097e65b in exec_simple_query (query_string=0x2600ffdc6128 "select * from t;") at ../../../../../../src/postgres/src/backend/tcop/postgres.c:1321
       #16 yb_exec_simple_query_impl (query_string=query_string@entry=0x2600ffdc6128) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5060
       #17 0x000000000097b7a5 in yb_exec_query_wrapper_one_attempt (exec_context=exec_context@entry=0x2600ffdc6000, restart_data=restart_data@entry=0x7fffc14a1640,
           functor=functor@entry=0x97e033 <yb_exec_simple_query_impl>, functor_context=functor_context@entry=0x2600ffdc6128, attempt=attempt@entry=0, retry=retry@entry=0x7fffc14a15ff)
           at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5028
       #18 0x000000000097d077 in yb_exec_query_wrapper (exec_context=exec_context@entry=0x2600ffdc6000, restart_data=restart_data@entry=0x7fffc14a1640,
           functor=functor@entry=0x97e033 <yb_exec_simple_query_impl>, functor_context=functor_context@entry=0x2600ffdc6128)
           at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5052
       #19 0x000000000097d0ca in yb_exec_simple_query (query_string=query_string@entry=0x2600ffdc6128 "select * from t;", exec_context=exec_context@entry=0x2600ffdc6000)
           at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5075
       #20 0x000000000097fe8a in PostgresMain (dbname=<optimized out>, username=<optimized out>) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5794
       #21 0x00000000008c8354 in BackendRun (port=0x2600ff8423c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4791
       #22 BackendStartup (port=0x2600ff8423c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4491
       #23 ServerLoop () at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1878
       #24 0x00000000008caa55 in PostmasterMain (argc=argc@entry=25, argv=argv@entry=0x2600ffdc01a0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1533
       #25 0x0000000000804ba8 in PostgresServerProcessMain (argc=25, argv=0x2600ffdc01a0) at ../../../../../../src/postgres/src/backend/main/main.c:208
       #26 0x0000000000804bc8 in main ()

       3469    ybFetchNext(YBCPgStatement handle,
       3470                            TupleTableSlot *slot, Oid relid)
       3471    {
       3472            Assert(slot != NULL);
       3473            Assert(TTS_IS_VIRTUAL(slot));

       (gdb) p *slot
       $2 = {type = T_TupleTableSlot, tts_flags = 18, tts_nvalid = 0, tts_ops = 0xeaf5e0 <TTSOpsHeapTuple>, tts_tupleDescriptor = 0x2600ff6416c0, tts_values = 0x2600ff6c2a00, tts_isnull = 0x2600ff6c2a10, tts_mcxt = 0x2600ff6c2000, tts_tid = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 0, yb_item = {ybctid = 0}}, tts_tableOid = 0, tts_yb_insert_oid = 0}

   Fix by making YB Seq Scan always use virtual slot.  This is similar
   to what is done for YB Foreign Scan.

2. segfault in ending scan

   Same simple test case gives segfault at a later stage.

   Details:

       #0  0x00000000007de762 in table_endscan (scan=0x3debfe3ab88) at ../../../../../../src/postgres/src/include/access/tableam.h:997
       #1  ExecEndYbSeqScan (node=node@entry=0x3debfe3a778) at ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:298
       #2  0x0000000000787a75 in ExecEndNode (node=0x3debfe3a778) at ../../../../../../src/postgres/src/backend/executor/execProcnode.c:649
       #3  0x000000000077ffaf in ExecEndPlan (estate=0x3debfe3a128, planstate=<optimized out>) at ../../../../../../src/postgres/src/backend/executor/execMain.c:1489
       #4  standard_ExecutorEnd (queryDesc=0x2582fdc88928) at ../../../../../../src/postgres/src/backend/executor/execMain.c:503
       #5  0x00000000007800f8 in ExecutorEnd (queryDesc=queryDesc@entry=0x2582fdc88928) at ../../../../../../src/postgres/src/backend/executor/execMain.c:474
       #6  0x00000000006f140c in PortalCleanup (portal=0x2582ff900128) at ../../../../../../src/postgres/src/backend/commands/portalcmds.c:305
       #7  0x0000000000b3c36a in PortalDrop (portal=portal@entry=0x2582ff900128, isTopCommit=isTopCommit@entry=false)
           at ../../../../../../../src/postgres/src/backend/utils/mmgr/portalmem.c:514
       #8  0x000000000097e667 in exec_simple_query (query_string=0x2582ffdc6128 "select * from t;") at ../../../../../../src/postgres/src/backend/tcop/postgres.c:1331
       #9  yb_exec_simple_query_impl (query_string=query_string@entry=0x2582ffdc6128) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5060
       #10 0x000000000097b79a in yb_exec_query_wrapper_one_attempt (exec_context=exec_context@entry=0x2582ffdc6000, restart_data=restart_data@entry=0x7ffc81c0e7d0,
           functor=functor@entry=0x97e028 <yb_exec_simple_query_impl>, functor_context=functor_context@entry=0x2582ffdc6128, attempt=attempt@entry=0, retry=retry@entry=0x7ffc81c0e78f)
           at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5028
       #11 0x000000000097d06c in yb_exec_query_wrapper (exec_context=exec_context@entry=0x2582ffdc6000, restart_data=restart_data@entry=0x7ffc81c0e7d0,
           functor=functor@entry=0x97e028 <yb_exec_simple_query_impl>, functor_context=functor_context@entry=0x2582ffdc6128)
           at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5052
       #12 0x000000000097d0bf in yb_exec_simple_query (query_string=query_string@entry=0x2582ffdc6128 "select * from t;", exec_context=exec_context@entry=0x2582ffdc6000)
           at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5075
       #13 0x000000000097fe7f in PostgresMain (dbname=<optimized out>, username=<optimized out>) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5794
       #14 0x00000000008c8349 in BackendRun (port=0x2582ff8403c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4791
       #15 BackendStartup (port=0x2582ff8403c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4491
       #16 ServerLoop () at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1878
       #17 0x00000000008caa4a in PostmasterMain (argc=argc@entry=25, argv=argv@entry=0x2582ffdc01a0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1533
       #18 0x0000000000804b9d in PostgresServerProcessMain (argc=25, argv=0x2582ffdc01a0) at ../../../../../../src/postgres/src/backend/main/main.c:208
       #19 0x0000000000804bbd in main ()

       294             /*
       295              * close heap scan
       296              */
       297             if (tsdesc != NULL)
       298                     table_endscan(tsdesc);

   Reason is initial merge 55782d5
   incorrectly merges end of ExecEndYbSeqScan.  Upstream PG
   9ddef36278a9f676c07d0b4d9f33fa22e48ce3b5 removes code, but initial
   merge duplicates lines.  Remove those lines.

Test Plan:
Apply the following patch to activate YB Seq Scan:

    diff --git a/src/postgres/src/backend/optimizer/path/allpaths.c b/src/postgres/src/backend/optimizer/path/allpaths.c
    index 8a4c38a965..854d84a648 100644
    --- a/src/postgres/src/backend/optimizer/path/allpaths.c
    +++ b/src/postgres/src/backend/optimizer/path/allpaths.c
    @@ -576,7 +576,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
                     else
                     {
                         /* Plain relation */
    -                    if (IsYBRelationById(rte->relid))
    +                    if (false)
                         {
                             /*
                              * Using a foreign scan which will use the YB FDW by

On almalinux 8,

    ./yb_build.sh fastdebug --gcc11
    pg15_tests/run_all_tests.sh fastdebug --gcc11 --sj --sp --scb

fails the following tests:

- test_D29546
- test_pg15_regress: yb_pg15
- test_types_geo: yb_pg_box
- test_hash_in_queries: yb_hash_in_queries

Manually check to see that they are due to YB Seq Scan explain output
differences.

Reviewers: aagrawal, tfoucher

Reviewed By: tfoucher

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D31139
jasonyb pushed a commit that referenced this issue Jun 11, 2024
Major refactoring to support multiple new features.
es1024 added a commit that referenced this issue Sep 25, 2024
Summary:
It is possible for tablet peer's `tablet_` to be null when a rocksdb flush finishes. We call `tablet_->MaxPersistentOpId()` after flush to clean up recently applied transaction state, and this causes a SIGSEGV:
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV
  * frame #0: 0x000055885b97311d yb-tserver`yb::ScopedRWOperation::ScopedRWOperation(yb::RWOperationCounter*, yb::StatusHolder const*, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>> const&) [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::basic_string(this="", __str=<unavailable>) at string:898:9
    frame #1: 0x000055885b97311d yb-tserver`yb::ScopedRWOperation::ScopedRWOperation(yb::RWOperationCounter*, yb::StatusHolder const*, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>> const&) [inlined] yb::RWOperationCounter::resource_name(this=0x0000000000000378) const at operation_counter.h:95:12
    frame #2: 0x000055885b97311d yb-tserver`yb::ScopedRWOperation::ScopedRWOperation(this=0x00007f9455305d58, counter=0x0000000000000378, abort_status_holder=<unavailable>, deadline=0x00007f9455305d98) at operation_counter.cc:190:62
    frame #3: 0x000055885b247ea6 yb-tserver`yb::tablet::Tablet::MaxPersistentOpId(bool) const [inlined] yb::ScopedRWOperation::ScopedRWOperation(this=0x00007f9455305d58, counter=<unavailable>, deadline=0x00007f9455305d98) at operation_counter.h:140:9
    frame #4: 0x000055885b247e9f yb-tserver`yb::tablet::Tablet::MaxPersistentOpId(bool) const [inlined] yb::tablet::Tablet::CreateScopedRWOperationBlockingRocksDbShutdownStart(this=0x0000000000000000, deadline=yb::CoarseTimePoint @ 0x00007f9455305d98) const at tablet.cc:3375:10
    frame #5: 0x000055885b247e90 yb-tserver`yb::tablet::Tablet::MaxPersistentOpId(this=0x0000000000000000, invalid_if_no_new_data=<unavailable>) const at tablet.cc:3540:32
    frame #6: 0x000055885b277f5e yb-tserver`yb::tablet::TabletPeer::MaxPersistentOpId(this=<unavailable>) const at tablet_peer.cc:946:23
    frame #7: 0x000055885b278e52 yb-tserver`non-virtual thunk to yb::tablet::TabletPeer::MaxPersistentOpId() const at tablet_peer.cc:0
    frame #8: 0x000055885b2dec44 yb-tserver`yb::tablet::TransactionParticipant::Impl::DoProcessRecentlyAppliedTransactions(this=0x0000153123151500, retryable_requests_flushed_op_id=<unavailable>, persist=<unavailable>) at transaction_participant.cc:2186:22
    frame #9: 0x000055885b2e0a8e yb-tserver`yb::tablet::TransactionParticipant::ProcessRecentlyAppliedTransactions() [inlined] yb::tablet::TransactionParticipant::Impl::ProcessRecentlyAppliedTransactions(this=0x0000153123151500) at transaction_participant.cc:1440:27
    frame #10: 0x000055885b2e0a63 yb-tserver`yb::tablet::TransactionParticipant::ProcessRecentlyAppliedTransactions(this=<unavailable>) at transaction_participant.cc:2629:17
    frame #11: 0x000055885b226093 yb-tserver`yb::tablet::Tablet::RocksDbListener::OnFlushCompleted(this=0x0000153110c2da58, (null)=<unavailable>, (null)=<unavailable>) at tablet.cc:503:34
    frame #12: 0x000055885af0e507 yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(rocksdb::ColumnFamilyData*) at db_impl.cc:2121:19
    frame #13: 0x000055885af0e275 yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(rocksdb::ColumnFamilyData*) [inlined] rocksdb::DBImpl::FlushMemTableToOutputFile(this=0x0000153123150a80, cfd=0x000015317d651600, mutable_cf_options=0x00007f94553077d8, made_progress=<unavailable>, job_context=0x00007f9455306938, log_buffer=0x00007f9455306048) at db_impl.cc:2008:3
    frame #14: 0x000055885af0d859 yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(rocksdb::ColumnFamilyData*) [inlined] rocksdb::DBImpl::BackgroundFlush(this=0x0000153123150a80, made_progress=<unavailable>, job_context=0x00007f9455306938, log_buffer=0x00007f9455306048, cfd=0x000015317d651600) at db_impl.cc:3399:10
    frame #15: 0x000055885af0d21f yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(this=0x0000153123150a80, cfd=<unavailable>) at db_impl.cc:3470:31
    frame #16: 0x000055885b024a53 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() at thread_posix.cc:133:5
    frame #17: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] rocksdb::ThreadPool::StartBGThreads(this=<unavailable>)::$_0::operator()() const at thread_posix.cc:172:5
    frame #18: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_0&>()()) std::__1::__invoke[abi:ue170006]<rocksdb::ThreadPool::StartBGThreads()::$_0&>(rocksdb::ThreadPool::StartBGThreads()::$_0&) at invoke.h:340:25
    frame #19: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ue170006]<rocksdb::ThreadPool::StartBGThreads(__args=<unavailable>)::$_0&>(rocksdb::ThreadPool::StartBGThreads()::$_0&) at invoke.h:415:5
    frame #20: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator(this=<unavailable>)[abi:ue170006]() at function.h:192:16
    frame #21: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator(this=<unavailable>)() at function.h:363:12
    frame #22: 0x000055885b9c1543 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator(this=0x000015313de3b380)[abi:ue170006]() const at function.h:517:16
    frame #23: 0x000055885b9c152d yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator(this=0x000015313de3b380)() const at function.h:1168:12
    frame #24: 0x000055885b9c152d yb-tserver`yb::Thread::SuperviseThread(arg=0x000015313de3b320) at thread.cc:866:3
    frame #25: 0x00007f94994d81ca libpthread.so.0`start_thread + 234
    frame #26: 0x00007f9499729e73 libc.so.6`__clone + 67
```

This diff adds a null check and returns `OpId::Min()` (i.e. don't clean anything up) if `tablet_` is null and we cannot call `MaxPersistentOpId`.
Jira: DB-12915

Test Plan: Jenkins

Reviewers: sergei, rthallam

Reviewed By: sergei, rthallam

Subscribers: rthallam, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D38323
es1024 added a commit that referenced this issue Sep 26, 2024
…fter flush

Summary:
Original commit: 250a4d5 / D38323
It is possible for tablet peer's `tablet_` to be null when a rocksdb flush finishes. We call `tablet_->MaxPersistentOpId()` after flush to clean up recently applied transaction state, and this causes a SIGSEGV:
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV
  * frame #0: 0x000055885b97311d yb-tserver`yb::ScopedRWOperation::ScopedRWOperation(yb::RWOperationCounter*, yb::StatusHolder const*, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>> const&) [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::basic_string(this="", __str=<unavailable>) at string:898:9
    frame #1: 0x000055885b97311d yb-tserver`yb::ScopedRWOperation::ScopedRWOperation(yb::RWOperationCounter*, yb::StatusHolder const*, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>> const&) [inlined] yb::RWOperationCounter::resource_name(this=0x0000000000000378) const at operation_counter.h:95:12
    frame #2: 0x000055885b97311d yb-tserver`yb::ScopedRWOperation::ScopedRWOperation(this=0x00007f9455305d58, counter=0x0000000000000378, abort_status_holder=<unavailable>, deadline=0x00007f9455305d98) at operation_counter.cc:190:62
    frame #3: 0x000055885b247ea6 yb-tserver`yb::tablet::Tablet::MaxPersistentOpId(bool) const [inlined] yb::ScopedRWOperation::ScopedRWOperation(this=0x00007f9455305d58, counter=<unavailable>, deadline=0x00007f9455305d98) at operation_counter.h:140:9
    frame #4: 0x000055885b247e9f yb-tserver`yb::tablet::Tablet::MaxPersistentOpId(bool) const [inlined] yb::tablet::Tablet::CreateScopedRWOperationBlockingRocksDbShutdownStart(this=0x0000000000000000, deadline=yb::CoarseTimePoint @ 0x00007f9455305d98) const at tablet.cc:3375:10
    frame #5: 0x000055885b247e90 yb-tserver`yb::tablet::Tablet::MaxPersistentOpId(this=0x0000000000000000, invalid_if_no_new_data=<unavailable>) const at tablet.cc:3540:32
    frame #6: 0x000055885b277f5e yb-tserver`yb::tablet::TabletPeer::MaxPersistentOpId(this=<unavailable>) const at tablet_peer.cc:946:23
    frame #7: 0x000055885b278e52 yb-tserver`non-virtual thunk to yb::tablet::TabletPeer::MaxPersistentOpId() const at tablet_peer.cc:0
    frame #8: 0x000055885b2dec44 yb-tserver`yb::tablet::TransactionParticipant::Impl::DoProcessRecentlyAppliedTransactions(this=0x0000153123151500, retryable_requests_flushed_op_id=<unavailable>, persist=<unavailable>) at transaction_participant.cc:2186:22
    frame #9: 0x000055885b2e0a8e yb-tserver`yb::tablet::TransactionParticipant::ProcessRecentlyAppliedTransactions() [inlined] yb::tablet::TransactionParticipant::Impl::ProcessRecentlyAppliedTransactions(this=0x0000153123151500) at transaction_participant.cc:1440:27
    frame #10: 0x000055885b2e0a63 yb-tserver`yb::tablet::TransactionParticipant::ProcessRecentlyAppliedTransactions(this=<unavailable>) at transaction_participant.cc:2629:17
    frame #11: 0x000055885b226093 yb-tserver`yb::tablet::Tablet::RocksDbListener::OnFlushCompleted(this=0x0000153110c2da58, (null)=<unavailable>, (null)=<unavailable>) at tablet.cc:503:34
    frame #12: 0x000055885af0e507 yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(rocksdb::ColumnFamilyData*) at db_impl.cc:2121:19
    frame #13: 0x000055885af0e275 yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(rocksdb::ColumnFamilyData*) [inlined] rocksdb::DBImpl::FlushMemTableToOutputFile(this=0x0000153123150a80, cfd=0x000015317d651600, mutable_cf_options=0x00007f94553077d8, made_progress=<unavailable>, job_context=0x00007f9455306938, log_buffer=0x00007f9455306048) at db_impl.cc:2008:3
    frame #14: 0x000055885af0d859 yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(rocksdb::ColumnFamilyData*) [inlined] rocksdb::DBImpl::BackgroundFlush(this=0x0000153123150a80, made_progress=<unavailable>, job_context=0x00007f9455306938, log_buffer=0x00007f9455306048, cfd=0x000015317d651600) at db_impl.cc:3399:10
    frame #15: 0x000055885af0d21f yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(this=0x0000153123150a80, cfd=<unavailable>) at db_impl.cc:3470:31
    frame #16: 0x000055885b024a53 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() at thread_posix.cc:133:5
    frame #17: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] rocksdb::ThreadPool::StartBGThreads(this=<unavailable>)::$_0::operator()() const at thread_posix.cc:172:5
    frame #18: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_0&>()()) std::__1::__invoke[abi:ue170006]<rocksdb::ThreadPool::StartBGThreads()::$_0&>(rocksdb::ThreadPool::StartBGThreads()::$_0&) at invoke.h:340:25
    frame #19: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ue170006]<rocksdb::ThreadPool::StartBGThreads(__args=<unavailable>)::$_0&>(rocksdb::ThreadPool::StartBGThreads()::$_0&) at invoke.h:415:5
    frame #20: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator(this=<unavailable>)[abi:ue170006]() at function.h:192:16
    frame #21: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator(this=<unavailable>)() at function.h:363:12
    frame #22: 0x000055885b9c1543 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator(this=0x000015313de3b380)[abi:ue170006]() const at function.h:517:16
    frame #23: 0x000055885b9c152d yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator(this=0x000015313de3b380)() const at function.h:1168:12
    frame #24: 0x000055885b9c152d yb-tserver`yb::Thread::SuperviseThread(arg=0x000015313de3b320) at thread.cc:866:3
    frame #25: 0x00007f94994d81ca libpthread.so.0`start_thread + 234
    frame #26: 0x00007f9499729e73 libc.so.6`__clone + 67
```

This diff adds a null check and returns `OpId::Min()` (i.e. don't clean anything up) if `tablet_` is null and we cannot call `MaxPersistentOpId`.
Jira: DB-12915

Test Plan: Jenkins

Reviewers: sergei, rthallam

Reviewed By: rthallam

Subscribers: ybase, rthallam

Differential Revision: https://phorge.dev.yugabyte.com/D38431
Huqicheng added a commit that referenced this issue Dec 12, 2024
…on::GetDocPaths

Summary:
```
../../src/yb/docdb/conflict_resolution.cc:865:5: runtime error: load of value 4458368, which is not a valid value for type 'IsolationLevel'
[m-1] W1212 08:27:12.423846 99051 master_heartbeat_service.cc:426] Could not get YSQL db catalog versions for heartbeat response:
[m-1] W1212 08:27:12.425787 99052 master_heartbeat_service.cc:426] Could not get YSQL db catalog versions for heartbeat response:
    #0 0x7fd9ae920c0e in yb::docdb::(anonymous namespace)::GetWriteRequestIntents(std::vector<std::unique_ptr<yb::docdb::DocOperation, std::default_delete<yb::docdb::DocOperation>>, std::allocator<std::unique_ptr<yb::docdb::DocOperation, std::default_delete<yb::docdb::DocOperation>>>> const&, yb::dockv::KeyBytes*, yb::StronglyTypedBool<yb::dockv::PartialRangeKeyIntents_Tag>, yb::IsolationLevel) ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:865:5
    #1 0x7fd9ae9190ce in yb::docdb::(anonymous namespace)::TransactionConflictResolverContext::GetRequestedIntents(yb::docdb::(anonymous namespace)::ConflictResolver*, yb::dockv::KeyBytes*) ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:1130:22
    #2 0x7fd9ae90fcee in yb::docdb::(anonymous namespace)::TransactionConflictResolverContext::ReadConflicts(yb::docdb::(anonymous namespace)::ConflictResolver*) ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:1164:22
    #3 0x7fd9ae8fd333 in yb::docdb::(anonymous namespace)::ConflictResolver::Resolve() ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:198:26
    #4 0x7fd9ae8fc3b2 in yb::docdb::(anonymous namespace)::WaitOnConflictResolver::TryPreWait() ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:697:25
    #5 0x7fd9ae8fc3b2 in yb::docdb::(anonymous namespace)::WaitOnConflictResolver::Run() ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:670:7
    #6 0x7fd9ae8fa47d in yb::docdb::ResolveTransactionConflicts(std::vector<std::unique_ptr<yb::docdb::DocOperation, std::default_delete<yb::docdb::DocOperation>>, std::allocator<std::unique_ptr<yb::docdb::DocOperation, std::default_delete<yb::docdb::DocOperation>>>> const&, yb::docdb::ConflictManagementPolicy, yb::docdb::LWKeyValueWriteBatchPB const&, yb::HybridTime, yb::HybridTime, long, unsigned long, long, yb::docdb::DocDB const&, yb::StronglyTypedBool<yb::dockv::PartialRangeKeyIntents_Tag>, yb::TransactionStatusManager*, yb::tablet::TabletMetrics*, yb::docdb::LockBatch*, yb::docdb::WaitQueue*, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, boost::function<void (yb::Result<yb::HybridTime> const&)>) ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:1401:15
    #7 0x7fd9aff22ca9 in yb::tablet::WriteQuery::DoExecute() ${YB_SRC_ROOT}/src/yb/tablet/write_query.cc:801:10
    #8 0x7fd9aff1fda3 in yb::tablet::WriteQuery::Execute(std::unique_ptr<yb::tablet::WriteQuery, std::default_delete<yb::tablet::WriteQuery>>) ${YB_SRC_ROOT}/src/yb/tablet/write_query.cc:618:28
    #9 0x7fd9afb9b708 in yb::tablet::Tablet::AcquireLocksAndPerformDocOperations(std::unique_ptr<yb::tablet::WriteQuery, std::default_delete<yb::tablet::WriteQuery>>) ${YB_SRC_ROOT}/src/yb/tablet/tablet.cc:2147:3
    #10 0x7fd9afd166d7 in yb::tablet::TabletPeer::WriteAsync(std::unique_ptr<yb::tablet::WriteQuery, std::default_delete<yb::tablet::WriteQuery>>) ${YB_SRC_ROOT}/src/yb/tablet/tablet_peer.cc:704:12
    #11 0x7fd9b0fd4d57 in yb::tserver::TabletServiceImpl::PerformWrite(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext*) ${YB_SRC_ROOT}/src/yb/tserver/tablet_service.cc:2325:16
    #12 0x7fd9b0fd711a in yb::tserver::TabletServiceImpl::Write(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext) ${YB_SRC_ROOT}/src/yb/tserver/tablet_service.cc:2345:17
    #13 0x7fd9a91098ec in yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0::operator()(std::shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext)::operator()(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext) const ${BUILD_ROOT}/src/yb/tserver/tserver_service.service.cc:848:9
    #14 0x7fd9a91098ec in auto yb::rpc::HandleCall<yb::rpc::RpcCallPBParamsImpl<yb::tserver::WriteRequestPB, yb::tserver::WriteResponsePB>, yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0::operator()(std::shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext)>(std::shared_ptr<yb::rpc::InboundCall>, yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0::operator()(std::shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext)) ${YB_SRC_ROOT}/src/yb/rpc/local_call.h:126:7
    #15 0x7fd9a91098ec in yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0::operator()(std::shared_ptr<yb::rpc::InboundCall>) const ${BUILD_ROOT}/src/yb/tserver/tserver_service.service.cc:846:7
    #16 0x7fd9a91098ec in decltype(std::declval<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0&>()(std::declval<std::shared_ptr<yb::rpc::InboundCall>>())) std::__invoke[abi:ue170006]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0&, std::shared_ptr<yb::rpc::InboundCall>>(yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0&, std::shared_ptr<yb::rpc::InboundCall>&&) ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__type_traits/invoke.h:340:25
    #17 0x7fd9a91098ec in void std::__invoke_void_return_wrapper<void, true>::__call[abi:ue170006]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0&, std::shared_ptr<yb::rpc::InboundCall>>(yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0&, std::shared_ptr<yb::rpc::InboundCall>&&) ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__type_traits/invoke.h:415:5
    #18 0x7fd9a91098ec in std::__function::__alloc_func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0, std::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0>, void (std::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ue170006](std::shared_ptr<yb::rpc::InboundCall>&&) ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:192:16
    #19 0x7fd9a91098ec in std::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0, std::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0>, void (std::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::shared_ptr<yb::rpc::InboundCall>&&) ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:363:12
    #20 0x7fd9a9108b34 in std::__function::__value_func<void (std::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ue170006](std::shared_ptr<yb::rpc::InboundCall>&&) const ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:517:16
    #21 0x7fd9a9108b34 in std::function<void (std::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::shared_ptr<yb::rpc::InboundCall>) const ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:1168:12
    #22 0x7fd9a9108b34 in yb::tserver::TabletServerServiceIf::Handle(std::shared_ptr<yb::rpc::InboundCall>) ${BUILD_ROOT}/src/yb/tserver/tserver_service.service.cc:831:3
    #23 0x7fd9a5b7a892 in yb::rpc::ServicePoolImpl::Handle(std::shared_ptr<yb::rpc::InboundCall>) ${YB_SRC_ROOT}/src/yb/rpc/service_pool.cc:269:19
    #24 0x7fd9a59fcea5 in yb::rpc::InboundCall::InboundCallTask::Run() ${YB_SRC_ROOT}/src/yb/rpc/inbound_call.cc:317:13
    #25 0x7fd9a5bad15d in yb::rpc::(anonymous namespace)::Worker::Execute() ${YB_SRC_ROOT}/src/yb/rpc/thread_pool.cc:115:15
    #26 0x7fd9a42ad037 in std::__function::__value_func<void ()>::operator()[abi:ue170006]() const ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:517:16
    #27 0x7fd9a42ad037 in std::function<void ()>::operator()() const ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:1168:12
    #28 0x7fd9a42ad037 in yb::Thread::SuperviseThread(void*) ${YB_SRC_ROOT}/src/yb/util/thread.cc:895:3
    #29 0x56176114abea in asan_thread_start(void*) ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:225:31
    #30 0x7fd99f1071c9 in start_thread (/lib64/libpthread.so.0+0x81c9) (BuildId: 1962602ac5dc3011b6d697b38b05ddc244197114)
    #31 0x7fd99eb488d2 in clone (/lib64/libc.so.6+0x398d2) (BuildId: 37e4ac6a7fb96950b0e6bf72d73d94f3296c77eb)

UndefinedBehaviorSanitizer: undefined-behavior ../../src/yb/docdb/conflict_resolution.cc:865:5 in
```

IsolationLevel is left uninitialized at `PgsqlLockOperation::GetDocPaths` so the ASAN builds can get the `not a valid value` faliure.
Jira: DB-14458

Test Plan: advisory_lock-test

Reviewers: bkolagani

Reviewed By: bkolagani

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D40638
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/new-feature This is a request for a completely new feature
Projects
None yet
Development

No branches or pull requests

6 participants