forked from yugabyte/yugabyte-db
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Camunda docs #2
Merged
Merged
Camunda docs #2
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ashetkar
reviewed
May 17, 2022
jayant07-yb
pushed a commit
that referenced
this pull request
Sep 26, 2022
…t.cc Summary: Addresses the following errors: ASAN error: ``` ==47493==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150014911e8 at pc 0x00000046ce8e bp 0x7f84d2a01ee0 sp 0x7f84d2a01ed8 READ of size 4 at 0x6150014911e8 thread T561 (rpc_tp_CDCConsu) #0 0x46ce8d in google::protobuf::internal::RepeatedPtrFieldBase::size() const /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220322021123-e488f7fa5b-almalinux8-x86_64-clang12/installed/asan/include/google/protobuf/repeated_field.h:1515:10 #1 0x7f864b3e0118 in google::protobuf::RepeatedPtrField<yb::cdc::CDCRecordPB>::size() const /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220322021123-e488f7fa5b-almalinux8-x86_64-clang12/installed/asan/include/google/protobuf/repeated_field.h:1984:32 #2 0x7f864b3dd9fc in yb::cdc::GetChangesResponsePB::records_size() const $YB_SRC_ROOT/build/asan-clang12-dynamic-ninja/src/yb/cdc/cdc_service.pb.h:9334:19 #3 0x7f864b3da477 in yb::tserver::enterprise::TwoDCOutputClient::ProcessChangesStartingFromIndex(int) $YB_SRC_ROOT/build/asan-clang12-dynaamic-ninja/../../ent/src/yb/tserver/twodc_output_client.cc:213:44 #4 0x7f864b3dd131 in yb::tserver::enterprise::TwoDCOutputClient::WriteCDCRecordDone(yb::Status const&, yb::tserver::WriteResponsePB const&) $YB_SRC_ROOT/build/asan-clang12-dynamic-ninja/../../ent/src/yb/tserver/twodc_output_client.cc:408:18 ``` ``` 0x6150014911e8 is located 360 bytes inside of 512-byte region [0x615001491080,0x615001491280) freed by thread T772 (CDCConsumerHand) here: #0 0x7f8657e7465d in operator delete(void*) /opt/yb-build/llvm/yb-llvm-v12.0.1-yb-1-1633143152-bdb147e6-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:160:3 #1 0x7f864b3e00f5 in yb::tserver::enterprise::TwoDCOutputClient::~TwoDCOutputClient() $YB_SRC_ROOT/build/asan-clang12-dynamic-ninja/../../ent/src/yb/tserver/twodc_output_client.cc:77:24 ``` Fixed by adding in `shutdown_` variable to stop processing future records if the client is being shutdown. --- TSAN error: ``` WARNING: ThreadSanitizer: data race (pid=430) Write of size 4 at 0x7b5000b81168 by thread T306 (mutexes: write M1014289907445001104): #0 void google::protobuf::internal::RepeatedPtrFieldBase::Clear<google::protobuf::RepeatedPtrField<yb::cdc::CDCRecordPB>::TypeHandler>() /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220420172553-91c632476c-centos7-x86_64-clang12/installed/tsan/include/google/protobuf/repeated_field.h:1593:19 (libcdc_service_proto.so+0xf1df8) #1 google::protobuf::RepeatedPtrField<yb::cdc::CDCRecordPB>::Clear() /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220420172553-91c632476c-centos7-x86_64-clang12/installed/tsan/include/google/protobuf/repeated_field.h:2102:25 (libcdc_service_proto.so+0xe5ce9) #2 yb::cdc::GetChangesResponsePB::Clear() /nfusr/centos-gcp-cloud/jenkins-worker-2lx048/jenkins/jenkins-github-yugabyte-db-centos-master-clang12-tsan-175/build/tsan-clang12-dynamic-ninja/src/yb/cdc/cdc_service.pb.cc:10118:12 (libcdc_service_proto.so+0xcdaa9) ``` ``` Previous read of size 4 at 0x7b5000b81168 by thread T352: #0 google::protobuf::internal::RepeatedPtrFieldBase::size() const /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220420172553-91c632476c-centos7-x86_64-clang12/installed/tsan/include/google/protobuf/repeated_field.h:1515:10 (xcluster-tablet-split-itest+0x35072a) #1 google::protobuf::RepeatedPtrField<yb::cdc::CDCRecordPB>::size() const /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220420172553-91c632476c-centos7-x86_64-clang12/installed/tsan/include/google/protobuf/repeated_field.h:1984:32 (libtserver.so+0x591ff9) #2 yb::cdc::GetChangesResponsePB::records_size() const /nfusr/centos-gcp-cloud/jenkins-worker-2lx048/jenkins/jenkins-github-yugabyte-db-centos-master-clang12-tsan-175/build/tsan-clang12-dynamic-ninja/src/yb/cdc/cdc_service.pb.h:9367:19 (libtserver.so+0x59028d) #3 yb::tserver::enterprise::TwoDCOutputClient::ProcessChangesStartingFromIndex(int) /nfusr/centos-gcp-cloud/jenkins-worker-2lx048/jenkins/jenkins-github-yugabyte-db-centos-master-clang12-tsan-175/build/tsan-clang12-dynamic-ninja/../../ent/src/yb/tserver/twodc_output_client.cc:213:44 (libtserver.so+0x58e8ee) #4 yb::tserver::enterprise::TwoDCOutputClient::WriteCDCRecordDone(yb::Status const&, yb::tserver::WriteResponsePB const&) /nfusr/centos-gcp-cloud/jenkins-worker-2lx048/jenkins/jenkins-github-yugabyte-db-centos-master-clang12-tsan-175/build/tsan-clang12-dynamic-ninja/../../ent/src/yb/tserver/twodc_output_client.cc:408:18 (libtserver.so+0x58fa0e) ``` Fixed by returning immediately after completing ProcessSplitOp and there are no more records to process. Previously we would only continue the loop, which would still do an access on the `twodc_resp_copy_` object. Test Plan: ``` ybd tsan --cxx-test integration-tests_xcluster-tablet-split-itest --gtest_filter XClusterTabletSplitITest.SplittingOnProducerAndConsumer ybd asan --cxx-test integration-tests_xcluster-tablet-split-itest --gtest_filter XClusterTabletSplitITest.SplittingOnProducerAndConsumer ``` Reviewers: rahuldesirazu, nicolas Reviewed By: nicolas Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D16955
jayant07-yb
pushed a commit
that referenced
this pull request
Sep 26, 2022
… behavior on postgres shutdown Summary: The diff consists of 3 parts. **Part #1:** Multiple call of the `PerformFuture::Get` method In case of session close when temp tables exists the following stack is possible ``` std::__1::future<yb::pggate::PerformResult>::get() yb::pggate::PerformFuture::Get() // Second call of the PerformFuture::Get on same object yb::pggate::PgDocResponse::Get() yb::pggate::PgDocOp::~PgDocOp() yb::pggate::PgDocReadOp::~PgDocReadOp() ... yb::pggate::PgDml::~PgDml() yb::pggate::PgDmlRead::~PgDmlRead() yb::pggate::PgSelect::~PgSelect() ... yb::pggate::PgMemctx::Clear() yb::pggate::PgMemctx::~PgMemctx() ... yb::pggate::ClearGlobalPgMemctxMap() YBCDestroyPgGate YBOnPostgresBackendShutdown quickdie // Caused by interruption by the SIGQUIT signal ... std::__1::future<yb::pggate::PerformResult>::get() yb::pggate::PerformFuture::Get() // First call of the PerformFuture::Get yb::pggate::PgDocResponse::Get() yb::pggate::PgDocOp::GetResult(std::__1::list<yb::pggate::PgDocResult, std::__1::allocator<yb::pggate::PgDocResult> >*) yb::pggate::PgDml::FetchDataFromServer() yb::pggate::PgDml::Fetch(int, unsigned long*, bool*, yb::pggate::PgApiImpl::DmlFetch(yb::pggate::PgStatement*, int, unsigned long*, YBCPgDmlFetch ybcFetchNextHeapTuple ybc_getnext_heaptuple ybc_systable_getnext systable_getnext findDependentObjects performDeletion RemoveTempRelations RemoveTempRelationsCallback shmem_exit proc_exit_prepare proc_exit PostgresMain BackendRun BackendStartup ServerLoop PostmasterMain PostgresServerProcessMain main ``` In this stack the `PerformFuture::Get` method is called on same object twice. But the `PerformFuture` object is designed to call `Get` method only once. After the first call the `PerformFuture::Valid` method should return `false` and `Get` method should not be called. But in the above stack the second call of the `PerformFuture::Get`method is done before the first one is returned (due to interruption by the `SIGQUIT` signal). To handle this situation `session_` field is set to `null` at the very beginning of the `PerformFuture::Get`. In case `session_` is `null` the `PerformFuture::Valid` returns `false`. **Part #2:** Access deleted `PgApiImpl` object The `YBCDestroyPgGate` function destroys the `PgApiImpl` object and then calls the `ClearGlobalPgMemctxMap` function. But this function may call the code which access the `PgApiImpl` object fields. Here is the stack trace ``` std::__1::unique_ptr<yb::pggate::PgClient::Impl, std::__1::default_delete<yb::pggate::PgClient::Impl> >::operator->() const yb::pggate::PgClient::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>) yb::pggate::PgTxnManager::ExitSeparateDdlTxnMode(yb::StronglyTypedBool<yb::pggate::Commit_Tag>) yb::pggate::PgTxnManager::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>) yb::pggate::PgTxnManager::AbortTransaction() yb::pggate::PgTxnManager::~PgTxnManager() ... yb::pggate::PgSession::~PgSession() ... yb::pggate::PgStatement::~PgStatement() yb::pggate::PgDml::~PgDml() yb::pggate::PgDmlRead::~PgDmlRead() yb::pggate::PgSelect::~PgSelect() ... yb::pggate::PgMemctx::Clear() yb::pggate::PgMemctx::~PgMemctx() ... yb::pggate::ClearGlobalPgMemctxMap() ... ``` Solution is to move PgMemctx map into the `PgApiImpl` object (this also fixes yugabyte#7216) **Part #3:** Access shut down `PgClient`object The `~PgApiImpl()` explicitly shut downs `pg_client_` object but the `pg_session_` is still alive. And when it will be destroyed the pg_client_ object may be used. The stack is ``` yb::pggate::PgClient::Impl::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>) yb::pggate::PgClient::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>) yb::pggate::PgTxnManager::ExitSeparateDdlTxnMode(yb::StronglyTypedBool<yb::pggate::Commit_Tag>) yb::pggate::PgTxnManager::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>) yb::pggate::PgTxnManager::AbortTransaction() yb::pggate::PgTxnManager::~PgTxnManager() ... yb::pggate::PgSession::~PgSession() ... yb::pggate::PgApiImpl::~PgApiImpl() ... YBCDestroyPgGate YBOnPostgresBackendShutdown quickdie ... ``` Solution is to explicitly destroy `pg_session_` before shutting down of `pg_client_` Test Plan: Jenkins The following tests had crashed without the fix ``` ./yb_build.sh asan --gtest_filter PgFKeyTest.AddFKCorrectnessOnTempTables ./yb_build.sh asan --gtest_filter PgCatalogPerfTest.CacheRefreshRPCCountWithPartitionTables ./yb_build.sh asan --gtest_filter PgMiniTest.BigInsertWithRestart ./yb_build.sh asan --gtest_filter AlterTableWithConcurrentTxnTest.TServerLeaderChange ``` Reviewers: dsrinivasan, jason, alex Reviewed By: alex Subscribers: mbautin, bogdan, yql Differential Revision: https://phabricator.dev.yugabyte.com/D17259
jayant07-yb
pushed a commit
that referenced
this pull request
Sep 26, 2022
…le changing oom_score_adj Summary: When a `ysqlsh` shell is opened, the postmaster tries to set the oom_score_adj to a specific value. This is accomplished by modifying the oom value in the following file `/proc/[pid]/oom_score_adj`. However, when trying to set this sometimes the postmaster crashes with the following error. ``` ysqlsh: could not connect to server: Connection refused\n\tIs the server running on host "172.151.31.53" and accepting\n\tTCP/IP connections on port 5433? ``` ``` warning: File "/home/yugabyte/yb-software/yugabyte-2.15.1.0-b3-centos-x86_64/linuxbrew/lib/libthread_db.so.1" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py". To enable execution of this file add add-auto-load-safe-path /home/yugabyte/yb-software/yugabyte-2.15.1.0-b3-centos-x86_64/linuxbrew/lib/libthread_db.so.1 line to your configuration file "/root/.gdbinit". To completely disable this security protection add set auto-load safe-path / line to your configuration file "/root/.gdbinit". For more information about this security protection see the "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: info "(gdb)Auto-loading safe path" warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. warning: File "/home/yugabyte/yb-software/yugabyte-2.15.1.0-b3-centos-x86_64/linuxbrew/lib/libthread_db.so.1" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py". warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Core was generated by `/home/yugabyte/yb-software/yugabyte-2.15.1.0-b3-centos-x86_64/postgres/bin/post'. Program terminated with signal 11, Segmentation fault. #0 _IO_new_fclose (fp=0x0) at iofclose.c:53 53 iofclose.c: No such file or directory. (gdb) bt #0 _IO_new_fclose (fp=0x0) at iofclose.c:53 #1 0x00000000008ab237 in BackendStartup (port=0x1b7c1e0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4255 #2 ServerLoop () at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1767 #3 0x00000000008a7d01 in PostmasterMain (argc=<optimized out>, argv=0x1b92780) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1423 #4 0x00000000007c85c3 in PostgresServerProcessMain (argc=23, argv=0x1b92780) at ../../../../../../src/postgres/src/backend/main/main.c:234 #5 0x00000000004f5ae2 in main () ``` This is due to the fact that fclose tries to close a file descriptor when fopen fails. The diff that caused this regression is as follows --> https://phabricator.dev.yugabyte.com/D14099 Log message looks like this ``` 2022-06-09 19:12:18.685 UTC [17266] LOG: error 2: No such file or directory, unable to open file /proc/17390/oom_score_adj123 ``` Test Plan: Force fclose to fail by providing a garbage path to the oom_score_adj. Make sure that after the fix postmaster does not restart or segfault does not happen. Reviewers: sagarwal, zyu, mihnea, smishra Reviewed By: zyu, smishra Subscribers: rthallam, kannan, yql Differential Revision: https://phabricator.dev.yugabyte.com/D17567
jayant07-yb
pushed a commit
that referenced
this pull request
Sep 26, 2022
…f tserver Summary: Given a universe where data can be moved off a node, we now wait for tablets to move off of the tserver that was first stopped on platform and then removed . Previously, if a node is stopped on Platform, and then the node is removed, we do not wait for the tablets to move off and just continue for the removal of the node from the universe. We try to move tablets off of a node when possible We remove the `isTServer` condition in `UpdatePlacementInfo.java` when blacklisting nodes because if a node is stopped, it's `isTServer` value is set as `false` but we would still like to blacklist this node Test Plan: Some things to understand beforehand: 1. When a node’s tserver/master is not running, isTserver/isMaster is false 2. If a node is stopped, the node is still alive, just that the tserver/master process is not running, thus `isTserverAliveOnNode` will be false on a stopped node Create a GCP universe with 6 nodes and rf3, with AZs comprising of us-west-1, us-west-2, and us-east1. In the below tests, we should have a clean universe and all the nodes should be live. Perform the following tests: a) Happy path, stopping and then immediately removing a node from the universe 1. Stop a node in us-west-2 2. Immediately after the node is stopped, remove the same node from the universe 3. Go to master UI at <master-ip>:7000/tablet-servers, on a node that is currently not being removed 4. Keep refreshing the page, we should see the values for the node's `User Tablet-Peers / Leaders` slowly decrease until it hits 0 / 0 5. The node should successfully be removed b) Edge Case #1, Only removing a node from the universe 1. Remove a node in us-west-2 from the universe 2. Go to master UI at <master-ip>:7000/tablet-servers, on a node that is currently not being removed 3. Keep refreshing the page, we should see the values for the node's `User Tablet-Peers / Leaders` slowly decrease until it hits 0 / 0 4. The node should successfully be removed c) Edge Case #2: Stopping a node, wait until tablets are moved off, then remove node from universe 1. Stop a node in us-west-2 2. Go to master UI at <master-ip>:7000/tablet-servers 3. Wait for around 10 - 15 mins, tablets from the stopped node should be moved off automatically after this timeframe, i.e. under the `User Tablet-Peers / Leaders` column, that node should display 0 / 0. 4. Remove the same node from the universe 5. Since the tablets are already moved off the node, this node should not have much of a wait time for node removal 6. The node should successfully be removed d) Edge case #3: Remove 2 nodes from the same AZ 1. Remove a node in us-west-2 from the universe 2. Remove another node from us-west-2 3. For the second node, since there is nowhere for the tablets to go to, we will not wait for the tablets to move, so on the master UI, we should see x / 0 under the `User Tablet-Peers / Leaders` columns, where 'x' is the number of tablet peers. The RemoveNodeFromUniverse task should finish. However the value of 'x' should slowly decrease until it hits 0. e) Edge case #4: Remove as many nodes as possible on Platform 1. We should only be able to remove at most 1 node with a master server on it to maintain a majority of tablet peers (in our case, we have rf3, so 3 masters servers, thus we can only remove one master server). 2. We should be able to remove all nodes with only tservers All areas that use `UpdatePlacementInfo.java` either have the number of nodes to be blacklisted as 0 except for in `EditKubernetesUniverse.java` but we are already using tservers, so it is safe to remove the `isTserver` check in `UpdatePlacementInfo.java` Reviewers: sanketh, nsingh Reviewed By: nsingh Subscribers: yugaware Differential Revision: https://phabricator.dev.yugabyte.com/D18596
jayant07-yb
pushed a commit
that referenced
this pull request
Sep 26, 2022
Summary: BlockingQueueDeathTest.TestPointerParamsMustBeEmptyOnDestruct was failing under ASAN on Jenkins. ``` Result: died but not with expected error. Expected: BlockingQueue holds bare pointers Actual msg: [ DEATH ] ``` It turned out that `gtest-death-test_test` gtest's own unit-tests are also failing on Jenkins with the common failure pattern: ``` [ DEATH ] AddressSanitizer:DEADLYSIGNAL [ DEATH ] ================================================================= [ DEATH ] ==2262872==ERROR: AddressSanitizer: stack-overflow on address 0x7f823b324f80 (pc 0x7f823b2edc03 bp 0x7f823b325fb0 sp 0x7f823b324f80 T0) [ DEATH ] #0 0x7f823b2edc03 (/lib64/ld-linux-x86-64.so.2+0x1c03) [ DEATH ] #1 0x7f823a362684 (/nfusr/alma8-gcp-cloud/jenkins-worker-74vryc/jenkins/ty/yugabyte-db-thirdparty/build/asan/gmock-1.8.0/shared/googlemock/gtest/libgtest.so+0x134684) [ DEATH ] #2 0x7f8239900dd2 (/lib64/libc.so.6+0x39dd2) [ DEATH ] [ DEATH ] SUMMARY: AddressSanitizer: stack-overflow (/lib64/ld-linux-x86-64.so.2+0x1c03) [ DEATH ] ==2262872==ABORTING [ DEATH ] ``` Which is related to https://chromium.googlesource.com/external/github.com/pwnall/googletest/+/681454dae48f109abf68c424c9d2e6db9a092238. With ASAN on x86_64, ExecDeathTestChildMain has frame size of 1728 bytes. Call to `chdir()` in `ExecDeathTestChildMain` ends up in `_dl_runtime_resolve_xsavec`, which attempts to save register state on the stack. And `XSAVE register save area size` is larger than 1728 bytes on Jenkins nodes: ``` $ cpuid -i | grep -i xsave ... XSAVE features (0xd/0): bytes required by XSAVE/XRSTOR area = 0x00000a80 (2688) $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) CPU @ 2.80GHz ``` GoogleTest 1.11.0 has a fix for this issue: google/googletest@681454d. Decided to upgrade to latest GoogleTest 1.12.0 that also contains the fix. Test Plan: - Jenkins Reviewers: esheng, jhe, mbautin Reviewed By: esheng, jhe, mbautin Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D19390
jayant07-yb
pushed a commit
that referenced
this pull request
Dec 2, 2022
…e image Summary: We observed a crash while running TPCC workload with CDCSDK enabled. The stack trace is: ``` (gdb) bt #0 0x0000557f25b11910 in yb::DatumMessagePB::MergeFrom(yb::DatumMessagePB const&) () #1 0x0000557f258a41ef in yb::cdc::PopulateBeforeImage(std::__1::shared_ptr<yb::tablet::TabletPeer> const&, yb::ReadHybridTime const&, yb::cdc::RowMessage*, std::__1::unordered_map<unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&, std::__1::unordered_map<unsigned int, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB> >, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB> > > > > const&, yb::docdb::SubDocKey const&, yb::Schema const&, unsigned int) () #2 0x0000557f258a7304 in yb::cdc::PopulateCDCSDKIntentRecord(yb::OpId const&, yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::vector<yb::docdb::IntentKeyValueForCDC, std::__1::allocator<yb::docdb::IntentKeyValueForCDC> > const&, yb::cdc::StreamMetadata const&, std::__1::shared_ptr<yb::tablet::TabletPeer> const&, std::__1::unordered_map<unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&, std::__1::unordered_map<unsigned int, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB> >, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB> > > > > const&, yb::cdc::GetChangesResponsePB*, yb::ScopedTrackedConsumption*, unsigned int*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, yb::Schema*, unsigned int, unsigned long const&) () #3 0x0000557f258aaa27 in yb::cdc::ProcessIntents(yb::OpId const&, yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, yb::cdc::StreamMetadata const&, std::__1::unordered_map<unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&, std::__1::unordered_map<unsigned int, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB> >, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB> > > > > const&, yb::cdc::GetChangesResponsePB*, yb::ScopedTrackedConsumption*, yb::cdc::CDCSDKCheckpointPB*, std::__1::shared_ptr<yb::tablet::TabletPeer> const&, std::__1::vector<yb::docdb::IntentKeyValueForCDC, std::__1::allocator<yb::docdb::IntentKeyValueForCDC> >*, yb::docdb::ApplyTransactionState*, yb::client::YBClient*, std::__1::shared_ptr<yb::Schema>*, unsigned int*, unsigned long const&) () #4 0x0000557f258b00c1 in yb::cdc::GetChangesForCDCSDK(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, yb::cdc::CDCSDKCheckpointPB const&, yb::cdc::StreamMetadata const&, std::__1::shared_ptr<yb::tablet::TabletPeer> const&, std::__1::shared_ptr<yb::MemTracker> const&, std::__1::unordered_map<unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&, std::__1::unordered_map<unsigned int, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB> >, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB> > > > > const&, yb::client::YBClient*, yb::consensus::ReplicateMsgsHolder*, yb::cdc::GetChangesResponsePB*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, std::__1::shared_ptr<yb::Schema>*, unsigned int*, yb::OpId*, long*, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >) () #5 0x0000557f2586c448 in yb::cdc::CDCServiceImpl::GetChanges(yb::cdc::GetChangesRequestPB const*, yb::cdc::GetChangesResponsePB*, yb::rpc::RpcContext) () #6 0x0000557f25908246 in std::__1::__function::__func<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) () #7 0x0000557f2590a6af in yb::cdc::CDCServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) () #8 0x0000557f26227a1e in yb::rpc::ServicePoolImpl::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) () #9 0x0000557f2616db2f in yb::rpc::InboundCall::InboundCallTask::Run() () #10 0x0000557f26236583 in yb::rpc::(anonymous namespace)::Worker::Execute() () #11 0x0000557f268698cf in yb::Thread::SuperviseThread(void*) () #12 0x00007fa6fce89694 in ?? () #13 0x0000000000000000 in ?? () ``` The problem is in the method: PopulateBeforeImage When we drop a column, the the row won't have data for the dropped column, and hence will not be added to the "old_tuple" member of RowMessage. This will mean the size of "old_tuple" does not match the number of columns in the schema. Which means this line: "row_message->old_tuple(static_cast<int>(index))" could lead to an out of bounds exception. Instead, now we are keeping track of the found columns in the row. Test Plan: Running existing ctests Reviewers: srangavajjula, sdash, skumar Reviewed By: sdash, skumar Differential Revision: https://phabricator.dev.yugabyte.com/D21338
jayant07-yb
pushed a commit
that referenced
this pull request
Dec 7, 2022
…e image Summary: We observed a crash while running TPCC workload with CDCSDK enabled. The stack trace is: ``` (gdb) bt #0 0x0000557f25b11910 in yb::DatumMessagePB::MergeFrom(yb::DatumMessagePB const&) () #1 0x0000557f258a41ef in yb::cdc::PopulateBeforeImage(std::__1::shared_ptr<yb::tablet::TabletPeer> const&, yb::ReadHybridTime const&, yb::cdc::RowMessage*, std::__1::unordered_map<unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&, std::__1::unordered_map<unsigned int, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB> >, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB> > > > > const&, yb::docdb::SubDocKey const&, yb::Schema const&, unsigned int) () #2 0x0000557f258a7304 in yb::cdc::PopulateCDCSDKIntentRecord(yb::OpId const&, yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::vector<yb::docdb::IntentKeyValueForCDC, std::__1::allocator<yb::docdb::IntentKeyValueForCDC> > const&, yb::cdc::StreamMetadata const&, std::__1::shared_ptr<yb::tablet::TabletPeer> const&, std::__1::unordered_map<unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&, std::__1::unordered_map<unsigned int, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB> >, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB> > > > > const&, yb::cdc::GetChangesResponsePB*, yb::ScopedTrackedConsumption*, unsigned int*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, yb::Schema*, unsigned int, unsigned long const&) () #3 0x0000557f258aaa27 in yb::cdc::ProcessIntents(yb::OpId const&, yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, yb::cdc::StreamMetadata const&, std::__1::unordered_map<unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&, std::__1::unordered_map<unsigned int, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB> >, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB> > > > > const&, yb::cdc::GetChangesResponsePB*, yb::ScopedTrackedConsumption*, yb::cdc::CDCSDKCheckpointPB*, std::__1::shared_ptr<yb::tablet::TabletPeer> const&, std::__1::vector<yb::docdb::IntentKeyValueForCDC, std::__1::allocator<yb::docdb::IntentKeyValueForCDC> >*, yb::docdb::ApplyTransactionState*, yb::client::YBClient*, std::__1::shared_ptr<yb::Schema>*, unsigned int*, unsigned long const&) () #4 0x0000557f258b00c1 in yb::cdc::GetChangesForCDCSDK(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, yb::cdc::CDCSDKCheckpointPB const&, yb::cdc::StreamMetadata const&, std::__1::shared_ptr<yb::tablet::TabletPeer> const&, std::__1::shared_ptr<yb::MemTracker> const&, std::__1::unordered_map<unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&, std::__1::unordered_map<unsigned int, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB> >, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB> > > > > const&, yb::client::YBClient*, yb::consensus::ReplicateMsgsHolder*, yb::cdc::GetChangesResponsePB*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, std::__1::shared_ptr<yb::Schema>*, unsigned int*, yb::OpId*, long*, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >) () #5 0x0000557f2586c448 in yb::cdc::CDCServiceImpl::GetChanges(yb::cdc::GetChangesRequestPB const*, yb::cdc::GetChangesResponsePB*, yb::rpc::RpcContext) () #6 0x0000557f25908246 in std::__1::__function::__func<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) () #7 0x0000557f2590a6af in yb::cdc::CDCServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) () #8 0x0000557f26227a1e in yb::rpc::ServicePoolImpl::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) () #9 0x0000557f2616db2f in yb::rpc::InboundCall::InboundCallTask::Run() () #10 0x0000557f26236583 in yb::rpc::(anonymous namespace)::Worker::Execute() () #11 0x0000557f268698cf in yb::Thread::SuperviseThread(void*) () #12 0x00007fa6fce89694 in ?? () #13 0x0000000000000000 in ?? () ``` The problem is in the method: PopulateBeforeImage When we drop a column, the the row won't have data for the dropped column, and hence will not be added to the "old_tuple" member of RowMessage. This will mean the size of "old_tuple" does not match the number of columns in the schema. Which means this line: "row_message->old_tuple(static_cast<int>(index))" could lead to an out of bounds exception. Instead, now we are keeping track of the found columns in the row. Test Plan: Running existing ctests Reviewers: srangavajjula, sdash, skumar Reviewed By: sdash, skumar Differential Revision: https://phabricator.dev.yugabyte.com/D21338
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added the Documentation regarding the integration of YugabyteDB(ysql) with Camunda.