-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Undefined behavior around PerformFuture in PG backend shutdown #12490
Labels
Comments
bmatican
added
area/ysql
Yugabyte SQL (YSQL)
kind/failing-test
Tests and testing infra
labels
May 12, 2022
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
labels
May 12, 2022
d-uspenskiy
added a commit
that referenced
this issue
Jun 3, 2022
…utdown Summary: The diff consists of 3 parts. **Part #1:** Multiple call of the `PerformFuture::Get` method In case of session close when temp tables exists the following stack is possible ``` std::__1::future<yb::pggate::PerformResult>::get() yb::pggate::PerformFuture::Get() // Second call of the PerformFuture::Get on same object yb::pggate::PgDocResponse::Get() yb::pggate::PgDocOp::~PgDocOp() yb::pggate::PgDocReadOp::~PgDocReadOp() ... yb::pggate::PgDml::~PgDml() yb::pggate::PgDmlRead::~PgDmlRead() yb::pggate::PgSelect::~PgSelect() ... yb::pggate::PgMemctx::Clear() yb::pggate::PgMemctx::~PgMemctx() ... yb::pggate::ClearGlobalPgMemctxMap() YBCDestroyPgGate YBOnPostgresBackendShutdown quickdie // Caused by interruption by the SIGQUIT signal ... std::__1::future<yb::pggate::PerformResult>::get() yb::pggate::PerformFuture::Get() // First call of the PerformFuture::Get yb::pggate::PgDocResponse::Get() yb::pggate::PgDocOp::GetResult(std::__1::list<yb::pggate::PgDocResult, std::__1::allocator<yb::pggate::PgDocResult> >*) yb::pggate::PgDml::FetchDataFromServer() yb::pggate::PgDml::Fetch(int, unsigned long*, bool*, yb::pggate::PgApiImpl::DmlFetch(yb::pggate::PgStatement*, int, unsigned long*, YBCPgDmlFetch ybcFetchNextHeapTuple ybc_getnext_heaptuple ybc_systable_getnext systable_getnext findDependentObjects performDeletion RemoveTempRelations RemoveTempRelationsCallback shmem_exit proc_exit_prepare proc_exit PostgresMain BackendRun BackendStartup ServerLoop PostmasterMain PostgresServerProcessMain main ``` In this stack the `PerformFuture::Get` method is called on same object twice. But the `PerformFuture` object is designed to call `Get` method only once. After the first call the `PerformFuture::Valid` method should return `false` and `Get` method should not be called. But in the above stack the second call of the `PerformFuture::Get`method is done before the first one is returned (due to interruption by the `SIGQUIT` signal). To handle this situation `session_` field is set to `null` at the very beginning of the `PerformFuture::Get`. In case `session_` is `null` the `PerformFuture::Valid` returns `false`. **Part #2:** Access deleted `PgApiImpl` object The `YBCDestroyPgGate` function destroys the `PgApiImpl` object and then calls the `ClearGlobalPgMemctxMap` function. But this function may call the code which access the `PgApiImpl` object fields. Here is the stack trace ``` std::__1::unique_ptr<yb::pggate::PgClient::Impl, std::__1::default_delete<yb::pggate::PgClient::Impl> >::operator->() const yb::pggate::PgClient::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>) yb::pggate::PgTxnManager::ExitSeparateDdlTxnMode(yb::StronglyTypedBool<yb::pggate::Commit_Tag>) yb::pggate::PgTxnManager::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>) yb::pggate::PgTxnManager::AbortTransaction() yb::pggate::PgTxnManager::~PgTxnManager() ... yb::pggate::PgSession::~PgSession() ... yb::pggate::PgStatement::~PgStatement() yb::pggate::PgDml::~PgDml() yb::pggate::PgDmlRead::~PgDmlRead() yb::pggate::PgSelect::~PgSelect() ... yb::pggate::PgMemctx::Clear() yb::pggate::PgMemctx::~PgMemctx() ... yb::pggate::ClearGlobalPgMemctxMap() ... ``` Solution is to move PgMemctx map into the `PgApiImpl` object (this also fixes #7216) **Part #3:** Access shut down `PgClient`object The `~PgApiImpl()` explicitly shut downs `pg_client_` object but the `pg_session_` is still alive. And when it will be destroyed the pg_client_ object may be used. The stack is ``` yb::pggate::PgClient::Impl::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>) yb::pggate::PgClient::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>) yb::pggate::PgTxnManager::ExitSeparateDdlTxnMode(yb::StronglyTypedBool<yb::pggate::Commit_Tag>) yb::pggate::PgTxnManager::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>) yb::pggate::PgTxnManager::AbortTransaction() yb::pggate::PgTxnManager::~PgTxnManager() ... yb::pggate::PgSession::~PgSession() ... yb::pggate::PgApiImpl::~PgApiImpl() ... YBCDestroyPgGate YBOnPostgresBackendShutdown quickdie ... ``` Solution is to explicitly destroy `pg_session_` before shutting down of `pg_client_` Test Plan: Jenkins The following tests had crashed without the fix ``` ./yb_build.sh asan --gtest_filter PgFKeyTest.AddFKCorrectnessOnTempTables ./yb_build.sh asan --gtest_filter PgCatalogPerfTest.CacheRefreshRPCCountWithPartitionTables ./yb_build.sh asan --gtest_filter PgMiniTest.BigInsertWithRestart ./yb_build.sh asan --gtest_filter AlterTableWithConcurrentTxnTest.TServerLeaderChange ``` Reviewers: dsrinivasan, jason, alex Reviewed By: alex Subscribers: mbautin, bogdan, yql Differential Revision: https://phabricator.dev.yugabyte.com/D17259
d-uspenskiy
added a commit
that referenced
this issue
Aug 22, 2022
… on postgres shutdown Summary: The diff consists of 3 parts. **Part #1:** Multiple call of the `PerformFuture::Get` method In case of session close when temp tables exists the following stack is possible ``` std::__1::future<yb::pggate::PerformResult>::get() yb::pggate::PerformFuture::Get() // Second call of the PerformFuture::Get on same object yb::pggate::PgDocResponse::Get() yb::pggate::PgDocOp::~PgDocOp() yb::pggate::PgDocReadOp::~PgDocReadOp() ... yb::pggate::PgDml::~PgDml() yb::pggate::PgDmlRead::~PgDmlRead() yb::pggate::PgSelect::~PgSelect() ... yb::pggate::PgMemctx::Clear() yb::pggate::PgMemctx::~PgMemctx() ... yb::pggate::ClearGlobalPgMemctxMap() YBCDestroyPgGate YBOnPostgresBackendShutdown quickdie // Caused by interruption by the SIGQUIT signal ... std::__1::future<yb::pggate::PerformResult>::get() yb::pggate::PerformFuture::Get() // First call of the PerformFuture::Get yb::pggate::PgDocResponse::Get() yb::pggate::PgDocOp::GetResult(std::__1::list<yb::pggate::PgDocResult, std::__1::allocator<yb::pggate::PgDocResult> >*) yb::pggate::PgDml::FetchDataFromServer() yb::pggate::PgDml::Fetch(int, unsigned long*, bool*, yb::pggate::PgApiImpl::DmlFetch(yb::pggate::PgStatement*, int, unsigned long*, YBCPgDmlFetch ybcFetchNextHeapTuple ybc_getnext_heaptuple ybc_systable_getnext systable_getnext findDependentObjects performDeletion RemoveTempRelations RemoveTempRelationsCallback shmem_exit proc_exit_prepare proc_exit PostgresMain BackendRun BackendStartup ServerLoop PostmasterMain PostgresServerProcessMain main ``` In this stack the `PerformFuture::Get` method is called on same object twice. But the `PerformFuture` object is designed to call `Get` method only once. After the first call the `PerformFuture::Valid` method should return `false` and `Get` method should not be called. But in the above stack the second call of the `PerformFuture::Get`method is done before the first one is returned (due to interruption by the `SIGQUIT` signal). To handle this situation `session_` field is set to `null` at the very beginning of the `PerformFuture::Get`. In case `session_` is `null` the `PerformFuture::Valid` returns `false`. **Part #2:** Access deleted `PgApiImpl` object The `YBCDestroyPgGate` function destroys the `PgApiImpl` object and then calls the `ClearGlobalPgMemctxMap` function. But this function may call the code which access the `PgApiImpl` object fields. Here is the stack trace ``` std::__1::unique_ptr<yb::pggate::PgClient::Impl, std::__1::default_delete<yb::pggate::PgClient::Impl> >::operator->() const yb::pggate::PgClient::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>) yb::pggate::PgTxnManager::ExitSeparateDdlTxnMode(yb::StronglyTypedBool<yb::pggate::Commit_Tag>) yb::pggate::PgTxnManager::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>) yb::pggate::PgTxnManager::AbortTransaction() yb::pggate::PgTxnManager::~PgTxnManager() ... yb::pggate::PgSession::~PgSession() ... yb::pggate::PgStatement::~PgStatement() yb::pggate::PgDml::~PgDml() yb::pggate::PgDmlRead::~PgDmlRead() yb::pggate::PgSelect::~PgSelect() ... yb::pggate::PgMemctx::Clear() yb::pggate::PgMemctx::~PgMemctx() ... yb::pggate::ClearGlobalPgMemctxMap() ... ``` Solution is to move PgMemctx map into the `PgApiImpl` object (this also fixes #7216) **Part #3:** Access shut down `PgClient`object The `~PgApiImpl()` explicitly shut downs `pg_client_` object but the `pg_session_` is still alive. And when it will be destroyed the pg_client_ object may be used. The stack is ``` yb::pggate::PgClient::Impl::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>) yb::pggate::PgClient::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>) yb::pggate::PgTxnManager::ExitSeparateDdlTxnMode(yb::StronglyTypedBool<yb::pggate::Commit_Tag>) yb::pggate::PgTxnManager::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>) yb::pggate::PgTxnManager::AbortTransaction() yb::pggate::PgTxnManager::~PgTxnManager() ... yb::pggate::PgSession::~PgSession() ... yb::pggate::PgApiImpl::~PgApiImpl() ... YBCDestroyPgGate YBOnPostgresBackendShutdown quickdie ... ``` Solution is to explicitly destroy `pg_session_` before shutting down of `pg_client_` Original commit: 77396db / D17259 Test Plan: Jenkins: rebase: 2.14 The following tests had crashed without the fix ``` ./yb_build.sh asan --gtest_filter PgFKeyTest.AddFKCorrectnessOnTempTables ./yb_build.sh asan --gtest_filter PgCatalogPerfTest.CacheRefreshRPCCountWithPartitionTables ./yb_build.sh asan --gtest_filter PgMiniTest.BigInsertWithRestart ./yb_build.sh asan --gtest_filter AlterTableWithConcurrentTxnTest.TServerLeaderChange ``` Reviewers: dsrinivasan, alex, jason Reviewed By: jason Subscribers: yql, bogdan, mbautin Differential Revision: https://phabricator.dev.yugabyte.com/D19048
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Jira Link: [DB-323](https://yugabyte.atlassian.net/browse/DB-323)
Description
Examples
https://jenkins.dev.yugabyte.com/job/github-yugabyte-db-alma8-master-clang12-asan/422/artifact/build/asan-clang12-dynamic-ninja/yb-test-logs/tests-pgwrapper__pg_catalog_perf-test/PgCatalogPerfTest_CacheRefreshRPCCountWithPartitionTables.log
https://jenkins.dev.yugabyte.com/job/github-yugabyte-db-alma8-master-clang12-asan/422/artifact/build/asan-clang12-dynamic-ninja/yb-test-logs/tests-pgwrapper__pg_fkey-test/PgFKeyTest_AddFKCorrectnessOnTempTables.log
https://jenkins.dev.yugabyte.com/job/github-yugabyte-db-alma8-master-clang12-asan/422/artifact/build/asan-clang12-dynamic-ninja/yb-test-logs/tests-pgwrapper__alter_table_with_concurrent_txn-test/AlterTableWithConcurrentTxnTest_TServerLeaderChange.log
https://jenkins.dev.yugabyte.com/job/github-yugabyte-db-alma8-master-clang12-asan/422/artifact/build/asan-clang12-dynamic-ninja/yb-test-logs/tests-pgwrapper__pg_mini-test/PgMiniTest_BigInsertWithRestart.log
cc @spolitov
The text was updated successfully, but these errors were encountered: