Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move PgMemctx map into PgApi in pggate #7216

Closed
mbautin opened this issue Feb 11, 2021 · 0 comments
Closed

Move PgMemctx map into PgApi in pggate #7216

mbautin opened this issue Feb 11, 2021 · 0 comments
Assignees

Comments

@mbautin
Copy link
Contributor

mbautin commented Feb 11, 2021

To reduce the number of global objects, we need to move the global PgMemctx map (postgres_process_memctxs) into PgApi and make its lifecycle management clearer, so that it is obvious that it coincides with the lifecycle of the PgApi (PgApiImpl) object.

@mbautin mbautin self-assigned this Feb 11, 2021
mbautin added a commit that referenced this issue Feb 12, 2021
Summary:
Clear the postgres_process_memctxs global PgMemctx object map in the YBCDestroyPgGate function, right after destroying the PgApiImpl object. If we don't do this, then we are leaving the destruction order of objects whose last refcount is held by the PgMemctx objects in the global map up to chance. Some of them contain Trace objects, which hold pointers to arenas allocated from a global thread-safe pool of arenas, but that thread-safe pool could have already been destroyed by the time the global PgMemctx map is being destructed. Global order of destructors in C++ is not well defined, and we should not rely on it.

Example crash stack trace: https://gist.githubusercontent.com/mbautin/37df1b73963b9e9cb92294be69560512/raw

Also see the stack traces in #7105

It would be good to move the global PgMemctx map into the PgApiImpl object in the future. That is tracked in #7216.

Test Plan: Jenkins: enable clang 11

Reviewers: neil, mihnea, dmitry

Reviewed By: dmitry

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D10589
d-uspenskiy pushed a commit that referenced this issue Mar 1, 2021
…CDestroyPgGate

Summary:
Clear the postgres_process_memctxs global PgMemctx object map in the YBCDestroyPgGate function, right after destroying the PgApiImpl object. If we don't do this, then we are leaving the destruction order of objects whose last refcount is held by the PgMemctx objects in the global map up to chance. Some of them contain Trace objects, which hold pointers to arenas allocated from a global thread-safe pool of arenas, but that thread-safe pool could have already been destroyed by the time the global PgMemctx map is being destructed. Global order of destructors in C++ is not well defined, and we should not rely on it.

Example crash stack trace: https://gist.githubusercontent.com/mbautin/37df1b73963b9e9cb92294be69560512/raw

Also see the stack traces in #7105

It would be good to move the global PgMemctx map into the PgApiImpl object in the future. That is tracked in #7216.

Original commit: D10589 / f00bb9c

Test Plan: Jenkins: rebase: 2.4 enable clang 11

Reviewers: neil, mihnea

Reviewed By: mihnea

Subscribers: kannan, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D10740
polarweasel pushed a commit to lizayugabyte/yugabyte-db that referenced this issue Mar 9, 2021
…yPgGate

Summary:
Clear the postgres_process_memctxs global PgMemctx object map in the YBCDestroyPgGate function, right after destroying the PgApiImpl object. If we don't do this, then we are leaving the destruction order of objects whose last refcount is held by the PgMemctx objects in the global map up to chance. Some of them contain Trace objects, which hold pointers to arenas allocated from a global thread-safe pool of arenas, but that thread-safe pool could have already been destroyed by the time the global PgMemctx map is being destructed. Global order of destructors in C++ is not well defined, and we should not rely on it.

Example crash stack trace: https://gist.githubusercontent.com/mbautin/37df1b73963b9e9cb92294be69560512/raw

Also see the stack traces in yugabyte#7105

It would be good to move the global PgMemctx map into the PgApiImpl object in the future. That is tracked in yugabyte#7216.

Test Plan: Jenkins: enable clang 11

Reviewers: neil, mihnea, dmitry

Reviewed By: dmitry

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D10589
@d-uspenskiy d-uspenskiy self-assigned this Jun 3, 2022
d-uspenskiy added a commit that referenced this issue Aug 22, 2022
… on postgres shutdown

Summary:
The diff consists of 3 parts.

**Part #1:** Multiple call of the `PerformFuture::Get` method
In case of session close when temp tables exists the following stack is possible

```
std::__1::future<yb::pggate::PerformResult>::get()
yb::pggate::PerformFuture::Get()                   // Second call of the PerformFuture::Get on same object
yb::pggate::PgDocResponse::Get()
yb::pggate::PgDocOp::~PgDocOp()
yb::pggate::PgDocReadOp::~PgDocReadOp()
...
yb::pggate::PgDml::~PgDml()
yb::pggate::PgDmlRead::~PgDmlRead()
yb::pggate::PgSelect::~PgSelect()
...
yb::pggate::PgMemctx::Clear()
yb::pggate::PgMemctx::~PgMemctx()
...
yb::pggate::ClearGlobalPgMemctxMap()
YBCDestroyPgGate
YBOnPostgresBackendShutdown
quickdie                                           // Caused by interruption by the SIGQUIT signal
...
std::__1::future<yb::pggate::PerformResult>::get()
yb::pggate::PerformFuture::Get()                   // First call of the PerformFuture::Get
yb::pggate::PgDocResponse::Get()
yb::pggate::PgDocOp::GetResult(std::__1::list<yb::pggate::PgDocResult, std::__1::allocator<yb::pggate::PgDocResult> >*)
yb::pggate::PgDml::FetchDataFromServer()
yb::pggate::PgDml::Fetch(int, unsigned long*, bool*,
yb::pggate::PgApiImpl::DmlFetch(yb::pggate::PgStatement*, int, unsigned long*,
YBCPgDmlFetch
ybcFetchNextHeapTuple
ybc_getnext_heaptuple
ybc_systable_getnext
systable_getnext
findDependentObjects
performDeletion
RemoveTempRelations
RemoveTempRelationsCallback
shmem_exit
proc_exit_prepare
proc_exit
PostgresMain
BackendRun
BackendStartup
ServerLoop
PostmasterMain
PostgresServerProcessMain
main
```

In this stack the `PerformFuture::Get` method is called on same object twice. But the `PerformFuture` object is designed to call `Get` method only once. After the first call the `PerformFuture::Valid` method should return `false` and `Get` method should not be called. But in the above stack the second call of the `PerformFuture::Get`method is done before the first one is returned (due to interruption by the `SIGQUIT` signal).

To handle this situation `session_` field is set to `null` at the very beginning of the `PerformFuture::Get`. In case `session_` is `null` the `PerformFuture::Valid` returns `false`.

**Part #2:** Access deleted `PgApiImpl` object
The `YBCDestroyPgGate` function destroys the `PgApiImpl` object and then calls the `ClearGlobalPgMemctxMap` function. But this function may call the code which access the `PgApiImpl` object fields. Here is the stack trace

```
std::__1::unique_ptr<yb::pggate::PgClient::Impl, std::__1::default_delete<yb::pggate::PgClient::Impl> >::operator->() const
yb::pggate::PgClient::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>)
   yb::pggate::PgTxnManager::ExitSeparateDdlTxnMode(yb::StronglyTypedBool<yb::pggate::Commit_Tag>)
yb::pggate::PgTxnManager::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>)
yb::pggate::PgTxnManager::AbortTransaction()
yb::pggate::PgTxnManager::~PgTxnManager()
...
yb::pggate::PgSession::~PgSession()
...
yb::pggate::PgStatement::~PgStatement()
yb::pggate::PgDml::~PgDml()
yb::pggate::PgDmlRead::~PgDmlRead()
yb::pggate::PgSelect::~PgSelect()
...
yb::pggate::PgMemctx::Clear()
yb::pggate::PgMemctx::~PgMemctx()
...
yb::pggate::ClearGlobalPgMemctxMap()
...
```

Solution is to move PgMemctx map into the `PgApiImpl` object (this also fixes #7216)

**Part #3:** Access shut down `PgClient`object

The `~PgApiImpl()` explicitly shut downs `pg_client_` object but the `pg_session_` is still alive. And when it will be destroyed the pg_client_ object may be used.

The stack is

```
yb::pggate::PgClient::Impl::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>)
yb::pggate::PgClient::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>)
yb::pggate::PgTxnManager::ExitSeparateDdlTxnMode(yb::StronglyTypedBool<yb::pggate::Commit_Tag>)
yb::pggate::PgTxnManager::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>)
yb::pggate::PgTxnManager::AbortTransaction()
yb::pggate::PgTxnManager::~PgTxnManager()
...
yb::pggate::PgSession::~PgSession()
...
yb::pggate::PgApiImpl::~PgApiImpl()
...
YBCDestroyPgGate
YBOnPostgresBackendShutdown
quickdie
...
```

Solution is to explicitly destroy `pg_session_` before shutting down of `pg_client_`

Original commit: 77396db / D17259

Test Plan:
Jenkins: rebase: 2.14

The following tests had crashed without the fix

```
./yb_build.sh asan --gtest_filter PgFKeyTest.AddFKCorrectnessOnTempTables
./yb_build.sh asan --gtest_filter PgCatalogPerfTest.CacheRefreshRPCCountWithPartitionTables
./yb_build.sh asan --gtest_filter PgMiniTest.BigInsertWithRestart
./yb_build.sh asan --gtest_filter AlterTableWithConcurrentTxnTest.TServerLeaderChange
```

Reviewers: dsrinivasan, alex, jason

Reviewed By: jason

Subscribers: yql, bogdan, mbautin

Differential Revision: https://phabricator.dev.yugabyte.com/D19048
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants