Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] INTO INTO ... RETURNS STATUS AS ROW might do useless extra reads for CQL tables with primary key having range components #23330

Closed
1 task done
ttyusupov opened this issue Jul 30, 2024 · 0 comments

Comments

@ttyusupov
Copy link
Contributor

ttyusupov commented Jul 30, 2024

Jira Link: DB-12255

Description

For CQL tables with primary key which have both hash and range components, INSERT INTO INTO ... RETURNS STATUS AS ROW might perform unnecessary extra reads. In cases when then there are a lot of deleted and not yet compacted rows with the same hash key column value, there are a couple of side effects:

  1. It might read all these rows during processing INSERT., which can increase the latency of the INSERT operation.
  2. If this unnecessary read scans a large number of deletes (if the data is TTL data), then it causes the DocDB iterator to update the statistics - docdb_obsolete_keys_found and docdb_obsolete_keys_found_past_cutoff (because of huge amount of obsolete keys that are already behind history cutoff - 15 minutes by default). This can potentially schedule compactions, added as part of - [DocDB] Compaction policies should take into account deletes in SST files and compact SSTs more frequently for TTL tables. #13786. Compactions can have a latency impact on the user workload as well.

This is a regression starting from commit f68853b

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@ttyusupov ttyusupov added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage kind/perf labels Jul 30, 2024
@ttyusupov ttyusupov self-assigned this Jul 30, 2024
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jul 30, 2024
@yugabyte-ci yugabyte-ci added priority/high High Priority 2.20.6_blocker and removed priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage labels Jul 30, 2024
ttyusupov added a commit that referenced this issue Aug 1, 2024
Summary:
For CQL tables with primary key which have both hash and range components, `INSERT INTO INTO ... RETURNS STATUS AS ROW` might do useless extra reads. In case when then there are a lot of deleted (not yet compacted) rows with the same hash key column value, it might read all these rows during processing INSERT. In its turn this might cause growth of `docdb_obsolete_keys_found`, `docdb_obsolete_keys_found` metrics. When part of those deleted keys is already behind history cutoff limit (15 minutes by default) metric `docdb_obsolete_keys_found_past_cutoff` will also be increased and this can lead to triggering automatic full compactions for affected tablets resulting in additional system load.

This behaviour starts with the following code change (f68853b) inside `CreateProjections` function for CQL operations: https://github.com/yugabyte/yugabyte-db/blame/889f44fb8153b9535663542d5bf4b4824c9da983/src/yb/docdb/cql_operation.cc#L135. It affects `QLWriteOperation::ReadColumns` logic in the following way. The change causes
`static_projection` to always contain hash key columns and being non-empty even if we are processing `INTO INTO ... RETURNS STATUS AS ROW` without modifying static columns (and even if we don't have static columns at all). This was required due to updated read path logic that is now reading values for required columns only but not for all. But as a side effect, this leads to `QLWriteOperation::ReadColumns` calling `QLWriteOperation::InitializeKeys` with `hashed_key = !static_projection->columns.empty() == true`. Because of this `QLWriteOperation::InitializeKeys` sets `hashed_doc_key_` to non-empty value. And due to this `QLWriteOperation::ReadColumns` tries to find existing rows in DocDB querying only by hashed column value (even if primary key has range columns): https://github.com/yugabyte/yugabyte-db/blame/889f44fb8153b9535663542d5bf4b4824c9da983/src/yb/docdb/cql_operation.cc#L499.

Due to TTL, there could be a lot of obsolete rows with the same hashed column value and `DocRowwiseIterator` skips them during iteration when trying to find a live row.
This causes high growth of both `docdb_obsolete_keys_found` and `docdb_obsolete_keys_found_past_cutoff` (when huge amount of obsolete keys is already behind history cutoff).

Added unit-test for catching this bug and implemented fix for `QLWriteOperation::ReadColumns` function.
Jira: DB-12255

Test Plan: `ybd --cxx-test integration-tests_cql-test --gtest_filter CqlTest.InsertHashAndRangePkWithReturnsStatusAsRow -n 50 -- -p 1` for asan/tsan/debug/release.

Reviewers: sergei, rthallam, arybochkin

Reviewed By: sergei, arybochkin

Subscribers: yql, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36921
jasonyb pushed a commit that referenced this issue Aug 2, 2024
Summary:
 66890cb [PLAT-14781] Add runtime config for download metrics as pdf
 b2f24f2 [#9797] YSQL: Stubs for ysql major version upgrade RPCs
 5891faa [#23332] Docdb: Fix yb-admin to handle snapshot_retention correctly
 0614c52 [PLAT-14756] Releases API RBAC cleanup
 2cfb999 [docs] Release notes for 2024.1.1.0-b137 (#23185)
 d868a03 [PLAT-13495] Do not run dual nic steps on systemd upgrade
 10bc447 [PLAT-14798] Adding internal load balancer to devspace
 7296df8 [docs] [2.23] Add pg_cron extension docs (#22546)
 79902ae [PLAT-14678] - feat : Export All Metrics to pdf
 8a0b95c [#23322] YSQL: pg_partman: fix logic of checking existing table
 Excluded: 63f471a [#18822] YSQL: Framework to skip redundant sec index updates and fkey checks when relevant columns not modified
 3040472 [PLAT-14783] [PGParity] After Edit RR cores scale up on PG Parity enabled cluster, the RR node does not have PGP gflags
 e052089 [PLAT-14774] Per process tserver metrics is not working if YSQL is disabled
 0c664a1 [#22370] docdb: Cost Based Optimizer changes to take into account backward scans improvement
 a060877 [PLAT-13712]fix plan info for Azure VMs
 291dd40 Remove 6sense domains from CSP headers (#23354)
 75cb273 [#23330] docdb: fixed static columns handling for CQL operations

Test Plan: Jenkins: rebase: pg15-cherrypicks

Reviewers: jason, tfoucher

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36984
ttyusupov added a commit that referenced this issue Aug 2, 2024
…operations

Summary:
For CQL tables with primary key which have both hash and range components, `INSERT INTO ... RETURNS STATUS AS ROW` might do useless extra reads. In case when there are a lot of deleted (not yet compacted) rows with the same hash key column value, it might read all these rows during processing INSERT. In its turn this might cause growth of `docdb_obsolete_keys_found`, `docdb_obsolete_keys_found` metrics. When part of those deleted keys is already behind history cutoff limit (15 minutes by default) metric `docdb_obsolete_keys_found_past_cutoff` will also be increased and this can lead to triggering automatic full compactions for affected tablets resulting in additional system load.

This behaviour starts with the following code change (f68853b) inside `CreateProjections` function for CQL operations: https://github.com/yugabyte/yugabyte-db/blame/889f44fb8153b9535663542d5bf4b4824c9da983/src/yb/docdb/cql_operation.cc#L135. It affects `QLWriteOperation::ReadColumns` logic in the following way. The change causes
`static_projection` to always contain hash key columns and being non-empty even if we are processing `INSERT INTO ... RETURNS STATUS AS ROW` without modifying static columns (and even if we don't have static columns at all). This was required due to updated read path logic that is now reading values for required columns only but not for all. But as a side effect, this leads to `QLWriteOperation::ReadColumns` calling `QLWriteOperation::InitializeKeys` with `hashed_key = !static_projection->columns.empty() == true`. Because of this `QLWriteOperation::InitializeKeys` sets `hashed_doc_key_` to non-empty value. And due to this `QLWriteOperation::ReadColumns` tries to find existing rows in DocDB querying only by hashed column value (even if primary key has range columns): https://github.com/yugabyte/yugabyte-db/blame/889f44fb8153b9535663542d5bf4b4824c9da983/src/yb/docdb/cql_operation.cc#L499.

Due to TTL, there could be a lot of obsolete rows with the same hashed column value and `DocRowwiseIterator` skips them during iteration when trying to find a live row.
This causes high growth of both `docdb_obsolete_keys_found` and `docdb_obsolete_keys_found_past_cutoff` (when huge amount of obsolete keys is already behind history cutoff).

Added unit-test for catching this bug and implemented fix for `QLWriteOperation::ReadColumns` function.
Jira: DB-12255

Original commit: 75cb273 / D36921

Test Plan: `ybd --cxx-test integration-tests_cql-test --gtest_filter CqlTest.InsertHashAndRangePkWithReturnsStatusAsRow -n 50 -- -p 1` for asan/tsan/debug/release.

Reviewers: sergei, rthallam, arybochkin

Reviewed By: arybochkin

Subscribers: ybase, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36982
ttyusupov added a commit that referenced this issue Aug 2, 2024
…L operations

Summary:
For CQL tables with primary key which have both hash and range components, `INSERT INTO INTO ... RETURNS STATUS AS ROW` might do useless extra reads. In case when then there are a lot of deleted (not yet compacted) rows with the same hash key column value, it might read all these rows during processing INSERT. In its turn this might cause growth of `docdb_obsolete_keys_found`, `docdb_obsolete_keys_found` metrics. When part of those deleted keys is already behind history cutoff limit (15 minutes by default) metric `docdb_obsolete_keys_found_past_cutoff` will also be increased and this can lead to triggering automatic full compactions for affected tablets resulting in additional system load.

This behaviour starts with the following code change (f68853b) inside `CreateProjections` function for CQL operations: https://github.com/yugabyte/yugabyte-db/blame/889f44fb8153b9535663542d5bf4b4824c9da983/src/yb/docdb/cql_operation.cc#L135. It affects `QLWriteOperation::ReadColumns` logic in the following way. The change causes
`static_projection` to always contain hash key columns and being non-empty even if we are processing `INTO INTO ... RETURNS STATUS AS ROW` without modifying static columns (and even if we don't have static columns at all). This was required due to updated read path logic that is now reading values for required columns only but not for all. But as a side effect, this leads to `QLWriteOperation::ReadColumns` calling `QLWriteOperation::InitializeKeys` with `hashed_key = !static_projection->columns.empty() == true`. Because of this `QLWriteOperation::InitializeKeys` sets `hashed_doc_key_` to non-empty value. And due to this `QLWriteOperation::ReadColumns` tries to find existing rows in DocDB querying only by hashed column value (even if primary key has range columns): https://github.com/yugabyte/yugabyte-db/blame/889f44fb8153b9535663542d5bf4b4824c9da983/src/yb/docdb/cql_operation.cc#L499.

Due to TTL, there could be a lot of obsolete rows with the same hashed column value and `DocRowwiseIterator` skips them during iteration when trying to find a live row.
This causes high growth of both `docdb_obsolete_keys_found` and `docdb_obsolete_keys_found_past_cutoff` (when huge amount of obsolete keys is already behind history cutoff).

Added unit-test for catching this bug and implemented fix for `QLWriteOperation::ReadColumns` function.
Jira: DB-12255

Original commit: 75cb273 / D36921

Test Plan: `ybd --cxx-test integration-tests_cql-test --gtest_filter CqlTest.InsertHashAndRangePkWithReturnsStatusAsRow -n 50 -- -p 1` for asan/tsan/debug/release.

Reviewers: sergei, rthallam, arybochkin

Reviewed By: arybochkin

Subscribers: yql, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36983
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants