[docdb] Pack columns in DocDB storage format for better performance #3520

rkarthik007 · 2020-02-01T01:07:14Z

Jira Link: DB-2216
Packing columns into a single RocksDB entry per row instead of one per column (as we do currently) improves YSQL performance. Below is a table of benchmark results using a JSONB column as a proxy for a packed representation on disk.

Setup

In this setup, there are two tables A and B (each with 128 columns). For each row in A, there are 4 rows in B. Consider the following two different way of creating A and B:

A1 and B1 both have 128 column each.
A2 and B2 have around 15 of the original columns each, and one additional column of type JSONB containing the key-value attributes of the remaining columns. This last column can be used to simulate a packed column representation.

Inserts

The following was observed when performing concurrent inserts from multiple clients.

Table Name	Num Indexes	Batch Size	Num Clients	Throughput
A1	0	0	32	1.5K
A2	0	0	32	2.8K
A1	2	20	8	1.5K
A2	2	20	8	3K

Queries

Table Name	Operation Type	Num Clients	Throughput	Latency
A1	JOIN on A1, B1	128	4K	32ms
A2	JOIN on A2, B2	128	10K	15ms
A1	PK QUERY on A1	128	14K	6ms
A2	PK QUERY on A2	128	28K	4ms

The text was updated successfully, but these errors were encountered:

bmatican · 2021-06-08T14:20:58Z

One other potential benefit of packed columns could be reducing space amplification from us writing the key portion over and over again, for every column in the table, due to our usage of rocksdb.

Although in practice, we get some prefix compression benefit, on keys, from rocksdb, so we'd need to test out disabling that, to understand if the space amplification is really a problem.

…lue objects Summary: Currently code to serialize value is contained in Value and PrimitiveValue objects. Originally data is contained in QLValuePB object. So to store entry to DocDB we construct PrimitiveValue and Value objects. It is not necessary, since we could serialize QLValuePB directly. This diff extracts serialization code to free function, and removes creation of unnecessary objects. Test Plan: Jenkins Reviewers: timur Reviewed By: timur Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D15806

Summary: This diff adds PackedRow class, that could be used to produce encoded packed row from column values. The packed row is encoded in the following format. Row packing/unpacking is accompanied by SchemaPacking class that is built from schema. We distinguish variable length columns from fixed length column. Fixed length column is non null columns with fixed length encoding, for instance int32. All other columns are variable length, for instance string or nullable int64. SchemaPacking contains ColumnPackingData for each column. See fields meaning below. For each variable length column we store offset of the end of his data. It allows us to easily find bytes range used by particular column. I.e. to find column data start we could just get end of previous varlen column and add offset_after_prev_varlen_column to it. Then in case of fixed column we could just add its size to find end of data. And in case of varlen column offset of the end of the data is just stored. The data is encoded as value type + actual encoded data. The serialized format: ``` varint: schema_version uint32: end_of_column_data for the 1st varlen column ... uint32: end_of_column_data for the last varlen column bytes: data for the 1st column ... bytes: data for the last column ``` NULL values are stored as columns with 0 length. Since we always embed ValueType in column data, we could easily distinguish NULL values from columns with empty value. The rationale for this format is to have ability to extract column value with O(1) complexity. Also it helps us to avoid storing common data for all rows, and put it to a single schema info. Test Plan: ybd --cxx-test packed_row-test Reviewers: timur, rthallam Reviewed By: timur, rthallam Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16143

Summary: This diff adds PackedRow usage during YSQL insert. Also PackedRow is handled during read. Introduced flag max_packed_row_columns to specify max number of columns in packed row. By default its value is -1, that disables packed rows. It is required because of backward compatibility. Only value columns are counted. If table has only key columns, then number of value columns will be 0. So it we set this value to 0, then packed rows will be active for such tables. Because of that we use -1 to disable this feature. Updated tablet metadata to store schema packing for old schema versions. Test Plan: PgMiniTest.PackedRow Reviewers: timur, bogdan Reviewed By: timur, bogdan Subscribers: rthallam, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16344

Summary: RocksDB provides a way to patch compaction results using CompactionFilter. The current interface allows the filter to discard an entry, or update its value. This interface is not convenient for advanced usage, e.g. when multiple entries are combined into a single one in the output. Such functionality is required for packed rows. This diff adds a new compaction callback API called "compaction feed", and switches DocDB compactions from using the RocksDB compaction filter API to this new API. The user supplies a feed implementation that accepts key/value pairs from the compaction iterator, and it can choose to either ignore them or pass them through to the underlying feed, which will ultimately write them to the compaction output file. In theory multiple such feeds can be chained together, but currently there are at most two in each compaction. With this approach, the advanced modification of the key/value pair sequence required by the packed rows feature becomes possible. Test Plan: Jenkins Reviewers: timur, mbautin, bogdan Reviewed By: timur, mbautin, bogdan Subscribers: mbautin, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16440

Summary: This diff implements repacking packed row during compaction. When packed row is found, it is not immediately forwarded to underlying feed. Instead of that packed row is stored and waits for column values. If such values found, they are applied to packed row and newly generated packed row is sent to underlying feed. The following left not implemented and should become parts of upcoming diffs: 1) Control field values are not handled for packed rows. 2) Cotables are not supported. 3) Packing separate columns into packed row. 4) Schema versions GC in table metadata. 5) Picking correct schema for repacking. I.e. we should pick schema that was applied before retention period. Test Plan: pg_packe_row-test Reviewers: mbautin, bogdan, timur Reviewed By: bogdan, timur Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16489

Summary: To read packed row, we need schema packing information. So we store all schema packings that are still in use in the tablet metadata. But rows could be repacked during compaction to new schema version. And old unused schema packings could be removed from tablet metadata after the compaction. This diff implements necessary logic for such GC. Schema versions used by an SST file are stored in user frontiers. Such versions are recalculated after compaction. When compaction is finished, old schema versions are removed from tablet metadata. Test Plan: PgPackedRowTest.SchemaGC Reviewers: timur, mbautin Reviewed By: timur, mbautin Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16626

…ng compaction Summary: This diff implements handling of cotable and colocation when packed row is repacked during compaction. Several tables could be stored in a single tablet, so they have different schema. This diff adds handling for such scenario, and adds logic to pick correct schema packing when there are multiple tables in the single tablet. Test Plan: PgPackedRowTest.Cotable Reviewers: timur, mbautin Reviewed By: mbautin Subscribers: dmitry, ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D16747

Summary: Currently we perform schema version check before reading DB. It does not prevent a situation in which the schema version is changed right after the check and the actual read is executed with a new schema version. This diff moves schema version check to after the read/write operation. Now we can be sure that schema version has not changed in parallel to this operation. Schema version errors take precedence over any other kinds of errors. Also, we are now checking that either all operations in the request have the correct schema or none of them do. If this check fails, we will return an error in release and log a fatal error in debug. Test Plan: Jenkins Reviewers: mbautin Reviewed By: mbautin Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16952

Summary: There are values of fixed size such as bools, ints, floats. While packing not null column to packed row we expect that packed size will be 1 + size of value. Where 1 is reserved for value type. But bool values has different encoding, depending on value different value type is used (kTrue or kFalse). As result expected size of packed row does not match actual size. This diff fixes the issue by adjusting logic for bool columns. Test Plan: PgPackedRowTest.Serial Reviewers: rthallam Reviewed By: rthallam Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D17149

Summary: Prior to the introduction of the Packed Rows feature, all columns of a row had to be stored as separate RocksDB key/value pairs. To take advantage of the Packed Row format for existing clusters, this diff implements combining columns into a packed row during compaction. Test Plan: PgPackedRowTest.PackDuringCompaction Reviewers: mbautin Reviewed By: mbautin Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D17063

Summary: Support packed rows for CQL: 1) Add logic to pack row during CQL insert. 2) Skip collection columns while packing row. Should be addressed in followup diffs: 1) Support for control fields in packed row. 2) Keep column write time during compaction. Test Plan: cql-packed-row-test Reviewers: mbautin Reviewed By: mbautin Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D17255

Summary: This diff adds the following flags to configure packed row behaviour. And also adds logic to handle size based packed row limit. * ysql_enable_packed_row - Whether packed row is enabled for YSQL. * ysql_packed_row_size_limit - Packed row size limit for YSQL. 0 to pick this value from block size. * ycql_enable_packed_row - Whether packed row is enabled for YCQL. * ycql_packed_row_size_limit - Packed row size limit for YCQL. 0 to pick this value from block size. When packed row is over limit, then remaining columns will be stored as separate key values. Test Plan: PgPackedRowTest.BigValue Reviewers: mbautin Reviewed By: mbautin Subscribers: kannan, rthallam, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D17409

Summary: This diff adds the following flags to configure packed row behaviour. And also adds logic to handle size based packed row limit. * ysql_enable_packed_row - Whether packed row is enabled for YSQL. * ysql_packed_row_size_limit - Packed row size limit for YSQL. 0 to pick this value from block size. * ycql_enable_packed_row - Whether packed row is enabled for YCQL. * ycql_packed_row_size_limit - Packed row size limit for YCQL. 0 to pick this value from block size. When packed row is over limit, then remaining columns will be stored as separate key values. Original commit: 5900613/D17409 Test Plan: PgPackedRowTest.BigValue Reviewers: mbautin, rthallam Reviewed By: rthallam Subscribers: ybase, rthallam, kannan Differential Revision: https://phabricator.dev.yugabyte.com/D17529

Summary: This diff fixes the following issues related to PITR and packed rows. 1) Support reading pg catalog version from packed row. 2) Ignore tombstoned column updates during read and compaction of pgsql tables. When patching pg catalog version we pick max version from all entries present in existing state, add 1 to it and add to write batch after all other entries. The following cases are possible: (1) Restoring state: `doc_key => (restoring_version1, last_breaking_version1)` Existing state: `doc_key => (existing_version1, last_breaking_version2)` Here we will just pick existing_version1 as max found version. (2) Restoring state: `doc_key => (restoring_version1, last_breaking_version1)` `doc_key, column_id("current_version") => restoring_version2` Exiting state: `doc_key => (existing_version1, last_breaking_version2)` The same as above, with exception that entry for restoring_version2 will also be added. But with lower write id, since we put patched pg catalog version after iteration. (3) Restoring state: `doc_key => (restoring_version1, last_breaking_version1)` Exiting state: `doc_key => (existing_version1, last_breaking_version2)` `doc_key, column_id("current_version") => existing_version1` The max of existing_version2 and existing_version3 will be used. Also updated FetchState to handle packed rows by maintaining key value stack. Test Plan: YbAdminSnapshotScheduleTest.PgsqlDropDefaultWithPackedRow Reviewers: mbautin, skedia Reviewed By: mbautin, skedia Subscribers: zdrudi, timur, ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D17834

…cked row Summary: Prior to this diff, when an updated column was repacked to a packed row during a compaction, we would lose the column's original write time. There is an API in CQL to read a column's write time ( https://docs.yugabyte.com/preview/api/ycql/expr_fcall/#writetime-function ), so the above behavior is incorrect. This diff adds logic to keep original column write time while creating a packed row. Test Plan: CqlPackedRowTest.WriteTime Reviewers: mbautin Reviewed By: mbautin Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D19161

rthallamko3 · 2022-10-24T22:44:50Z

Packed Row format support landed.

rkarthik007 added area/ysql Yugabyte SQL (YSQL) area/docdb YugabyteDB core features labels Feb 1, 2020

rkarthik007 assigned ndeodhar Feb 1, 2020

ndeodhar mentioned this issue Jun 9, 2020

Ysql: cannot count, list, delete many rows #4692

Closed

rkarthik007 mentioned this issue Jul 28, 2020

Improve YSQL query performance #5242

Open

ajcaldera1 mentioned this issue Aug 11, 2020

Allow overriding the RPC communication port for masters/tservers in Platform #3806

Closed

bmatican assigned bmatican and unassigned ndeodhar Oct 24, 2020

bmatican changed the title ~~Pack columns in DocDB storage format for better performance~~ [docdb] Pack columns in DocDB storage format for better performance Oct 24, 2020

ddorian mentioned this issue Jun 7, 2021

Can yugabyte-db modify column ordinal position? #8781

Closed

bmatican assigned rahuldesirazu and unassigned bmatican Jul 22, 2021

sushantrmishra mentioned this issue Jan 15, 2022

[YSQL] Support TwoPhase commit #11084

Open

omkar-yb added this to YQL-beta Jan 21, 2022

bmatican mentioned this issue Mar 3, 2022

Allow storing multiple columns in one RocksDB value #1887

Closed

bmatican assigned spolitov and rthallamko3 and unassigned rahuldesirazu Mar 4, 2022

ymahajan mentioned this issue Mar 15, 2022

[New Feature] Faster Bulk-Data Loading in YugabyteDB #11765

Open

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 9, 2022

ddorian mentioned this issue Jun 9, 2022

[YSQL] Very bad performance compared with Postgres #12837

Closed

yugabyte-ci closed this as completed Jun 29, 2022

yugabyte-ci moved this to Done in YQL-beta Jun 29, 2022

yugabyte-ci reopened this Jun 29, 2022

FranckPachot mentioned this issue Jul 8, 2022

Set latest optimization feature for YugabyteDB MaibornWolff/database-performance-comparison#5

Merged

kmuthukk mentioned this issue Jul 13, 2022

[DocDB] Pack full row updates #13291

Closed

nyndyny mentioned this issue Oct 2, 2022

[Snyk] Upgrade react-query from 3.15.2 to 3.39.2 nyndyny/yugabyte-db#3

Open

yugabyte-ci removed the area/ysql Yugabyte SQL (YSQL) label Oct 12, 2022

rthallamko3 closed this as completed Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docdb] Pack columns in DocDB storage format for better performance #3520

[docdb] Pack columns in DocDB storage format for better performance #3520

rkarthik007 commented Feb 1, 2020 •

edited by yugabyte-ci

Loading

bmatican commented Jun 8, 2021

rthallamko3 commented Oct 24, 2022

[docdb] Pack columns in DocDB storage format for better performance #3520

[docdb] Pack columns in DocDB storage format for better performance #3520

Comments

rkarthik007 commented Feb 1, 2020 • edited by yugabyte-ci Loading

Setup

Inserts

Queries

bmatican commented Jun 8, 2021

rthallamko3 commented Oct 24, 2022

rkarthik007 commented Feb 1, 2020 •

edited by yugabyte-ci

Loading