-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docdb] Pack columns in DocDB storage format for better performance #3520
Comments
One other potential benefit of packed columns could be reducing space amplification from us writing the key portion over and over again, for every column in the table, due to our usage of rocksdb. Although in practice, we get some prefix compression benefit, on keys, from rocksdb, so we'd need to test out disabling that, to understand if the space amplification is really a problem. |
…lue objects Summary: Currently code to serialize value is contained in Value and PrimitiveValue objects. Originally data is contained in QLValuePB object. So to store entry to DocDB we construct PrimitiveValue and Value objects. It is not necessary, since we could serialize QLValuePB directly. This diff extracts serialization code to free function, and removes creation of unnecessary objects. Test Plan: Jenkins Reviewers: timur Reviewed By: timur Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D15806
Summary: This diff adds PackedRow class, that could be used to produce encoded packed row from column values. The packed row is encoded in the following format. Row packing/unpacking is accompanied by SchemaPacking class that is built from schema. We distinguish variable length columns from fixed length column. Fixed length column is non null columns with fixed length encoding, for instance int32. All other columns are variable length, for instance string or nullable int64. SchemaPacking contains ColumnPackingData for each column. See fields meaning below. For each variable length column we store offset of the end of his data. It allows us to easily find bytes range used by particular column. I.e. to find column data start we could just get end of previous varlen column and add offset_after_prev_varlen_column to it. Then in case of fixed column we could just add its size to find end of data. And in case of varlen column offset of the end of the data is just stored. The data is encoded as value type + actual encoded data. The serialized format: ``` varint: schema_version uint32: end_of_column_data for the 1st varlen column ... uint32: end_of_column_data for the last varlen column bytes: data for the 1st column ... bytes: data for the last column ``` NULL values are stored as columns with 0 length. Since we always embed ValueType in column data, we could easily distinguish NULL values from columns with empty value. The rationale for this format is to have ability to extract column value with O(1) complexity. Also it helps us to avoid storing common data for all rows, and put it to a single schema info. Test Plan: ybd --cxx-test packed_row-test Reviewers: timur, rthallam Reviewed By: timur, rthallam Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16143
Summary: This diff adds PackedRow usage during YSQL insert. Also PackedRow is handled during read. Introduced flag max_packed_row_columns to specify max number of columns in packed row. By default its value is -1, that disables packed rows. It is required because of backward compatibility. Only value columns are counted. If table has only key columns, then number of value columns will be 0. So it we set this value to 0, then packed rows will be active for such tables. Because of that we use -1 to disable this feature. Updated tablet metadata to store schema packing for old schema versions. Test Plan: PgMiniTest.PackedRow Reviewers: timur, bogdan Reviewed By: timur, bogdan Subscribers: rthallam, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16344
Summary: RocksDB provides a way to patch compaction results using CompactionFilter. The current interface allows the filter to discard an entry, or update its value. This interface is not convenient for advanced usage, e.g. when multiple entries are combined into a single one in the output. Such functionality is required for packed rows. This diff adds a new compaction callback API called "compaction feed", and switches DocDB compactions from using the RocksDB compaction filter API to this new API. The user supplies a feed implementation that accepts key/value pairs from the compaction iterator, and it can choose to either ignore them or pass them through to the underlying feed, which will ultimately write them to the compaction output file. In theory multiple such feeds can be chained together, but currently there are at most two in each compaction. With this approach, the advanced modification of the key/value pair sequence required by the packed rows feature becomes possible. Test Plan: Jenkins Reviewers: timur, mbautin, bogdan Reviewed By: timur, mbautin, bogdan Subscribers: mbautin, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16440
Summary: This diff implements repacking packed row during compaction. When packed row is found, it is not immediately forwarded to underlying feed. Instead of that packed row is stored and waits for column values. If such values found, they are applied to packed row and newly generated packed row is sent to underlying feed. The following left not implemented and should become parts of upcoming diffs: 1) Control field values are not handled for packed rows. 2) Cotables are not supported. 3) Packing separate columns into packed row. 4) Schema versions GC in table metadata. 5) Picking correct schema for repacking. I.e. we should pick schema that was applied before retention period. Test Plan: pg_packe_row-test Reviewers: mbautin, bogdan, timur Reviewed By: bogdan, timur Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16489
Summary: To read packed row, we need schema packing information. So we store all schema packings that are still in use in the tablet metadata. But rows could be repacked during compaction to new schema version. And old unused schema packings could be removed from tablet metadata after the compaction. This diff implements necessary logic for such GC. Schema versions used by an SST file are stored in user frontiers. Such versions are recalculated after compaction. When compaction is finished, old schema versions are removed from tablet metadata. Test Plan: PgPackedRowTest.SchemaGC Reviewers: timur, mbautin Reviewed By: timur, mbautin Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16626
…ng compaction Summary: This diff implements handling of cotable and colocation when packed row is repacked during compaction. Several tables could be stored in a single tablet, so they have different schema. This diff adds handling for such scenario, and adds logic to pick correct schema packing when there are multiple tables in the single tablet. Test Plan: PgPackedRowTest.Cotable Reviewers: timur, mbautin Reviewed By: mbautin Subscribers: dmitry, ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D16747
Summary: Currently we perform schema version check before reading DB. It does not prevent a situation in which the schema version is changed right after the check and the actual read is executed with a new schema version. This diff moves schema version check to after the read/write operation. Now we can be sure that schema version has not changed in parallel to this operation. Schema version errors take precedence over any other kinds of errors. Also, we are now checking that either all operations in the request have the correct schema or none of them do. If this check fails, we will return an error in release and log a fatal error in debug. Test Plan: Jenkins Reviewers: mbautin Reviewed By: mbautin Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D16952
Summary: There are values of fixed size such as bools, ints, floats. While packing not null column to packed row we expect that packed size will be 1 + size of value. Where 1 is reserved for value type. But bool values has different encoding, depending on value different value type is used (kTrue or kFalse). As result expected size of packed row does not match actual size. This diff fixes the issue by adjusting logic for bool columns. Test Plan: PgPackedRowTest.Serial Reviewers: rthallam Reviewed By: rthallam Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D17149
Summary: Prior to the introduction of the Packed Rows feature, all columns of a row had to be stored as separate RocksDB key/value pairs. To take advantage of the Packed Row format for existing clusters, this diff implements combining columns into a packed row during compaction. Test Plan: PgPackedRowTest.PackDuringCompaction Reviewers: mbautin Reviewed By: mbautin Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D17063
Summary: Support packed rows for CQL: 1) Add logic to pack row during CQL insert. 2) Skip collection columns while packing row. Should be addressed in followup diffs: 1) Support for control fields in packed row. 2) Keep column write time during compaction. Test Plan: cql-packed-row-test Reviewers: mbautin Reviewed By: mbautin Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D17255
Summary: This diff adds the following flags to configure packed row behaviour. And also adds logic to handle size based packed row limit. * ysql_enable_packed_row - Whether packed row is enabled for YSQL. * ysql_packed_row_size_limit - Packed row size limit for YSQL. 0 to pick this value from block size. * ycql_enable_packed_row - Whether packed row is enabled for YCQL. * ycql_packed_row_size_limit - Packed row size limit for YCQL. 0 to pick this value from block size. When packed row is over limit, then remaining columns will be stored as separate key values. Test Plan: PgPackedRowTest.BigValue Reviewers: mbautin Reviewed By: mbautin Subscribers: kannan, rthallam, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D17409
Summary: This diff adds the following flags to configure packed row behaviour. And also adds logic to handle size based packed row limit. * ysql_enable_packed_row - Whether packed row is enabled for YSQL. * ysql_packed_row_size_limit - Packed row size limit for YSQL. 0 to pick this value from block size. * ycql_enable_packed_row - Whether packed row is enabled for YCQL. * ycql_packed_row_size_limit - Packed row size limit for YCQL. 0 to pick this value from block size. When packed row is over limit, then remaining columns will be stored as separate key values. Original commit: 5900613/D17409 Test Plan: PgPackedRowTest.BigValue Reviewers: mbautin, rthallam Reviewed By: rthallam Subscribers: ybase, rthallam, kannan Differential Revision: https://phabricator.dev.yugabyte.com/D17529
Summary: This diff fixes the following issues related to PITR and packed rows. 1) Support reading pg catalog version from packed row. 2) Ignore tombstoned column updates during read and compaction of pgsql tables. When patching pg catalog version we pick max version from all entries present in existing state, add 1 to it and add to write batch after all other entries. The following cases are possible: (1) Restoring state: `doc_key => (restoring_version1, last_breaking_version1)` Existing state: `doc_key => (existing_version1, last_breaking_version2)` Here we will just pick existing_version1 as max found version. (2) Restoring state: `doc_key => (restoring_version1, last_breaking_version1)` `doc_key, column_id("current_version") => restoring_version2` Exiting state: `doc_key => (existing_version1, last_breaking_version2)` The same as above, with exception that entry for restoring_version2 will also be added. But with lower write id, since we put patched pg catalog version after iteration. (3) Restoring state: `doc_key => (restoring_version1, last_breaking_version1)` Exiting state: `doc_key => (existing_version1, last_breaking_version2)` `doc_key, column_id("current_version") => existing_version1` The max of existing_version2 and existing_version3 will be used. Also updated FetchState to handle packed rows by maintaining key value stack. Test Plan: YbAdminSnapshotScheduleTest.PgsqlDropDefaultWithPackedRow Reviewers: mbautin, skedia Reviewed By: mbautin, skedia Subscribers: zdrudi, timur, ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D17834
…cked row Summary: Prior to this diff, when an updated column was repacked to a packed row during a compaction, we would lose the column's original write time. There is an API in CQL to read a column's write time ( https://docs.yugabyte.com/preview/api/ycql/expr_fcall/#writetime-function ), so the above behavior is incorrect. This diff adds logic to keep original column write time while creating a packed row. Test Plan: CqlPackedRowTest.WriteTime Reviewers: mbautin Reviewed By: mbautin Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D19161
Packed Row format support landed. |
Jira Link: DB-2216
Packing columns into a single RocksDB entry per row instead of one per column (as we do currently) improves YSQL performance. Below is a table of benchmark results using a
JSONB
column as a proxy for a packed representation on disk.Setup
In this setup, there are two tables A and B (each with 128 columns). For each row in A, there are 4 rows in B. Consider the following two different way of creating A and B:
JSONB
containing the key-value attributes of the remaining columns. This last column can be used to simulate a packed column representation.Inserts
The following was observed when performing concurrent inserts from multiple clients.
Queries
The text was updated successfully, but these errors were encountered: