Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docdb] Pack columns in DocDB storage format for better performance #3520

Closed
rkarthik007 opened this issue Feb 1, 2020 · 2 comments
Closed
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@rkarthik007
Copy link
Collaborator

rkarthik007 commented Feb 1, 2020

Jira Link: DB-2216
Packing columns into a single RocksDB entry per row instead of one per column (as we do currently) improves YSQL performance. Below is a table of benchmark results using a JSONB column as a proxy for a packed representation on disk.

Setup

In this setup, there are two tables A and B (each with 128 columns). For each row in A, there are 4 rows in B. Consider the following two different way of creating A and B:

  1. A1 and B1 both have 128 column each.
  2. A2 and B2 have around 15 of the original columns each, and one additional column of type JSONB containing the key-value attributes of the remaining columns. This last column can be used to simulate a packed column representation.

Inserts

The following was observed when performing concurrent inserts from multiple clients.

Table Name Num Indexes Batch Size Num Clients Throughput
A1 0 0 32 1.5K
A2 0 0 32 2.8K
A1 2 20 8 1.5K
A2 2 20 8 3K

Queries

Table Name Operation Type Num Clients Throughput Latency
A1 JOIN on A1, B1 128 4K 32ms
A2 JOIN on A2, B2 128 10K 15ms
A1 PK QUERY on A1 128 14K 6ms
A2 PK QUERY on A2 128 28K 4ms
@rkarthik007 rkarthik007 added area/ysql Yugabyte SQL (YSQL) area/docdb YugabyteDB core features labels Feb 1, 2020
@bmatican bmatican assigned bmatican and unassigned ndeodhar Oct 24, 2020
@bmatican bmatican changed the title Pack columns in DocDB storage format for better performance [docdb] Pack columns in DocDB storage format for better performance Oct 24, 2020
@bmatican
Copy link
Contributor

bmatican commented Jun 8, 2021

One other potential benefit of packed columns could be reducing space amplification from us writing the key portion over and over again, for every column in the table, due to our usage of rocksdb.

Although in practice, we get some prefix compression benefit, on keys, from rocksdb, so we'd need to test out disabling that, to understand if the space amplification is really a problem.

@bmatican bmatican assigned rahuldesirazu and unassigned bmatican Jul 22, 2021
@bmatican bmatican assigned spolitov and rthallamko3 and unassigned rahuldesirazu Mar 4, 2022
spolitov added a commit that referenced this issue Mar 17, 2022
…lue objects

Summary:
Currently code to serialize value is contained in Value and PrimitiveValue objects.
Originally data is contained in QLValuePB object.
So to store entry to DocDB we construct PrimitiveValue and Value objects.
It is not necessary, since we could serialize QLValuePB directly.

This diff extracts serialization code to free function, and removes creation of unnecessary objects.

Test Plan: Jenkins

Reviewers: timur

Reviewed By: timur

Subscribers: ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D15806
spolitov added a commit that referenced this issue Mar 27, 2022
Summary:
This diff adds PackedRow class, that could be used to produce encoded packed row from column values.

The packed row is encoded in the following format.
Row packing/unpacking is accompanied by SchemaPacking class that is built from schema.

We distinguish variable length columns from fixed length column.
Fixed length column is non null columns with fixed length encoding, for instance int32.
All other columns are variable length, for instance string or nullable int64.

SchemaPacking contains ColumnPackingData for each column. See fields meaning below.

For each variable length column we store offset of the end of his data.
It allows us to easily find bytes range used by particular column.
I.e. to find column data start we could just get end of previous varlen column and
add offset_after_prev_varlen_column to it.
Then in case of fixed column we could just add its size to find end of data.
And in case of varlen column offset of the end of the data is just stored.

The data is encoded as value type + actual encoded data.

The serialized format:
```
varint: schema_version
uint32: end_of_column_data for the 1st varlen column
...
uint32: end_of_column_data for the last varlen column
bytes: data for the 1st column
...
bytes: data for the last column
```

NULL values are stored as columns with 0 length.
Since we always embed ValueType in column data, we could easily distinguish NULL values
from columns with empty value.

The rationale for this format is to have ability to extract column value with O(1) complexity.
Also it helps us to avoid storing common data for all rows, and put it to a single schema info.

Test Plan: ybd --cxx-test packed_row-test

Reviewers: timur, rthallam

Reviewed By: timur, rthallam

Subscribers: bogdan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D16143
spolitov added a commit that referenced this issue Apr 6, 2022
Summary:
This diff adds PackedRow usage during YSQL insert.
Also PackedRow is handled during read.

Introduced flag max_packed_row_columns to specify max number of columns in packed row.
By default its value is -1, that disables packed rows. It is required because of backward compatibility.

Only value columns are counted. If table has only key columns, then number of value columns will be 0.
So it we set this value to 0, then packed rows will be active for such tables.
Because of that we use -1 to disable this feature.

Updated tablet metadata to store schema packing for old schema versions.

Test Plan: PgMiniTest.PackedRow

Reviewers: timur, bogdan

Reviewed By: timur, bogdan

Subscribers: rthallam, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D16344
spolitov added a commit that referenced this issue Apr 12, 2022
Summary:
RocksDB provides a way to patch compaction results using CompactionFilter. The current interface allows the filter to discard an entry, or update its value. This interface is not convenient for advanced usage, e.g. when multiple entries are combined into a single one in the output. Such functionality is required for packed rows.

This diff adds a new compaction callback API called "compaction feed", and switches DocDB compactions from using the RocksDB compaction filter API to this new API. The user supplies a feed implementation that accepts key/value pairs from the compaction iterator, and it can choose to either ignore them or pass them through to the underlying feed, which will ultimately write them to the compaction output file. In theory multiple such feeds can be chained together, but currently there are at most two in each compaction. With this approach, the advanced modification of the key/value pair sequence required by the packed rows feature becomes possible.

Test Plan: Jenkins

Reviewers: timur, mbautin, bogdan

Reviewed By: timur, mbautin, bogdan

Subscribers: mbautin, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D16440
spolitov added a commit that referenced this issue Apr 15, 2022
Summary:
This diff implements repacking packed row during compaction.
When packed row is found, it is not immediately forwarded to underlying feed.
Instead of that packed row is stored and waits for column values.
If such values found, they are applied to packed row and newly generated packed row is sent to underlying feed.

The following left not implemented and should become parts of upcoming diffs:
1) Control field values are not handled for packed rows.
2) Cotables are not supported.
3) Packing separate columns into packed row.
4) Schema versions GC in table metadata.
5) Picking correct schema for repacking. I.e. we should pick schema that was applied before retention period.

Test Plan: pg_packe_row-test

Reviewers: mbautin, bogdan, timur

Reviewed By: bogdan, timur

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D16489
spolitov added a commit that referenced this issue Apr 28, 2022
Summary:
To read packed row, we need schema packing information.
So we store all schema packings that are still in use in the tablet metadata.

But rows could be repacked during compaction to new schema version.
And old unused schema packings could be removed from tablet metadata after the compaction.

This diff implements necessary logic for such GC.
Schema versions used by an SST file are stored in user frontiers.
Such versions are recalculated after compaction.
When compaction is finished, old schema versions are removed from tablet metadata.

Test Plan: PgPackedRowTest.SchemaGC

Reviewers: timur, mbautin

Reviewed By: timur, mbautin

Subscribers: bogdan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D16626
spolitov added a commit that referenced this issue May 3, 2022
…ng compaction

Summary:
This diff implements handling of cotable and colocation when packed row is repacked during compaction.

Several tables could be stored in a single tablet, so they have different schema.
This diff adds handling for such scenario, and adds logic to pick correct schema packing when there are multiple tables in the single tablet.

Test Plan: PgPackedRowTest.Cotable

Reviewers: timur, mbautin

Reviewed By: mbautin

Subscribers: dmitry, ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D16747
spolitov added a commit that referenced this issue May 19, 2022
Summary:
Currently we perform schema version check before reading DB.
It does not prevent a situation in which the schema version is changed right after the check and the actual read is executed with a new schema version.

This diff moves schema version check to after the read/write operation. Now we can be sure that schema version has not changed in parallel to this operation.

Schema version errors take precedence over any other kinds of errors.

Also, we are now checking that either all operations in the request have the correct schema or none of them do. If this check fails, we will return an error in release and log a fatal error in debug.

Test Plan: Jenkins

Reviewers: mbautin

Reviewed By: mbautin

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D16952
spolitov added a commit that referenced this issue May 24, 2022
Summary:
There are values of fixed size such as bools, ints, floats.
While packing not null column to packed row we expect that packed size will be 1 + size of value.
Where 1 is reserved for value type.
But bool values has different encoding, depending on value different value type is used (kTrue or kFalse).
As result expected size of packed row does not match actual size.

This diff fixes the issue by adjusting logic for bool columns.

Test Plan: PgPackedRowTest.Serial

Reviewers: rthallam

Reviewed By: rthallam

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D17149
spolitov added a commit that referenced this issue May 26, 2022
Summary: Prior to the introduction of the Packed Rows feature, all columns of a row had to be stored as separate RocksDB key/value pairs. To take advantage of the Packed Row format for existing clusters, this diff implements combining columns into a packed row during compaction.

Test Plan: PgPackedRowTest.PackDuringCompaction

Reviewers: mbautin

Reviewed By: mbautin

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D17063
spolitov added a commit that referenced this issue Jun 5, 2022
Summary:
Support packed rows for CQL:
1) Add logic to pack row during CQL insert.
2) Skip collection columns while packing row.

Should be addressed in followup diffs:
1) Support for control fields in packed row.
2) Keep column write time during compaction.

Test Plan: cql-packed-row-test

Reviewers: mbautin

Reviewed By: mbautin

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D17255
spolitov added a commit that referenced this issue Jun 8, 2022
Summary:
This diff adds the following flags to configure packed row behaviour.
And also adds logic to handle size based packed row limit.
* ysql_enable_packed_row - Whether packed row is enabled for YSQL.
* ysql_packed_row_size_limit - Packed row size limit for YSQL. 0 to pick this value from block size.
* ycql_enable_packed_row - Whether packed row is enabled for YCQL.
* ycql_packed_row_size_limit - Packed row size limit for YCQL. 0 to pick this value from block size.

When packed row is over limit, then remaining columns will be stored as separate key values.

Test Plan: PgPackedRowTest.BigValue

Reviewers: mbautin

Reviewed By: mbautin

Subscribers: kannan, rthallam, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D17409
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 9, 2022
spolitov added a commit that referenced this issue Jun 9, 2022
Summary:
This diff adds the following flags to configure packed row behaviour.
And also adds logic to handle size based packed row limit.
* ysql_enable_packed_row - Whether packed row is enabled for YSQL.
* ysql_packed_row_size_limit - Packed row size limit for YSQL. 0 to pick this value from block size.
* ycql_enable_packed_row - Whether packed row is enabled for YCQL.
* ycql_packed_row_size_limit - Packed row size limit for YCQL. 0 to pick this value from block size.

When packed row is over limit, then remaining columns will be stored as separate key values.

Original commit: 5900613/D17409

Test Plan: PgPackedRowTest.BigValue

Reviewers: mbautin, rthallam

Reviewed By: rthallam

Subscribers: ybase, rthallam, kannan

Differential Revision: https://phabricator.dev.yugabyte.com/D17529
@yugabyte-ci yugabyte-ci moved this to Done in YQL-beta Jun 29, 2022
@yugabyte-ci yugabyte-ci reopened this Jun 29, 2022
spolitov added a commit that referenced this issue Jul 15, 2022
Summary:
This diff fixes the following issues related to PITR and packed rows.
1) Support reading pg catalog version from packed row.
2) Ignore tombstoned column updates during read and compaction of pgsql tables.

When patching pg catalog version we pick max version from all entries present in existing state, add 1 to it and add to write batch after all other entries.
The following cases are possible:

(1)
Restoring state:
`doc_key => (restoring_version1, last_breaking_version1)`
Existing state:
`doc_key => (existing_version1, last_breaking_version2)`
Here we will just pick existing_version1 as max found version.

(2)
Restoring state:
`doc_key => (restoring_version1, last_breaking_version1)`
`doc_key, column_id("current_version") => restoring_version2`
Exiting state:
`doc_key => (existing_version1, last_breaking_version2)`
The same as above, with exception that entry for restoring_version2 will also be added. But with lower write id, since we put patched pg catalog version after iteration.

(3)
Restoring state:
`doc_key => (restoring_version1, last_breaking_version1)`
Exiting state:
`doc_key => (existing_version1, last_breaking_version2)`
`doc_key, column_id("current_version") => existing_version1`
The max of existing_version2 and existing_version3 will be used.

Also updated FetchState to handle packed rows by maintaining key value stack.

Test Plan: YbAdminSnapshotScheduleTest.PgsqlDropDefaultWithPackedRow

Reviewers: mbautin, skedia

Reviewed By: mbautin, skedia

Subscribers: zdrudi, timur, ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D17834
spolitov added a commit that referenced this issue Sep 10, 2022
…cked row

Summary:
Prior to this diff, when an updated column was repacked to a packed row during a compaction, we would lose the column's original write time.
There is an API in CQL to read a column's write time ( https://docs.yugabyte.com/preview/api/ycql/expr_fcall/#writetime-function ), so the above behavior is incorrect.
This diff adds logic to keep original column write time while creating a packed row.

Test Plan: CqlPackedRowTest.WriteTime

Reviewers: mbautin

Reviewed By: mbautin

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D19161
@yugabyte-ci yugabyte-ci removed the area/ysql Yugabyte SQL (YSQL) label Oct 12, 2022
@rthallamko3
Copy link
Contributor

Packed Row format support landed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
Status: Done
Development

No branches or pull requests

7 participants