[DocDB] Pack full row updates #13291
Labels
area/docdb
YugabyteDB core features
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
Jira Link: DB-2926
Description
With the packed columns feature, we now pack all the columns in the initial insert of a row as one entry in DocDB/RocksDB rather than as separate entries.
But subsequent updates to the row are kept in the older, one entry per column, format. These partial updates aren't packed themselves because there could be a large number of such updates, and a read for a specific column would have to look through all of them in the worst case to find the latest value of the column. For instance,
// say you had a 100 (non-primary key columns) insert for primary key
k
INSERT INTO T(k, c1, c2, ..., c100) VALUES (....)
With the packed columns feature, this will get stored as one entry
k -> { c1: ?, c2: ?, .... c100: ? }
in DocDB.// say now, you do 20 such updates to columns c1..c10.
UPDATE T SET c1 = ?, ..., c10 = ? where k = 'k';
.. 20 times
UPDATE T SET c1 = ?, ..., c10 = ? where k = 'k';
And now you want to read column
c25
of this rowk
.If you store UPDATEs in exploded format you only have to look for the presence of the most recent unpacked entry
k.c25
and if that's not present, look for most recent packed entryk
to see if it containsc25
.On the other hand, if you store these partial UPDATEs also in packed format, you have to look at each those 20 "partially packed entries" to see if it contains
c25
.. and each one will be a miss (because they only containc1
throughc10
), and finally you'll go to the fully packed initial insert fork
to findc25
.Proposal
It would be worth it and reasonably simple to recognize the special case where the UPDATE does touch all columns of the row, and optimize this case by storing such "full" UPDATES in the "packed" format.
While the discussion above uses tables as example, the same optimization applies to indices also since at the storage layer indices are similar to tables. If all INCLUDED columns of an index are getting updated, we should record the updated INDEX entry also in a packed format.
The text was updated successfully, but these errors were encountered: