-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: fetch only required columns for partial index mutations #58322
Conversation
We no longer derive output prune columns for Upsert, Update, and Delete ops in `DerivePruneCols`. There are no PruneCols rules for these operators, so deriving their prune columns was only performing unnecessary work. There are other rules that prune the fetch and return columns for these operators. These rules do not rely on `DerivePruneCols`. Release note: None
Previously, logic within prune columns normalization rules calculated the fetch columns required for updating indexes during a mutation. This logic has been moved to the optbuilder. This will make it easier to reduce this set of columns in the future when we can prove that a column is not needed to maintain the state of a partial index because the partial index is guaranteed not to change. In addition, this change reduces repetitive computation to calculate these fetch columns every time the `PruneMutationFetchCols` rule attempted to match an expression. For a simple `UPDATE` mutation, this can occur ~3 times. Release note: None
Previously, the `row.MakeUpdater` function had logic to determine the fetch columns required for an update operation. This is not necessary because the cost based optimizer already determines the necessary fetch columns and plumbs them to `MakeUpdater` as the `requestedCols` argument. Release note: None
Indexed columns of partial indexes are now only fetched for UPDATE and UPSERT operations when needed. They are pruned in cases where it is guaranteed that they are not needed to build old or new index entries. For example, consider the table and UPDATE: CREATE TABLE t ( a INT PRIMARY KEY, b INT, c INT, d INT, INDEX (b) WHERE c > 0, FAMILY (a), FAMILY (b), FAMILY (c), FAMILY (d) ) UPDATE t SET d = d + 1 WHERE a = 1 The partial index is guaranteed not to change with this UPDATE because neither its indexed columns not the columns referenced in its predicate are mutating. Therefore, the existing values of b do not need to be fetched to maintain the state of the partial index. Furthermore, the primary index does require the existing values of b because no columns in b's family are mutating. So, b can be pruned from the UPDATE's fetch columns. Release note (performance improvement): Previously, indexed columns of partial indexes were always fetched for UPDATEs and UPSERTs. Now they are only fetched if they are required for maintaining the state of the index. If an UPDATE or UPSERT mutates columns that are neither indexed by a partial index nor referenced in a partial index predicate, they will no longer be fetched (assuming that they are not needed to maintain the state of other indexes, including the primary index).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that moving pruning calculations to optbuilder
will catch all cases we care about now and in the future? What happens if we normalize a complex mutation and end up with an additional column that can be pruned? I've always avoided putting optimizations in optbuilder
for this reason: they don't catch cases that arise after other rules have fired.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only rule that would affect these particular calculations would be a rule that pruned update columns. For example, we could add a rule that prunes a
from the update columns:
CREATE TABLE t (a INT, b INT, INDEX (a), FAMILY (a), FAMILY (b));
UPDATE t SET a = 1, b = 2 WHERE a = 1;
This implementation would prevent a
from also being pruned from the fetch columns, because a
would exist in the update columns during the opt builder phase. But I'm not sure pruning a
from the fetch columns helps much in this case — we're already scanning the index for a = 1
, so propagating the value of a
to the update operator would be cheap.
I've moved some of the logic to optbuilder because (1) I didn't want to build and project partial index predicate expressions only to have a new rule immediately normalize them to false and (2) didn't want duplicate logic in prune_cols_funcs.go
and optbuilder to determine which indexes require updates (and which columns must be fetched as a result). Also, there's already some logic within optbuilder that determines whether check constraint columns can be pruned, see #56007 (and I believe there's similar logic in optbuilder for computed columns now).
That being said, I appreciate consistent patterns, and maybe this goes too much against the grain. If we're not concerned about the unnecessary work of building and projecting the PUT/DEL partial index predicate expressions and then normalizing to false, then I'll try an implementation that moves all of this to normalization rules.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)
The optimizer now projects false expressions for partial index PUT and DEL columns, rather than predicate expressions, in cases where it is guaranteed that a UPDATE or UPSERT will not alter the entries of a partial index. Release note (performance improvement): UPDATE and UPSERT operations on tables with partial indexes no longer evaluate partial index predicate expressions when it is guaranteed that the operation will not alter the state of the partial index. In some cases, this can eliminate fetching the existing value of columns that are referenced in partial index predicates.
916be5a
to
e4fe162
Compare
Is this PR now superseded by #58358? Or should I review this one too? |
Fixes #51623
opt: do not derive prune columns for Upsert, Update, Delete
We no longer derive output prune columns for Upsert, Update, and Delete
ops in
DerivePruneCols
. There are no PruneCols rules for theseoperators, so deriving their prune columns was only performing
unnecessary work. There are other rules that prune the fetch and return
columns for these operators. These rules do not rely on
DerivePruneCols
.Release note: None
opt: move needed mutation fetch columns logic to optbuilder
Previously, logic within prune columns normalization rules calculated
the fetch columns required for updating indexes during a mutation. This
logic has been moved to the optbuilder.
This will make it easier to reduce this set of columns in the future
when we can prove that a column is not needed to maintain the state of a
partial index because the partial index is guaranteed not to change.
In addition, this change reduces repetitive computation to calculate
these fetch columns every time the
PruneMutationFetchCols
ruleattempted to match an expression. For a simple
UPDATE
mutation, thiscan occur ~3 times.
Release note: None
sql: remove logic to determine fetch cols in row updater
Previously, the
row.MakeUpdater
function had logic to determine thefetch columns required for an update operation. This is not necessary
because the cost based optimizer already determines the necessary fetch
columns and plumbs them to
MakeUpdater
as therequestedCols
argument.
Release note: None
opt: prune update/upsert fetch columns not needed for partial indexes
Indexed columns of partial indexes are now only fetched for UPDATE and
UPSERT operations when needed. They are pruned in cases where it is
guaranteed that they are not needed to build old or new index entries.
For example, consider the table and UPDATE:
The partial index is guaranteed not to change with this UPDATE because
neither its indexed columns not the columns referenced in its predicate
are mutating. Therefore, the existing values of b do not need to be
fetched to maintain the state of the partial index. Furthermore, the
primary index does require the existing values of b because no columns
in b's family are mutating. So, b can be pruned from the UPDATE's fetch
columns.
Release note (performance improvement): Previously, indexed columns of
partial indexes were always fetched for UPDATEs and UPSERTs. Now they
are only fetched if they are required for maintaining the state of the
index. If an UPDATE or UPSERT mutates columns that are neither indexed by a
partial index nor referenced in a partial index predicate, they will no
longer be fetched (assuming that they are not needed to maintain the
state of other indexes, including the primary index).
opt: project false for partial index PUT and DEL columns when possible
The optimizer now projects false expressions for partial index PUT and
DEL columns, rather than predicate expressions, in cases where it is
guaranteed that a UPDATE or UPSERT will not alter the entries of a
partial index.
Release note (performance improvement): UPDATE and UPSERT operations on
tables with partial indexes no longer evaluate partial index predicate
expressions when it is guaranteed that the operation will not alter the
state of the partial index. In some cases, this can eliminate fetching
the existing value of columns that are referenced in partial index
predicates.