opt: fetch only required columns for partial index mutations #58322

mgartner · 2020-12-29T02:35:28Z

opt: do not derive prune columns for Upsert, Update, Delete

We no longer derive output prune columns for Upsert, Update, and Delete
ops in DerivePruneCols. There are no PruneCols rules for these
operators, so deriving their prune columns was only performing
unnecessary work. There are other rules that prune the fetch and return
columns for these operators. These rules do not rely on
DerivePruneCols.

Release note: None

opt: move needed mutation fetch columns logic to optbuilder

Previously, logic within prune columns normalization rules calculated
the fetch columns required for updating indexes during a mutation. This
logic has been moved to the optbuilder.

This will make it easier to reduce this set of columns in the future
when we can prove that a column is not needed to maintain the state of a
partial index because the partial index is guaranteed not to change.

In addition, this change reduces repetitive computation to calculate
these fetch columns every time the PruneMutationFetchCols rule
attempted to match an expression. For a simple UPDATE mutation, this
can occur ~3 times.

Release note: None

sql: remove logic to determine fetch cols in row updater

Previously, the row.MakeUpdater function had logic to determine the
fetch columns required for an update operation. This is not necessary
because the cost based optimizer already determines the necessary fetch
columns and plumbs them to MakeUpdater as the requestedCols
argument.

Release note: None

opt: prune update/upsert fetch columns not needed for partial indexes

Indexed columns of partial indexes are now only fetched for UPDATE and
UPSERT operations when needed. They are pruned in cases where it is
guaranteed that they are not needed to build old or new index entries.
For example, consider the table and UPDATE:

CREATE TABLE t (
  a INT PRIMARY KEY,
  b INT,
  c INT,
  d INT,
  INDEX (b) WHERE c > 0,
  FAMILY (a), FAMILY (b), FAMILY (c), FAMILY (d)
)

UPDATE t SET d = d + 1 WHERE a = 1

The partial index is guaranteed not to change with this UPDATE because
neither its indexed columns not the columns referenced in its predicate
are mutating. Therefore, the existing values of b do not need to be
fetched to maintain the state of the partial index. Furthermore, the
primary index does require the existing values of b because no columns
in b's family are mutating. So, b can be pruned from the UPDATE's fetch
columns.

Release note (performance improvement): Previously, indexed columns of
partial indexes were always fetched for UPDATEs and UPSERTs. Now they
are only fetched if they are required for maintaining the state of the
index. If an UPDATE or UPSERT mutates columns that are neither indexed by a
partial index nor referenced in a partial index predicate, they will no
longer be fetched (assuming that they are not needed to maintain the
state of other indexes, including the primary index).

opt: project false for partial index PUT and DEL columns when possible

The optimizer now projects false expressions for partial index PUT and
DEL columns, rather than predicate expressions, in cases where it is
guaranteed that a UPDATE or UPSERT will not alter the entries of a
partial index.

Release note (performance improvement): UPDATE and UPSERT operations on
tables with partial indexes no longer evaluate partial index predicate
expressions when it is guaranteed that the operation will not alter the
state of the partial index. In some cases, this can eliminate fetching
the existing value of columns that are referenced in partial index
predicates.

We no longer derive output prune columns for Upsert, Update, and Delete ops in `DerivePruneCols`. There are no PruneCols rules for these operators, so deriving their prune columns was only performing unnecessary work. There are other rules that prune the fetch and return columns for these operators. These rules do not rely on `DerivePruneCols`. Release note: None

Previously, logic within prune columns normalization rules calculated the fetch columns required for updating indexes during a mutation. This logic has been moved to the optbuilder. This will make it easier to reduce this set of columns in the future when we can prove that a column is not needed to maintain the state of a partial index because the partial index is guaranteed not to change. In addition, this change reduces repetitive computation to calculate these fetch columns every time the `PruneMutationFetchCols` rule attempted to match an expression. For a simple `UPDATE` mutation, this can occur ~3 times. Release note: None

Previously, the `row.MakeUpdater` function had logic to determine the fetch columns required for an update operation. This is not necessary because the cost based optimizer already determines the necessary fetch columns and plumbs them to `MakeUpdater` as the `requestedCols` argument. Release note: None

Indexed columns of partial indexes are now only fetched for UPDATE and UPSERT operations when needed. They are pruned in cases where it is guaranteed that they are not needed to build old or new index entries. For example, consider the table and UPDATE: CREATE TABLE t ( a INT PRIMARY KEY, b INT, c INT, d INT, INDEX (b) WHERE c > 0, FAMILY (a), FAMILY (b), FAMILY (c), FAMILY (d) ) UPDATE t SET d = d + 1 WHERE a = 1 The partial index is guaranteed not to change with this UPDATE because neither its indexed columns not the columns referenced in its predicate are mutating. Therefore, the existing values of b do not need to be fetched to maintain the state of the partial index. Furthermore, the primary index does require the existing values of b because no columns in b's family are mutating. So, b can be pruned from the UPDATE's fetch columns. Release note (performance improvement): Previously, indexed columns of partial indexes were always fetched for UPDATEs and UPSERTs. Now they are only fetched if they are required for maintaining the state of the index. If an UPDATE or UPSERT mutates columns that are neither indexed by a partial index nor referenced in a partial index predicate, they will no longer be fetched (assuming that they are not needed to maintain the state of other indexes, including the primary index).

cockroach-teamcity · 2020-12-29T02:35:38Z

This change is

andy-kimball

Are you sure that moving pruning calculations to optbuilder will catch all cases we care about now and in the future? What happens if we normalize a complex mutation and end up with an additional column that can be pruned? I've always avoided putting optimizations in optbuilder for this reason: they don't catch cases that arise after other rules have fired.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)

mgartner

I think the only rule that would affect these particular calculations would be a rule that pruned update columns. For example, we could add a rule that prunes a from the update columns:

CREATE TABLE t (a INT, b INT, INDEX (a), FAMILY (a), FAMILY (b));

UPDATE t SET a = 1, b = 2 WHERE a = 1;

This implementation would prevent a from also being pruned from the fetch columns, because a would exist in the update columns during the opt builder phase. But I'm not sure pruning a from the fetch columns helps much in this case — we're already scanning the index for a = 1, so propagating the value of a to the update operator would be cheap.

I've moved some of the logic to optbuilder because (1) I didn't want to build and project partial index predicate expressions only to have a new rule immediately normalize them to false and (2) didn't want duplicate logic in prune_cols_funcs.go and optbuilder to determine which indexes require updates (and which columns must be fetched as a result). Also, there's already some logic within optbuilder that determines whether check constraint columns can be pruned, see #56007 (and I believe there's similar logic in optbuilder for computed columns now).

That being said, I appreciate consistent patterns, and maybe this goes too much against the grain. If we're not concerned about the unnecessary work of building and projecting the PUT/DEL partial index predicate expressions and then normalizing to false, then I'll try an implementation that moves all of this to normalization rules.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)

The optimizer now projects false expressions for partial index PUT and DEL columns, rather than predicate expressions, in cases where it is guaranteed that a UPDATE or UPSERT will not alter the entries of a partial index. Release note (performance improvement): UPDATE and UPSERT operations on tables with partial indexes no longer evaluate partial index predicate expressions when it is guaranteed that the operation will not alter the state of the partial index. In some cases, this can eliminate fetching the existing value of columns that are referenced in partial index predicates.

rytaft · 2020-12-30T15:41:32Z

Is this PR now superseded by #58358? Or should I review this one too?

mgartner · 2020-12-30T17:14:43Z

@rytaft don't review this. Closing in favor of #58358.

mgartner added 4 commits December 28, 2020 13:27

mgartner requested review from rytaft and RaduBerinde December 29, 2020 02:35

mgartner requested a review from a team as a code owner December 29, 2020 02:35

mgartner changed the title ~~Partial index prune~~ opt: prune columns that are guaranteed not to be needed for partial index mutations Dec 29, 2020

mgartner changed the title ~~opt: prune columns that are guaranteed not to be needed for partial index mutations~~ opt: fetch only required columns for partial index mutations Dec 29, 2020

andy-kimball reviewed Dec 29, 2020

View reviewed changes

mgartner commented Dec 29, 2020

View reviewed changes

mgartner force-pushed the partial-index-prune branch from 916be5a to e4fe162 Compare December 29, 2020 18:36

mgartner closed this Dec 30, 2020

mgartner deleted the partial-index-prune branch December 30, 2020 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: fetch only required columns for partial index mutations #58322

opt: fetch only required columns for partial index mutations #58322

mgartner commented Dec 29, 2020

cockroach-teamcity commented Dec 29, 2020

andy-kimball left a comment

mgartner left a comment

rytaft commented Dec 30, 2020

mgartner commented Dec 30, 2020

opt: fetch only required columns for partial index mutations #58322

opt: fetch only required columns for partial index mutations #58322

Conversation

mgartner commented Dec 29, 2020

opt: do not derive prune columns for Upsert, Update, Delete

opt: move needed mutation fetch columns logic to optbuilder

sql: remove logic to determine fetch cols in row updater

opt: prune update/upsert fetch columns not needed for partial indexes

opt: project false for partial index PUT and DEL columns when possible

cockroach-teamcity commented Dec 29, 2020

andy-kimball left a comment

Choose a reason for hiding this comment

mgartner left a comment

Choose a reason for hiding this comment

rytaft commented Dec 30, 2020

mgartner commented Dec 30, 2020