Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: fetch only required columns for partial index mutations #58322

Closed
wants to merge 5 commits into from

Conversation

mgartner
Copy link
Collaborator

Fixes #51623

opt: do not derive prune columns for Upsert, Update, Delete

We no longer derive output prune columns for Upsert, Update, and Delete
ops in DerivePruneCols. There are no PruneCols rules for these
operators, so deriving their prune columns was only performing
unnecessary work. There are other rules that prune the fetch and return
columns for these operators. These rules do not rely on
DerivePruneCols.

Release note: None

opt: move needed mutation fetch columns logic to optbuilder

Previously, logic within prune columns normalization rules calculated
the fetch columns required for updating indexes during a mutation. This
logic has been moved to the optbuilder.

This will make it easier to reduce this set of columns in the future
when we can prove that a column is not needed to maintain the state of a
partial index because the partial index is guaranteed not to change.

In addition, this change reduces repetitive computation to calculate
these fetch columns every time the PruneMutationFetchCols rule
attempted to match an expression. For a simple UPDATE mutation, this
can occur ~3 times.

Release note: None

sql: remove logic to determine fetch cols in row updater

Previously, the row.MakeUpdater function had logic to determine the
fetch columns required for an update operation. This is not necessary
because the cost based optimizer already determines the necessary fetch
columns and plumbs them to MakeUpdater as the requestedCols
argument.

Release note: None

opt: prune update/upsert fetch columns not needed for partial indexes

Indexed columns of partial indexes are now only fetched for UPDATE and
UPSERT operations when needed. They are pruned in cases where it is
guaranteed that they are not needed to build old or new index entries.
For example, consider the table and UPDATE:

CREATE TABLE t (
  a INT PRIMARY KEY,
  b INT,
  c INT,
  d INT,
  INDEX (b) WHERE c > 0,
  FAMILY (a), FAMILY (b), FAMILY (c), FAMILY (d)
)

UPDATE t SET d = d + 1 WHERE a = 1

The partial index is guaranteed not to change with this UPDATE because
neither its indexed columns not the columns referenced in its predicate
are mutating. Therefore, the existing values of b do not need to be
fetched to maintain the state of the partial index. Furthermore, the
primary index does require the existing values of b because no columns
in b's family are mutating. So, b can be pruned from the UPDATE's fetch
columns.

Release note (performance improvement): Previously, indexed columns of
partial indexes were always fetched for UPDATEs and UPSERTs. Now they
are only fetched if they are required for maintaining the state of the
index. If an UPDATE or UPSERT mutates columns that are neither indexed by a
partial index nor referenced in a partial index predicate, they will no
longer be fetched (assuming that they are not needed to maintain the
state of other indexes, including the primary index).

opt: project false for partial index PUT and DEL columns when possible

The optimizer now projects false expressions for partial index PUT and
DEL columns, rather than predicate expressions, in cases where it is
guaranteed that a UPDATE or UPSERT will not alter the entries of a
partial index.

Release note (performance improvement): UPDATE and UPSERT operations on
tables with partial indexes no longer evaluate partial index predicate
expressions when it is guaranteed that the operation will not alter the
state of the partial index. In some cases, this can eliminate fetching
the existing value of columns that are referenced in partial index
predicates.

We no longer derive output prune columns for Upsert, Update, and Delete
ops in `DerivePruneCols`. There are no PruneCols rules for these
operators, so deriving their prune columns was only performing
unnecessary work. There are other rules that prune the fetch and return
columns for these operators. These rules do not rely on
`DerivePruneCols`.

Release note: None
Previously, logic within prune columns normalization rules calculated
the fetch columns required for updating indexes during a mutation. This
logic has been moved to the optbuilder.

This will make it easier to reduce this set of columns in the future
when we can prove that a column is not needed to maintain the state of a
partial index because the partial index is guaranteed not to change.

In addition, this change reduces repetitive computation to calculate
these fetch columns every time the `PruneMutationFetchCols` rule
attempted to match an expression. For a simple `UPDATE` mutation, this
can occur ~3 times.

Release note: None
Previously, the `row.MakeUpdater` function had logic to determine the
fetch columns required for an update operation. This is not necessary
because the cost based optimizer already determines the necessary fetch
columns and plumbs them to `MakeUpdater` as the `requestedCols`
argument.

Release note: None
Indexed columns of partial indexes are now only fetched for UPDATE and
UPSERT operations when needed. They are pruned in cases where it is
guaranteed that they are not needed to build old or new index entries.
For example, consider the table and UPDATE:

    CREATE TABLE t (
      a INT PRIMARY KEY,
      b INT,
      c INT,
      d INT,
      INDEX (b) WHERE c > 0,
      FAMILY (a), FAMILY (b), FAMILY (c), FAMILY (d)
    )

    UPDATE t SET d = d + 1 WHERE a = 1

The partial index is guaranteed not to change with this UPDATE because
neither its indexed columns not the columns referenced in its predicate
are mutating. Therefore, the existing values of b do not need to be
fetched to maintain the state of the partial index. Furthermore, the
primary index does require the existing values of b because no columns
in b's family are mutating. So, b can be pruned from the UPDATE's fetch
columns.

Release note (performance improvement): Previously, indexed columns of
partial indexes were always fetched for UPDATEs and UPSERTs. Now they
are only fetched if they are required for maintaining the state of the
index. If an UPDATE or UPSERT mutates columns that are neither indexed by a
partial index nor referenced in a partial index predicate, they will no
longer be fetched (assuming that they are not needed to maintain the
state of other indexes, including the primary index).
@mgartner mgartner requested a review from a team as a code owner December 29, 2020 02:35
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@mgartner mgartner changed the title Partial index prune opt: prune columns that are guaranteed not to be needed for partial index mutations Dec 29, 2020
@mgartner mgartner changed the title opt: prune columns that are guaranteed not to be needed for partial index mutations opt: fetch only required columns for partial index mutations Dec 29, 2020
Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that moving pruning calculations to optbuilder will catch all cases we care about now and in the future? What happens if we normalize a complex mutation and end up with an additional column that can be pruned? I've always avoided putting optimizations in optbuilder for this reason: they don't catch cases that arise after other rules have fired.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)

Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only rule that would affect these particular calculations would be a rule that pruned update columns. For example, we could add a rule that prunes a from the update columns:

CREATE TABLE t (a INT, b INT, INDEX (a), FAMILY (a), FAMILY (b));

UPDATE t SET a = 1, b = 2 WHERE a = 1;

This implementation would prevent a from also being pruned from the fetch columns, because a would exist in the update columns during the opt builder phase. But I'm not sure pruning a from the fetch columns helps much in this case — we're already scanning the index for a = 1, so propagating the value of a to the update operator would be cheap.

I've moved some of the logic to optbuilder because (1) I didn't want to build and project partial index predicate expressions only to have a new rule immediately normalize them to false and (2) didn't want duplicate logic in prune_cols_funcs.go and optbuilder to determine which indexes require updates (and which columns must be fetched as a result). Also, there's already some logic within optbuilder that determines whether check constraint columns can be pruned, see #56007 (and I believe there's similar logic in optbuilder for computed columns now).

That being said, I appreciate consistent patterns, and maybe this goes too much against the grain. If we're not concerned about the unnecessary work of building and projecting the PUT/DEL partial index predicate expressions and then normalizing to false, then I'll try an implementation that moves all of this to normalization rules.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)

The optimizer now projects false expressions for partial index PUT and
DEL columns, rather than predicate expressions, in cases where it is
guaranteed that a UPDATE or UPSERT will not alter the entries of a
partial index.

Release note (performance improvement): UPDATE and UPSERT operations on
tables with partial indexes no longer evaluate partial index predicate
expressions when it is guaranteed that the operation will not alter the
state of the partial index. In some cases, this can eliminate fetching
the existing value of columns that are referenced in partial index
predicates.
@rytaft
Copy link
Collaborator

rytaft commented Dec 30, 2020

Is this PR now superseded by #58358? Or should I review this one too?

@mgartner
Copy link
Collaborator Author

@rytaft don't review this. Closing in favor of #58358.

@mgartner mgartner closed this Dec 30, 2020
@mgartner mgartner deleted the partial-index-prune branch December 30, 2020 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: prune columns that are guaranteed not to be needed for partial index mutations
4 participants