Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: fixup CTE stats on placeholder queries #99433

Merged
merged 1 commit into from
Mar 28, 2023

Conversation

cucaroach
Copy link
Contributor

@cucaroach cucaroach commented Mar 23, 2023

During optbuilder phase we copy the initial expressions stats into the
fake-rel but this value can change when placeholders are assigned so add
code in AssignPlaceholders to rebuild the cte if the stats change.

Fixes: #99389
Epic: none
Release note: none

@blathers-crl
Copy link

blathers-crl bot commented Mar 23, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@cucaroach cucaroach force-pushed the placeholder-cte-stats-fix branch 2 times, most recently from ca5da91 to 9c807dc Compare March 24, 2023 11:53
@cucaroach cucaroach changed the title wip: try to fixup binding stats on normalize opt: fixup CTE stats on placeholder queries Mar 24, 2023
@cucaroach cucaroach marked this pull request as ready for review March 24, 2023 11:55
@cucaroach cucaroach requested a review from a team as a code owner March 24, 2023 11:55
@cucaroach cucaroach requested a review from msirek March 24, 2023 11:55
@cucaroach cucaroach added backport-23.1.x Flags PRs that need to be backported to 23.1 backport-22.2.x labels Mar 24, 2023
@mgartner mgartner self-requested a review March 24, 2023 16:49
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find!

It's a bit unprecedented for a normalization rule to muck with statistics. Maybe that's ok, but I'm not 100% sure yet. I'm trying to think of a more idiomatic way to achieve the same goal as this.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @msirek)


pkg/sql/opt/norm/with_funcs.go line 301 at r2 (raw file):

}

func (c *CustomFuncs) ApplyBindingRowCount(

nit: add comment


pkg/sql/opt/norm/with_funcs.go line 315 at r2 (raw file):

}

func (c *CustomFuncs) RowCountDifferent(

nit: add comment explaining this function.


pkg/sql/opt/norm/rules/with.opt line 68 at r2 (raw file):

)

# ApplyBindingRowCountToCTE makes sure to that stats changes are propagated on

nit: typo in "makes sure to that"


pkg/sql/opt/xform/testdata/rules/cte line 23 at r2 (raw file):


# Regression test for #99389.
assign-placeholders-opt query-args=(1) format=show-stats

Can you move this test to pkg/sql/opt/norm/testdata/rules/with and use assign-placeholder-norm instead of assign-placeholders-opt?

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @mgartner)


pkg/sql/opt/norm/rules/with.opt line 79 at r2 (raw file):

=>
(RecursiveCTE
    (ApplyBindingRowCount $binding $initial)

It looks like this will update the original binding expression with a new row count.
Shouldn't an expression in the memo be treated as immutable? The updated expression would have a different hash than what was computed when it was placed in the internCache.

We may want to make a new FakeRel and update its row count.

@rytaft
Copy link
Collaborator

rytaft commented Mar 24, 2023

It's a bit unprecedented for a normalization rule to muck with statistics.

@msirek was playing around with an update to optgen -- did that work?

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible downside to using a normalization rule is use of more memory. We memoize new expressions in CopyAndReplaceDefault and then memoize again in the normalization rule.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @mgartner)

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msirek was playing around with an update to optgen -- did that work?

I haven't tested that yet, but I think it should work if we want to go that route.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @mgartner)

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach, @mgartner, and @msirek)


pkg/sql/opt/norm/rules/with.opt line 79 at r2 (raw file):

Previously, msirek (Mark Sirek) wrote…

It looks like this will update the original binding expression with a new row count.
Shouldn't an expression in the memo be treated as immutable? The updated expression would have a different hash than what was computed when it was placed in the internCache.

We may want to make a new FakeRel and update its row count.

+1 - we should make a new one and update the binding stored in the metadata. That would probably require adding a new method to opt.Metadata similar to AddWithBinding that would instead update an existing binding.


pkg/sql/opt/norm/rules/with.opt line 81 at r2 (raw file):

    (ApplyBindingRowCount $binding $initial)
    $initial
    $recursive

I think the recursive branch will also need to be reconstructed. The test case didn't catch it because the recursive branch is a contradiction that gets simplified, but a more interesting case should do the trick.

@rytaft
Copy link
Collaborator

rytaft commented Mar 24, 2023

I haven't tested that yet, but I think it should work if we want to go that route.

Would be interesting to know if that approach is simpler / would avoid the potential downsides of a normalization rule

@DrewKimball
Copy link
Collaborator

That would probably require adding a new method to opt.Metadata similar to AddWithBinding that would instead update an existing binding.

Actually, AddWithBinding already does what we need, so no need for a new method.

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the regression test in this PR against the CopyAndReplaceDefault fix and got the same results.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach, @DrewKimball, and @mgartner)

@mgartner
Copy link
Collaborator

Is an optgen language change necessary? We already have the onConstructRelational hook which is called at the end of Construct... functions. Can we put this logic there?

@mgartner
Copy link
Collaborator

Forgot the code pointer:

// onConstructRelational is called as a final step by each factory method that
// constructs a relational expression, so that any custom manual pattern
// matching/replacement code can be run.
func (f *Factory) onConstructRelational(rel memo.RelExpr) memo.RelExpr {
// [SimplifyZeroCardinalityGroup]
// SimplifyZeroCardinalityGroup replaces a group with [0 - 0] cardinality
// with an empty values expression. It is placed here because it depends on
// the logical properties of the group in question.
if rel.Op() != opt.ValuesOp {
relational := rel.Relational()
// We can do this if we only contain leakproof operators. As an example of
// an immutable operator that should not be folded: a Limit on top of an
// empty input has to error out if the limit turns out to be negative.
if relational.Cardinality.IsZero() && relational.VolatilitySet.IsLeakproof() {
if f.matchedRule == nil || f.matchedRule(opt.SimplifyZeroCardinalityGroup) {
values := f.funcs.ConstructEmptyValues(relational.OutputCols)
if f.appliedRule != nil {
f.appliedRule(opt.SimplifyZeroCardinalityGroup, nil, values)
}
return values
}
}
}
return rel
}

@cucaroach
Copy link
Contributor Author

onConstructRelational

That looks like just the thing! I'll give that a whirl.

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an optgen language change necessary? We already have the onConstructRelational hook which is called at the end of Construct... functions. Can we put this logic there?

onConstructRelational is called after the expression is memoized, so if we alter the expression, we'll have to memoize it again (which may add another expression to the memo), unless we swap the order of the MemoizeXXX call and onConstructRelational.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach, @DrewKimball, and @mgartner)

@cucaroach
Copy link
Contributor Author

onConstructRelational is called after the expression is memoized, so if we alter the expression, we'll have to memoize it again (which may add another expression to the memo), unless we swap the order of the MemoizeXXX call and onConstructRelational.

I think if we change something we have to re memoize and intern anyway no? FakeRel interns by props pointer value and it seems unsound to just mutate it. And like Drew said we need to reconstruct the recursive branch as well.

@cucaroach
Copy link
Contributor Author

Okay based on Drew's feedback I think we have to rebuild stuff (and trying to avoid rebuilding seems silly given that the model of assign-placeholders/replace is that if we change something all the ancestors are rebuilt). So I tried I cleaned it up and basically went with same model.

RFAL!

@cucaroach
Copy link
Contributor Author

New approach inspired by Mark's approach, what do we think?

if newInitial != rcte.Initial {
newBinding := f.ConstructFakeRel(&memo.FakeRelPrivate{
Props: MakeBindingPropsForRecursiveCTE(
props.AnyCardinality, rcte.Binding.Relational().OutputCols, newInitial.Relational().Statistics().RowCount)})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See fixup for possible improvement to cardinality arg.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: I like this, AssignPlaceholders feels like the right place to solve it.

Reviewed 1 of 2 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach, @mgartner, @msirek, and @rytaft)


-- commits line 6 at r6:
I think the commit message needs an update.


pkg/sql/opt/norm/factory.go line 363 at r6 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

See fixup for possible improvement to cardinality arg.

I don't think it should be necessary to set the cardinality here, since there are already rules that should handle it (ApplyLimitToRecursiveCTEScan and TryAddLimitToRecursiveBranch).


pkg/sql/opt/norm/factory.go line 359 at r7 (raw file):

		if rcte, ok := e.(*memo.RecursiveCTEExpr); ok {
			newInitial := f.CopyAndReplaceDefault(rcte.Initial, replaceFn).(memo.RelExpr)
			if newInitial != rcte.Initial {

We could also check if the row-counts are different here.

Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, @msirek, and @rytaft)


-- commits line 6 at r6:

Previously, DrewKimball (Drew Kimball) wrote…

I think the commit message needs an update.

Done.


pkg/sql/opt/norm/factory.go line 363 at r6 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I don't think it should be necessary to set the cardinality here, since there are already rules that should handle it (ApplyLimitToRecursiveCTEScan and TryAddLimitToRecursiveBranch).

👍


pkg/sql/opt/norm/factory.go line 359 at r7 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

We could also check if the row-counts are different here.

I thought about it but what if CopyAndReplaceDefault rebuilt the thing and changed something else but not the stats, doing this seemed more sane.


pkg/sql/opt/xform/testdata/rules/cte line 23 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Can you move this test to pkg/sql/opt/norm/testdata/rules/with and use assign-placeholder-norm instead of assign-placeholders-opt?

Done.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: I agree with @DrewKimball, this seems like the most natural solution. Great work!

Reviewed 1 of 2 files at r5, 1 of 1 files at r8, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @DrewKimball, @msirek, and @rytaft)


-- commits line 10 at r8:
nit: Should we have a performance improvement release note here?


pkg/sql/opt/norm/factory.go line 359 at r8 (raw file):

		}
		if rcte, ok := e.(*memo.RecursiveCTEExpr); ok {
			newInitial := f.CopyAndReplaceDefault(rcte.Initial, replaceFn).(memo.RelExpr)

nit: A comment explaining what we're doing here would be good, since it's not obvious.

During optbuilder phase we copy the initial expressions stats into the
fake-rel but this value can change when placeholders are assigned so add
code in AssignPlaceholders to rebuild the cte if the stats change.

Fixes: cockroachdb#99389
Epic: none
Release note (sql change): Prepared statements using placeholders in
recursive CTEs sometimes did not re-optimize correctly after plugging
in the parameters leading to poor plan choices, this has been fixed.
@cucaroach
Copy link
Contributor Author

TFTRs! Nice teamwork!
bors r+

@craig
Copy link
Contributor

craig bot commented Mar 27, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 28, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 28, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 28, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 28, 2023

Build failed:

@cucaroach
Copy link
Contributor Author

bors retry

@craig craig bot merged commit 0745cd4 into cockroachdb:master Mar 28, 2023
@craig
Copy link
Contributor

craig bot commented Mar 28, 2023

Build succeeded:

@blathers-crl
Copy link

blathers-crl bot commented Mar 28, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 2014cf2 to blathers/backport-release-22.1-99433: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


error creating merge commit from 2014cf2 to blathers/backport-release-22.2-99433: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: stats being different leads to different plans when using a placeholder
6 participants