Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: calculate number of rows processed when costing joins #40248

Merged
merged 2 commits into from
Sep 3, 2019

Conversation

rytaft
Copy link
Collaborator

@rytaft rytaft commented Aug 27, 2019

This PR updates the costing of joins to take into account the number of
rows processed by the operator. This number may be larger than the
number of output rows if an additional filter is applied as part of the
ON condition that is not used to determine equality
columns for the join.

For example, consider the query
SELECT * FROM abc JOIN def ON a = e AND b = 3;

Assuming there is no index on b, if a lookup join is used to execute this
query, the number of rows processed is actually the same as the query
SELECT * FROM abc JOIN def ON a = e;

The difference is that the filter b=3 must also be applied to every row in
the first query. The coster now takes this into account when determining
the cost of joins.

Fixes #34810

Release note: None

@rytaft rytaft requested a review from a team as a code owner August 27, 2019 16:31
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@awoods187
Copy link
Contributor

Do we have any kind of perf difference here?

@rytaft
Copy link
Collaborator Author

rytaft commented Aug 27, 2019

None of the plans changed in TPC-C or TPC-H, so I wouldn't expect perf to change there.

This change should fix some cases where we are choosing a lookup join when we shouldn't be, or where we are choosing the wrong index for the lookup join. @RaduBerinde mentioned that this is currently a problem for some FK checks, so performance should improve in those cases.

@RaduBerinde
Copy link
Member


pkg/sql/opt/xform/testdata/external/tpcc, line 199 at r1 (raw file):

 └── f-k-checks
      └── f-k-checks-item: order(o_w_id,o_d_id,o_c_id) -> customer(c_w_id,c_d_id,c_id)
           └── anti-join (lookup customer@customer_idx)

I would expect this to change - it should use the FK index which has all three columns..

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @RaduBerinde, and @rytaft)


pkg/sql/opt/xform/testdata/external/tpcc, line 735 at r1 (raw file):

 └── f-k-checks
      ├── f-k-checks-item: order_line(ol_w_id,ol_d_id,ol_o_id) -> order(o_w_id,o_d_id,o_id)
      │    └── anti-join (lookup order@order_idx)

Another instance.


pkg/sql/opt/xform/testdata/external/tpcc, line 1085 at r1 (raw file):

 └── f-k-checks
      ├── f-k-checks-item: history(h_c_w_id,h_c_d_id,h_c_id) -> customer(c_w_id,c_d_id,c_id)
      │    └── anti-join (lookup customer@customer_idx)

Another instance.

Copy link
Contributor

@justinj justinj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj, @RaduBerinde, and @rytaft)


pkg/sql/opt/memo/statistics_builder.go, line 2293 at r1 (raw file):

		// Otherwise, we need to determine the row count of the join before the
		// ON conditions are applied.
		withoutOn := e.Memo().MemoizeLookupJoin(t.Input, nil /* on */, &t.LookupJoinPrivate)

is nil acceptable for empty List expressions? I had vaguely thought that they should always be non-nil, but maybe it doesn't matter

also nit: s/on/On/

Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem seems to be that we weren't calculating stats correctly for semi or anti joins. I've added another commit to fix those stats.

Two plans for TPC-H changed, so I need to update the stats quality tests for those. But in the mean time, I figured I would push this commit to see if there is any feedback. PTAL!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj)


pkg/sql/opt/memo/statistics_builder.go, line 2293 at r1 (raw file):

Previously, justinj (Justin Jaffray) wrote…

is nil acceptable for empty List expressions? I had vaguely thought that they should always be non-nil, but maybe it doesn't matter

also nit: s/on/On/

Seems like using nil isn't a problem, since len() of a nil list is 0.

What does s/on/On/ apply to? The name of the parameter in MemoizeLookupJoin is on, not On...


pkg/sql/opt/xform/testdata/external/tpcc, line 199 at r1 (raw file):

Previously, RaduBerinde wrote…

I would expect this to change - it should use the FK index which has all three columns..

Done.


pkg/sql/opt/xform/testdata/external/tpcc, line 735 at r1 (raw file):

Previously, RaduBerinde wrote…

Another instance.

Done.


pkg/sql/opt/xform/testdata/external/tpcc, line 1085 at r1 (raw file):

Previously, RaduBerinde wrote…

Another instance.

Done.

@rytaft rytaft force-pushed the lookup-join-cost branch 2 times, most recently from a1aa32f to 28d0a38 Compare August 28, 2019 16:55
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! The updated FK plans for TPCC look great!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj and @rytaft)


pkg/sql/opt/memo/statistics_builder.go, line 909 at r4 (raw file):

	// -----------------------------------
	s.RowCount = leftStats.RowCount * rightStats.RowCount
	inputRowCount := s.RowCount

This could use a comment, it's not easy to figure out what this value actually means.


pkg/sql/opt/memo/statistics_builder.go, line 915 at r4 (raw file):

		// calculations. It will be fixed below.
		s.RowCount = leftStats.RowCount
		inputRowCount = s.RowCount

[nit] = leftStats.RowCount is more clear (it initially seemed like a no-op)


pkg/sql/opt/memo/statistics_builder.go, line 1003 at r4 (raw file):

	}

	// Fix the stats for anti join.

Seems like colStatJoin would need some of this logic too?


pkg/sql/opt/memo/statistics_builder.go, line 1006 at r4 (raw file):

	switch h.joinType {
	case opt.AntiJoinOp, opt.AntiJoinApplyOp:
		s.RowCount = max(inputRowCount-s.RowCount, epsilon)

Would be clearer if we just used leftStats.RowCount - s.RowCount directly (that's the formula one would expect)


pkg/sql/opt/memo/statistics_builder.go, line 1013 at r4 (raw file):

			// Distinct count is tricky for anti-joins. This is a rough estimate,
			// accounting for the way distinct counts are calculated above.

[nit] can you expand on this? Not sure what aspect of the way we calculate distinct counts above this formula stems from. Also, seems like the total and semi-join row count would play a role, but can't say I can come up with a formula that's reasonable.


pkg/sql/opt/xform/coster.go, line 403 at r4 (raw file):

	// Add a cost if we have to evaluate an ON condition on every row. The more
	// leftover conditions, the more expensive it should be. We want to
	// differentiate between two lookup joins where one uses only a subset of the

This comment no longer holds (your change should fix this example).


pkg/sql/opt/xform/coster.go, line 411 at r4 (raw file):

	// then discarded).
	perRowCost += cpuCostFactor * memo.Cost(len(join.On))
	// We also add a constant "setup" cost per ON condition. Without this, the

This hacky "setup" adjustment to the cost shouldn't be needed anymore. The first one (line above) shouldn't be necessary either with your change, though I don't see a problem with keeping it (especially if removing it changes plans that we're unsure about). Strictly speaking, it's correct to account for running the filters, but then the other join types should also account for that, and Select should account for it similarly (it doesn't look at how many filters we have to evaluate).

@rytaft rytaft force-pushed the lookup-join-cost branch 4 times, most recently from cc6dd6a to 1014541 Compare September 2, 2019 02:22
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the TPC-H stats quality tests. Here are the performance changes (averaged over 3 runs of each, standard deviation is less than 1% for all measurements):

Q18
----
old     new    change
14.36s  9.05s  -36.97%

Q22
----
old      new      change
742.1ms  833.6ms  +12.34%

As shown above, Q18 improved by 37% but Q22 regressed by 12%.

I also further improved the costing and stats of semi/anti joins.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj and @RaduBerinde)


pkg/sql/opt/memo/statistics_builder.go, line 909 at r4 (raw file):

Previously, RaduBerinde wrote…

This could use a comment, it's not easy to figure out what this value actually means.

Done.


pkg/sql/opt/memo/statistics_builder.go, line 915 at r4 (raw file):

Previously, RaduBerinde wrote…

[nit] = leftStats.RowCount is more clear (it initially seemed like a no-op)

Done.


pkg/sql/opt/memo/statistics_builder.go, line 1003 at r4 (raw file):

Previously, RaduBerinde wrote…

Seems like colStatJoin would need some of this logic too?

It shouldn't need any change since we're setting the selectivity here so it's correct for anti joins. For semi/anti joins, colStatJoin just applies the selectivity to the given column.


pkg/sql/opt/memo/statistics_builder.go, line 1006 at r4 (raw file):

Previously, RaduBerinde wrote…

Would be clearer if we just used leftStats.RowCount - s.RowCount directly (that's the formula one would expect)

Done.


pkg/sql/opt/memo/statistics_builder.go, line 1013 at r4 (raw file):

Previously, RaduBerinde wrote…

[nit] can you expand on this? Not sure what aspect of the way we calculate distinct counts above this formula stems from. Also, seems like the total and semi-join row count would play a role, but can't say I can come up with a formula that's reasonable.

I just removed this -- using the selectivity is easier and seems to be about as accurate.


pkg/sql/opt/xform/coster.go, line 403 at r4 (raw file):

Previously, RaduBerinde wrote…

This comment no longer holds (your change should fix this example).

Done.


pkg/sql/opt/xform/coster.go, line 411 at r4 (raw file):

Previously, RaduBerinde wrote…

This hacky "setup" adjustment to the cost shouldn't be needed anymore. The first one (line above) shouldn't be necessary either with your change, though I don't see a problem with keeping it (especially if removing it changes plans that we're unsure about). Strictly speaking, it's correct to account for running the filters, but then the other join types should also account for that, and Select should account for it similarly (it doesn't look at how many filters we have to evaluate).

Unfortunately we still need the setup cost to handle the case of a very small row count. When the row count is less than one, we cannot calculate the selectivity of each filter, so we need this hack to discriminate between fully covering indexes and partially covering indexes.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thanks for fixing this!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @justinj, @RaduBerinde, and @rytaft)


pkg/sql/opt/exec/execbuilder/testdata/lookup_join, line 247 at r5 (raw file):

NULL       NULL     {5}       5

# TODO(radu): this doesn't seem to be a lookup join, but it should be.

We can remove this now :)


pkg/sql/opt/memo/statistics_builder.go, line 2358 at r6 (raw file):

		}

		// Otherwise, we need to determine the row count of the join before the

[nit] not clear anymore what "otherwise" refers to

… joins

This commit updates the costing of both lookup joins and merge joins to
take into account the number of rows processed by the operator. This number
may be larger than the number of output rows if an additional filter is
applied as part of the ON condition that is not used to determine equality
columns for the join.

For example, consider the query
  `SELECT * FROM abc JOIN def ON a = e AND b = 3;`

Assuming there is no index on b, if a lookup join is used to execute this
query, the number of rows processed is actually the same as the query
  `SELECT * FROM abc JOIN def ON a = e;`

The difference is that the filter b=3 must also be applied to every row in
the first query. The coster now takes this into account when determining
the cost of lookup joins and merge joins.

Informs cockroachdb#34810

Release note: None
Prior to this commit, the statisticsBuilder always estimated that
the number of output rows for a semi or anti join was equal to the
number of rows on the left side. It ignored any ON conditions.
This commit improves the estimate by taking into account the ON
conditions.

Release note: None
@rytaft rytaft changed the title opt: calculate number of rows processed when costing lookup and merge joins opt: calculate number of rows processed when costing joins Sep 3, 2019
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @justinj and @RaduBerinde)


pkg/sql/opt/exec/execbuilder/testdata/lookup_join, line 247 at r5 (raw file):

Previously, RaduBerinde wrote…

We can remove this now :)

Done.


pkg/sql/opt/memo/statistics_builder.go, line 2358 at r6 (raw file):

Previously, RaduBerinde wrote…

[nit] not clear anymore what "otherwise" refers to

Done.

craig bot pushed a commit that referenced this pull request Sep 3, 2019
40248: opt: calculate number of rows processed when costing joins r=rytaft a=rytaft

This PR updates the costing of joins to take into account the number of
rows processed by the operator. This number may be larger than the
number of output rows if an additional filter is applied as part of the
ON condition that is not used to determine equality
columns for the join.

For example, consider the query
  `SELECT * FROM abc JOIN def ON a = e AND b = 3;`

Assuming there is no index on b, if a lookup join is used to execute this
query, the number of rows processed is actually the same as the query
  `SELECT * FROM abc JOIN def ON a = e;`

The difference is that the filter b=3 must also be applied to every row in
the first query. The coster now takes this into account when determining
the cost of joins.

Fixes #34810

Release note: None

40431: workload: fix partition commands in tpcc import r=solongordon a=solongordon

The commands for partitioning indexes in the TPCC import were erroring
out due to a syntax change introduced in #39332. I updated them to use
`ALTER PARTITION ... OF INDEX` rather than `ALTER PARTITION ... OF
TABLE`.

Fixes #39005
Fixes #40360
Fixes #40416

Release note: None

Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
Co-authored-by: Solon Gordon <solon@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Sep 3, 2019

Build succeeded

@craig craig bot merged commit 3d565a2 into cockroachdb:master Sep 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: take into account ON conditions for join costing
5 participants