Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: use paired joins for left joins with non-covering indices #55452

Open
sumeerbhola opened this issue Oct 12, 2020 · 3 comments
Open

opt: use paired joins for left joins with non-covering indices #55452

sumeerbhola opened this issue Oct 12, 2020 · 3 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Oct 12, 2020

We have recently added paired joins (#54639 #55216) which consist of an inverted join followed by the lookup join, to efficiently perform left outer/semi/anti joins using inverted indexes.

The idea can also be applied to left joins over regular (non-inverted) indexes that are non-covering, where the first join produces false positives that are eliminated by the second lookup join. This will require (small) changes to the order-preserving lookup join to function as the first join, but the rest of the changes to accomplish this are in the optimizer. AFAIK, left joins with non-covering indexes cannot currently be index accelerated.

We've mainly discussed the case where the first join is a lookup join, though if there are actual customer use cases, one could imaging extending this to hash and merge joins as the first join (I do not know how difficult the optimizer changes would be). Some made-up examples where x, y are column names from the left/right side:

  • Do merge join, lookup join pair when the expression is x1 = y1 and x3 > y3 and the chosen indexes for the merge join are (x1, x3), (y1), because x1 = y1 is the part of the expression with very low selectivity.
  • Do hash join, lookup join pair when the expression is x1 = y1 and x3 > y3 and x2 = a and y2 = b and the chosen indexes for the hash join are (x2, x3, x1), (y2, y1), and the hash join on these indexes is chosen because x2 = a, y2 = b have low selectivity.

Jira issue: CRDB-3661

@sumeerbhola sumeerbhola added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Oct 12, 2020
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Oct 30, 2020
This is to allow lookup joins to be used for left outer/semi/anti
joins with non-covering indexes. Currently only semi joins for
this case can use the index (by doing two inner joins and a DistinctOn)

Paired joins with a non-covering index will be used as follows:
- Left outer join: will become a pair of left outer lookup joins.
- Left anti join: will be a left outer lookup join followed by
  a left anti lookup join.
- Left semi join: will be an inner lookup join followed by a
  left semi lookup join.

This PR does not contain the optimizer changes.

Informs cockroachdb#55452

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Nov 5, 2020
This is to allow lookup joins to be used for left outer/semi/anti
joins with non-covering indexes. Currently only semi joins for
this case can use the index (by doing two inner joins and a DistinctOn)

Paired joins with a non-covering index will be used as follows:
- Left outer join: will become a pair of left outer lookup joins.
- Left anti join: will be a left outer lookup join followed by
  a left anti lookup join.
- Left semi join: will be an inner lookup join followed by a
  left semi lookup join.

This PR does not contain the optimizer changes.

Informs cockroachdb#55452

Release note: None
craig bot pushed a commit that referenced this issue Nov 5, 2020
56161: sql: add support for joinReader to be the first join in paired joins r=sumeerbhola a=sumeerbhola

This is to allow lookup joins to be used for left outer/semi/anti
joins with non-covering indexes. Currently only semi joins for
this case can use the index (by doing two inner joins and a DistinctOn)

Paired joins with a non-covering index will be used as follows:
- Left outer join: will become a pair of left outer lookup joins.
- Left anti join: will be a left outer lookup join followed by
  a left anti lookup join.
- Left semi join: will be an inner lookup join followed by a
  left semi lookup join.

This PR does not contain the optimizer changes.

Informs #55452

Release note: None

Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
rytaft pushed a commit to rytaft/cockroach that referenced this issue Nov 5, 2020
This is to allow lookup joins to be used for left outer/semi/anti
joins with non-covering indexes. Currently only semi joins for
this case can use the index (by doing two inner joins and a DistinctOn)

Paired joins with a non-covering index will be used as follows:
- Left outer join: will become a pair of left outer lookup joins.
- Left anti join: will be a left outer lookup join followed by
  a left anti lookup join.
- Left semi join: will be an inner lookup join followed by a
  left semi lookup join.

This PR does not contain the optimizer changes.

Informs cockroachdb#55452

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Dec 23, 2020
This is done when the left outer/semi/anti join can use a
lookup join. Prior to this, when the non-covering index
could not fully evaluate the filter for left join we could
not generate a lookup join.

With this change:
- Left outer join becomes a pair of two left outer joins.
- Left semi join is a pair of inner join followed by left
  semi join.
- Left anti join is a pair of left outer join followed by
  left anti join.

Informs cockroachdb#55452

Release note (performance improvement): The optimizer can now
generate lookup joins in certain cases for non-covering
indexes, when performing a left outer/semi/anti join.
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jan 5, 2021
This is done when the left outer/semi/anti join can use a
lookup join. Prior to this, when the non-covering index
could not fully evaluate the filter for left join we could
not generate a lookup join.

With this change:
- Left outer join becomes a pair of two left outer joins.
- Left semi join is a pair of inner join followed by left
  semi join.
- Left anti join is a pair of left outer join followed by
  left anti join.

Informs cockroachdb#55452

Release note (performance improvement): The optimizer can now
generate lookup joins in certain cases for non-covering
indexes, when performing a left outer/semi/anti join.
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jan 5, 2021
This is done when the left outer/semi/anti join can use a
lookup join. Prior to this, when the non-covering index
could not fully evaluate the filter for left join we could
not generate a lookup join.

With this change:
- Left outer join becomes a pair of two left outer joins.
- Left semi join is a pair of inner join followed by left
  semi join.
- Left anti join is a pair of left outer join followed by
  left anti join.

Informs cockroachdb#55452

Release note (performance improvement): The optimizer can now
generate lookup joins in certain cases for non-covering
indexes, when performing a left outer/semi/anti join.
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jan 7, 2021
This is done when the left outer/semi/anti join can use a
lookup join. Prior to this, when the non-covering index
could not fully evaluate the filter for left join we could
not generate a lookup join.

With this change:
- Left outer join becomes a pair of two left outer joins.
- Left semi join is a pair of inner join followed by left
  semi join.
- Left anti join is a pair of left outer join followed by
  left anti join.

Informs cockroachdb#55452

Release note (performance improvement): The optimizer can now
generate lookup joins in certain cases for non-covering
indexes, when performing a left outer/semi/anti join.
@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
rytaft pushed a commit to sumeerbhola/cockroach that referenced this issue Jan 28, 2022
This is done when the left outer/semi/anti join can use a
lookup join. Prior to this, when the non-covering index
could not fully evaluate the filter for left join we could
not generate a lookup join.

With this change:
- Left outer join becomes a pair of two left outer joins.
- Left semi join is a pair of inner join followed by left
  semi join.
- Left anti join is a pair of left outer join followed by
  left anti join.

Informs cockroachdb#55452

Release note (performance improvement): The optimizer can now
generate lookup joins in certain cases for non-covering
indexes, when performing a left outer/semi/anti join.
rytaft pushed a commit to sumeerbhola/cockroach that referenced this issue Jan 28, 2022
This is done when the left outer/semi/anti join can use a
lookup join. Prior to this, when the non-covering index
could not fully evaluate the filter for left join we could
not generate a lookup join.

With this change:
- Left outer join becomes a pair of two left outer joins.
- Left semi join is a pair of inner join followed by left
  semi join.
- Left anti join is a pair of left outer join followed by
  left anti join.

Informs cockroachdb#55452

Release note (performance improvement): The optimizer can now
generate lookup joins in certain cases for non-covering
indexes, when performing a left outer/semi/anti join.
rytaft pushed a commit to sumeerbhola/cockroach that referenced this issue Jan 31, 2022
This is done when the left outer/semi/anti join can use a
lookup join. Prior to this, when the non-covering index
could not fully evaluate the filter for left join we could
not generate a lookup join.

With this change:
- Left outer join becomes a pair of two left outer joins.
- Left semi join is a pair of inner join followed by left
  semi join.
- Left anti join is a pair of left outer join followed by
  left anti join.

Informs cockroachdb#55452

Release note (performance improvement): The optimizer can now
generate lookup joins in certain cases for non-covering
indexes, when performing a left outer/semi/anti join.
rytaft pushed a commit to sumeerbhola/cockroach that referenced this issue Jan 31, 2022
This is done when the left outer/semi/anti join can use a
lookup join. Prior to this, when the non-covering index
could not fully evaluate the filter for left join we could
not generate a lookup join.

With this change:
- Left outer join becomes a pair of two left outer joins.
- Left semi join is a pair of inner join followed by left
  semi join.
- Left anti join is a pair of left outer join followed by
  left anti join.

Informs cockroachdb#55452

Release note (performance improvement): The optimizer can now
generate lookup joins in certain cases for non-covering
indexes, when performing a left outer/semi/anti join.
rytaft pushed a commit to sumeerbhola/cockroach that referenced this issue Feb 2, 2022
This is done when the left outer/semi/anti join can use a
lookup join. Prior to this, when the non-covering index
could not fully evaluate the filter for left join we could
not generate a lookup join.

With this change:
- Left outer join becomes a pair of two left outer joins.
- Left semi join is a pair of inner join followed by left
  semi join.
- Left anti join is a pair of left outer join followed by
  left anti join.

Informs cockroachdb#55452

Release note (performance improvement): The optimizer can now
generate lookup joins in certain cases for non-covering
indexes, when performing a left outer/semi/anti join.
craig bot pushed a commit that referenced this issue Feb 2, 2022
…75871

58261: opt,sql: use paired-joins with non-covering indexes for left joins r=rytaft a=sumeerbhola

This is done when the left outer/semi/anti join can use a
lookup join. Prior to this, when the non-covering index
could not fully evaluate the filter for left join we could
not generate a lookup join.

With this change:
- Left outer join becomes a pair of two left outer joins.
- Left semi join is a pair of inner join followed by left
  semi join.
- Left anti join is a pair of left outer join followed by
  left anti join.

Informs #55452

Release note (performance improvement): The optimizer can now
generate lookup joins in certain cases for non-covering
indexes, when performing a left outer/semi/anti join.

75746: dev: initialize submodules in `dev doctor` r=aayushshah15 a=aayushshah15

This commit adds a check to `dev doctor` to initialize submodules, like we do
in our `Makefile`.

Fixes #72247

Release note: None

75766: server: do not check decommission list for the tenant r=JeffSwenson a=JeffSwenson

Previously, the system tenant would return PermissionDenied if the
tenant's instance_id was equivalent to a decommissioned node's id.

Now, the system tenant does not check the decommissioned node list if
the incoming node_id belongs to a non-system tenant.

This PR feeds the request context down to the OnOutgoingPing and
OnIncomingPing callbacks. Previously the callbacks were using the
ambient context. The only use of the context was a storage.MVCCGet call
in nodeTombstoneStorage.IsDecommissioned.

Release note: None

75804: sql: support RESET ALL statement r=otan a=rafiss

fixes #75435

Release note (sql change): Support for the RESET ALL statement was
added. This statement resets the values of all session variables to
their default values.

75822: sql: error when setting timezone outside of postgres max utc offsets r=otan a=RichardJCai

Release note (sql change): Previously, users would be able to set
a UTC timezone offset of greater than 167 or less than -167. This
now returns an error.

Example:

SET TIME ZONE '168'
Gives error:
invalid value for parameter "timezone": "'168'": cannot find time zone "168": UTC timezone offset is out of range.

SET TIME ZONE '-168'
Gives error:
invalid value for parameter "timezone": "'-168'": cannot find time zone "-168": UTC timezone offset is out of range.

Fixes #75168

Note: If a user has already set a UTC timezone offset outside of these bounds, it will be unchanged. 

75843: c-deps/krb5: fix build for more recent versions of autoconf r=otan a=nicktrav

More recent versions of `autoconf`, when used to build `krb5`, generates
shell scripts with invalid syntax.

Fix by pulling in the [upstream patch][1] for the issue into our tree.

Closes #72529.

[1]:
krb5/krb5@f78edbe

75845: vendor: bump Pebble to 38b68e17aa97 r=jbowens a=nicktrav

Pebble commits:

```
38b68e17 internal/batchskl: return error on index overflow
8440f290 internal/manifest: use a line sweep to optimize NewL0Sublevels
0f5acb26 sstable: add direct block reading to suffix rewriter
26856d10 db: avoid stats flake in TestMemTableReservation
b452808f sstable: Make sstable Writer.Close idempotent
17fe1a65 sstable: add RewriteKeySuffixes function
c9e6edfc db: expose metrics on count and earliest seqnum of snapshots
b958d9a7 sstable: add a writeQueue to the sstable writer
c8dad06c db: disable automatic compactions in `MetricsTest`
015f5e38 internal/rangekey: fix range key iteration bug
```

The commit `38b68e17` contains the fix for #69906.

Closes #69906.

Release note: none

75857: sql: fix small race in distIndexBackfiller r=adityamaru a=stevendanna

This fixes a small race condition in
distIndexBackfiller. updateJobDetails calls SetResumeSpansInJob which
mutates the ResumeSpanList in the job details.  Normally, this is only
called from the periodic updater.  However, when the testing knob
AlwasyUpdateIndexBackfillDetails is set, we also update it on every
ProducerMetadata message we get back

Release note: None

75865: build: address util.log.logcrash package rename r=knz a=rail

After `util.log` was renamed to `util.log.logcrash`, the build system
stopped updating the Sentry environment variable properly. Instead of
setting it to the release version, it was falling back to the default
"development" value. As a result, all Sentry reports went to the
development environment bucket.

This patch addresses the name change.

Release note: None

75871: logictestccl: fix stale issue number in TODO r=arulajmani a=arulajmani

We closed #69265 in favour of #70558, and the only remaining work
left to address locality aware planning for tenants is captured
in #75864.

Release note: None

Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
Co-authored-by: Jeff <swenson@cockroachlabs.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: richardjcai <caioftherichard@gmail.com>
Co-authored-by: Nick Travers <travers@cockroachlabs.com>
Co-authored-by: Steven Danna <danna@cockroachlabs.com>
Co-authored-by: Rail Aliiev <rail@iqchoice.com>
Co-authored-by: arulajmani <arulajmani@gmail.com>
@rytaft rytaft removed their assignment May 22, 2022
@mgartner
Copy link
Collaborator

@rytaft I believe we can close this now, correct?

@rytaft
Copy link
Collaborator

rytaft commented Jun 22, 2022

This part of the issue description is still not implemented:

We've mainly discussed the case where the first join is a lookup join, though if there are actual customer use cases, one could imaging extending this to hash and merge joins as the first join (I do not know how difficult the optimizer changes would be). Some made-up examples where x, y are column names from the left/right side:

  • Do merge join, lookup join pair when the expression is x1 = y1 and x3 > y3 and the chosen indexes for the merge join are (x1, x3), (y1), because x1 = y1 is the part of the expression with very low selectivity.
  • Do hash join, lookup join pair when the expression is x1 = y1 and x3 > y3 and x2 = a and y2 = b and the chosen indexes for the hash join are (x2, x3, x1), (y2, y1), and the hash join on these indexes is chosen because x2 = a, y2 = b have low selectivity.

Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

4 participants