Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gen4: Add hash join primitive and planning #9140

Merged
merged 40 commits into from
Nov 22, 2021
Merged

Conversation

systay
Copy link
Collaborator

@systay systay commented Nov 4, 2021

Description

This PR adds hash joins to the alternatives that the gen4 planner can use when planning queries.

The current join algorithm is a nested loop join, also known as an Apply Join. It will run the query on the RHS of the join as many times as there are rows on the LHS. The complexity is O(n*m). Hash Join will only run each query once, so the complexity is O(n+m)

Related Issue(s)

#7280

Checklist

  • Tests were added or are not required
  • Documentation was added or is not required

systay and others added 6 commits November 4, 2021 08:34
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
@systay systay changed the title Hash Join gen4: Add hash join primitive and planning Nov 9, 2021
Signed-off-by: Andres Taylor <andres@planetscale.com>
@systay systay marked this pull request as ready for review November 9, 2021 09:56
@systay systay added Component: Query Serving release notes Type: Enhancement Logical improvement (somewhere between a bug and feature) labels Nov 9, 2021
Signed-off-by: Andres Taylor <andres@planetscale.com>
go/vt/vtgate/planbuilder/jointree.go Outdated Show resolved Hide resolved
go/vt/vtgate/planbuilder/hash_join.go Show resolved Hide resolved
go/vt/vtgate/planbuilder/jointree.go Outdated Show resolved Hide resolved
Signed-off-by: Andres Taylor <andres@planetscale.com>
systay and others added 8 commits November 15, 2021 10:31
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
@systay systay marked this pull request as draft November 16, 2021 18:28
@systay
Copy link
Collaborator Author

systay commented Nov 16, 2021

We should hide this behind a planner hint until we are comfortable with it being used

Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
@frouioui
Copy link
Member

We should hide this behind a planner hint until we are comfortable with it being used

we can now use the ALLOW_HASH_JOIN directive to allow hash join for a query

@systay systay marked this pull request as ready for review November 17, 2021 08:08
frouioui and others added 2 commits November 17, 2021 09:09
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Andres Taylor <andres@planetscale.com>
go/vt/vtgate/planbuilder/querytree_transformers.go Outdated Show resolved Hide resolved
go/vt/vtgate/semantics/semantic_state.go Show resolved Hide resolved
go/vt/sqlparser/comments.go Outdated Show resolved Hide resolved
go/vt/vtgate/planbuilder/testdata/filter_cases.txt Outdated Show resolved Hide resolved
go/vt/vtgate/engine/join.go Show resolved Hide resolved
go/vt/vtgate/planbuilder/querytree_transformers.go Outdated Show resolved Hide resolved
go/vt/vtgate/planbuilder/querytree_transformers.go Outdated Show resolved Hide resolved
go/vt/vtgate/planbuilder/querytree_transformers.go Outdated Show resolved Hide resolved
go/vt/vtgate/planbuilder/querytree_transformers.go Outdated Show resolved Hide resolved
Copy link
Member

@harshit-gangal harshit-gangal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be interested to see 5-6 different kinds of queries in the end-to-end test.

go/vt/vtgate/engine/hash_join.go Outdated Show resolved Hide resolved
vmg and others added 3 commits November 17, 2021 13:12
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Comment on lines +224 to +228
typ, err := CoerceTo(v1.Type(), v2.Type()) // TODO systay we should add a method where this decision is done at plantime
if err != nil {
return 0, err
}
v1cast, err := castTo(v1, typ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the TODO,
what can at least be done is to pass the coerceTo value directly to the NullsafeCompare method and calculate that once in the engine.

Signed-off-by: Andres Taylor <andres@planetscale.com>
@@ -28,37 +28,37 @@ var _ Primitive = (*Distinct)(nil)

// Distinct Primitive is used to uniqueify results
type Distinct struct {
Source Primitive
Source Primitive
ColCollations []collations.ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use a map over here which can simplify the access and also prevent any index out of range errors easily.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when executing things in the runtime, we want them as fast as possible. A map is actually slower than accessing a position by offset in a slice, so I'll keep the collations as a slice here

go/vt/vtgate/engine/distinct.go Show resolved Hide resolved
Signed-off-by: Andres Taylor <andres@planetscale.com>
…y on the hint for now

Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
@systay systay merged commit ae58d49 into vitessio:main Nov 22, 2021
@systay systay deleted the hash-join branch November 22, 2021 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants