Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sql.defaults.experimental_vectorized setting to experimental features list #4953

Closed
jseldess opened this issue Jun 20, 2019 · 4 comments · Fixed by #5026
Closed

Add sql.defaults.experimental_vectorized setting to experimental features list #4953

jseldess opened this issue Jun 20, 2019 · 4 comments · Fixed by #5026
Assignees
Labels
C-doc-improvement O-support Internal source: Support P-1 High priority; must be done this release
Milestone

Comments

@jseldess
Copy link
Contributor

sql.defaults.experimental_vectorize                      | off                     | e            | default experimental_vectorize mode [off = 0, on = 1, always = 2]

Customer asked about the dangers of using this in production. Not clear. We should add to https://www.cockroachlabs.com/docs/dev/experimental-features.html with some details.

I answered:

In the meantime, we just posted a blog on vectorizing the merge joiner that might add some clarity: https://www.cockroachlabs.com/blog/vectorizing-the-merge-joiner-in-cockroachdb/

I think the experimental status pertains the underlying vectorization engine more generally. You can track the work left to be done in this GitHub issue, if you like: cockroachdb/cockroach#36507

I see this at the end:

the bottleneck becomes not the joiner itself but the speed at which data can be read from the disk. This is still important as a more optimized merge joiner can free up CPU cycles to be used elsewhere, on another query perhaps.

@jseldess jseldess added A-sql P-1 High priority; must be done this release O-support Internal source: Support C-doc-improvement labels Jun 20, 2019
@jseldess jseldess added this to the 19.2 milestone Jun 20, 2019
@jseldess
Copy link
Contributor Author

The experimental status is mostly due to the fact that vectorization is still under-tested on our end in general.

@asubiotto also called out that we don’t yet monitor memory used or spill to disk for large joins, so there is a possibility of crashing a server when doing those, and we don’t yet distributed joins, which increases the chances of this problem.

@asubiotto
Copy link
Contributor

Note that distribution of joins is something that will make it in in the next couple of days: cockroachdb/cockroach#38233, so shouldn't be called out as a limitation

rmloveland added a commit that referenced this issue Jul 8, 2019
... along with some context about what the feature is, how it works (by
linking to George's blog post), and a list of current limitations.

Fixes #4953.
rmloveland added a commit that referenced this issue Jul 9, 2019
... along with some context about what the feature is, how it works (by
linking to George's blog post), and a list of current limitations.

Fixes #4953.
rmloveland added a commit that referenced this issue Jul 18, 2019
... along with some context about what the feature is, how it works (by
linking to George's blog post), and a list of current limitations.

Fixes #4953.
@yuzefovich
Copy link
Member

A quick note: we have just merged a PR (cockroachdb/cockroach#38777) that renamed experimental_vectorize to just vectorize. It also added a new possible mode of vectorization, and now all modes are:

  • off - the vectorized engine is disabled, all queries go through the row execution engine.
  • auto (which is now the default choice) - all queries consisting only of streaming operators (a streaming operator is such that doesn't require any buffering) are executed through the vectorized engine whereas all others are run through the row execution engine.
  • experimental_on - all queries that are supported in the vectorized engine (both streaming and non-streaming) are run through it. It is "experimental" because we still don't have disk spilling (which means that large queries can get out of memory error and crash the node).
  • experimental_always - absolutely all queries are forced to run through the vectorized engine. If the engine doesn't support the query, it errors out. The only exception is SET queries so that vectorize setting can be changed.

@rmloveland
Copy link
Contributor

Thanks! just created #5141 to address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-doc-improvement O-support Internal source: Support P-1 High priority; must be done this release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants