-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
physicalplan: add support for multi-stage execution of aggregate functions #58347
Comments
@yuzefovich, definitely, that would be awesome to work on this, thanks!
I got the impression that it is impossible to have I'll take a look next year, so if anyone can start working immediately - please go ahead. |
I think that conversation is not directly applicable to this issue since here we are deciding on how to execute a particular logical plan (meaning after the optimizer has already chosen the plan) whereas that convo was about some "transformations" of the logical plan in order to get an equivalent one that has supposedly less cost. Possibly, my usage of a word "decomposition" is a bit misleading since |
Ok, need to dig into the code :). |
Hi @yuzefovich |
hi @piao88, have you already started working on this? May I take it? |
hi @mneverov , I have already started. |
hi @mneverov @yuzefovich I have already finished |
hi, is this issue still ongoing? can i work on this? |
Hi@henrjan ,Of course you can . |
@yuzefovich |
@yuzefovich Sorry, I am no longer interested in this issue, you can give it to someone else. |
thanks @yuzefovich , i'll take a look |
See cockroachdb#58347. Release note (performance improvement): `covar_pop` aggregate function is now evaluated more efficiently in a distributed setting
See cockroachdb#58347. Release note (performance improvement): `covar_pop` aggregate function is now evaluated more efficiently in a distributed setting.
See cockroachdb#58347. Release note (performance improvement): `covar_pop` aggregate function is now evaluated more efficiently in a distributed setting.
73062: physicalplan: add support for multi-stage execution of covar_pop. r=yuzefovich a=mneverov physicalplan: add support for multi-stage execution of covar_pop. See #58347. Release note (performance improvement): `covar_pop` aggregate function is now evaluated more efficiently in a distributed setting. Co-authored-by: Max Neverov <neverov.max@gmail.com>
See cockroachdb#58347. Release note (performance improvement): `covar_pop` aggregate function is now evaluated more efficiently in a distributed setting.
…_sxy, regr_syy. See cockroachdb#58347. Release note (performance improvement): regr_sxx, regr_sxy, regr_syy aggregate functions are now evaluated more efficiently in a distributed setting.
75619: physicalplan: multi-stage for regr_sxx, regr_sxy, regr_syy r=yuzefovich a=mneverov physicalplan: add support for multi-stage execution of regr_sxx, regr_sxy, regr_syy. See #58347. Release note (performance improvement): regr_sxx, regr_sxy, regr_syy aggregate functions are now evaluated more efficiently in a distributed setting. 75626: sql: skip TestDistSQLFlowsVirtualTables r=lidorcarmel a=lidorcarmel Refs: #75208 Reason: flaky test Generated by bin/skip-test. Release justification: non-production code changes Release note: None 75629: batcheval: ignore `MVCCStats.LastUpdateNanos` in `AddSSTable` assertions r=dt a=erikgrinaker Release note: None Co-authored-by: Max Neverov <neverov.max@gmail.com> Co-authored-by: Lidor Carmel <lidor@cockroachlabs.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
…r_avgy. See cockroachdb#58347. Release note (performance improvement): regr_avgx and regr_avgy aggregate functions are now evaluated more efficiently in a distributed setting
…r_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions. See cockroachdb#58347. Release note (performance improvement): regr_avgx, regr_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions are now evaluated more efficiently in a distributed setting
…r_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions. See cockroachdb#58347. Release note (performance improvement): regr_avgx, regr_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions are now evaluated more efficiently in a distributed setting
75870: rfc: pgwire-compatible query cancellation r=otan,knz a=rafiss refs #41335 Release note: None 76007: physicalplan: add support for multi-stage execution of regr_avgx, regr_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions. r=yuzefovich a=mneverov physicalplan: add support for multi-stage execution of regr_avgx, regr_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions. See #58347. Release note (performance improvement): regr_avgx, regr_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions are now evaluated more efficiently in a distributed setting 76115: sql: use 16 as default bucket count for hash index r=chengxiong-ruan a=chengxiong-ruan for some reason I forgot to modify it in my previous pr :( Release note (sql change): 16 is used as the default bucket count for hash sharded index. 76262: server: remove DeprecateBaseEncryptionRegistry r=jbowens a=jbowens Remove the migration server DeprecateBaseEncryptionRegistry RPC. It was used for the migration of the encryption-at-rest registry format in 21.2 and is no longer relevant. Missed this in #74314. Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: Max Neverov <neverov.max@gmail.com> Co-authored-by: Chengxiong Ruan <chengxiongruan@gmail.com> Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
sqrdiff, and regr_count aggregate functions. See cockroachdb#58347. Release note (performance improvement): corr, covar_samp, sqrdiff, and regr_count aggregate functions are now evaluated more efficiently in a distributed setting
sqrdiff, and regr_count aggregate functions. See cockroachdb#58347. Release note (performance improvement): corr, covar_samp, sqrdiff, and regr_count aggregate functions are now evaluated more efficiently in a distributed setting
sqrdiff, and regr_count aggregate functions. Fixes cockroachdb#58347. Release note (performance improvement): corr, covar_samp, sqrdiff, and regr_count aggregate functions are now evaluated more efficiently in a distributed setting
sqrdiff, and regr_count aggregate functions. Fixes cockroachdb#58347. Release note (performance improvement): corr, covar_samp, sqrdiff, and regr_count aggregate functions are now evaluated more efficiently in a distributed setting
…77019 #77045 #77047 #77049 72925: sql, cli: support basic auto complete for sql keywords r=rafiss a=RichardJCai sql: add SHOW COMPLETIONS AT offset FOR syntax Release note (sql change): Support SHOW COMPLETIONS AT OFFSET <offset> FOR <stmt> syntax that returns a set of SQL keywords that can complete the keyword at <offset> in the given <stmt>. If the offset is in the middle of a word, then it returns the full word. For example SHOW COMPLETIONS AT OFFSET 1 FOR "SELECT" returns select. cli: support autocomplete Release note (cli change): CLI now auto completes on tab by using `SHOW COMPLETIONS AT OFFSET`. 76539: cli: Enable profiles and other debug info for tenants r=rimadeodhar a=rimadeodhar This PR updates debug.zip functionality to collect goroutine stacks and profiles for all active SQL instances for a tenant. This PR also addresses a bug where the nodes.json and status.json data was not getting populated correctly due to the switch to the `NodesList` API. This bug has been addressed by using the p;d `Nodes` API when the debug zip command is run against a storage server. Release note: None 76676: telemetry,sql: remove redaction from operational sql data r=abarganier a=dhartunian Previously, when redaction was introduced into CRDB, all unidentifiable strings were marked as redacted since that was the safer approach. We expected to later return with a closer look and differentiate more carefully between what should be redacted and what shouldn't. SQL names have been identified as operational-sensitive data that should not be redacted since it provides very useful debugging information and, while user-controlled, do not typically contain user-data since those would be stored in a Datum. This commit marks names as safe from redaction for telemetry logging in cases where the `sql.telemetry.query_sampling.enabled` cluster setting is enabled. Additionally, some log tags such as client IP addresses are not to be considered sensitive and are critical to debugging operational issues. They have also been marked as safe. In order to help with documenting these cases, a helper `SafeOperational()` has been added to the `log` package. This helps us mark strings as safe while documenting *why* we're doing so. Resolves #76595 Release note (security update, ops change): When the `sql.telemetry.query_sampling.enabled` cluster setting is enabled, SQL names and client IPs are no longer redacted in telemetry logs. 76754: physicalplan: add support for multi-stage execution of corr, covar_samp, sqrdiff, and regr_count aggregate functions. r=yuzefovich a=mneverov Fixes: #58347. Release note (performance improvement): corr, covar_samp, sqrdiff, and regr_count aggregate functions are now evaluated more efficiently in a distributed setting 76908: roachtest: update 22.1 version map to v21.2.6 r=bananabrick a=bananabrick Release note: None 76948: spanconfigreconciler{ccl}: apply system span config diffs to the store r=arulajmani a=adityamaru This change teaches the reconciler about system span configs. Concretely, we make the following changes: - A full reconciliation when checking for existing span configurations now asks for SpanConfigs corresponding to the SystemTargets relevant to the tenant. For the host tenant this includes the SystemTarget for the `entire-keyspace` as well as the SystemTarget for span configs installed by the host tenant on its tenant keyspace, and on other secondary tenant keyspaces. For secondary tenants this only includes the SystemTarget for span configs installed by it on its own tenant keyspace. - During incremental reconciliation, before applying our updates to the Store, we now also check for "missing protected timestamp system targets". These correspond to protected timestamp records that target a `Cluster` or a `Tenant` but no longer exist in the system.protected_ts_records table as they have been released by the client. For every such unique missing system target we apply a spanconfig.Deletion to the Store. In order to make the above possible, this change moves the ptsStateReader from the `spanconfigsqltranslator` package, to the top level `spanconfig` package. Informs: #73727 Release note: None 76990: sql, geo: Fix upper case geohash parsing and allow NULL arguments r=otan a=RichardJCai Release note (sql change): ST_Box2DFromGeoHash now accepts NULL arguments, the precision being NULL is the same as no precision being passed in at all. Upper case characters are now parsed as lower case characters for geohash, this matches Postgis behaviour. Resolves #75537 77006: changefeedccl: Fix data race in a test. r=miretskiy a=miretskiy Fix data race in TestChangefeedSendError. Release Notes: None 77014: log: Set content type header for http sink r=dhartunian a=rimadeodhar The content type header for the output of HTTP log sink is always set to text/plain irrespective of the log format. If the log format is JSON, we should set the content type to be application/json. Release note (bug fix): The content type header for the HTTP log sink is set to application/json if the format of the log output is JSON. 77019: sql: update tpcc and tpch stats r=rharding6373 a=rharding6373 This change rewrites the stats for tpcc and tpch to include the new table statistic avg_size. Fixes: #72332 Release note: None 77045: bazel: bump `rules_go` to pick up cockroachdb/rules_go#4 r=irfansharif a=rickystewart Closes #77037. Release note: None 77047: dev: make sure we inherit `stdout` and `stderr` when appropriate r=postamar,irfansharif a=rickystewart Otherwise `bazel build`/`test` output will be ugly (won't be colored/ use `ncurses`/etc.) Release note: None 77049: sql: skip flaky schema_changer/drop_database_cascade test r=postamar a=RichardJCai Release note: None Co-authored-by: richardjcai <caioftherichard@gmail.com> Co-authored-by: rimadeodhar <rima@cockroachlabs.com> Co-authored-by: David Hartunian <davidh@cockroachlabs.com> Co-authored-by: Max Neverov <neverov.max@gmail.com> Co-authored-by: Arjun Nair <nair@cockroachlabs.com> Co-authored-by: Aditya Maru <adityamaru@gmail.com> Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com> Co-authored-by: rharding6373 <harding@cockroachlabs.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
…r_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions. See cockroachdb#58347. Release note (performance improvement): regr_avgx, regr_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions are now evaluated more efficiently in a distributed setting
sqrdiff, and regr_count aggregate functions. Fixes cockroachdb#58347. Release note (performance improvement): corr, covar_samp, sqrdiff, and regr_count aggregate functions are now evaluated more efficiently in a distributed setting
Most aggregate functions support multi-stage distributed execution in which on every node we perform a local aggregation, then we route all the intermediate results to all of the nodes (according to the grouping columns), and then we perform a final aggregation that gives us the result. For some aggregate functions the decomposition into local and final stages is trivial - e.g. in order to calculate
sum
we need to perform localsum
and then the finalsum
; for other functions it is a bit more involved - e.g. in order to computeavg
we calculate partialsum
andcount
locally, then we finalizesum
andcount
at the final stage and we render the result asfinal_sum / final_count
.(I believe Postgres calls the multi-stage execution a partial mode.)
The decomposition of supported functions is defined here
cockroach/pkg/sql/physicalplan/aggregator_funcs.go
Line 87 in 9f678e7
We are currently missing the following functions:
sqrdiff
(I think it should be possible to decompose it)corr
var_pop
stddev_pop
covar_pop
covar_samp
regr_intercept
regr_r2
regr_slope
regr_sxx
regr_syy
regr_sxy
regr_count
regr_avgx
regr_avgy
avg
andvariance
are good examples of how more complicated decompositions are handled.cc @mneverov in case you're interested
Jira issue: CRDB-3407
The text was updated successfully, but these errors were encountered: