physicalplan: add support for multi-stage execution of aggregate functions #58347

yuzefovich · 2020-12-29T18:34:52Z

mneverov · 2020-12-29T18:53:16Z

@yuzefovich, definitely, that would be awesome to work on this, thanks!
Although, from the conversation:

This doesn't look right. I think this entire thing only makes sense for single-arg operators anyway,

I got the impression that it is impossible to have merge operation for multi-arg operators.

I'll take a look next year, so if anyone can start working immediately - please go ahead.

yuzefovich · 2020-12-29T19:00:57Z

I think that conversation is not directly applicable to this issue since here we are deciding on how to execute a particular logical plan (meaning after the optimizer has already chosen the plan) whereas that convo was about some "transformations" of the logical plan in order to get an equivalent one that has supposedly less cost.

Possibly, my usage of a word "decomposition" is a bit misleading since AggregatesCanMerge also talks about "decomposition".

mneverov · 2020-12-29T19:12:53Z

Ok, need to dig into the code :).

piao88 · 2020-12-30T01:53:52Z

Hi @yuzefovich
I'm new to the community and would like to pick this up.

mneverov · 2021-01-05T11:41:13Z

hi @piao88, have you already started working on this? May I take it?

piao88 · 2021-01-08T01:21:34Z

hi @mneverov , I have already started.

yuzefovich · 2021-01-10T22:10:02Z

@piao88 @mneverov I think this issue is big enough for both of you to work on it, just make sure to communicate which functions you are working on.

piao88 · 2021-01-11T10:16:25Z

hi @mneverov @yuzefovich I have already finished var_pop and stddev_pop

…tions cockroachdb#58347

henrjan · 2021-03-17T17:24:52Z

hi, is this issue still ongoing? can i work on this?

yuzefovich · 2021-03-17T18:06:56Z

Hi @henrjan, thanks for the interest! Let's check in with @piao88 who have already opened up a PR (#59174) that addresses this issue.

@piao88 are you still interested in pushing #59174 forward or would you be ok if someone took over this issue and possibly your PR?

piao88 · 2021-03-18T01:25:30Z

Hi@henrjan ,Of course you can .

alimoosavi · 2021-11-18T19:03:35Z

@yuzefovich
I am new to cockroachdb project and this issue seems attractive to me , please assign this issue to me.

yuzefovich · 2021-11-18T19:08:37Z

@mneverov are you still interested in working on this issue? Looks like @piao88 made some progress in #59174, but that PR seems abandoned, so I'd assign the issue to @mneverov if they are still interested.

piao88 · 2021-11-19T02:15:58Z

@yuzefovich Sorry, I am no longer interested in this issue, you can give it to someone else.

mneverov · 2021-11-20T14:30:37Z

thanks @yuzefovich , i'll take a look

See cockroachdb#58347. Release note (performance improvement): `covar_pop` aggregate function is now evaluated more efficiently in a distributed setting

See cockroachdb#58347. Release note (performance improvement): `covar_pop` aggregate function is now evaluated more efficiently in a distributed setting.

73062: physicalplan: add support for multi-stage execution of covar_pop. r=yuzefovich a=mneverov physicalplan: add support for multi-stage execution of covar_pop. See #58347. Release note (performance improvement): `covar_pop` aggregate function is now evaluated more efficiently in a distributed setting. Co-authored-by: Max Neverov <neverov.max@gmail.com>

See cockroachdb#58347. Release note (performance improvement): `covar_pop` aggregate function is now evaluated more efficiently in a distributed setting.

…_sxy, regr_syy. See cockroachdb#58347. Release note (performance improvement): regr_sxx, regr_sxy, regr_syy aggregate functions are now evaluated more efficiently in a distributed setting.

75619: physicalplan: multi-stage for regr_sxx, regr_sxy, regr_syy r=yuzefovich a=mneverov physicalplan: add support for multi-stage execution of regr_sxx, regr_sxy, regr_syy. See #58347. Release note (performance improvement): regr_sxx, regr_sxy, regr_syy aggregate functions are now evaluated more efficiently in a distributed setting. 75626: sql: skip TestDistSQLFlowsVirtualTables r=lidorcarmel a=lidorcarmel Refs: #75208 Reason: flaky test Generated by bin/skip-test. Release justification: non-production code changes Release note: None 75629: batcheval: ignore `MVCCStats.LastUpdateNanos` in `AddSSTable` assertions r=dt a=erikgrinaker Release note: None Co-authored-by: Max Neverov <neverov.max@gmail.com> Co-authored-by: Lidor Carmel <lidor@cockroachlabs.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>

…r_avgy. See cockroachdb#58347. Release note (performance improvement): regr_avgx and regr_avgy aggregate functions are now evaluated more efficiently in a distributed setting

…r_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions. See cockroachdb#58347. Release note (performance improvement): regr_avgx, regr_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions are now evaluated more efficiently in a distributed setting

75870: rfc: pgwire-compatible query cancellation r=otan,knz a=rafiss refs #41335 Release note: None 76007: physicalplan: add support for multi-stage execution of regr_avgx, regr_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions. r=yuzefovich a=mneverov physicalplan: add support for multi-stage execution of regr_avgx, regr_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions. See #58347. Release note (performance improvement): regr_avgx, regr_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions are now evaluated more efficiently in a distributed setting 76115: sql: use 16 as default bucket count for hash index r=chengxiong-ruan a=chengxiong-ruan for some reason I forgot to modify it in my previous pr :( Release note (sql change): 16 is used as the default bucket count for hash sharded index. 76262: server: remove DeprecateBaseEncryptionRegistry r=jbowens a=jbowens Remove the migration server DeprecateBaseEncryptionRegistry RPC. It was used for the migration of the encryption-at-rest registry format in 21.2 and is no longer relevant. Missed this in #74314. Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: Max Neverov <neverov.max@gmail.com> Co-authored-by: Chengxiong Ruan <chengxiongruan@gmail.com> Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>

sqrdiff, and regr_count aggregate functions. See cockroachdb#58347. Release note (performance improvement): corr, covar_samp, sqrdiff, and regr_count aggregate functions are now evaluated more efficiently in a distributed setting

sqrdiff, and regr_count aggregate functions. Fixes cockroachdb#58347. Release note (performance improvement): corr, covar_samp, sqrdiff, and regr_count aggregate functions are now evaluated more efficiently in a distributed setting

…77019 #77045 #77047 #77049 72925: sql, cli: support basic auto complete for sql keywords r=rafiss a=RichardJCai sql: add SHOW COMPLETIONS AT offset FOR syntax Release note (sql change): Support SHOW COMPLETIONS AT OFFSET <offset> FOR <stmt> syntax that returns a set of SQL keywords that can complete the keyword at <offset> in the given <stmt>. If the offset is in the middle of a word, then it returns the full word. For example SHOW COMPLETIONS AT OFFSET 1 FOR "SELECT" returns select. cli: support autocomplete Release note (cli change): CLI now auto completes on tab by using `SHOW COMPLETIONS AT OFFSET`. 76539: cli: Enable profiles and other debug info for tenants r=rimadeodhar a=rimadeodhar This PR updates debug.zip functionality to collect goroutine stacks and profiles for all active SQL instances for a tenant. This PR also addresses a bug where the nodes.json and status.json data was not getting populated correctly due to the switch to the `NodesList` API. This bug has been addressed by using the p;d `Nodes` API when the debug zip command is run against a storage server. Release note: None 76676: telemetry,sql: remove redaction from operational sql data r=abarganier a=dhartunian Previously, when redaction was introduced into CRDB, all unidentifiable strings were marked as redacted since that was the safer approach. We expected to later return with a closer look and differentiate more carefully between what should be redacted and what shouldn't. SQL names have been identified as operational-sensitive data that should not be redacted since it provides very useful debugging information and, while user-controlled, do not typically contain user-data since those would be stored in a Datum. This commit marks names as safe from redaction for telemetry logging in cases where the `sql.telemetry.query_sampling.enabled` cluster setting is enabled. Additionally, some log tags such as client IP addresses are not to be considered sensitive and are critical to debugging operational issues. They have also been marked as safe. In order to help with documenting these cases, a helper `SafeOperational()` has been added to the `log` package. This helps us mark strings as safe while documenting *why* we're doing so. Resolves #76595 Release note (security update, ops change): When the `sql.telemetry.query_sampling.enabled` cluster setting is enabled, SQL names and client IPs are no longer redacted in telemetry logs. 76754: physicalplan: add support for multi-stage execution of corr, covar_samp, sqrdiff, and regr_count aggregate functions. r=yuzefovich a=mneverov Fixes: #58347. Release note (performance improvement): corr, covar_samp, sqrdiff, and regr_count aggregate functions are now evaluated more efficiently in a distributed setting 76908: roachtest: update 22.1 version map to v21.2.6 r=bananabrick a=bananabrick Release note: None 76948: spanconfigreconciler{ccl}: apply system span config diffs to the store r=arulajmani a=adityamaru This change teaches the reconciler about system span configs. Concretely, we make the following changes: - A full reconciliation when checking for existing span configurations now asks for SpanConfigs corresponding to the SystemTargets relevant to the tenant. For the host tenant this includes the SystemTarget for the `entire-keyspace` as well as the SystemTarget for span configs installed by the host tenant on its tenant keyspace, and on other secondary tenant keyspaces. For secondary tenants this only includes the SystemTarget for span configs installed by it on its own tenant keyspace. - During incremental reconciliation, before applying our updates to the Store, we now also check for "missing protected timestamp system targets". These correspond to protected timestamp records that target a `Cluster` or a `Tenant` but no longer exist in the system.protected_ts_records table as they have been released by the client. For every such unique missing system target we apply a spanconfig.Deletion to the Store. In order to make the above possible, this change moves the ptsStateReader from the `spanconfigsqltranslator` package, to the top level `spanconfig` package. Informs: #73727 Release note: None 76990: sql, geo: Fix upper case geohash parsing and allow NULL arguments r=otan a=RichardJCai Release note (sql change): ST_Box2DFromGeoHash now accepts NULL arguments, the precision being NULL is the same as no precision being passed in at all. Upper case characters are now parsed as lower case characters for geohash, this matches Postgis behaviour. Resolves #75537 77006: changefeedccl: Fix data race in a test. r=miretskiy a=miretskiy Fix data race in TestChangefeedSendError. Release Notes: None 77014: log: Set content type header for http sink r=dhartunian a=rimadeodhar The content type header for the output of HTTP log sink is always set to text/plain irrespective of the log format. If the log format is JSON, we should set the content type to be application/json. Release note (bug fix): The content type header for the HTTP log sink is set to application/json if the format of the log output is JSON. 77019: sql: update tpcc and tpch stats r=rharding6373 a=rharding6373 This change rewrites the stats for tpcc and tpch to include the new table statistic avg_size. Fixes: #72332 Release note: None 77045: bazel: bump `rules_go` to pick up cockroachdb/rules_go#4 r=irfansharif a=rickystewart Closes #77037. Release note: None 77047: dev: make sure we inherit `stdout` and `stderr` when appropriate r=postamar,irfansharif a=rickystewart Otherwise `bazel build`/`test` output will be ugly (won't be colored/ use `ncurses`/etc.) Release note: None 77049: sql: skip flaky schema_changer/drop_database_cascade test r=postamar a=RichardJCai Release note: None Co-authored-by: richardjcai <caioftherichard@gmail.com> Co-authored-by: rimadeodhar <rima@cockroachlabs.com> Co-authored-by: David Hartunian <davidh@cockroachlabs.com> Co-authored-by: Max Neverov <neverov.max@gmail.com> Co-authored-by: Arjun Nair <nair@cockroachlabs.com> Co-authored-by: Aditya Maru <adityamaru@gmail.com> Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com> Co-authored-by: rharding6373 <harding@cockroachlabs.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>

…r_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions. See cockroachdb#58347. Release note (performance improvement): regr_avgx, regr_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions are now evaluated more efficiently in a distributed setting

sqrdiff, and regr_count aggregate functions. Fixes cockroachdb#58347. Release note (performance improvement): corr, covar_samp, sqrdiff, and regr_count aggregate functions are now evaluated more efficiently in a distributed setting

yuzefovich added good first issue meta-issue Contains a list of several other issues. labels Dec 29, 2020

blathers-crl bot added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Dec 29, 2020

cockroachdb deleted a comment from blathers-crl bot Dec 29, 2020

piao88 mentioned this issue Jan 20, 2021

physicalplan: add support for multi-stage execution of aggregate func… #59174

Closed

piao88 pushed a commit to piao88/cockroach that referenced this issue Jan 25, 2021

physicalplan: add support for multi-stage execution of aggregate func…

ab09c9b

…tions cockroachdb#58347

piao88 pushed a commit to piao88/cockroach that referenced this issue Jan 26, 2021

physicalplan: add support for multi-stage execution of aggregate func…

f311efd

…tions cockroachdb#58347

yuzefovich mentioned this issue Apr 30, 2021

sql: some queries in aggregate logic test take unreasonably long time #64455

Closed

jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021

yuzefovich assigned mneverov Nov 20, 2021

mneverov mentioned this issue Nov 22, 2021

physicalplan: add support for multi-stage execution of covar_pop. #73062

Merged

mneverov mentioned this issue Jan 27, 2022

physicalplan: multi-stage for regr_sxx, regr_sxy, regr_syy #75619

Merged

mneverov mentioned this issue Feb 4, 2022

physicalplan: add support for multi-stage execution of regr_avgx, regr_avgy, regr_intercept, regr_r2, and regr_slope aggregate functions. #76007

Merged

mneverov mentioned this issue Feb 17, 2022

physicalplan: add support for multi-stage execution of corr, covar_samp, sqrdiff, and regr_count aggregate functions. #76754

Merged

craig bot closed this as completed in 6fb88ef Feb 25, 2022

jlinder added the sync-me-3 label May 24, 2022

jlinder unassigned mneverov May 25, 2022

jlinder added the sync-me-4 label May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

physicalplan: add support for multi-stage execution of aggregate functions #58347

physicalplan: add support for multi-stage execution of aggregate functions #58347

yuzefovich commented Dec 29, 2020 •

edited by msirek

Loading

mneverov commented Dec 29, 2020

yuzefovich commented Dec 29, 2020

mneverov commented Dec 29, 2020

piao88 commented Dec 30, 2020

mneverov commented Jan 5, 2021

piao88 commented Jan 8, 2021

yuzefovich commented Jan 10, 2021

piao88 commented Jan 11, 2021

henrjan commented Mar 17, 2021

yuzefovich commented Mar 17, 2021

piao88 commented Mar 18, 2021

alimoosavi commented Nov 18, 2021

yuzefovich commented Nov 18, 2021

piao88 commented Nov 19, 2021

mneverov commented Nov 20, 2021

physicalplan: add support for multi-stage execution of aggregate functions #58347

physicalplan: add support for multi-stage execution of aggregate functions #58347

Comments

yuzefovich commented Dec 29, 2020 • edited by msirek Loading

mneverov commented Dec 29, 2020

yuzefovich commented Dec 29, 2020

mneverov commented Dec 29, 2020

piao88 commented Dec 30, 2020

mneverov commented Jan 5, 2021

piao88 commented Jan 8, 2021

yuzefovich commented Jan 10, 2021

piao88 commented Jan 11, 2021

henrjan commented Mar 17, 2021

yuzefovich commented Mar 17, 2021

piao88 commented Mar 18, 2021

alimoosavi commented Nov 18, 2021

yuzefovich commented Nov 18, 2021

piao88 commented Nov 19, 2021

mneverov commented Nov 20, 2021

yuzefovich commented Dec 29, 2020 •

edited by msirek

Loading