SQL-2622: Implement multi-column COUNT #28

mattChiaravalloti · 2025-02-19T17:48:34Z

This PR adds support for multi-column COUNT. It supports this for distinct and non-distinct COUNT. It also updates the desugarer to properly handle single-column COUNT when that column MAY/MUST be a document, and to avoid conditionally checking literal values. It also added some new exprs and mql functions to the air since we needed them for our desugaring ($map, $objectToArray, $allElementsTrue, etc. you'll see them when you review).

This ended up being a bit larger than originally anticipated mainly because, in addition to implementing multi-column COUNT, I covered the other things described above. These changes touch a lot of files but the core of the new work is still in the accumulators.rs desugarer, the query spec tests, and a new rewriter pass.

… tests

…, and codegen

…s a literal value

…n, single non-doc column, and multi column

mattChiaravalloti · 2025-02-19T17:51:25Z

agg-ast/ast/src/definitions.rs

@@ -440,6 +440,7 @@ pub enum GroupAccumulatorExpr {
    SQLAccumulator {
        distinct: bool,
        var: Box<Expression>,
+        arg_is_possibly_doc: Option<String>,


Since the $sql-prefixed accumulators are not "real" MQL syntax, I feel comfortable adding this for testing convenience. Generally, although this file appears first I advise you do not review it first. Consider starting with the spec tests and the desugarer tests and working out from there.

mattChiaravalloti · 2025-02-19T17:52:04Z

mongosql/src/air/agg_ast/ast_definitions.rs

+                            let arg_is_possibly_doc = match arg_is_possibly_doc {
+                                Some(s) if s.to_lowercase() == "must" => Satisfaction::Must,
+                                Some(s) if s.to_lowercase() == "may" => Satisfaction::May,
+                                _ => Satisfaction::Not,


This is just for testing, so I think it is ok to play a little fast and loose and use a wildcard.

mattChiaravalloti · 2025-02-19T17:58:58Z

mongosql/src/air/desugarer/accumulators.rs

+            // No need to create a conditional if the argument is a literal value.
+            // We know null will result in the then case and non-null will result in the else.
+            Literal(LiteralValue::Null) => then,
+            Literal(_) => r#else,


This is just a little drive-by optimization that helps avoid things like desugaring COUNT(1) into {$cond: [{$in: [{$type: {$literal: 1}}, ["missing", "null"]}, 0, 1]}. There's really no need to use the conditional when we statically know the value is not null or missing.

There is also the obvious future extension here where we optimize at the mir-level whether or not we need to use null-checking semantics at all but that's for another time.

mattChiaravalloti · 2025-02-19T18:01:53Z

mongosql/src/air/desugarer/testdata/desugar_accumulators.yml

+                    "$cond":
+                      [
+                        {
+                          "$or":


I flipped this from a conjunction of negative conditions to a disjunction of positive ones. This just helped line up the single- and multiple-column cases more easily, especially now that we account for single-columns that may be documents.

mattChiaravalloti · 2025-02-19T18:03:08Z

mongosql/src/algebrizer/test.rs

@@ -5177,6 +5177,7 @@ mod aggregation {
                function: mir::AggregationFunction::Count,
                distinct: false,
                arg: mir::Expression::Literal(mir::LiteralValue::Integer(42)).into(),
+                arg_is_possibly_doc: Satisfaction::Not,


This value is irrelevant for testing for the majority of the codebase. As noted where the field is defined, it is only relevant at desugar time. That's why you keep seeing it set to Not in most unit tests.

mattChiaravalloti · 2025-02-19T18:06:05Z

mongosql/src/ast/rewrites/aggregate.rs

@@ -165,6 +174,59 @@ impl Visitor for AggregateUsageCheckVisitor {
    }
 }

+#[derive(Default)]
+pub struct MultiArgCountVisitor {


This was not in the design doc. The algebrizer rejects aggregations with multiple arguments, and the mir and air only support single-argument accumulator exprs. I figured the smallest change was to syntactically rewrite multi-column COUNTs into COUNT(<doc>), thus making it an aggregation with a single argument. The previous ticket already enabled COUNT with document arguments so this change requires no further work down the line except desugarer updates.

mattChiaravalloti · 2025-02-19T18:07:34Z

mongosql/src/ast/rewrites/aggregate.rs

+                        .into_iter()
+                        .map(|arg| {
+                            let ast::Expression::Identifier(key) = arg.clone() else {
+                                self.error = Some(Error::InvalidMultiArgCountArg);


I decided for now we will only support field references as the multiple arguments. It is doubtful anyone will want to do something like COUNT(a+1, WHEN b CASE 7 THEN "yes" ELSE c END, x). If someone requests this, we can support it then. For now, this is a very convenient restriction.

Yeah that's a good call, no need to add too much complexity if it's not requested.

mattChiaravalloti · 2025-02-19T18:08:22Z

tests/spec_tests/query_tests/group_by.yml

@@ -9,7 +9,7 @@ catalog_data:

    multi:
      - { "_id": 1, "a": 1, "b": 2, "c": 1 }
-      - { "_id": 2, "a": 2, "b": 2, "c": 2 }
+      - { "_id": 2, "a": 2, "b": 2, "c": 3 }


Made this change for the sake of DISTINCT testing.

pmeredit

LGTM! I really like the syntax rewrite here, and now I'm pretty convinced that is how we should handle the new Unwind support, also.

Although, we don't have schema info so we might end up with extra unwinds if we go that way.

bucaojit

The changes look great and the tests are thorough.
I had one question about supporting SELECT COUNT(DISTINCT col,...).

bucaojit · 2025-02-21T21:45:12Z

mongosql/src/algebrizer/definitions.rs

@@ -1106,15 +1107,18 @@ impl<'a> Algebrizer<'a> {
                return Err(Error::StarInNonCount);
            }
            ast::FunctionArguments::Args(ve) => {
+                let arg = if ve.len() != 1 {


I'm testing out the change and I see that it is still throwing an error if I execute the query:

SELECT COUNT(DISTINCT a,b) from multi

I see that we support the SELECT COUNT(DISTINCT *) so I think we'd also want to support the multi-column here by having a separate case for the Count function to allow multiple arguments.

Oh that's surprising! I wonder if it's a rewriter order thing 🤔 Like we do the accumulator rewrite before the select clause one. This spec test obviously passes:

- description: COUNT distinct multi column correctness test -- only counts rows with at least one non-nullish value query: "SELECT * FROM foo.multi AS m GROUP BY a AS a AGGREGATE COUNT(DISTINCT b, c) AS gcount"

so I'll need to figure out what's causing the COUNT(<multi>) in the select position to fail! Great catch, thank you!

bucaojit · 2025-02-21T22:08:52Z

mongosql/src/air/desugarer/accumulators.rs

-            MQLSemanticOperator(MQLSemanticOperator {
-                op: MQLOperator::In,
+// Condition used to desugar a $sqlCount when the argument is not a document. This condition asserts
+// that the arg is null or missing.


Nice use of macros, that helped with keeping the code easy to understand 👍

bucaojit · 2025-02-21T22:13:32Z

mongosql/src/ast/rewrites/aggregate.rs

+                        .into_iter()
+                        .map(|arg| {
+                            let ast::Expression::Identifier(key) = arg.clone() else {
+                                self.error = Some(Error::InvalidMultiArgCountArg);


Yeah that's a good call, no need to add too much complexity if it's not requested.

mattChiaravalloti added 14 commits February 19, 2025 12:49

SQL-2622: Add COUNT multi-column spec tests and update all COUNT spec…

6245515

… tests

SQL-2622: Add allElementsTrue, map, and objectToArray to air, agg_ast…

fb70722

…, and codegen

SQL-2622: Update desugarer to support multi-column count

be7c0c4

SQL-2622: Update desugarer with comments

cd6e499

SQL-2622: Add unit tests for ast rewriter for multi-arg count

52233f7

SQL-2622: Implement ast rewriter for multi-arg count

56ceb9a

SQL-2622: Optimize desugarer to not use conditional when single arg i…

a1faf56

…s a literal value

SQL-2622: Introduce arg_is_possible_doc field to AggFunc

3426679

SQL-2622: Fix map codegen

67a20f4

SQL-2622: Update name and set value at algebrization time

ef1c550

SQL-2622: Introduce arg_is_possibly_doc field to air

e81e114

SQL-2622: Refactor is_possibly_doc from bool to Satisfaction value

dd80e4c

SQL-2622: Update desugarer to ensure count works for single doc colum…

24e4285

…n, single non-doc column, and multi column

SQL-2622: Fix lingering unit test and clippy mistakes

a4bf705

mattChiaravalloti force-pushed the SQL-2622 branch from c4839a9 to a4bf705 Compare February 19, 2025 17:50

mattChiaravalloti commented Feb 19, 2025

View reviewed changes

mattChiaravalloti requested review from pmeredit and bucaojit February 19, 2025 18:11

pmeredit approved these changes Feb 21, 2025

View reviewed changes

bucaojit requested changes Feb 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL-2622: Implement multi-column COUNT #28

SQL-2622: Implement multi-column COUNT #28

mattChiaravalloti commented Feb 19, 2025 •

edited

Loading

mattChiaravalloti Feb 19, 2025

mattChiaravalloti Feb 19, 2025

mattChiaravalloti Feb 19, 2025

mattChiaravalloti Feb 19, 2025

mattChiaravalloti Feb 19, 2025

mattChiaravalloti Feb 19, 2025

mattChiaravalloti Feb 19, 2025

mattChiaravalloti Feb 19, 2025

bucaojit Feb 21, 2025

mattChiaravalloti Feb 19, 2025

pmeredit left a comment •

edited

Loading

bucaojit left a comment

bucaojit Feb 21, 2025

mattChiaravalloti Feb 21, 2025

bucaojit Feb 21, 2025

bucaojit Feb 21, 2025

SQL-2622: Implement multi-column COUNT #28

Are you sure you want to change the base?

SQL-2622: Implement multi-column COUNT #28

Conversation

mattChiaravalloti commented Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmeredit left a comment • edited Loading

Choose a reason for hiding this comment

bucaojit left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattChiaravalloti commented Feb 19, 2025 •

edited

Loading

pmeredit left a comment •

edited

Loading