Aggregation filter using JOIN #4009

egor-ryashin · 2024-02-09T18:33:37Z

No description provided.

begelundmuller · 2024-02-13T11:19:47Z

runtime/queries/metricsview_aggregation.go

+	joinConditions := make([]string, 0, len(q.Dimensions))
+	selfJoinCols := make([]string, 0, len(q.Dimensions)+1)
+	selfJoinTableAlias := tempName("self_join")
+	nonNullValue := tempName("non_null")
 	var args []any
+	var subSelectArgs []any


I'm afraid this takes the code to a barely readable level and will make future changes too hard and error-prone. I don't have a great proposal, but somehow this filter case needs to be isolated better in the code (and not interleaved with the existing implementation with various if statements throughout) and also have an appropriate docstring explaining the case being handled.

I wonder if it can either be refactored to a single if statement at the end of the buildMetricsAggregationSQL function, or otherwise we may need to have a dedicated buildMetricsAggregationSingleMeasureFilterSQL function that gets invoked instead?

args collection prevents from refactoring it to a single if

begelundmuller · 2024-02-13T11:23:09Z

runtime/queries/metricsview_aggregation.go

+				SELECT %[1]s FROM (
+					SELECT %[10]s FROM %[2]s %[3]s %[4]s %[5]s %[6]s 
+				) %[2]s 
+				LEFT JOIN (
+					SELECT %[10]s FROM %[2]s %[3]s %[9]s %[5]s %[6]s
+				) %[7]s 
+				ON (%[8]s) %[13]s %[11]s  OFFSET %[12]d
+				`,
+				strings.Join(selfJoinCols, ", "),      // 1
+				safeName(mv.Table),                    // 2
+				strings.Join(unnestClauses, ""),       // 3
+				whereClause,                           // 4
+				groupClause,                           // 5
+				havingClause,                          // 6


Wouldn't the HAVING clause need to be applied after the left join?

begelundmuller · 2024-02-13T11:23:19Z

runtime/queries/metricsview_aggregation.go

+			)
+
+			args = append(args, subSelectArgs...)
+			fmt.Println("sql ", sql, args)


Leftover print statement

begelundmuller

This looks cleaner. Have you also validated that it works on Druid?

begelundmuller · 2024-02-13T16:55:22Z

runtime/queries/metricsview_aggregation.go

+			This JOIN mirrors functionality of SELECT d1, d2, d3, m1 FILTER (WHERE d4 = 'Safari') FROM t WHERE... GROUP BY d1, d2, d3
+			bacause FILTER cannot be applied for arbitrary measure, ie sum(a)/1000
+		*/
+		if len(q.Measures) == 1 && q.Measures[0].Filter != nil {


We should return an error if more than 1 measure has a filter (since that's not supported) or if it is used with pivot_on

begelundmuller · 2024-02-13T16:56:33Z

runtime/queries/metricsview_aggregation.go

+					SELECT %[1]s FROM (
+						SELECT %[10]s FROM %[2]s %[3]s %[4]s %[5]s %[6]s 
+					) %[2]s 
+					LEFT JOIN (
+						SELECT %[10]s FROM %[2]s %[3]s %[9]s %[5]s %[6]s
+					) %[7]s 
+					ON (%[8]s) %[13]s %[11]s  OFFSET %[12]d


I might be missing something, but the having clause seems to be applied in the sub-selects, but shouldn’t it be outside, after the join? For example, if you have having measure1 < 10, some dimension values might be excluded from the first sub-selects which are included in the second sub-select.

begelundmuller · 2024-02-13T16:57:52Z

runtime/queries/metricsview_aggregation.go

+				selfJoinTableAlias,                    // 7
+				strings.Join(joinConditions, " AND "), // 8
+				measureWhereClause,                    // 9
+				strings.Join(selectCols, ", "),        // 10


If using the same selectCols in both sub-selects, then won't it be computing the measure twice, one without the measure filter and the second with the measure filter? The non-filtered value would not be used, so should be cheaper to not compute it.

This comment is still unaddressed

I'm not sure it will be much more performant than "SELECT DISTINCT" and it requires additional algorithm branching

begelundmuller · 2024-02-15T12:17:16Z

runtime/queries/metricsview.go

+				splits := strings.Split(alias.Name, ".")
+				if len(splits) > 1 {
+					return safeName(splits[0]) + "." + safeName(splits[1]), true
+				}


This isn't safe since the name can contain dots

The other way I need to refactor it to pass information that it's a complex name

begelundmuller · 2024-02-15T13:41:16Z

runtime/queries/metricsview_aggregation.go

+		/*
+			Example:
+			SELECT t.d1, t.d2, t.d3, t2.m1 (SELECT d1, d2, d3, m1 FROM t WHERE ...  GROUP BY d1, d2, d3 HAVING m1 > 10 ) t LEFT JOIN (
+				SELECT d1, d2, d3, m1 FROM t WHERE ... AND (d4 = 'Safari') GROUP BY d1, d2, d3 HAVING m1 > 10
+			)  t2 ON (COALESCE(t.d1, 'val') = COALESCE(t2.d1, 'val') and COALESCE(t.d2, 'val') = COALESCE(t2.d2, 'val') and ...)
+			WHERE t2.m1 > 10
+			ORDER BY ...
+			LIMIT 100
+			OFFSET 0
+
+			This JOIN mirrors functionality of SELECT d1, d2, d3, m1 FILTER (WHERE d4 = 'Safari') FROM t WHERE... GROUP BY d1, d2, d3
+			bacause FILTER cannot be applied for arbitrary measure, ie sum(a)/1000
+		*/
+		if filterCount == 1 {


Can we pull this case into a separate function? The level of nesting and complexity here makes it hard to follow

it will be a long signature function

begelundmuller · 2024-02-15T13:43:02Z

runtime/queries/metricsview_aggregation.go

+				selfJoinTableAlias,                    // 7
+				strings.Join(joinConditions, " AND "), // 8
+				measureWhereClause,                    // 9
+				strings.Join(selectCols, ", "),        // 10


This comment is still unaddressed

begelundmuller · 2024-02-15T13:45:03Z

runtime/queries/metricsview_aggregation.go

+					SELECT %[1]s FROM (
+						SELECT %[10]s FROM %[2]s %[3]s %[4]s %[5]s %[6]s 
+					) %[2]s 
+					LEFT JOIN (
+						SELECT %[10]s FROM %[2]s %[3]s %[9]s %[5]s %[6]s


%[6]s represents the having clause – does it make sense to have it in the subqueries when it will be enforced after the left join?

At least in the first sub-query, we should be able to completely remove calculation of the measure (see other comment), so having it there seems weird.

assuming Druid won't push down the WHERE, the subselect generates more rows for joining

…4009) * aggregation filter with join * aggregation filter with join * aggregation filter with join * aggregation filter with join * aggregation filter with join * aggregation filter with join * aggregation filter with join * aggregation filter with join * aggregation filter with join * aggregation filter with join * aggregation filter with join * aggregation filter with join --------- Co-authored-by: Egor Ryashin <egor.ryashin@rilldata.com>

aggregation filter with join

5df0189

egor-ryashin requested a review from begelundmuller February 9, 2024 18:33

Egor Ryashin added 2 commits February 12, 2024 21:08

aggregation filter with join

97a2231

aggregation filter with join

ad71485

egor-ryashin marked this pull request as ready for review February 12, 2024 18:44

begelundmuller requested changes Feb 13, 2024

View reviewed changes

Egor Ryashin added 2 commits February 13, 2024 14:26

aggregation filter with join

f0de75e

aggregation filter with join

8a82d71

begelundmuller reviewed Feb 13, 2024

View reviewed changes

Egor Ryashin added 4 commits February 13, 2024 21:08

aggregation filter with join

d733f20

aggregation filter with join

ca6bda6

aggregation filter with join

9b488cb

aggregation filter with join

9272e27

begelundmuller requested changes Feb 15, 2024

View reviewed changes

Egor Ryashin added 3 commits February 15, 2024 17:32

aggregation filter with join

d4beb64

aggregation filter with join

331a8a8

aggregation filter with join

31df406

begelundmuller approved these changes Feb 15, 2024

View reviewed changes

begelundmuller merged commit 18caa05 into main Feb 15, 2024
4 checks passed

begelundmuller deleted the agg-filter-with-join branch February 15, 2024 15:59

djbarnwal mentioned this pull request Feb 15, 2024

Integrate measure row filters to pivot #4062

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregation filter using JOIN #4009

Aggregation filter using JOIN #4009

egor-ryashin commented Feb 9, 2024

begelundmuller Feb 13, 2024

egor-ryashin Feb 13, 2024

begelundmuller Feb 13, 2024

begelundmuller Feb 13, 2024

begelundmuller left a comment

begelundmuller Feb 13, 2024

begelundmuller Feb 13, 2024

begelundmuller Feb 13, 2024

begelundmuller Feb 15, 2024

egor-ryashin Feb 15, 2024 •

edited

Loading

begelundmuller Feb 15, 2024

egor-ryashin Feb 15, 2024

begelundmuller Feb 15, 2024

egor-ryashin Feb 15, 2024 •

edited

Loading

begelundmuller Feb 15, 2024

begelundmuller Feb 15, 2024

egor-ryashin Feb 15, 2024

Aggregation filter using JOIN #4009

Aggregation filter using JOIN #4009

Conversation

egor-ryashin commented Feb 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

begelundmuller left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

egor-ryashin Feb 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

egor-ryashin Feb 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

egor-ryashin Feb 15, 2024 •

edited

Loading

egor-ryashin Feb 15, 2024 •

edited

Loading