Time dimension comparison for advanced measures #5475

egor-ryashin · 2024-08-19T15:50:02Z

No description provided.

begelundmuller · 2024-08-20T12:55:44Z

runtime/metricsview/ast.go


+func (a *AST) isTime(qd Dimension) bool {


nit: Add a docstring and move it toward the bottom of the file together with the other util funcs

begelundmuller · 2024-08-20T13:08:39Z

runtime/metricsview/astsql.go

+		if comp && f.Time {
+			intv, err := b.ast.interval()
+			if err != nil {
+				return err
+			}
+
+			// example: base.ts IS NOT DISTINCT FROM comparison.ts - (? - ?) -- <comparison-start> - <base-start>
+			rhs = fmt.Sprintf("(%s - INTERVAL %s MILLISECONDS)", rhs, intv)
+		}


I think we can move this into the ast.addTimeComparisonMeasure function, where we could add the subtraction to the dimension expression in the comparison select instead, around here:

rill/runtime/metricsview/ast.go

Lines 803 to 807 in ec87121

csn, err := a.buildBaseSelect("comparison", a.query.ComparisonTimeRange)

if err != nil {

return err

}

n.JoinComparisonSelect = csn

This would allow us to keep the regular IS NOT DISTINCT FROM check here, and avoid accessing the ast too much from the SQL builder phase.

begelundmuller · 2024-08-20T13:11:42Z

runtime/metricsview/astsql.go

+	diff := a.query.ComparisonTimeRange.Start.Sub(a.query.TimeRange.Start)
+	return fmt.Sprint(diff.Milliseconds()), nil


Unfortunately it's not sufficient to do millisecond delta comparisons. For example, if the time granularity is monthly, the delta will be different for each month, so you need to use INTERVAL 1 MONTH in SQL instead of microseconds.

It may need to scan the dimensions for the biggest granularity used in TimeFloor and use that to determine the interval type. Also, see the issue description for some other thoughts: #5361

…or-am

runtime/metricsview/astsql.go

begelundmuller · 2024-08-26T16:48:09Z

runtime/metricsview/ast.go

-		DimFields: a.dimFields,
 		Unnests:   a.unnests,
 		Group:     true,
 		FromTable: a.underlyingTable,
 		Where:     a.underlyingWhere,
 	}
+	n.DimFields = make([]FieldNode, len(a.dimFields))
+	copy(n.DimFields, a.dimFields)


Use slices.Clone instead

runtime/metricsview/ast.go

begelundmuller · 2024-08-26T16:54:42Z

runtime/drivers/olap.go

+// unit - ie second, minute, ...
+func (d Dialect) DateDiff(unit string, t1, t2 time.Time) (string, error) {


Might make sense to accept runtimev1.TimeGrain here instead and call d.ConvertToDateTruncSpecifier internally

runtime/metricsview/query.go

begelundmuller · 2024-08-26T16:56:40Z

Also note the failing CI

egor-ryashin · 2024-08-27T12:23:19Z

All done.

begelundmuller · 2024-08-27T20:01:35Z

@egor-ryashin There are still CI failures on the PR – can you take a look at them?

…or-am

egor-ryashin · 2024-08-28T10:06:12Z

Done.

begelundmuller

@egor-ryashin Looking at the time adjustment logic, it's not clear to me that it handles the case where two levels of granularities are present and the base and comparison time ranges cross boundaries of both granularities.

For example, if you group by granularities year and month, and have timeRange: {start: "2024-02-01", end: "2024-05-01"} and comparisonTimeRange: {start: "2023-11-01", end: "2024-02-01"} (three months offset), I wonder if it will erroneously adjust the value 2024-01-01 in the comparison time range to 2025-01-01 in the year-granularity field? Can you try adding a test case for that and check if that's the case or not?

egor-ryashin · 2024-08-28T11:56:43Z

Is it different from this unit-test?

func TestMetricsViewsAggregation_comparison_time_dim(t *testing.T) {
	rt, instanceID := testruntime.NewInstanceForProject(t, "ad_bids")

	limit := int64(10)
	q := &queries.MetricsViewAggregation{
		MetricsViewName: "ad_bids_metrics",
		Dimensions: []*runtimev1.MetricsViewAggregationDimension{
			{
				Name: "pub",
			},
			{
				Name: "dom",
			},

			{
				Name:      "timestamp",
				TimeGrain: runtimev1.TimeGrain_TIME_GRAIN_DAY,
			},
			{
				Name:      "timestamp",
				TimeGrain: runtimev1.TimeGrain_TIME_GRAIN_YEAR,
				Alias:     "timestamp_year",
			},
		},
		Measures: []*runtimev1.MetricsViewAggregationMeasure{
			{
				Name: "measure_0",
			},
			{
				Name: "measure_1",
			},
			{
				Name: "m1",
			},
			{
				Name: "measure_0__p",
				Compute: &runtimev1.MetricsViewAggregationMeasure_ComparisonValue{
					ComparisonValue: &runtimev1.MetricsViewAggregationMeasureComputeComparisonValue{
						Measure: "measure_0",
					},
				},
			},
		},
		Where: expressionpb.AndAll(
			expressionpb.Eq("pub", "Google"),
			expressionpb.Eq("dom", "news.google.com"),
		),
		Having: expressionpb.Gt("measure_0__p", 0.0),
		Sort: []*runtimev1.MetricsViewAggregationSort{
			{
				Name: "pub",
			},
			{
				Name: "dom",
			},
			{
				Name: "timestamp",
			},
			{
				Name: "timestamp_year",
			},
			{
				Name: "measure_1",
			},
		},

		TimeRange: &runtimev1.TimeRange{
			Start: timestamppb.New(time.Date(2022, 1, 1, 0, 0, 0, 0, time.UTC)),
			End:   timestamppb.New(time.Date(2022, 1, 3, 0, 0, 0, 0, time.UTC)),
		},
		ComparisonTimeRange: &runtimev1.TimeRange{
			Start: timestamppb.New(time.Date(2022, 1, 3, 0, 0, 0, 0, time.UTC)),
			End:   timestamppb.New(time.Date(2022, 1, 5, 0, 0, 0, 0, time.UTC)),

begelundmuller · 2024-08-28T12:34:31Z

Is it different from this unit-test?

Yeah because the unit test doesn’t cross a year-boundary in the comparison time range case.

egor-ryashin · 2024-08-28T13:48:02Z

I found out that we hide the reference to the timestamp column here:

func (q *MetricsViewAggregation) rewriteToMetricsViewQuery(export bool) (*metricsview.Query, error) {
	qry := &metricsview.Query{MetricsView: q.MetricsViewName}

	for _, d := range q.Dimensions {
		res := metricsview.Dimension{Name: d.Name}
		if d.Alias != "" { 
			res.Name = d.Alias // <<<<<<<
		}
		if d.TimeZone != "" {
			qry.TimeZone = d.TimeZone
		}

From that point we cannot say how it should participate in join.

begelundmuller · 2024-08-28T14:00:31Z

I found out that we hide the reference to the timestamp column here:

The time dimension is captured in the Compute.TimeFloor a few lines below:

		if d.TimeGrain != runtimev1.TimeGrain_TIME_GRAIN_UNSPECIFIED {
			res.Compute = &metricsview.DimensionCompute{
				TimeFloor: &metricsview.DimensionComputeTimeFloor{
					Dimension: d.Name,
					Grain:     metricsview.TimeGrainFromProto(d.TimeGrain),
				},
			}
		}

Isn't that sufficient info to incorporate it correctly into the join? Though it seems it might not be checked correctly currently.

egor-ryashin · 2024-08-28T14:04:41Z

We can have non-primary timestamp columns (which should be in join as is) I cannot find where we distinguish them - you mean they don't have time-grains, right?

begelundmuller · 2024-08-28T14:07:11Z

We can have non-primary timestamp columns (which should be in join as is) I cannot find where we distinguish them - you mean they don't have time-grains, right?

Yes we can, and they can also have time grains. However, an offset should NOT be added for them because the ComparisonTimeRange currently only applies to the primary time dimension.

begelundmuller · 2024-08-28T14:12:05Z

I think both of the bugs identified above can be fixed by generating a separate ast.comparisonDimFields property alongside ast.dimFields and using that for comparisons. That would allow both:

Adding the interval to the time dimension expression before wrapping it with date_trunc (i.e. the expression becomes date_trunc(<time dim> + INTERVAL ...) instead of date_trunc(<time dim>) + INTERVAL. That would address the first bug identified above.
Accessing the non-aliased time dimension field name, so it can correctly add the interval only for the metrics view's primary time dimension.

…ndaries

egor-ryashin · 2024-08-29T09:49:52Z

Done.

begelundmuller · 2024-08-29T15:26:17Z

runtime/metricsview/ast.go

+	} else {
+		// larger time grain values can change as well
+		res, err := a.dialect.DateDiff(mg.ToProto(), start1, start2)
+		if err != nil {
+			return "", err
+		}
+		dateDiff = res
+
+		// DATE_TRUNC('year', t - INTERVAL (DATE_DIFF(start, end)) day)
+		tc := a.dialect.EscapeIdentifier(a.metricsView.TimeDimension)
+		expr := fmt.Sprintf("(%s - INTERVAL (%s) %s)", tc, dateDiff, a.dialect.ConvertToDateTruncSpecifier(mg.ToProto()))
+		dim := &runtimev1.MetricsViewSpec_DimensionV2{
+			Expression: expr,
+		}
+		expr, err = a.dialect.DateTruncExpr(dim, g.ToProto(), a.query.TimeZone, int(a.metricsView.FirstDayOfWeek), int(a.metricsView.FirstMonthOfYear))
+		if err != nil {
+			return "", fmt.Errorf(`failed to compute time floor: %w`, err)
+		}
+		return expr, nil
+	}


This case completely discards the input expr, and changes to using a.metricsView.TimeDimension. What if the input expr was not that of the a.metricsView.TimeDimension (like when we support multiple time dimensions, as previously discussed)? This could be fixed by moving the handling to resolveDim and returning a comparisonExpression from there or something like that. Or at least there should be a // TODO here explaining the inconsistency.

I expect a.metricsView.TimeDimension contains the required time column from a list of possible time columns.
From that point of view it will work.

I think at least a comment about that would still be nice – it's pretty convoluted logic

begelundmuller · 2024-08-29T15:32:08Z

runtime/queries/metricsview_aggregation_test.go

+		TimeRange: &runtimev1.TimeRange{
+			Start: timestamppb.New(time.Date(2022, 1, 1, 23, 0, 0, 0, time.UTC)),
+			End:   timestamppb.New(time.Date(2022, 1, 2, 2, 0, 0, 0, time.UTC)),
+		},
+		ComparisonTimeRange: &runtimev1.TimeRange{
+			Start: timestamppb.New(time.Date(2022, 1, 2, 1, 0, 0, 0, time.UTC)),
+			End:   timestamppb.New(time.Date(2022, 1, 2, 3, 0, 0, 0, time.UTC)),
+		},


I don't think this test case addresses the concern described in this comment: #5475 (review)?

The goal is not to test overlapping base and comparison time ranges. The goal is to test that if the comparison time range has start: "2023-11-01" and the base time range has start: "2024-02-01", then it should correct the comparison values for e.g. 2024-01-15 to 2024-04-15, not 2025-04-15 (accidentally adding a year, which is the year-interval datediff between the two start times).

We fixed the bug where we didn't subtract anything from larger grain times. I abstract language:

Assuming format yy-mm and base and comparison values: base: [23-12 24-01 24-02] base_year: [23 24 24] comparison_month: [24-02 24-03 24-04] comparison_year: [24 24 24] date_diff(23-12,24-02,month) = 2 new_comparison = comparison - date_diff new_comparison_year = date_trunc('year', comparison - date_diff)) Subtracting date diff from comparison: new_comparison: [23-12 24-01 24-02] new_comparison_year: [23 24 24]

The same will work for hour and day and any other combinations. Not sure I'm following.

Oh, I see what you mean, let me check.

It's irrelevant, we don't check that record, it's a test typo - fixed.

time dim comparison for advanced measures

40a911f

egor-ryashin requested a review from begelundmuller August 19, 2024 15:50

time dim comparison for advanced measures

9218a80

begelundmuller requested changes Aug 20, 2024

View reviewed changes

Egor Ryashin added 6 commits August 23, 2024 18:22

time dim comparison for advanced measures

5e52e5c

time dim comparison for advanced measures

3643236

time dim comparison for advanced measures - multiple dims

0592463

time dim comparison for advanced measures - multiple dims

fa9f2dd

time dim comparison for advanced measures - style

1c32fac

Merge remote-tracking branch 'origin/main' into time-dim-comparison-f…

a780be4

…or-am

egor-ryashin marked this pull request as ready for review August 26, 2024 09:32

egor-ryashin requested a review from begelundmuller August 26, 2024 09:32

begelundmuller requested changes Aug 26, 2024

View reviewed changes

Egor Ryashin added 2 commits August 27, 2024 11:43

time dim comparison for advanced measures - time fix

03e22ed

time dim comparison for advanced measures - refactorings

c664e7e

egor-ryashin requested a review from begelundmuller August 27, 2024 12:23

time dim comparison for advanced measures - refactorings

b723d81

Egor Ryashin added 2 commits August 28, 2024 12:46

time dim comparison for advanced measures - refactorings

a33b229

Merge remote-tracking branch 'origin/main' into time-dim-comparison-f…

ed65433

…or-am

Review

3b61d5a

begelundmuller requested changes Aug 28, 2024

View reviewed changes

Egor Ryashin added 4 commits August 29, 2024 12:06

time dim comparison for advanced measures - multiple dims - cross bou…

b42b2ad

…ndaries

time dim comparison for advanced measures - multiple dims - cross bou…

9af4482

…ndaries

time dim comparison for advanced measures - multiple dims - cross bou…

447c535

…ndaries

time dim comparison for advanced measures - multiple dims - cross bou…

60cf79d

…ndaries

begelundmuller added 2 commits August 29, 2024 16:08

Review

07ba166

Review 2

f47b971

begelundmuller requested changes Aug 29, 2024

View reviewed changes

Egor Ryashin added 2 commits August 29, 2024 18:59

time dim comparison for advanced measures - cross boundaries - typo

a13cc66

time dim comparison for advanced measures - comments

318bc27

begelundmuller approved these changes Aug 30, 2024

View reviewed changes

begelundmuller merged commit 14c4378 into main Aug 30, 2024
7 checks passed

begelundmuller deleted the time-dim-comparison-for-am branch August 30, 2024 14:21

begelundmuller mentioned this pull request Sep 2, 2024

Support comparisons with time as a dimension #5361

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time dimension comparison for advanced measures #5475

Time dimension comparison for advanced measures #5475

egor-ryashin commented Aug 19, 2024

begelundmuller Aug 20, 2024

egor-ryashin Aug 23, 2024

begelundmuller Aug 20, 2024

begelundmuller Aug 20, 2024

egor-ryashin Aug 23, 2024

begelundmuller Aug 26, 2024

begelundmuller Aug 26, 2024

begelundmuller commented Aug 26, 2024

egor-ryashin commented Aug 27, 2024

begelundmuller commented Aug 27, 2024

egor-ryashin commented Aug 28, 2024

begelundmuller left a comment •

edited

Loading

egor-ryashin commented Aug 28, 2024

begelundmuller commented Aug 28, 2024

egor-ryashin commented Aug 28, 2024 •

edited

Loading

begelundmuller commented Aug 28, 2024 •

edited

Loading

egor-ryashin commented Aug 28, 2024

begelundmuller commented Aug 28, 2024

begelundmuller commented Aug 28, 2024

egor-ryashin commented Aug 29, 2024

begelundmuller Aug 29, 2024

egor-ryashin Aug 29, 2024

begelundmuller Aug 29, 2024

egor-ryashin Aug 29, 2024

begelundmuller Aug 29, 2024

egor-ryashin Aug 29, 2024 •

edited

Loading

egor-ryashin Aug 29, 2024

egor-ryashin Aug 29, 2024

	csn, err := a.buildBaseSelect("comparison", a.query.ComparisonTimeRange)
	if err != nil {
	return err
	}
	n.JoinComparisonSelect = csn

		diff := a.query.ComparisonTimeRange.Start.Sub(a.query.TimeRange.Start)
		return fmt.Sprint(diff.Milliseconds()), nil

		// unit - ie second, minute, ...
		func (d Dialect) DateDiff(unit string, t1, t2 time.Time) (string, error) {

Time dimension comparison for advanced measures #5475

Time dimension comparison for advanced measures #5475

Conversation

egor-ryashin commented Aug 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

begelundmuller commented Aug 26, 2024

egor-ryashin commented Aug 27, 2024

begelundmuller commented Aug 27, 2024

egor-ryashin commented Aug 28, 2024

begelundmuller left a comment • edited Loading

Choose a reason for hiding this comment

egor-ryashin commented Aug 28, 2024

begelundmuller commented Aug 28, 2024

egor-ryashin commented Aug 28, 2024 • edited Loading

begelundmuller commented Aug 28, 2024 • edited Loading

egor-ryashin commented Aug 28, 2024

begelundmuller commented Aug 28, 2024

begelundmuller commented Aug 28, 2024

egor-ryashin commented Aug 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

egor-ryashin Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

begelundmuller left a comment •

edited

Loading

egor-ryashin commented Aug 28, 2024 •

edited

Loading

begelundmuller commented Aug 28, 2024 •

edited

Loading

egor-ryashin Aug 29, 2024 •

edited

Loading