Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time dimension comparison for advanced measures #5475

Merged
merged 22 commits into from
Aug 30, 2024

Conversation

egor-ryashin
Copy link
Contributor

No description provided.

Comment on lines 230 to 231

func (a *AST) isTime(qd Dimension) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add a docstring and move it toward the bottom of the file together with the other util funcs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 265 to 273
if comp && f.Time {
intv, err := b.ast.interval()
if err != nil {
return err
}

// example: base.ts IS NOT DISTINCT FROM comparison.ts - (? - ?) -- <comparison-start> - <base-start>
rhs = fmt.Sprintf("(%s - INTERVAL %s MILLISECONDS)", rhs, intv)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move this into the ast.addTimeComparisonMeasure function, where we could add the subtraction to the dimension expression in the comparison select instead, around here:

csn, err := a.buildBaseSelect("comparison", a.query.ComparisonTimeRange)
if err != nil {
return err
}
n.JoinComparisonSelect = csn

This would allow us to keep the regular IS NOT DISTINCT FROM check here, and avoid accessing the ast too much from the SQL builder phase.

Comment on lines 234 to 235
diff := a.query.ComparisonTimeRange.Start.Sub(a.query.TimeRange.Start)
return fmt.Sprint(diff.Milliseconds()), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it's not sufficient to do millisecond delta comparisons. For example, if the time granularity is monthly, the delta will be different for each month, so you need to use INTERVAL 1 MONTH in SQL instead of microseconds.

It may need to scan the dimensions for the biggest granularity used in TimeFloor and use that to determine the interval type. Also, see the issue description for some other thoughts: #5361

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@egor-ryashin egor-ryashin marked this pull request as ready for review August 26, 2024 09:32
runtime/metricsview/astsql.go Outdated Show resolved Hide resolved
runtime/metricsview/astsql.go Outdated Show resolved Hide resolved
Comment on lines 565 to 598
DimFields: a.dimFields,
Unnests: a.unnests,
Group: true,
FromTable: a.underlyingTable,
Where: a.underlyingWhere,
}
n.DimFields = make([]FieldNode, len(a.dimFields))
copy(n.DimFields, a.dimFields)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use slices.Clone instead

runtime/metricsview/ast.go Outdated Show resolved Hide resolved
runtime/metricsview/ast.go Outdated Show resolved Hide resolved
runtime/metricsview/ast.go Outdated Show resolved Hide resolved
Comment on lines 453 to 454
// unit - ie second, minute, ...
func (d Dialect) DateDiff(unit string, t1, t2 time.Time) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make sense to accept runtimev1.TimeGrain here instead and call d.ConvertToDateTruncSpecifier internally

runtime/metricsview/query.go Outdated Show resolved Hide resolved
@begelundmuller
Copy link
Contributor

Also note the failing CI

@egor-ryashin
Copy link
Contributor Author

All done.

@begelundmuller
Copy link
Contributor

@egor-ryashin There are still CI failures on the PR – can you take a look at them?

@egor-ryashin
Copy link
Contributor Author

Done.

Copy link
Contributor

@begelundmuller begelundmuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@egor-ryashin Looking at the time adjustment logic, it's not clear to me that it handles the case where two levels of granularities are present and the base and comparison time ranges cross boundaries of both granularities.

For example, if you group by granularities year and month, and have timeRange: {start: "2024-02-01", end: "2024-05-01"} and comparisonTimeRange: {start: "2023-11-01", end: "2024-02-01"} (three months offset), I wonder if it will erroneously adjust the value 2024-01-01 in the comparison time range to 2025-01-01 in the year-granularity field? Can you try adding a test case for that and check if that's the case or not?

@egor-ryashin
Copy link
Contributor Author

Is it different from this unit-test?

func TestMetricsViewsAggregation_comparison_time_dim(t *testing.T) {
	rt, instanceID := testruntime.NewInstanceForProject(t, "ad_bids")

	limit := int64(10)
	q := &queries.MetricsViewAggregation{
		MetricsViewName: "ad_bids_metrics",
		Dimensions: []*runtimev1.MetricsViewAggregationDimension{
			{
				Name: "pub",
			},
			{
				Name: "dom",
			},

			{
				Name:      "timestamp",
				TimeGrain: runtimev1.TimeGrain_TIME_GRAIN_DAY,
			},
			{
				Name:      "timestamp",
				TimeGrain: runtimev1.TimeGrain_TIME_GRAIN_YEAR,
				Alias:     "timestamp_year",
			},
		},
		Measures: []*runtimev1.MetricsViewAggregationMeasure{
			{
				Name: "measure_0",
			},
			{
				Name: "measure_1",
			},
			{
				Name: "m1",
			},
			{
				Name: "measure_0__p",
				Compute: &runtimev1.MetricsViewAggregationMeasure_ComparisonValue{
					ComparisonValue: &runtimev1.MetricsViewAggregationMeasureComputeComparisonValue{
						Measure: "measure_0",
					},
				},
			},
		},
		Where: expressionpb.AndAll(
			expressionpb.Eq("pub", "Google"),
			expressionpb.Eq("dom", "news.google.com"),
		),
		Having: expressionpb.Gt("measure_0__p", 0.0),
		Sort: []*runtimev1.MetricsViewAggregationSort{
			{
				Name: "pub",
			},
			{
				Name: "dom",
			},
			{
				Name: "timestamp",
			},
			{
				Name: "timestamp_year",
			},
			{
				Name: "measure_1",
			},
		},

		TimeRange: &runtimev1.TimeRange{
			Start: timestamppb.New(time.Date(2022, 1, 1, 0, 0, 0, 0, time.UTC)),
			End:   timestamppb.New(time.Date(2022, 1, 3, 0, 0, 0, 0, time.UTC)),
		},
		ComparisonTimeRange: &runtimev1.TimeRange{
			Start: timestamppb.New(time.Date(2022, 1, 3, 0, 0, 0, 0, time.UTC)),
			End:   timestamppb.New(time.Date(2022, 1, 5, 0, 0, 0, 0, time.UTC)),
	

@begelundmuller
Copy link
Contributor

Is it different from this unit-test?

Yeah because the unit test doesn’t cross a year-boundary in the comparison time range case.

@egor-ryashin
Copy link
Contributor Author

egor-ryashin commented Aug 28, 2024

I found out that we hide the reference to the timestamp column here:

func (q *MetricsViewAggregation) rewriteToMetricsViewQuery(export bool) (*metricsview.Query, error) {
	qry := &metricsview.Query{MetricsView: q.MetricsViewName}

	for _, d := range q.Dimensions {
		res := metricsview.Dimension{Name: d.Name}
		if d.Alias != "" { 
			res.Name = d.Alias // <<<<<<<
		}
		if d.TimeZone != "" {
			qry.TimeZone = d.TimeZone
		}

From that point we cannot say how it should participate in join.

@begelundmuller
Copy link
Contributor

begelundmuller commented Aug 28, 2024

I found out that we hide the reference to the timestamp column here:

The time dimension is captured in the Compute.TimeFloor a few lines below:

		if d.TimeGrain != runtimev1.TimeGrain_TIME_GRAIN_UNSPECIFIED {
			res.Compute = &metricsview.DimensionCompute{
				TimeFloor: &metricsview.DimensionComputeTimeFloor{
					Dimension: d.Name,
					Grain:     metricsview.TimeGrainFromProto(d.TimeGrain),
				},
			}
		}

Isn't that sufficient info to incorporate it correctly into the join? Though it seems it might not be checked correctly currently.

@egor-ryashin
Copy link
Contributor Author

We can have non-primary timestamp columns (which should be in join as is) I cannot find where we distinguish them - you mean they don't have time-grains, right?

@begelundmuller
Copy link
Contributor

We can have non-primary timestamp columns (which should be in join as is) I cannot find where we distinguish them - you mean they don't have time-grains, right?

Yes we can, and they can also have time grains. However, an offset should NOT be added for them because the ComparisonTimeRange currently only applies to the primary time dimension.

@begelundmuller
Copy link
Contributor

I think both of the bugs identified above can be fixed by generating a separate ast.comparisonDimFields property alongside ast.dimFields and using that for comparisons. That would allow both:

  • Adding the interval to the time dimension expression before wrapping it with date_trunc (i.e. the expression becomes date_trunc(<time dim> + INTERVAL ...) instead of date_trunc(<time dim>) + INTERVAL. That would address the first bug identified above.
  • Accessing the non-aliased time dimension field name, so it can correctly add the interval only for the metrics view's primary time dimension.

@egor-ryashin
Copy link
Contributor Author

Done.

Comment on lines +1174 to +1193
} else {
// larger time grain values can change as well
res, err := a.dialect.DateDiff(mg.ToProto(), start1, start2)
if err != nil {
return "", err
}
dateDiff = res

// DATE_TRUNC('year', t - INTERVAL (DATE_DIFF(start, end)) day)
tc := a.dialect.EscapeIdentifier(a.metricsView.TimeDimension)
expr := fmt.Sprintf("(%s - INTERVAL (%s) %s)", tc, dateDiff, a.dialect.ConvertToDateTruncSpecifier(mg.ToProto()))
dim := &runtimev1.MetricsViewSpec_DimensionV2{
Expression: expr,
}
expr, err = a.dialect.DateTruncExpr(dim, g.ToProto(), a.query.TimeZone, int(a.metricsView.FirstDayOfWeek), int(a.metricsView.FirstMonthOfYear))
if err != nil {
return "", fmt.Errorf(`failed to compute time floor: %w`, err)
}
return expr, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case completely discards the input expr, and changes to using a.metricsView.TimeDimension. What if the input expr was not that of the a.metricsView.TimeDimension (like when we support multiple time dimensions, as previously discussed)? This could be fixed by moving the handling to resolveDim and returning a comparisonExpression from there or something like that. Or at least there should be a // TODO here explaining the inconsistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect a.metricsView.TimeDimension contains the required time column from a list of possible time columns.
From that point of view it will work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least a comment about that would still be nice – it's pretty convoluted logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 2547 to 2554
TimeRange: &runtimev1.TimeRange{
Start: timestamppb.New(time.Date(2022, 1, 1, 23, 0, 0, 0, time.UTC)),
End: timestamppb.New(time.Date(2022, 1, 2, 2, 0, 0, 0, time.UTC)),
},
ComparisonTimeRange: &runtimev1.TimeRange{
Start: timestamppb.New(time.Date(2022, 1, 2, 1, 0, 0, 0, time.UTC)),
End: timestamppb.New(time.Date(2022, 1, 2, 3, 0, 0, 0, time.UTC)),
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test case addresses the concern described in this comment: #5475 (review)?

The goal is not to test overlapping base and comparison time ranges. The goal is to test that if the comparison time range has start: "2023-11-01" and the base time range has start: "2024-02-01", then it should correct the comparison values for e.g. 2024-01-15 to 2024-04-15, not 2025-04-15 (accidentally adding a year, which is the year-interval datediff between the two start times).

Copy link
Contributor Author

@egor-ryashin egor-ryashin Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We fixed the bug where we didn't subtract anything from larger grain times. I abstract language:

Assuming format yy-mm and base and comparison values:
base: [23-12 24-01 24-02]
 base_year: [23 24 24]

comparison_month: [24-02 24-03 24-04]
  comparison_year: [24 24 24]

date_diff(23-12,24-02,month) = 2
new_comparison = comparison - date_diff
new_comparison_year = date_trunc('year', comparison - date_diff))

Subtracting date diff from comparison:
new_comparison: [23-12 24-01 24-02]
  new_comparison_year: [23 24 24]

The same will work for hour and day and any other combinations. Not sure I'm following.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what you mean, let me check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's irrelevant, we don't check that record, it's a test typo - fixed.

@begelundmuller begelundmuller merged commit 14c4378 into main Aug 30, 2024
7 checks passed
@begelundmuller begelundmuller deleted the time-dim-comparison-for-am branch August 30, 2024 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants