-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time dimension comparison for advanced measures #5475
Conversation
runtime/metricsview/ast.go
Outdated
|
||
func (a *AST) isTime(qd Dimension) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add a docstring and move it toward the bottom of the file together with the other util funcs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
runtime/metricsview/astsql.go
Outdated
if comp && f.Time { | ||
intv, err := b.ast.interval() | ||
if err != nil { | ||
return err | ||
} | ||
|
||
// example: base.ts IS NOT DISTINCT FROM comparison.ts - (? - ?) -- <comparison-start> - <base-start> | ||
rhs = fmt.Sprintf("(%s - INTERVAL %s MILLISECONDS)", rhs, intv) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move this into the ast.addTimeComparisonMeasure
function, where we could add the subtraction to the dimension expression in the comparison select instead, around here:
rill/runtime/metricsview/ast.go
Lines 803 to 807 in ec87121
csn, err := a.buildBaseSelect("comparison", a.query.ComparisonTimeRange) | |
if err != nil { | |
return err | |
} | |
n.JoinComparisonSelect = csn |
This would allow us to keep the regular IS NOT DISTINCT FROM
check here, and avoid accessing the ast
too much from the SQL builder phase.
runtime/metricsview/astsql.go
Outdated
diff := a.query.ComparisonTimeRange.Start.Sub(a.query.TimeRange.Start) | ||
return fmt.Sprint(diff.Milliseconds()), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it's not sufficient to do millisecond delta comparisons. For example, if the time granularity is monthly, the delta will be different for each month, so you need to use INTERVAL 1 MONTH
in SQL instead of microseconds.
It may need to scan the dimensions for the biggest granularity used in TimeFloor
and use that to determine the interval type. Also, see the issue description for some other thoughts: #5361
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
runtime/metricsview/ast.go
Outdated
DimFields: a.dimFields, | ||
Unnests: a.unnests, | ||
Group: true, | ||
FromTable: a.underlyingTable, | ||
Where: a.underlyingWhere, | ||
} | ||
n.DimFields = make([]FieldNode, len(a.dimFields)) | ||
copy(n.DimFields, a.dimFields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use slices.Clone
instead
runtime/drivers/olap.go
Outdated
// unit - ie second, minute, ... | ||
func (d Dialect) DateDiff(unit string, t1, t2 time.Time) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might make sense to accept runtimev1.TimeGrain
here instead and call d.ConvertToDateTruncSpecifier
internally
Also note the failing CI |
All done. |
@egor-ryashin There are still CI failures on the PR – can you take a look at them? |
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@egor-ryashin Looking at the time adjustment logic, it's not clear to me that it handles the case where two levels of granularities are present and the base and comparison time ranges cross boundaries of both granularities.
For example, if you group by granularities year
and month
, and have timeRange: {start: "2024-02-01", end: "2024-05-01"}
and comparisonTimeRange: {start: "2023-11-01", end: "2024-02-01"}
(three months offset), I wonder if it will erroneously adjust the value 2024-01-01
in the comparison time range to 2025-01-01
in the year-granularity field? Can you try adding a test case for that and check if that's the case or not?
Is it different from this unit-test?
|
Yeah because the unit test doesn’t cross a year-boundary in the comparison time range case. |
I found out that we hide the reference to the timestamp column here: func (q *MetricsViewAggregation) rewriteToMetricsViewQuery(export bool) (*metricsview.Query, error) {
qry := &metricsview.Query{MetricsView: q.MetricsViewName}
for _, d := range q.Dimensions {
res := metricsview.Dimension{Name: d.Name}
if d.Alias != "" {
res.Name = d.Alias // <<<<<<<
}
if d.TimeZone != "" {
qry.TimeZone = d.TimeZone
} From that point we cannot say how it should participate in join. |
The time dimension is captured in the if d.TimeGrain != runtimev1.TimeGrain_TIME_GRAIN_UNSPECIFIED {
res.Compute = &metricsview.DimensionCompute{
TimeFloor: &metricsview.DimensionComputeTimeFloor{
Dimension: d.Name,
Grain: metricsview.TimeGrainFromProto(d.TimeGrain),
},
}
} Isn't that sufficient info to incorporate it correctly into the join? Though it seems it might not be checked correctly currently. |
We can have non-primary timestamp columns (which should be in join as is) I cannot find where we distinguish them - you mean they don't have time-grains, right? |
Yes we can, and they can also have time grains. However, an offset should NOT be added for them because the |
I think both of the bugs identified above can be fixed by generating a separate
|
Done. |
} else { | ||
// larger time grain values can change as well | ||
res, err := a.dialect.DateDiff(mg.ToProto(), start1, start2) | ||
if err != nil { | ||
return "", err | ||
} | ||
dateDiff = res | ||
|
||
// DATE_TRUNC('year', t - INTERVAL (DATE_DIFF(start, end)) day) | ||
tc := a.dialect.EscapeIdentifier(a.metricsView.TimeDimension) | ||
expr := fmt.Sprintf("(%s - INTERVAL (%s) %s)", tc, dateDiff, a.dialect.ConvertToDateTruncSpecifier(mg.ToProto())) | ||
dim := &runtimev1.MetricsViewSpec_DimensionV2{ | ||
Expression: expr, | ||
} | ||
expr, err = a.dialect.DateTruncExpr(dim, g.ToProto(), a.query.TimeZone, int(a.metricsView.FirstDayOfWeek), int(a.metricsView.FirstMonthOfYear)) | ||
if err != nil { | ||
return "", fmt.Errorf(`failed to compute time floor: %w`, err) | ||
} | ||
return expr, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case completely discards the input expr
, and changes to using a.metricsView.TimeDimension
. What if the input expr
was not that of the a.metricsView.TimeDimension
(like when we support multiple time dimensions, as previously discussed)? This could be fixed by moving the handling to resolveDim
and returning a comparisonExpression
from there or something like that. Or at least there should be a // TODO
here explaining the inconsistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect a.metricsView.TimeDimension
contains the required time column from a list of possible time columns.
From that point of view it will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at least a comment about that would still be nice – it's pretty convoluted logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
TimeRange: &runtimev1.TimeRange{ | ||
Start: timestamppb.New(time.Date(2022, 1, 1, 23, 0, 0, 0, time.UTC)), | ||
End: timestamppb.New(time.Date(2022, 1, 2, 2, 0, 0, 0, time.UTC)), | ||
}, | ||
ComparisonTimeRange: &runtimev1.TimeRange{ | ||
Start: timestamppb.New(time.Date(2022, 1, 2, 1, 0, 0, 0, time.UTC)), | ||
End: timestamppb.New(time.Date(2022, 1, 2, 3, 0, 0, 0, time.UTC)), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this test case addresses the concern described in this comment: #5475 (review)?
The goal is not to test overlapping base and comparison time ranges. The goal is to test that if the comparison time range has start: "2023-11-01"
and the base time range has start: "2024-02-01"
, then it should correct the comparison values for e.g. 2024-01-15
to 2024-04-15
, not 2025-04-15
(accidentally adding a year, which is the year-interval datediff between the two start times).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We fixed the bug where we didn't subtract anything from larger grain times. I abstract language:
Assuming format yy-mm and base and comparison values:
base: [23-12 24-01 24-02]
base_year: [23 24 24]
comparison_month: [24-02 24-03 24-04]
comparison_year: [24 24 24]
date_diff(23-12,24-02,month) = 2
new_comparison = comparison - date_diff
new_comparison_year = date_trunc('year', comparison - date_diff))
Subtracting date diff from comparison:
new_comparison: [23-12 24-01 24-02]
new_comparison_year: [23 24 24]
The same will work for hour and day and any other combinations. Not sure I'm following.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see what you mean, let me check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's irrelevant, we don't check that record, it's a test typo - fixed.
No description provided.