-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Use timestamp_floor when granularity is not safe. #13206
Conversation
PR apache#12944 added a check at the execution layer to avoid materializing excessive amounts of time-granular buckets. This patch modifies the SQL planner to avoid generating queries that would throw such errors, by switching certain plans to use the timestamp_floor function instead of granularities. This applies both to the Timeseries query type, and the GroupBy timestampResultFieldGranularity feature. The patch also goes one step further: we switch to timestamp_floor not just in the ETERNITY + non-ALL case, but also if the estimated number of time-granular buckets exceeds 100,000. Finally, the patch modifies the timestampResultFieldGranularity field to consistently be a String rather than a Granularity. This ensures that it can be round-trip serialized and deserialized, which is useful when trying to execute the results of "EXPLAIN PLAN FOR" with GroupBy queries that use the timestampResultFieldGranularity feature.
There are two better solutions that I didn't do in this patch due to them being more complex:
Personally I think path (2) is the best one for the future. That being said, there is a need to have these queries execute properly today, hence the present patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the changes! 👍 Mostly LGTM, some comments.
Regarding the long term solution - I agree that making timestamp_floor
faster could be the way to go. The only doubt in my mind is regarding support of various time-grain level operations like limit, ranking might need a new construct (like a new Sequence). That can be discussed further when we decide to do it.
theContext.put(GroupByQuery.CTX_TIMESTAMP_RESULT_FIELD_INDEX, timestampDimensionIndexInDimensions); | ||
theContext.put(GroupByQuery.CTX_TIMESTAMP_RESULT_FIELD_GRANULARITY, queryGranularity); | ||
|
||
if (canUseQueryGranularity(dataSource, filtration, queryGranularity)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can use this on L1272 with if(granularity == null)
check? because this sets the queryGranularity to skip other grouping columns but may not set the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the code structure, I think it's OK, but I rearranged it a bit anyway to hopefully make the logic clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! there are some CI failures although which look legit
// Validate the interval against MAX_TIME_GRAINS_NON_DRUID_TABLE. | ||
// Estimate based on the size of the first bucket, to avoid computing them all. (That's what we're | ||
// trying to avoid!) | ||
final Interval firstBucket = queryGranularity.bucket(filtrationInterval.getStart()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the Granularity object should have a duration method as well - but that can be a follow-up later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is here because not all Granularities have a fixed duration. There is DurationGranularity, which does, but also PeriodGranularity, which does not. For example, P1Y changes durations on leap years and P1D changes durations for daylight savings time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's a good point. I missed those cases
Thanks for the review @rohangarg. I pushed up some changes based on your comments. |
This seems like a good longer-term solution. Can we go further? Any reason to create buckets that will be empty? If we're doing time-based grouping, and data is time ordered, we can create buckets on the fly as we read the data. This is the classic streaming grouping solution: one doesn't normally enumerate all possible customers, say, to group sales by customer. There would have to be code to handle empty groups, if we we want to fill empty time slots. But, this can be done by enumerating the time buckets as we return results. If the next result is greater than the current time bucket, return zeros. For this, the set of buckets need not be materialized, just enumerated.
For fixed-length intervals, (week or less), the time floor should be a simple mod-then-subtract. For variable-length intervals, the logic is more complex. Can we split the implementations for those two cases? Week-or-less is super fast, Month-or-more pays the extra compute cost? |
That's exactly what I meant by "let the cursor drive the bucketing". I meant only generate buckets based on timestamps we actually see. (Unless the user specifically requests that empty buckets be zero-filled.)
In general, today, we don't materialize the buckets; we just enumerate them. The issue we see is that even with this approach, the amount of time it takes to enumerate buckets can be prohibitively large. (People raise bugs saying that queries "hang". They don't actually hang, but it seems that way to the user, due to the large number of buckets that are being enumerated.) By letting the cursor drive the bucketing, we can avoid this completely for the case where we aren't zero-filling. The zero-fill case still poses an issue, but we can address that some other way (limit on number of zero-filled buckets)?
Ah. In practice the main perf hit isn't from the evaluation speed of the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % doubt-comment
.build() | ||
), | ||
NullHandling.sqlCompatible() | ||
? ImmutableList.of(new Object[]{946684800000L, "", 1L}, new Object[]{946771200000L, "10.1", 1L}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this because the data in the test table is actually ""
and not null
? So, in the SQL compat way null
and ""
are considered different and hence the ""
row comes in the join output?
Whereas in the non-compatible mode, ""
is treated as null
and hence join ignores it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's exactly what is happening.
This function is notorious for causing memory exhaustion and excessive CPU usage; so much so that it was valuable to work around it in the SQL planner in apache#13206. Hopefully, a warning comment will encourage developers to stay away and come up with solutions that do not involve computing all possible buckets.
This function is notorious for causing memory exhaustion and excessive CPU usage; so much so that it was valuable to work around it in the SQL planner in #13206. Hopefully, a warning comment will encourage developers to stay away and come up with solutions that do not involve computing all possible buckets.
This function is notorious for causing memory exhaustion and excessive CPU usage; so much so that it was valuable to work around it in the SQL planner in apache#13206. Hopefully, a warning comment will encourage developers to stay away and come up with solutions that do not involve computing all possible buckets.
* SQL: Use timestamp_floor when granularity is not safe. PR apache#12944 added a check at the execution layer to avoid materializing excessive amounts of time-granular buckets. This patch modifies the SQL planner to avoid generating queries that would throw such errors, by switching certain plans to use the timestamp_floor function instead of granularities. This applies both to the Timeseries query type, and the GroupBy timestampResultFieldGranularity feature. The patch also goes one step further: we switch to timestamp_floor not just in the ETERNITY + non-ALL case, but also if the estimated number of time-granular buckets exceeds 100,000. Finally, the patch modifies the timestampResultFieldGranularity field to consistently be a String rather than a Granularity. This ensures that it can be round-trip serialized and deserialized, which is useful when trying to execute the results of "EXPLAIN PLAN FOR" with GroupBy queries that use the timestampResultFieldGranularity feature. * Fix test, address PR comments. * Fix ControllerImpl. * Fix test. * Fix unused import.
Fixes #13182.
PR #12944 added a check at the execution layer to avoid materializing excessive amounts of time-granular buckets. This patch modifies the SQL planner to avoid generating queries that would throw such errors, by switching certain plans to use the timestamp_floor function instead of granularities. This applies both to the Timeseries query type, and the GroupBy timestampResultFieldGranularity feature.
The patch also goes one step further: we switch to timestamp_floor not just in the ETERNITY + non-ALL case, but also if the estimated number of time-granular buckets exceeds 100,000.
Finally, the patch modifies the timestampResultFieldGranularity field to consistently be a String rather than a Granularity. This ensures that it can be round-trip serialized and deserialized, which is useful when trying to execute the results of "EXPLAIN PLAN FOR" with GroupBy queries that use the timestampResultFieldGranularity feature.