-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform range optimization for BETWEEN predicate on date_trunc
and temporal casts
#14390
Perform range optimization for BETWEEN predicate on date_trunc
and temporal casts
#14390
Conversation
date_trunc
and temporal casts
c01d17e
to
4226391
Compare
a487c99
to
aa7abc0
Compare
CI hit #11140 |
aa7abc0
to
5317143
Compare
…expression This change allows the engine to infer that, for instance, given t::timestamp(6) date_trunc('day', t) BETWEEN TIMESTAMP '2022-01-01 00:00:00' AND TIMESTAMP '2022-01-02 00:00:00' can be rewritten as t BETWEEN TIMESTAMP '2022-01-01 00:00:00' AND TIMESTAMP '2022-01-02 23:59:59.999999' The change applies for the temporal types: - date - timestamp - timestamp with time zone Range predicate BetweenPredicate can be transformed into a `TupleDomain` and thus help with predicate pushdown. Range-based `TupleDomain` representation is critical for connectors which have min/max-based metadata (like Iceberg manifests lists which play a key role in partition pruning or Iceberg data files), as ranges allow for intersection tests, something that is hard to do in a generic manner for `ConnectorExpression`.
This change allows the engine to infer that, for instance, given t::timestamp(6) cast(t as date) BETWEEN DATE '2022-01-01' AND DATE '2022-01-02' can be rewritten as t BETWEEEN TIMESTAMP '2022-01-01 00:00:00' AND TIMESTAMP '2022-01-02 23:59:59.999999' The change applies for the temporal types: - date - timestamp - timestamp with time zone Range predicate BetweenPredicate can be transformed into a `TupleDomain` and thus help with predicate pushdown. Range-based `TupleDomain` representation is critical for connectors which have min/max-based metadata (like Iceberg manifests lists which play a key role in partition pruning or Iceberg data files), as ranges allow for intersection tests, something that is hard to do in a generic manner for `ConnectorExpression`.
5317143
to
a7ab471
Compare
} | ||
LongTimestamp longTimestamp = (LongTimestamp) rangeStart; | ||
verify(longTimestamp.getPicosOfMicro() == 0, "Unexpected picos in %s, value not rounded to %s", rangeStart, rangeUnit); | ||
long endInclusiveMicros = (long) calculateRangeEndInclusive(longTimestamp.getEpochMicros(), createTimestampType(6), rangeUnit); | ||
return new LongTimestamp(endInclusiveMicros, toIntExact(PICOSECONDS_PER_MICROSECOND - scaleFactor(timestampType.getPrecision(), 12))); | ||
long endInclusiveMicros = (long) calculateRangeEndInclusive(longTimestamp.getEpochMicros(), createTimestampType(TimestampType.MAX_SHORT_PRECISION), rangeUnit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the variable name is "endInclusiveMicros"
the code used 6 and it's know that 10^(-6)s is a microsecond.
after the change the code uses TimestampType.MAX_SHORT_PRECISION. it's not obvious that it's correct (is short precision actually microseconds?). Thus, actually this change decreases readability
long endInclusiveMicros = (long) calculateRangeEndInclusive(longTimestamp.getEpochMicros(), createTimestampType(6), rangeUnit); | ||
return new LongTimestamp(endInclusiveMicros, toIntExact(PICOSECONDS_PER_MICROSECOND - scaleFactor(timestampType.getPrecision(), 12))); | ||
long endInclusiveMicros = (long) calculateRangeEndInclusive(longTimestamp.getEpochMicros(), createTimestampType(TimestampType.MAX_SHORT_PRECISION), rangeUnit); | ||
return new LongTimestamp(endInclusiveMicros, toIntExact(PICOSECONDS_PER_MICROSECOND - scaleFactor(timestampType.getPrecision(), TimestampType.MAX_PRECISION))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar here. the use PICOSECONDS_PER_MICROSECOND mandates that we know we're dealing with picoseconds, i.e. 10^(-12)s, so it matched the corresponding 12 on this line
after the change, we invoke "max precision" constant, but we still rely on it having an actual value of 12
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/UnwrapCastInComparison.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/UnwrapCastInComparison.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/UnwrapCastInComparison.java
Show resolved
Hide resolved
@findinpath let's have unwrapping of CASTs and date_trunc as separate PRs. |
Description
This change allows the engine to infer that, for instance,
given t::timestamp(6)
or
can be rewritten as
The change applies for the temporal types:
date
timestamp
timestamp with time zone
Range predicate BetweenPredicate can be transformed into a
TupleDomain
and thus help with predicate pushdown.
Range-based
TupleDomain
representation is critical for connectorswhich have min/max-based metadata (like Iceberg manifests lists which
play a key role in partition pruning or Iceberg data files), as ranges allow
for intersection tests, something that is hard
to do in a generic manner for
ConnectorExpression
.Fixes #14293
Non-technical explanation
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: