Skip to content

Commit

Permalink
Add warning comments to Granularity.getIterable.
Browse files Browse the repository at this point in the history
This function is notorious for causing memory exhaustion and excessive
CPU usage; so much so that it was valuable to work around it in the
SQL planner in apache#13206. Hopefully, a warning comment will encourage
developers to stay away and come up with solutions that do not involve
computing all possible buckets.
  • Loading branch information
gianm committed Mar 7, 2023
1 parent 38b6373 commit ad2bdcf
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,19 @@ final Integer[] getDateValues(String filePath, Formatter formatter)
return vals;
}

/**
* Return an iterable of granular buckets that overlap a particular interval.
*
* In cases where the number of granular buckets is very large, the Iterable returned by this method will take
* an excessive amount of time to compute, and materializing it into a collection will take an excessive amount
* of memory. For example, this happens in the extreme case of an input interval of
* {@link org.apache.druid.java.util.common.Intervals#ETERNITY} and any granularity other than
* {@link Granularities#ALL}, as well as cases like an input interval of ten years with {@link Granularities#SECOND}.
*
* To avoid issues stemming from large numbers of buckets, this method should be avoided, and code that uses
* this method should be rewritten to use some other approach. For example: rather than computing all possible
* buckets in a wide time range, only process buckets related to actual data points that appear.
*/
public Iterable<Interval> getIterable(final Interval input)
{
return new IntervalIterable(input);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -871,6 +871,8 @@ private static Filtration toFiltration(DimFilter filter, VirtualColumnRegistry v
* <p>
* Necessary because some combinations are unsafe, mainly because they would lead to the creation of too many
* time-granular buckets during query processing.
*
* @see Granularity#getIterable(Interval) the problematic method call we are trying to avoid
*/
private static boolean canUseQueryGranularity(
final DataSource dataSource,
Expand Down

0 comments on commit ad2bdcf

Please sign in to comment.