WITH RECURSIVE is inefficient #4771

lhofhansl · 2020-08-10T21:50:13Z

First off, I am excited about Presto supporting this now and I get why it works the way it does. Just filing this for a discussion.

Looking here: https://github.com/prestosql/presto/blob/340/presto-main/src/main/java/io/prestosql/sql/planner/QueryPlanner.java#L235, there's a single plan built with size O(n^2) and it's always building the plan to maxRecursionDepth.
This is not complaining, just pointing out that - as is - it's only partially useful.

As an engineer I totally get it - Presto has no place to store intermediary results.
(If we had ... we could create a temp table with an extra current_depth column and store/query intermediary results that way, we'd also need "looping" query execution. In the end we can query the whole temp table - UNION ALL - or do a DISTINCT scan over all columns but the depth - UNION.)

As a user, though, I would try something like the below and would either be surprised or disappointed.
The user is forced to pretty closely anticipate the recursion depth, or the query will either be inefficient (depth too high) or fail (depth too low).

WITH RECURSIVE fib (cur, prev) AS
(
  SELECT 0, 1
    UNION ALL 
  SELECT cur + prev, cur FROM fib WHERE cur + prev < 1000000
)
SELECT cur FROM fib ORDER BY cur;

In PostgreSQL:

  cur   
--------
      0
 [...]
 832040
(31 rows)

Time: 0.836 ms

In Presto:

Query 20200810_212624_00016_iampk, FAILED, 1 node
Splits: 18 total, 0 done (0.00%)
1.91 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20200810_212624_00016_iampk failed: Recursion depth limit exceeded (10). Use 'max_recursion_depth' session property to modify the limit.

Ok... So set session max_recursion_depth=100;

Now:

Query 20200810_212703_00018_iampk failed: statement is too large (stack overflow during analysis)

Ok... So set session max_recursion_depth=40;

Now it works, but it takes almost 35s:

  cur   
--------
      0 
 [...]
 832040 
(31 rows)

Query 20200810_212808_00020_iampk, FINISHED, 1 node
Splits: 685 total, 685 done (100.00%)
34.90 [0 rows, 0B] [0 rows/s, 0B/s]

And now even when I change the 1000000 limit to 10, it still takes ~35s (since it's going by the max depth property).
I don't know offhand how to improve this, unless we invent a place to store intermediary data; but at the very least we should document the limitation very carefully.

The text was updated successfully, but these errors were encountered:

martint · 2020-08-10T23:01:10Z

Indeed. That's why the feature is experimental :)

I wrote some details some time ago on what we need to implement it efficiently: #1122 (comment)

Basically, it boils down to:

Temporary tables/storage
Multi-step query plans, with strict execution sequencing

There are also probably some opportunities for algebraic optimization of recursive plans to remove the recursion altogether. Something to investigate.

lhofhansl · 2020-08-11T20:12:38Z

I see. Missed your comment in #1122 (even though I also commented on it earlier.)
Fully expanding the plan is nice to try it out at least.

dain · 2020-08-13T00:59:18Z

Let's consolidate the discussion in #1122

dain closed this as completed Aug 13, 2020

lhofhansl mentioned this issue Sep 16, 2020

Add WITH RECURSIVE documentation #5163

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WITH RECURSIVE is inefficient #4771

WITH RECURSIVE is inefficient #4771

lhofhansl commented Aug 10, 2020 •

edited

Loading

martint commented Aug 10, 2020

lhofhansl commented Aug 11, 2020

dain commented Aug 13, 2020

WITH RECURSIVE is inefficient #4771

WITH RECURSIVE is inefficient #4771

Comments

lhofhansl commented Aug 10, 2020 • edited Loading

martint commented Aug 10, 2020

lhofhansl commented Aug 11, 2020

dain commented Aug 13, 2020

lhofhansl commented Aug 10, 2020 •

edited

Loading