Segfault on hypertable query in lateral join with aggregate #579

fvannee · 2018-06-30T11:01:27Z

On TimescaleDb 0.9.2 (PG10.4) we observed a segfault on certain queries. I haven't tested on 0.10.0 yet, but I think the part of the code that crashes has not changed in the new version. I've narrowed the query down to the following:

explain
select * FROM normal_table ik
LEFT JOIN LATERAL (SELECT max(time0::bigint) FROM hypertable s WHERE ik.name = 'value' and s.time0 > now()) s ON TRUE

The query is a bit weird, as it has a WHERE clause inside the subquery which references the outer query (ik.name = 'value'). It seems the combination of ik.name='value' and the max aggregate function makes TimescaleDb crash. Removing either one results in a valid plan. Removing the condition on s.time0 > now() also yields a valid plan, but this is just because the constraint_aware_append is not triggered without this condition.

The first part of the stack trace:
#0 list_nth_cell (list=0x2d44bd0, n=47494368) at list.c:399
#1 0x00000000006239e2 in list_nth (list=, n=) at list.c:413
#2 0x00007f58d865d6bb in ca_append_begin (node=0x2d4a550, estate=0x2d486e8, eflags=17) at /home/florisvannee/timescaledb/src/constraint_aware_append.c:165
#3 0x00000000005f1712 in ExecInitCustomScan (cscan=cscan@entry=0x2d440b8, estate=estate@entry=0x2d486e8, eflags=eflags@entry=17) at nodeCustom.c:105
#4 0x00000000005e54fd in ExecInitNode (node=0x2d440b8, estate=estate@entry=0x2d486e8, eflags=eflags@entry=17) at execProcnode.c:282
#5 0x00000000005fe644 in ExecInitResult (node=node@entry=0x2d44408, estate=estate@entry=0x2d486e8, eflags=eflags@entry=17) at nodeResult.c:223
#6 0x00000000005e52fd in ExecInitNode (node=0x2d44408, estate=estate@entry=0x2d486e8, eflags=eflags@entry=17) at execProcnode.c:164

It crashes on the following line in ca_append_begin:
RangeTblEntry *rte = rt_fetch(scan->scanrelid, estate->es_range_table);

The es_range_table is something with 6 entries and scanrelid is a very large number in a T_RESULT node. It tries to access the 47484376th element of the range table list.

(gdb) print *estate->es_range_table
$1 = {type = T_List, length = 6, head = 0x2d44bb0, tail = 0x2d458a0}
(gdb) print *scan
$2 = {plan = {type = T_Result, startup_cost = 0.14999999999999999, total_cost = 7.2587499999999991, plan_rows = 1, plan_width = 8, parallel_aware = 0 '\000', parallel_safe = 0 '\000', plan_node_id = 7,
targetlist = 0x2d4adc8, qual = 0x0, lefttree = 0x2d4aeb8, righttree = 0x0, initPlan = 0x0, extParam = 0x2d4b4b8, allParam = 0x2d4b4d0}, scanrelid = 47494376}

I believe the aggregate function turns the children of the Append node into a T_RESULT (where the T_RESULT children contain the actual sequential/index scan on the relation). This is the plan when I change the query to use Postres partition exclusion instead of the constraint aware append:

Nested Loop Left Join (cost=14.02..10131.40 rows=720 width=93)
-> Seq Scan on normal_table ik (cost=0.00..17.20 rows=720 width=85)
-> Aggregate (cost=14.02..14.03 rows=1 width=8)
-> Append (cost=0.00..13.68 rows=135 width=8)
-> Result (cost=0.00..0.00 rows=1 width=8)
One-Time Filter: (ik.name = 'value'::text)
-> Seq Scan on hyper_table s (cost=0.00..0.00 rows=1 width=8)
Filter: (time0 > '2018-05-20 00:00:00.0 +0200'::timestamp9)
-> Result (cost=0.00..13.68 rows=134 width=8)
One-Time Filter: (ik.name = 'value'::text)
-> Seq Scan on _hyper_383_10587_chunk s_1 (cost=0.00..13.68 rows=134 width=8)
Filter: (time0 > '2018-05-20 00:00:00.0 +0200'::timestamp9)

To me it looks like a quick fix to avoid segfaults would be to move:

		AppendRelInfo *appinfo = lfirst(lc_info);
		RangeTblEntry *rte = rt_fetch(scan->scanrelid, estate->es_range_table);

Inside the case block. This avoids the segfault. However, this is not the behavior you'd like, as it will not prune chunks from the plan. Ideally, there'd be a special case for T_RESULT which then looks at the children of T_RESULT to determine if the chunk can be pruned or not. This would result in correct behavior.

Any ideas on how best to approach this?

The text was updated successfully, but these errors were encountered:

fvannee · 2018-06-30T12:26:24Z

I verified that the proposed quick fix to move the lfirst/rt_fetch calls inside the case-block works (eg. the database does not crash anymore).

mfreed · 2018-07-01T18:26:12Z

Thanks for the detailed bug report, @fvannee . We will try to replicate with your instructions.

fvannee · 2018-07-12T20:24:28Z

Fixed in 0.10.1 :)

fvannee changed the title ~~Segfault on hypertable query in lateral join~~ Segfault on hypertable query in lateral join with aggregate Jun 30, 2018

mfreed added the bug label Jul 1, 2018

RobAtticus added this to the 0.10.1 milestone Jul 9, 2018

amytai mentioned this issue Jul 10, 2018

Fix CustomScan pruning whenever the subplan is NOT of a Scan type #590

Merged

mfreed assigned amytai Jul 11, 2018

fvannee closed this as completed Jul 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault on hypertable query in lateral join with aggregate #579

Segfault on hypertable query in lateral join with aggregate #579

fvannee commented Jun 30, 2018

fvannee commented Jun 30, 2018

mfreed commented Jul 1, 2018

fvannee commented Jul 12, 2018

Segfault on hypertable query in lateral join with aggregate #579

Segfault on hypertable query in lateral join with aggregate #579

Comments

fvannee commented Jun 30, 2018

fvannee commented Jun 30, 2018

mfreed commented Jul 1, 2018

fvannee commented Jul 12, 2018