Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on hypertable query in lateral join with aggregate #579

Closed
fvannee opened this issue Jun 30, 2018 · 3 comments
Closed

Segfault on hypertable query in lateral join with aggregate #579

fvannee opened this issue Jun 30, 2018 · 3 comments
Assignees
Labels
Milestone

Comments

@fvannee
Copy link
Contributor

fvannee commented Jun 30, 2018

On TimescaleDb 0.9.2 (PG10.4) we observed a segfault on certain queries. I haven't tested on 0.10.0 yet, but I think the part of the code that crashes has not changed in the new version. I've narrowed the query down to the following:

explain
select * FROM normal_table ik
LEFT JOIN LATERAL (SELECT max(time0::bigint) FROM hypertable s WHERE ik.name = 'value' and s.time0 > now()) s ON TRUE

The query is a bit weird, as it has a WHERE clause inside the subquery which references the outer query (ik.name = 'value'). It seems the combination of ik.name='value' and the max aggregate function makes TimescaleDb crash. Removing either one results in a valid plan. Removing the condition on s.time0 > now() also yields a valid plan, but this is just because the constraint_aware_append is not triggered without this condition.

The first part of the stack trace:
#0 list_nth_cell (list=0x2d44bd0, n=47494368) at list.c:399
#1 0x00000000006239e2 in list_nth (list=, n=) at list.c:413
#2 0x00007f58d865d6bb in ca_append_begin (node=0x2d4a550, estate=0x2d486e8, eflags=17) at /home/florisvannee/timescaledb/src/constraint_aware_append.c:165
#3 0x00000000005f1712 in ExecInitCustomScan (cscan=cscan@entry=0x2d440b8, estate=estate@entry=0x2d486e8, eflags=eflags@entry=17) at nodeCustom.c:105
#4 0x00000000005e54fd in ExecInitNode (node=0x2d440b8, estate=estate@entry=0x2d486e8, eflags=eflags@entry=17) at execProcnode.c:282
#5 0x00000000005fe644 in ExecInitResult (node=node@entry=0x2d44408, estate=estate@entry=0x2d486e8, eflags=eflags@entry=17) at nodeResult.c:223
#6 0x00000000005e52fd in ExecInitNode (node=0x2d44408, estate=estate@entry=0x2d486e8, eflags=eflags@entry=17) at execProcnode.c:164

It crashes on the following line in ca_append_begin:
RangeTblEntry *rte = rt_fetch(scan->scanrelid, estate->es_range_table);

The es_range_table is something with 6 entries and scanrelid is a very large number in a T_RESULT node. It tries to access the 47484376th element of the range table list.

(gdb) print *estate->es_range_table
$1 = {type = T_List, length = 6, head = 0x2d44bb0, tail = 0x2d458a0}
(gdb) print *scan
$2 = {plan = {type = T_Result, startup_cost = 0.14999999999999999, total_cost = 7.2587499999999991, plan_rows = 1, plan_width = 8, parallel_aware = 0 '\000', parallel_safe = 0 '\000', plan_node_id = 7,
targetlist = 0x2d4adc8, qual = 0x0, lefttree = 0x2d4aeb8, righttree = 0x0, initPlan = 0x0, extParam = 0x2d4b4b8, allParam = 0x2d4b4d0}, scanrelid = 47494376}

I believe the aggregate function turns the children of the Append node into a T_RESULT (where the T_RESULT children contain the actual sequential/index scan on the relation). This is the plan when I change the query to use Postres partition exclusion instead of the constraint aware append:

Nested Loop Left Join (cost=14.02..10131.40 rows=720 width=93)
-> Seq Scan on normal_table ik (cost=0.00..17.20 rows=720 width=85)
-> Aggregate (cost=14.02..14.03 rows=1 width=8)
-> Append (cost=0.00..13.68 rows=135 width=8)
-> Result (cost=0.00..0.00 rows=1 width=8)
One-Time Filter: (ik.name = 'value'::text)
-> Seq Scan on hyper_table s (cost=0.00..0.00 rows=1 width=8)
Filter: (time0 > '2018-05-20 00:00:00.0 +0200'::timestamp9)
-> Result (cost=0.00..13.68 rows=134 width=8)
One-Time Filter: (ik.name = 'value'::text)
-> Seq Scan on _hyper_383_10587_chunk s_1 (cost=0.00..13.68 rows=134 width=8)
Filter: (time0 > '2018-05-20 00:00:00.0 +0200'::timestamp9)

To me it looks like a quick fix to avoid segfaults would be to move:

		AppendRelInfo *appinfo = lfirst(lc_info);
		RangeTblEntry *rte = rt_fetch(scan->scanrelid, estate->es_range_table);

Inside the case block. This avoids the segfault. However, this is not the behavior you'd like, as it will not prune chunks from the plan. Ideally, there'd be a special case for T_RESULT which then looks at the children of T_RESULT to determine if the chunk can be pruned or not. This would result in correct behavior.

Any ideas on how best to approach this?

@fvannee fvannee changed the title Segfault on hypertable query in lateral join Segfault on hypertable query in lateral join with aggregate Jun 30, 2018
@fvannee
Copy link
Contributor Author

fvannee commented Jun 30, 2018

I verified that the proposed quick fix to move the lfirst/rt_fetch calls inside the case-block works (eg. the database does not crash anymore).

@mfreed mfreed added the bug label Jul 1, 2018
@mfreed
Copy link
Member

mfreed commented Jul 1, 2018

Thanks for the detailed bug report, @fvannee . We will try to replicate with your instructions.

@fvannee
Copy link
Contributor Author

fvannee commented Jul 12, 2018

Fixed in 0.10.1 :)

@fvannee fvannee closed this as completed Jul 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants