Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Crash in planner for JOIN queries. #4148

Closed
svenklemm opened this issue Mar 7, 2022 · 5 comments · Fixed by #4161
Closed

[Bug]: Crash in planner for JOIN queries. #4148

svenklemm opened this issue Mar 7, 2022 · 5 comments · Fixed by #4161
Assignees

Comments

@svenklemm
Copy link
Member

svenklemm commented Mar 7, 2022

What type of bug is this?

Crash

What subsystems and features are affected?

Other

What happened?

TimescaleDB crashes on certain queries, this seems to be very easy to trigger as I get roughly 30 coredumps per minute with sqlsmith.

I will update ticket with more information once i have reproducer.

TimescaleDB version affected

master

PostgreSQL version used

14.2

What operating system did you use?

Archlinux 2022.03.01 x64

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

#0  0x0000560c4cb103df GetMemoryChunkContext (postgres + 0x8333df)
#1  0x0000560c4cb12763 pfree (postgres + 0x835763)
#2  0x0000560c4c469cb1 btendscan (postgres + 0x18ccb1)
#3  0x0000560c4c452d0e index_endscan (postgres + 0x175d0e)
#4  0x00007fac2beaa435 index_scanner_endscan (timescaledb-2.7.0-dev.so + 0x63435)
#5  0x00007fac2beaa91f ts_scanner_end_scan (timescaledb-2.7.0-dev.so + 0x6391f)
#6  0x00007fac2beaaf71 ts_scan_iterator_close (timescaledb-2.7.0-dev.so + 0x63f71)
#7  0x00007fac2be7ba70 ts_chunk_scan_by_constraints (timescaledb-2.7.0-dev.so + 0x34a70)
#8  0x00007fac2be93f1a ts_hypertable_restrict_info_get_chunks (timescaledb-2.7.0-dev.so + 0x4cf1a)
#9  0x00007fac2be9c46c find_children_chunks (timescaledb-2.7.0-dev.so + 0x5546c)
#10 0x00007fac2be9ccb6 get_chunks (timescaledb-2.7.0-dev.so + 0x55cb6)
#11 0x00007fac2be9d367 ts_plan_expand_hypertable_chunks (timescaledb-2.7.0-dev.so + 0x56367)
#12 0x00007fac2be9890a reenable_inheritance (timescaledb-2.7.0-dev.so + 0x5190a)
#13 0x00007fac2be9919d timescaledb_set_rel_pathlist (timescaledb-2.7.0-dev.so + 0x5219d)
#14 0x0000560c4c783129 set_rel_pathlist (postgres + 0x4a6129)
#15 0x0000560c4c782c8e set_base_rel_pathlists (postgres + 0x4a5c8e)
#16 0x0000560c4c78299f make_one_rel (postgres + 0x4a599f)
#17 0x0000560c4c7c2399 query_planner (postgres + 0x4e5399)
#18 0x0000560c4c7c4b91 grouping_planner (postgres + 0x4e7b91)
#19 0x0000560c4c7c423a subquery_planner (postgres + 0x4e723a)
#20 0x0000560c4c785d98 set_subquery_pathlist (postgres + 0x4a8d98)
#21 0x0000560c4c782df8 set_rel_size (postgres + 0x4a5df8)
#22 0x0000560c4c782bd1 set_base_rel_sizes (postgres + 0x4a5bd1)
#23 0x0000560c4c782891 make_one_rel (postgres + 0x4a5891)
#24 0x0000560c4c7c2399 query_planner (postgres + 0x4e5399)
#25 0x0000560c4c7c4b91 grouping_planner (postgres + 0x4e7b91)
#26 0x0000560c4c7c423a subquery_planner (postgres + 0x4e723a)
#27 0x0000560c4c7c2aa1 standard_planner (postgres + 0x4e5aa1)
#28 0x00007fac2be97f4f timescaledb_planner (timescaledb-2.7.0-dev.so + 0x50f4f)
#29 0x0000560c4c7c27ce planner (postgres + 0x4e57ce)
#30 0x0000560c4c91181e pg_plan_query (postgres + 0x63481e)
#31 0x0000560c4c91196e pg_plan_queries (postgres + 0x63496e)
#32 0x0000560c4c911d5d exec_simple_query (postgres + 0x634d5d)
#33 0x0000560c4c916d48 PostgresMain (postgres + 0x639d48)
#34 0x0000560c4c83d4db BackendRun (postgres + 0x5604db)
#35 0x0000560c4c83cd42 BackendStartup (postgres + 0x55fd42)
#36 0x0000560c4c838acb ServerLoop (postgres + 0x55bacb)
#37 0x0000560c4c838240 PostmasterMain (postgres + 0x55b240)
#38 0x0000560c4c729d26 main (postgres + 0x44cd26)
#39 0x00007fac3536fb25 __libc_start_main (libc.so.6 + 0x27b25)
#40 0x0000560c4c3a176e _start (postgres + 0xc476e)

How can we reproduce the bug?

The database used is the one from regresscheck-shared.

select * from devices inner join metrics_space_compressed on (true);
@svenklemm svenklemm added bug planner segfault Segmentation fault labels Mar 7, 2022
@svenklemm svenklemm changed the title [Bug]: Crash in planner for certain queries. [Bug]: Crash in planner for JOIN queries. Mar 7, 2022
@pmwkaa pmwkaa self-assigned this Mar 7, 2022
@erimatnor
Copy link
Contributor

erimatnor commented Mar 8, 2022

@svenklemm I cannot reproduce the crash with the DB and query provided. Is it possible to provide a stack trace with debug symbols?

Or, run addr2line on frame 7 in the stack trace.

@svenklemm
Copy link
Member Author

@erimatnor

#0  0x000055e0877f33df in GetMemoryChunkContext (pointer=0x55e0887dbe30) at ../../../../src/include/utils/memutils.h:131
#1  0x000055e0877f5763 in pfree (pointer=0x55e0887dbe30) at mcxt.c:1177
#2  0x000055e08714ccb1 in btendscan (scan=0x55e08881f4e0) at nbtree.c:472
#3  0x000055e087135d0e in index_endscan (scan=0x55e08881f4e0) at indexam.c:336
#4  0x00007f34fe803445 in index_scanner_endscan (ctx=0x7ffcdac73740) at /home/sven/projects/tsmaster/src/scanner.c:129
#5  0x00007f34fe80392f in ts_scanner_end_scan (ctx=0x7ffcdac73740) at /home/sven/projects/tsmaster/src/scanner.c:325
#6  0x00007f34fe803f81 in ts_scan_iterator_close (iterator=0x7ffcdac73740) at /home/sven/projects/tsmaster/src/scan_iterator.c:26
#7  0x00007f34fe7d4a80 in ts_chunk_scan_by_constraints (hs=0x55e0887215e8, dimension_vecs=0x55e0888f14f8, chunk_lockmode=1, numchunks=0x7ffcdac73d40) at /home/sven/projects/tsmaster/src/chunk_scan.c:261
#8  0x00007f34fe7ecf2a in ts_hypertable_restrict_info_get_chunks (hri=0x55e0888f11d0, ht=0x55e0887213d0, lockmode=1, num_chunks=0x7ffcdac73d40) at /home/sven/projects/tsmaster/src/hypertable_restrict_info.c:566
#9  0x00007f34fe7ed047 in ts_hypertable_restrict_info_get_chunks_ordered (hri=0x55e0888f11d0, ht=0x55e0887213d0, chunks=0x0, lockmode=1, reverse=true, nested_oids=0x55e0888eedc0, num_chunks=0x7ffcdac73d40)
    at /home/sven/projects/tsmaster/src/hypertable_restrict_info.c:618
#10 0x00007f34fe7f5ca9 in get_chunks (ctx=0x7ffcdac73df0, root=0x55e0888ed3d8, rel=0x55e0888ee528, ht=0x55e0887213d0, num_chunks=0x7ffcdac73d40) at /home/sven/projects/tsmaster/src/plan_expand_hypertable.c:1124
#11 0x00007f34fe7f6377 in ts_plan_expand_hypertable_chunks (ht=0x55e0887213d0, root=0x55e0888ed3d8, rel=0x55e0888ee528) at /home/sven/projects/tsmaster/src/plan_expand_hypertable.c:1332
#12 0x00007f34fe7f191a in reenable_inheritance (root=0x55e0888ed3d8, rel=0x55e0888ee528, rti=1, rte=0x55e0888ed960) at /home/sven/projects/tsmaster/src/planner.c:697
#13 0x00007f34fe7f21ad in timescaledb_set_rel_pathlist (root=0x55e0888ed3d8, rel=0x55e0888ee528, rti=1, rte=0x55e0888ed960) at /home/sven/projects/tsmaster/src/planner.c:916
#14 0x000055e087466129 in set_rel_pathlist (root=0x55e0888ed3d8, rel=0x55e0888ee528, rti=1, rte=0x55e0888ed960) at allpaths.c:541
#15 0x000055e087465c8e in set_base_rel_pathlists (root=0x55e0888ed3d8) at allpaths.c:353
#16 0x000055e08746599f in make_one_rel (root=0x55e0888ed3d8, joinlist=0x55e0888ef938) at allpaths.c:223
#17 0x000055e0874a5399 in query_planner (root=0x55e0888ed3d8, qp_callback=0x55e0874a4e6b <minmax_qp_callback>, qp_extra=0x0) at planmain.c:276
#18 0x000055e0874a4d29 in build_minmax_path (root=0x55e0888ec840, mminfo=0x55e0888ed328, eqop=1320, sortop=1324, nulls_first=true) at planagg.c:421
#19 0x000055e0874a44ab in preprocess_minmax_aggregates (root=0x55e0888ec840) at planagg.c:176
#20 0x000055e0874a7a7d in grouping_planner (root=0x55e0888ec840, tuple_fraction=0) at planner.c:1410
#21 0x000055e0874a723a in subquery_planner (glob=0x55e0888cd048, parse=0x55e0888e7118, parent_root=0x55e088631da0, hasRecursion=false, tuple_fraction=0) at planner.c:1019
#22 0x000055e0874b97a9 in make_subplan (root=0x55e088631da0, orig_subquery=0x55e0888892c0, subLinkType=EXPR_SUBLINK, subLinkId=0, testexpr=0x0, isTopQual=false) at subselect.c:222
#23 0x000055e0874bc953 in process_sublinks_mutator (node=0x55e0888d0140, context=0x7ffcdac74910) at subselect.c:1951
#24 0x000055e087437099 in expression_tree_mutator (node=0x55e0888d0198, mutator=0x55e0874bc8ab <process_sublinks_mutator>, context=0x7ffcdac74910) at nodeFuncs.c:3109
#25 0x000055e0874bcc4c in process_sublinks_mutator (node=0x55e0888d0198, context=0x7ffcdac74d30) at subselect.c:2047
#26 0x000055e087435f65 in expression_tree_mutator (node=0x55e0888d03a8, mutator=0x55e0874bc8ab <process_sublinks_mutator>, context=0x7ffcdac74d30) at nodeFuncs.c:2765
#27 0x000055e0874bcc4c in process_sublinks_mutator (node=0x55e0888d03a8, context=0x7ffcdac75150) at subselect.c:2047
#28 0x000055e087436847 in expression_tree_mutator (node=0x55e0888d0508, mutator=0x55e0874bc8ab <process_sublinks_mutator>, context=0x7ffcdac75150) at nodeFuncs.c:2940
#29 0x000055e0874bcc4c in process_sublinks_mutator (node=0x55e0888d0508, context=0x7ffcdac75570) at subselect.c:2047
#30 0x000055e087437099 in expression_tree_mutator (node=0x55e0888d0540, mutator=0x55e0874bc8ab <process_sublinks_mutator>, context=0x7ffcdac75570) at nodeFuncs.c:3109
#31 0x000055e0874bcc4c in process_sublinks_mutator (node=0x55e0888d0540, context=0x7ffcdac75990) at subselect.c:2047
#32 0x000055e08743679c in expression_tree_mutator (node=0x55e0888d06a0, mutator=0x55e0874bc8ab <process_sublinks_mutator>, context=0x7ffcdac75990) at nodeFuncs.c:2929
#33 0x000055e0874bcc4c in process_sublinks_mutator (node=0x55e0888d06a0, context=0x7ffcdac75db0) at subselect.c:2047
#34 0x000055e087437099 in expression_tree_mutator (node=0x55e0888d06f8, mutator=0x55e0874bc8ab <process_sublinks_mutator>, context=0x7ffcdac75db0) at nodeFuncs.c:3109
#35 0x000055e0874bcc4c in process_sublinks_mutator (node=0x55e0888d06f8, context=0x7ffcdac761d0) at subselect.c:2047
#36 0x000055e087436a91 in expression_tree_mutator (node=0x55e0888d0998, mutator=0x55e0874bc8ab <process_sublinks_mutator>, context=0x7ffcdac761d0) at nodeFuncs.c:2983
#37 0x000055e0874bcc4c in process_sublinks_mutator (node=0x55e0888d0998, context=0x7ffcdac765f0) at subselect.c:2047
#38 0x000055e087437099 in expression_tree_mutator (node=0x55e0888d09d0, mutator=0x55e0874bc8ab <process_sublinks_mutator>, context=0x7ffcdac765f0) at nodeFuncs.c:3109
#39 0x000055e0874bcc4c in process_sublinks_mutator (node=0x55e0888d09d0, context=0x7ffcdac76a10) at subselect.c:2047
#40 0x000055e087435e71 in expression_tree_mutator (node=0x55e0888d0a28, mutator=0x55e0874bc8ab <process_sublinks_mutator>, context=0x7ffcdac76a10) at nodeFuncs.c:2745
#41 0x000055e0874bcc4c in process_sublinks_mutator (node=0x55e0888d0a28, context=0x7ffcdac76e30) at subselect.c:2047
#42 0x000055e087436d9e in expression_tree_mutator (node=0x55e0888d00e8, mutator=0x55e0874bc8ab <process_sublinks_mutator>, context=0x7ffcdac76e30) at nodeFuncs.c:3045
#43 0x000055e0874bcc4c in process_sublinks_mutator (node=0x55e0888d00e8, context=0x7ffcdac77250) at subselect.c:2047
#44 0x000055e087437099 in expression_tree_mutator (node=0x55e0888cff30, mutator=0x55e0874bc8ab <process_sublinks_mutator>, context=0x7ffcdac77250) at nodeFuncs.c:3109
#45 0x000055e0874bcc4c in process_sublinks_mutator (node=0x55e0888cff30, context=0x7ffcdac772b0) at subselect.c:2047
#46 0x000055e0874bc895 in SS_process_sublinks (root=0x55e088631da0, expr=0x55e0888cff30, isQual=false) at subselect.c:1924
#47 0x000055e0874a736c in preprocess_expression (root=0x55e088631da0, expr=0x55e0888cff30, kind=1) at planner.c:1126
#48 0x000055e0874a68d9 in subquery_planner (glob=0x55e0888cd048, parse=0x55e088887d80, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at planner.c:781
#49 0x000055e0874a5aa1 in standard_planner (parse=0x55e088887d80,
    query_string=0x55e0886325d0 "select  \n  (select pg_catalog.bit_and(device_id) from public.metrics_space)\n     as c0, \n  pg_catalog.pg_control_system() as c1, \n  ref_0.value as c2, \n  pg_catalog.macaddr8_set7bit(\n    cast(cast(coalesce(case when (select pg_catalog.max(time) from public.metrics_space_compressed)\n             <> case when cast(null as tid) <= cast(null as tid) then (select ts from public.distinct_on_distributed limit 1 offset 5)\n               else (select ts from public.distinct_on_distributed limit 1 offset 5)\n               end\n             then pg_catalog.trunc(\n          cast(cast(null as macaddr8) as macaddr8)) else pg_catalog.trunc(\n          cast(cast(null as macaddr8) as macaddr8)) end\n        ,\n      case when false then case when EXISTS (\n            select  \n                sample_4.name as c0, \n                sample_3.v2 as c1, \n                sample_4.sensor_id as c2, \n                sample_4.name as c3, \n                ref_0.value as c4\n              from \n                public.metrics_tstz as sample_3 tablesample system (5.9) \n                  right join public.sensors as sample_4 tablesample bernoulli (5.2) \n                  on ((((select pg_catalog.variance(num_replicas) from timescaledb_experimental.chunk_replication_status)\n                             >= (select pg_catalog.var_samp(num_replicas) from timescaledb_experimental.chunk_replication_status)\n                            ) \n                        or (true)) \n                      and (((true) \n                          and (false)) \n                        and (false)))\n              where true) then cast(null as macaddr8) else cast(null as macaddr8) end\n           else case when EXISTS (\n            select  \n                sample_4.name as c0, \n                sample_3.v2 as c1, \n                sample_4.sensor_id as c2, \n                sample_4.name as c3, \n                ref_0.value as c4\n              from \n                public.metrics_tstz as sample_3 tablesample system (5.9) \n                  right join public.sensors as sample_4 tablesample bernoulli (5.2) \n                  on ((((select pg_catalog.variance(num_replicas) from timescaledb_experimental.chunk_replication_status)\n                             >= (select pg_catalog.var_samp(num_replicas) from timescaledb_experimental.chunk_replication_status)\n                            ) \n                        or (true)) \n                      and (((true) \n                          and (false)) \n                        and (false)))\n              where true) then cast(null as macaddr8) else cast(null as macaddr8) end\n           end\n        ) as macaddr8) as macaddr8)) as c3, \n  timescaledb_experimental.time_bucket_ng(\n    cast(cast(coalesce(\n        pg_catalog.max(\n          cast(cast(null as \"interval\") as \"interval\")) over (partition by ref_0.sensor_id order by ref_0.time),\n      case when case when cast(null as xid) <> ref_0.device_id then cast(null as oidvector) else cast(null as oidvector) end\n             >= cast(null as oidvector) then cast(nullif(case when ref_0.device_id is not NULL then cast(null as \"interval\") else cast(null as \"interval\") end\n            ,\n          case when (ref_0.sensor_id is NULL) \n              or (cast(null as tsquery) <@ cast(null as tsquery)) then cast(null as \"interval\") else cast(null as \"interval\") end\n            ) as \"interval\") else cast(nullif(case when ref_0.device_id is not NULL then cast(null as \"interval\") else cast(null as \"interval\") end\n            ,\n          case when (ref_0.sensor_id is NULL) \n              or (cast(null as tsquery) <@ cast(null as tsquery)) then cast(null as \"interval\") else cast(null as \"interval\") end\n            ) as \"interval\") end\n        ) as \"interval\") as \"interval\"),\n    cast((select pg_catalog.max(ts) from public.distinct_on_distributed)\n       as \"timestamp\"),\n    cast((select ts from public.distinct_on_hypertable limit 1 offset 3)\n       as \"timestamp\")) as c4, \n  ref_0.value as c5\nfrom \n  public.metrics_int as ref_0\nwhere cast(null as timestamptz) >= cast(null as timestamptz)", cursorOptions=2048, boundParams=0x0) at planner.c:400

@erimatnor
Copy link
Contributor

I think I have found the issue, but need to verify the fix. Essentially, the issue is that a rescan is initiated on a per-tuple memory context.

@pmwkaa pmwkaa assigned erimatnor and unassigned pmwkaa Mar 9, 2022
erimatnor added a commit to erimatnor/timescaledb that referenced this issue Mar 12, 2022
Scan functions cannot be called on a per-tuple memory context as they
might allocate data that need to live until the end of the scan. Fix
this in a couple of places to ensure correct memory handling.

Fixes timescale#4148, timescale#4145
@svenklemm
Copy link
Member Author

@erimatnor if you have branch you can point me to i can check if it is fixed

erimatnor added a commit to erimatnor/timescaledb that referenced this issue Mar 14, 2022
Scan functions cannot be called on a per-tuple memory context as they
might allocate data that need to live until the end of the scan. Fix
this in a couple of places to ensure correct memory handling.

Fixes timescale#4148, timescale#4145
@erimatnor
Copy link
Contributor

@erimatnor if you have branch you can point me to i can check if it is fixed

There's a PR, linked above.

erimatnor added a commit to erimatnor/timescaledb that referenced this issue Mar 15, 2022
Scan functions cannot be called on a per-tuple memory context as they
might allocate data that need to live until the end of the scan. Fix
this in a couple of places to ensure correct memory handling.

Fixes timescale#4148, timescale#4145
erimatnor added a commit to erimatnor/timescaledb that referenced this issue Mar 16, 2022
Scan functions cannot be called on a per-tuple memory context as they
might allocate data that need to live until the end of the scan. Fix
this in a couple of places to ensure correct memory handling.

Fixes timescale#4148, timescale#4145
erimatnor added a commit to erimatnor/timescaledb that referenced this issue Mar 17, 2022
Scan functions cannot be called on a per-tuple memory context as they
might allocate data that need to live until the end of the scan. Fix
this in a couple of places to ensure correct memory handling.

Fixes timescale#4148, timescale#4145
erimatnor added a commit that referenced this issue Mar 18, 2022
Scan functions cannot be called on a per-tuple memory context as they
might allocate data that need to live until the end of the scan. Fix
this in a couple of places to ensure correct memory handling.

Fixes #4148, #4145
RafiaSabih pushed a commit to RafiaSabih/timescaledb that referenced this issue Apr 5, 2022
Scan functions cannot be called on a per-tuple memory context as they
might allocate data that need to live until the end of the scan. Fix
this in a couple of places to ensure correct memory handling.

Fixes timescale#4148, timescale#4145
RafiaSabih pushed a commit to RafiaSabih/timescaledb that referenced this issue Apr 8, 2022
Scan functions cannot be called on a per-tuple memory context as they
might allocate data that need to live until the end of the scan. Fix
this in a couple of places to ensure correct memory handling.

Fixes timescale#4148, timescale#4145
mkindahl pushed a commit to RafiaSabih/timescaledb that referenced this issue Apr 8, 2022
Scan functions cannot be called on a per-tuple memory context as they
might allocate data that need to live until the end of the scan. Fix
this in a couple of places to ensure correct memory handling.

Fixes timescale#4148, timescale#4145
svenklemm pushed a commit to RafiaSabih/timescaledb that referenced this issue Apr 11, 2022
Scan functions cannot be called on a per-tuple memory context as they
might allocate data that need to live until the end of the scan. Fix
this in a couple of places to ensure correct memory handling.

Fixes timescale#4148, timescale#4145
svenklemm pushed a commit that referenced this issue Apr 11, 2022
Scan functions cannot be called on a per-tuple memory context as they
might allocate data that need to live until the end of the scan. Fix
this in a couple of places to ensure correct memory handling.

Fixes #4148, #4145
svenklemm pushed a commit that referenced this issue Apr 11, 2022
Scan functions cannot be called on a per-tuple memory context as they
might allocate data that need to live until the end of the scan. Fix
this in a couple of places to ensure correct memory handling.

Fixes #4148, #4145
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants