Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Segmentation Fault in PrometheusWriter::WriteSingleEntry within PostgreSQL Process #19426

Open
1 task done
agsh-yb opened this issue Oct 6, 2023 · 1 comment
Open
1 task done
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue QA_Blocker

Comments

@agsh-yb
Copy link
Contributor

agsh-yb commented Oct 6, 2023

Jira Link: DB-8220

Description

Core dump is observed in test_ysql_snapshots_daily_test. Observed this failure in on 2.21.0.0-b42

Steps

  1. Create 20 YSQL tables
  2. Delete existing schedules if any
  3. Load data - using a large key multiplier-500 and Create a snapshot schedule
  4. Execute Nemesis - Failure moment- After Schedule-RESTART_VM, ADD_DROP_COLUMN
  5. Capture time for restore point
  6. Execute 20 restores in a row
  7. Evaluate nemesis after restore- RESTART_TSERVER, RESTART_MASTER, PARTITION_NETWORK, SLOW_NETWORK,
  8. RESTART_VM
  9. Assert no. of rows of data
  10. Check for FATALs and CORRUPT files

Environment setup

instance_type="c5.2xlarge",
master_gflags={
"ysql_num_shards_per_tserver": "10",
"yb_num_shards_per_tserver": "10",
},
tserver_gflags={
"ysql_num_shards_per_tserver": "10",
"yb_num_shards_per_tserver": "10",
'client_read_write_timeout_ms': str(30 * 60 * 1000),
'yb_client_admin_operation_timeout_sec': str(30 * 60),
}

Observations:

  1. Error:Found schema version mismatch for table 000033f3000030008000000000004000: expected 2, got 1
  2. Core dumps are found
  3. SST File size spiked
    echarts
2023-10-06 00:02:13,040:DEBUG: > 172.151.20.168 > /home/yugabyte/tserver/bin/ysqlsh --c "select count(k) as count from postgresqlkeyvalue0 where yb_hash_code(k) >= 26212 and yb_hash_code(k) <= 32765 " -h 172.151.17.9,172.151.17.20,172.151.20.168 -d postgres -A 2>&1
2023-10-06 00:02:13,040:DEBUG: < 172.151.20.168 < 
2023-10-06 00:02:13,041:DEBUG: > 172.151.20.168 > /home/yugabyte/tserver/bin/ysqlsh --c "select count(k) as count from postgresqlkeyvalue0 where yb_hash_code(k) >= 39318 and yb_hash_code(k) <= 45871 " -h 172.151.17.20,172.151.20.168,172.151.17.9 -d postgres -A 2>&1
2023-10-06 00:02:13,041:DEBUG: > 172.151.17.9 > /home/yugabyte/tserver/bin/ysqlsh --c "select count(k) as count from postgresqlkeyvalue0 where yb_hash_code(k) >= 45871 and yb_hash_code(k) <= 52424 " -h 172.151.17.20,172.151.20.168,172.151.17.9 -d postgres -A 2>&1
2023-10-06 00:02:13,041:DEBUG: > 172.151.17.9 > /home/yugabyte/tserver/bin/ysqlsh --c "select count(k) as count from postgresqlkeyvalue0 where yb_hash_code(k) >= 32765 and yb_hash_code(k) <= 39318 " -h 172.151.17.9,172.151.20.168,172.151.17.20 -d postgres -A 2>&1
2023-10-06 00:02:35,822:DEBUG: < 172.151.20.168 < ERROR:  schema version mismatch for table 000033f3000030008000000000004000: expected 2, got 1

2023-10-06 00:02:35,823:DEBUG: > 172.151.20.168 > /home/yugabyte/tserver/bin/ysqlsh --c "select count(k) as count from postgresqlkeyvalue0 where yb_hash_code(k) >= 39318 and yb_hash_code(k) <= 45871 " -h 172.151.17.20,172.151.20.168,172.151.17.9 -d postgres -A 2>&1
2023-10-06 00:02:36,057:DEBUG: < 172.151.20.168 < ERROR:  schema version mismatch for table 000033f3000030008000000000004000: expected 2, got 1

2023-10-06 00:02:36,058:DEBUG: > 172.151.17.9 > /home/yugabyte/tserver/bin/ysqlsh --c "select count(k) as count from postgresqlkeyvalue0 where yb_hash_code(k) >= 39318 and yb_hash_code(k) <= 45871 " -h 172.151.17.20,172.151.20.168,172.151.17.9 -d postgres -A 2>&1
2023-10-06 00:04:24,111:DEBUG: < 172.151.17.9 < 
2023-10-06 00:04:24,111:DEBUG: > 172.151.17.20 > /home/yugabyte/tserver/bin/ysqlsh --c "select count(k) as count from postgresqlkeyvalue0 where yb_hash_code(k) >= 32765 and yb_hash_code(k) <= 39318 " -h 172.151.17.20,172.151.17.9,172.151.20.168 -d postgres -A 2>&1
2023-10-06 00:04:24,111:DEBUG: < 172.151.17.9 < 
2023-10-06 00:04:24,112:DEBUG: > 172.151.20.168 > /home/yugabyte/tserver/bin/ysqlsh --c "select count(k) as count from postgresqlkeyvalue0 where yb_hash_code(k) >= 45871 and yb_hash_code(k) <= 52424 " -h 172.151.17.20,172.151.20.168,172.151.17.9 -d postgres -A 2>&1
2023-10-06 00:04:27,296:DEBUG: < 172.151.20.168 < count
25535
(1 row)

Logs
Stack traceback:

(lldb) target create "/home/yugabyte/yb-software/yugabyte-2.21.0.0-b42-centos-x86_64/postgres/bin/postgres" --core "/home/yugabyte/cores/core_12771_1696550474_!home!yugabyte!yb-software!yugabyte-2.21.0.0-b42-centos-x86_64!postgres!bin!postgres"
Core file '/home/yugabyte/cores/core_12771_1696550474_!home!yugabyte!yb-software!yugabyte-2.21.0.0-b42-centos-x86_64!postgres!bin!postgres' (x86_64) was loaded.
(lldb) bt all
* thread #1, name = 'postgres', stop reason = signal SIGSEGV
  * frame #0: 0x00007fca39a97b67 libyb_util.so`std::__1::__hash_const_iterator<std::__1::__hash_node<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, void*>*> std::__1::__hash_table<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::__unordered_map_hasher<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, true>, std::__1::__unordered_map_equal<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, true>, std::__1::allocator<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>>::find<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>(this=<unavailable>, __k=<unavailable>) const at __hash_table:2343:31
    frame #1: 0x00007fca39a977f6 libyb_util.so`yb::PrometheusWriter::WriteSingleEntry(std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, long, yb::AggregationFunction, char const*, char const*) [inlined] std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>>::find[abi:v160006](this=0x00007fca38ed6f68, __k="table_id") const at unordered_map:1445:69
    frame #2: 0x00007fca39a977e0 libyb_util.so`yb::PrometheusWriter::WriteSingleEntry(this=0x00007fca2e4a17a8, attr=0x00007fca38ed6f68, name="handler_latency_yb_ysqlserver_SQLProcessor_SelectStmt_count", value=32326, aggregation_function=kSum, type="", description="") at metrics_writer.cc:125:18
    frame #3: 0x00007fca38ecf2d2 libyb_pggate_webserver.so`yb::pggate::PgPrometheusMetricsHandler(req=<unavailable>, resp=<unavailable>) at pgsql_webserver_wrapper.cc:452:5
    frame #4: 0x00007fca3af67430 libserver_process.so`yb::Webserver::Impl::RunPathHandler(yb::Webserver::Impl::PathHandler const&, sq_connection*, sq_request_info*) [inlined] std::__1::__function::__value_func<void (yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*)>::operator(this=0x000007d77fdc2c80, __args=0x00007fca2e4a3b58, __args=0x00007fca2e4a3a30)[abi:v160006](yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*&&) const at function.h:510:16
    frame #5: 0x00007fca3af67417 libserver_process.so`yb::Webserver::Impl::RunPathHandler(yb::Webserver::Impl::PathHandler const&, sq_connection*, sq_request_info*) [inlined] std::__1::function<void (yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*)>::operator(this= Function = yb::pggate::PgPrometheusMetricsHandler(yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*) , __arg=0x00007fca2e4a3b58, __arg=0x00007fca2e4a1a30)(yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*) const at function.h:1156:12
    frame #6: 0x00007fca3af67417 libserver_process.so`yb::Webserver::Impl::RunPathHandler(this=0x000007d77fd7ac80, handler=0x000007d77fddc730, connection=0x000007d77f4c2000, request_info=<unavailable>) at webserver.cc:625:5
    frame #7: 0x00007fca3af66f9b libserver_process.so`yb::Webserver::Impl::BeginRequestCallback(this=0x000007d77fd7ac80, connection=0x000007d77f4c2000, request_info=0x000007d77f4c2000) at webserver.cc:557:10
    frame #8: 0x00007fca3af73c88 libserver_process.so`worker_thread + 5400
    frame #9: 0x00007fca3d7a01cf libpthread.so.0`start_thread + 239
    frame #10: 0x00007fca3d40be73 libc.so.6`__clone + 67
  thread #2, stop reason = signal 0
    frame #0: 0x00007fca3d7a645c libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 508
    frame #1: 0x00007fca3af726e5 libserver_process.so`master_thread + 1301
    frame #2: 0x00007fca3d7a01cf libpthread.so.0`start_thread + 239
    frame #3: 0x00007fca3d40be73 libc.so.6`__clone + 67
  thread #3, stop reason = signal 0
    frame #0: 0x00007fca3ed76d08 ld-linux-x86-64.so.2`.annobin_dl_fini.c at dl-fini.c:84:12
    frame #1: 0x00007fca3d42329c libc.so.6`__run_exit_handlers + 316
    frame #2: 0x00007fca3d4233d0 libc.so.6`exit + 32
    frame #3: 0x00005592c2aada82 postgres`proc_exit(code=1) at ipc.c:157:2
    frame #4: 0x00005592c2c96271 postgres`errfinish(dummy=<unavailable>) at elog.c:801:3
    frame #5: 0x00005592c2a0d31d postgres`bgworker_die(postgres_signal_arg=<unavailable>) at bgworker.c:672:2
    frame #6: 0x00007fca3d7aacf0 libpthread.so.0`__restore_rt
    frame #7: 0x00007fca3d502027 libc.so.6`epoll_wait + 87
    frame #8: 0x00005592c2aafa66 postgres`WaitEventSetWait [inlined] WaitEventSetWaitBlock(set=0x000007d77fd95e88, cur_timeout=-1, occurred_events=0x00007ffdd14a05a8, nevents=1) at latch.c:1062:7
    frame #9: 0x00005592c2aafa56 postgres`WaitEventSetWait(set=0x000007d77fd95e88, timeout=-1, occurred_events=<unavailable>, nevents=1, wait_event_info=<unavailable>) at latch.c:1014:8
    frame #10: 0x00005592c2aaf503 postgres`WaitLatchOrSocket(latch=0x00007fca3804988c, wakeEvents=<unavailable>, sock=-1, timeout=<unavailable>, wait_event_info=117440512) at latch.c:399:7
    frame #11: 0x00005592c2aaf400 postgres`WaitLatch(latch=<unavailable>, wakeEvents=<unavailable>, timeout=<unavailable>, wait_event_info=<unavailable>) at latch.c:353:9 [artificial]
    frame #12: 0x00007fca38eb71e0 yb_pg_metrics.so`webserver_worker_main(unused=<unavailable>) at yb_pg_metrics.c:375:3
    frame #13: 0x00005592c2a0d1b2 postgres`StartBackgroundWorker at bgworker.c:841:2
    frame #14: 0x00005592c2a260eb postgres`maybe_start_bgworkers [inlined] do_start_bgworker(rw=0x000007d77fd7a500) at postmaster.c:6005:4
    frame #15: 0x00005592c2a2608e postgres`maybe_start_bgworkers at postmaster.c:6231:9
    frame #16: 0x00005592c2a228ad postgres`PostmasterMain(argc=25, argv=0x000007d77fd061a0) at postmaster.c:1429:2
    frame #17: 0x00005592c2925fb6 postgres`PostgresServerProcessMain(argc=25, argv=0x000007d77fd061a0) at main.c:234:3
    frame #18: 0x00005592c25e7952 postgres`main + 34
    frame #19: 0x00007fca3d40cd85 libc.so.6`__libc_start_main + 229
    frame #20: 0x00005592c25e7869 postgres`_start + 41

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@agsh-yb agsh-yb added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Oct 6, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Oct 6, 2023
@agsh-yb agsh-yb changed the title [YSQL] Postgres crash while performing select statement [YSQL] Crashed postgres process due to a segmentation fault SIGSEGV in PITR test Oct 6, 2023
@sushantrmishra
Copy link

Possible duplicate of #18948 . We should reverify once fix for #18948 has landed.

@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Oct 11, 2023
@yugabyte-ci yugabyte-ci changed the title [YSQL] Crashed postgres process due to a segmentation fault SIGSEGV in PITR test [YSQL] Segmentation Fault in PrometheusWriter::WriteSingleEntry within PostgreSQL Process Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue QA_Blocker
Projects
None yet
Development

No branches or pull requests

6 participants