Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (retries) in rptest.tests.e2e_iam_role_test #11107

Closed
twmb opened this issue May 30, 2023 · 8 comments · Fixed by #11155
Closed

CI Failure (retries) in rptest.tests.e2e_iam_role_test #11107

twmb opened this issue May 30, 2023 · 8 comments · Fixed by #11155
Assignees
Labels
area/cloud-storage Shadow indexing subsystem ci-failure kind/bug Something isn't working

Comments

@twmb
Copy link
Contributor

twmb commented May 30, 2023

https://buildkite.com/redpanda/redpanda/builds/30205#01886e06-e1ae-4b62-bd19-fe4c80a7082c

Module: rptest.tests.e2e_iam_role_test
Class:  STSRoleFetchTests
Method: test_write
====================================================================================================
test_id:    rptest.tests.e2e_iam_role_test.STSRoleFetchTests.test_write
status:     FAIL
run time:   1 minute 14.178 seconds


    ConnectionError(MaxRetryError("HTTPConnectionPool(host='docker-rp-20', port=9644): Max retries exceeded with url: /v1/cloud_storage/status/panda-topic/0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f30aa19f610>: Failed to establish a new connection: [Errno 111] Connection refused'))"))
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 159, in _new_conn
    conn = connection.create_connection(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.10/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1328, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1277, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1037, in _send_output
    self.send(msg)
  File "/usr/lib/python3.10/http/client.py", line 975, in send
    self.connect()
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 187, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 171, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f30aa19f610>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 726, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/retry.py", line 446, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='docker-rp-20', port=9644): Max retries exceeded with url: /v1/cloud_storage/status/panda-topic/0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f30aa19f610>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 155, in wrapped
    self.redpanda.stop_and_scrub_object_storage()
  File "/root/tests/rptest/services/redpanda.py", line 3563, in stop_and_scrub_object_storage
    wait_until(all_partitions_uploaded_manifest,
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 53, in wait_until
    raise e
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 44, in wait_until
    if condition():
  File "/root/tests/rptest/services/redpanda.py", line 3537, in all_partitions_uploaded_manifest
    status = self._admin.get_partition_cloud_storage_status(
  File "/root/tests/rptest/services/admin.py", line 933, in get_partition_cloud_storage_status
    return self._request("GET",
  File "/root/tests/rptest/services/admin.py", line 334, in _request
    r = self._session.request(verb, url, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='docker-rp-20', port=9644): Max retries exceeded with url: /v1/cloud_storage/status/panda-topic/0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f30aa19f610>: Failed to establish a new connection: [Errno 111] Connection refused'))
@twmb twmb added kind/bug Something isn't working ci-failure labels May 30, 2023
@twmb twmb changed the title CI Failure (retries) in STSRoleFetchTests.test_write CI Failure (retries) in rptest.tests.e2e_iam_role_test May 30, 2023
@twmb
Copy link
Contributor Author

twmb commented May 30, 2023

Nearly identical error below in same build

Module: rptest.tests.e2e_iam_role_test
Class:  ShortLivedCredentialsTests
Method: test_short_lived_credentials```

====================================================================================================
test_id: rptest.tests.e2e_iam_role_test.ShortLivedCredentialsTests.test_short_lived_credentials
status: FAIL
run time: 1 minute 12.219 seconds

ConnectionError(MaxRetryError("HTTPConnectionPool(host='docker-rp-20', port=9644): Max retries exceeded with url: /v1/cloud_storage/status/__consumer_offsets/1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4bd2c9e920>: Failed to establish a new connection: [Errno 111] Connection refused'))"))

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 159, in _new_conn
conn = connection.create_connection(
File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 670, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 392, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python3.10/http/client.py", line 1282, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1328, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1277, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1037, in _send_output
self.send(msg)
File "/usr/lib/python3.10/http/client.py", line 975, in send
self.connect()
File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 187, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 171, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f4bd2c9e920>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 726, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.10/dist-packages/urllib3/util/retry.py", line 446, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='docker-rp-20', port=9644): Max retries exceeded with url: /v1/cloud_storage/status/__consumer_offsets/1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4bd2c9e920>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
data = self.run_test()
File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
return self.test_context.function(self.test)
File "/root/tests/rptest/services/cluster.py", line 155, in wrapped
self.redpanda.stop_and_scrub_object_storage()
File "/root/tests/rptest/services/redpanda.py", line 3563, in stop_and_scrub_object_storage
wait_until(all_partitions_uploaded_manifest,
File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 53, in wait_until
raise e
File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 44, in wait_until
if condition():
File "/root/tests/rptest/services/redpanda.py", line 3537, in all_partitions_uploaded_manifest
status = self._admin.get_partition_cloud_storage_status(
File "/root/tests/rptest/services/admin.py", line 933, in get_partition_cloud_storage_status
return self._request("GET",
File "/root/tests/rptest/services/admin.py", line 334, in _request
r = self._session.request(verb, url, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='docker-rp-20', port=9644): Max retries exceeded with url: /v1/cloud_storage/status/__consumer_offsets/1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4bd2c9e920>: Failed to establish a new connection: [Errno 111] Connection refused'))```

@twmb
Copy link
Contributor Author

twmb commented May 30, 2023

And below in AWSRoleFetchTests.test_write

@rockwotj
Copy link
Contributor

rockwotj commented May 31, 2023

At least for my PR, this is an ASAN violation:
https://buildkite.com/redpanda/redpanda/builds/30234#01886fc6-3c51-4b29-a3a3-c94f213f1652

/usr/bin/llvm-symbolizer: /opt/redpanda_installs/ci/lib/libc.so.6: version `GLIBC_2.36' not found (required by /usr/lib/llvm-15/bin/../lib/libLLVM-15.so.1)
/usr/bin/llvm-symbolizer: /opt/redpanda_installs/ci/lib/libc.so.6: version `GLIBC_2.36' not found (required by /lib/x86_64-linux-gnu/libstdc++.so.6)
==2423==WARNING: Can't read from symbolizer at fd 58
==2423==WARNING: Can't write to symbolizer at fd 61
==2423==WARNING: Failed to use and restart external symbolizer!
    #0 0x7f400c71ac6c  (/opt/redpanda_installs/ci/lib/libv_v_cloud_roles.so+0xb70c6c) (BuildId: 812437d1ec951318ed1191b6387fa903a4eed59a)
    #1 0x7f400c71ab31  (/opt/redpanda_installs/ci/lib/libv_v_cloud_roles.so+0xb70b31) (BuildId: 812437d1ec951318ed1191b6387fa903a4eed59a)
    #2 0x7f400c71aaa3  (/opt/redpanda_installs/ci/lib/libv_v_cloud_roles.so+0xb70aa3) (BuildId: 812437d1ec951318ed1191b6387fa903a4eed59a)
    #3 0x7f400c71a953  (/opt/redpanda_installs/ci/lib/libv_v_cloud_roles.so+0xb70953) (BuildId: 812437d1ec951318ed1191b6387fa903a4eed59a)
    #4 0x7f400c71a8d7  (/opt/redpanda_installs/ci/lib/libv_v_cloud_roles.so+0xb708d7) (BuildId: 812437d1ec951318ed1191b6387fa903a4eed59a)
    #5 0x7f400c7163c3  (/opt/redpanda_installs/ci/lib/libv_v_cloud_roles.so+0xb6c3c3) (BuildId: 812437d1ec951318ed1191b6387fa903a4eed59a)
    #6 0x7f3ff046be84  (/opt/redpanda_installs/ci/lib/libseastar.so+0x4223e84) (BuildId: 1b2148e54305c08fed377c9a74db2364b97b7f21)
    #7 0x7f3ff0380ccb  (/opt/redpanda_installs/ci/lib/libseastar.so+0x4138ccb) (BuildId: 1b2148e54305c08fed377c9a74db2364b97b7f21)
    #8 0x7f3ff0361448  (/opt/redpanda_installs/ci/lib/libseastar.so+0x4119448) (BuildId: 1b2148e54305c08fed377c9a74db2364b97b7f21)
    #9 0x7f3ff04cbbf9  (/opt/redpanda_installs/ci/lib/libseastar.so+0x4283bf9) (BuildId: 1b2148e54305c08fed377c9a74db2364b97b7f21)
    #10 0x7f3ff04cd2f5  (/opt/redpanda_installs/ci/lib/libseastar.so+0x42852f5) (BuildId: 1b2148e54305c08fed377c9a74db2364b97b7f21)
    #11 0x7f3ff04ccc86  (/opt/redpanda_installs/ci/lib/libseastar.so+0x4284c86) (BuildId: 1b2148e54305c08fed377c9a74db2364b97b7f21)
    #12 0x7f3ff06cd864  (/opt/redpanda_installs/ci/lib/libseastar.so+0x4485864) (BuildId: 1b2148e54305c08fed377c9a74db2364b97b7f21)
    #13 0x7f3ff06da981  (/opt/redpanda_installs/ci/lib/libseastar.so+0x4492981) (BuildId: 1b2148e54305c08fed377c9a74db2364b97b7f21)
    #14 0x7f3ff06e0a3c  (/opt/redpanda_installs/ci/lib/libseastar.so+0x4498a3c) (BuildId: 1b2148e54305c08fed377c9a74db2364b97b7f21)
    #15 0x7f3ff06de5c5  (/opt/redpanda_installs/ci/lib/libseastar.so+0x44965c5) (BuildId: 1b2148e54305c08fed377c9a74db2364b97b7f21)
    #16 0x7f3ff0182a6c  (/opt/redpanda_installs/ci/lib/libseastar.so+0x3f3aa6c) (BuildId: 1b2148e54305c08fed377c9a74db2364b97b7f21)
    #17 0x7f3ff0180466  (/opt/redpanda_installs/ci/lib/libseastar.so+0x3f38466) (BuildId: 1b2148e54305c08fed377c9a74db2364b97b7f21)
    #18 0x7f405a20ba01  (/opt/redpanda_installs/ci/lib/libv_v_application.so+0x6323a01) (BuildId: bc536e0723ad63ca3f58c894f5b61f41f97ebbab)
    #19 0x564222aba0e3  (/opt/redpanda_installs/ci/libexec/redpanda+0x17e0e3) (BuildId: e1dd05939ec05ee7c968eac0e2f40480e58ec8e7)
    #20 0x7f3feb42858f  (/opt/redpanda_installs/ci/lib/libc.so.6+0x2d58f) (BuildId: 6e7b96dfb83f0bdcb6a410469b82f86415e5ada3)
    #21 0x7f3feb428648  (/opt/redpanda_installs/ci/lib/libc.so.6+0x2d648) (BuildId: 6e7b96dfb83f0bdcb6a410469b82f86415e5ada3)
    #22 0x5642229e30d4  (/opt/redpanda_installs/ci/libexec/redpanda+0xa70d4) (BuildId: e1dd05939ec05ee7c968eac0e2f40480e58ec8e7)

Address 0x7f3fe18631f8 is a wild pointer inside of access range of size 0x000000000008.
SUMMARY: AddressSanitizer: stack-use-after-return (/opt/redpanda_installs/ci/lib/libv_v_cloud_roles.so+0xb70c6c) (BuildId: 812437d1ec951318ed1191b6387fa903a4eed59a) 
Shadow bytes around the buggy address:
  0x7f3fe1862f00: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x7f3fe1862f80: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x7f3fe1863000: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x7f3fe1863080: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x7f3fe1863100: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
=>0x7f3fe1863180: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5[f5]
  0x7f3fe1863200: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x7f3fe1863280: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x7f3fe1863300: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x7f3fe1863380: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x7f3fe1863400: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==2423==ABORTING
/usr/bin/llvm-symbolizer: /opt/redpanda_installs/ci/lib/libc.so.6: version `GLIBC_2.36' not found (required by /usr/lib/llvm-15/bin/../lib/libLLVM-15.so.1)
/usr/bin/llvm-symbolizer: /opt/redpanda_installs/ci/lib/libc.so.6: version `GLIBC_2.36' not found (required by /lib/x86_64-linux-gnu/libstdc++.so.6)

And decoding that looks like:

[Backtrace #0]
cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1::operator()() const at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-00f2bd96a3b3b6635-1/redpanda/redpanda/src/v/cloud_roles/probe.cc:34
std::__1::function<seastar::metrics::impl::metric_value ()> seastar::metrics::impl::make_function<cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1, void>(cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1, seastar::metrics::impl::data_type)::'lambda'()::operator()() const at /vectorized/include/seastar/core/metrics.hh:411
decltype(std::declval<cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1>()()) std::__1::__invoke[abi:v160004]<std::__1::function<seastar::metrics::impl::metric_value ()> seastar::metrics::impl::make_function<cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1, void>(cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1, seastar::metrics::impl::data_type)::'lambda'()&>(cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1&&) at /vectorized/llvm/bin/../include/c++/v1/__functional/invoke.h:394
seastar::metrics::impl::metric_value std::__1::__invoke_void_return_wrapper<seastar::metrics::impl::metric_value, false>::__call<std::__1::function<seastar::metrics::impl::metric_value ()> seastar::metrics::impl::make_function<cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1, void>(cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1, seastar::metrics::impl::data_type)::'lambda'()&>(std::__1::function<seastar::metrics::impl::metric_value ()> seastar::metrics::impl::make_function<cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1, void>(cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1, seastar::metrics::impl::data_type)::'lambda'()&) at /vectorized/llvm/bin/../include/c++/v1/__functional/invoke.h:478
std::__1::__function::__alloc_func<std::__1::function<seastar::metrics::impl::metric_value ()> seastar::metrics::impl::make_function<cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1, void>(cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1, seastar::metrics::impl::data_type)::'lambda'(), std::__1::allocator<std::__1::function<seastar::metrics::impl::metric_value ()> seastar::metrics::impl::make_function<cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1, void>(cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1, seastar::metrics::impl::data_type)::'lambda'()>, seastar::metrics::impl::metric_value ()>::operator()[abi:v160004]() at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:185
std::__1::__function::__func<std::__1::function<seastar::metrics::impl::metric_value ()> seastar::metrics::impl::make_function<cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1, void>(cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1, seastar::metrics::impl::data_type)::'lambda'(), std::__1::allocator<std::__1::function<seastar::metrics::impl::metric_value ()> seastar::metrics::impl::make_function<cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1, void>(cloud_roles::auth_refresh_probe::auth_refresh_probe()::$_1, seastar::metrics::impl::data_type)::'lambda'()>, seastar::metrics::impl::metric_value ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:356
std::__1::__function::__value_func<seastar::metrics::impl::metric_value ()>::operator()[abi:v160004]() const at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:510
std::__1::function<seastar::metrics::impl::metric_value ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:1156
seastar::metrics::impl::get_values(int) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/metrics.cc:424
auto seastar::prometheus::get_map_value(seastar::prometheus::metrics_families_per_shard&, int)::$_0::operator()<unsigned int>(unsigned int) const::'lambda'()::operator()() const at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/prometheus.cc:224
seastar::future<seastar::foreign_ptr<seastar::shared_ptr<seastar::metrics::impl::values_copy>>> seastar::futurize<seastar::foreign_ptr<seastar::shared_ptr<seastar::metrics::impl::values_copy>>>::invoke<auto seastar::prometheus::get_map_value(seastar::prometheus::metrics_families_per_shard&, int)::$_0::operator()<unsigned int>(unsigned int) const::'lambda'()&>(unsigned int&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/future.hh:2001
seastar::smp_message_queue::async_work_item<auto seastar::prometheus::get_map_value(seastar::prometheus::metrics_families_per_shard&, int)::$_0::operator()<unsigned int>(unsigned int) const::'lambda'()>::run_and_dispose() at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/smp.hh:240
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2557
seastar::reactor::run_some_tasks() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3020
seastar::reactor::do_run() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3189

Which is this line:

[this] { return _fetch_errors; },

@rockwotj
Copy link
Contributor

Possibly related to our replicated metrics patch? /cc @VladLazar @dotnwat

@VladLazar
Copy link
Contributor

This test actual exercises the cloud roles code-paths, and implicitly the probe. While it's possible that something's wrong with the metrics patches, I'd expect that to crop up more often and in various places. I'd take a good look at the lifetime of this probe first.

@rockwotj
Copy link
Contributor

rockwotj commented Jun 1, 2023

I'm still learning about how the probes and seastar metrics work fully, but it seems like the metrics are supposed to be unregistered when the dtor is called for the probe right? I'm not sure how the probe's lifetime could cause anything here.

@rockwotj rockwotj self-assigned this Jun 1, 2023
rockwotj added a commit to rockwotj/redpanda that referenced this issue Jun 1, 2023
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this issue Jun 1, 2023
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
BenPope added a commit to BenPope/redpanda that referenced this issue Jun 1, 2023
Limit the lifetime of the metrics to the async lifetime of
`refresh_credentials`.

Fixes redpanda-data#11095
Fixes redpanda-data#11107

Signed-off-by: Ben Pope <ben@redpanda.com>
@rockwotj rockwotj removed their assignment Jun 1, 2023
@twmb
Copy link
Contributor Author

twmb commented Jun 2, 2023

rockwotj added a commit to rockwotj/redpanda that referenced this issue Jun 2, 2023
Probes register metrics by capturing `this`, and it's not save to move
them after that. In a lot of places this is safe because their lifetime
is directly tied to a service which lives the whole program's lifetime,
but any small move of that object even during initialization can break
things (see redpanda-data#11155, redpanda-data#11095, redpanda-data#11107).

This takes a big hammer approach to removing this foot gun by making all
probes immovable and wrapping them in `std::unique_ptr`.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this issue Jun 2, 2023
Probes register metrics by capturing `this`, and it's not save to move
them after that. In a lot of places this is safe because their lifetime
is directly tied to a service which lives the whole program's lifetime,
but any small move of that object even during initialization can break
things (see redpanda-data#11155, redpanda-data#11095, redpanda-data#11107).

This takes a big hammer approach to removing this foot gun by making all
probes immovable and wrapping them in `std::unique_ptr`.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this issue Jun 2, 2023
Probes register metrics by capturing `this`, and it's not save to move
them after that. In a lot of places this is safe because their lifetime
is directly tied to a service which lives the whole program's lifetime,
but any small move of that object even during initialization can break
things (see redpanda-data#11155, redpanda-data#11095, redpanda-data#11107).

This takes a big hammer approach to removing this foot gun by making all
probes immovable and wrapping them in `std::unique_ptr`.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this issue Jun 2, 2023
Probes register metrics by capturing `this`, and it's not save to move
them after that. In a lot of places this is safe because their lifetime
is directly tied to a service which lives the whole program's lifetime,
but any small move of that object even during initialization can break
things (see redpanda-data#11155, redpanda-data#11095, redpanda-data#11107).

This takes a big hammer approach to removing this foot gun by making all
probes immovable and wrapping them in `std::unique_ptr`.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this issue Jun 2, 2023
Probes register metrics by capturing `this`, and it's not save to move
them after that. In a lot of places this is safe because their lifetime
is directly tied to a service which lives the whole program's lifetime,
but any small move of that object even during initialization can break
things (see redpanda-data#11155, redpanda-data#11095, redpanda-data#11107).

This takes a big hammer approach to removing this foot gun by making all
probes immovable and wrapping them in `std::unique_ptr`.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this issue Jun 2, 2023
Probes register metrics by capturing `this`, and it's not save to move
them after that. In a lot of places this is safe because their lifetime
is directly tied to a service which lives the whole program's lifetime,
but any small move of that object even during initialization can break
things (see redpanda-data#11155, redpanda-data#11095, redpanda-data#11107).

This takes a big hammer approach to removing this foot gun by making all
probes immovable and wrapping them in `std::unique_ptr`.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this issue Jun 3, 2023
Probes register metrics by capturing `this`, and it's not save to move
them after that. In a lot of places this is safe because their lifetime
is directly tied to a service which lives the whole program's lifetime,
but any small move of that object even during initialization can break
things (see redpanda-data#11155, redpanda-data#11095, redpanda-data#11107).

This takes a big hammer approach to removing this foot gun by making all
probes immovable and wrapping them in `std::unique_ptr`.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this issue Jun 5, 2023
Probes register metrics by capturing `this`, and it's not save to move
them after that. In a lot of places this is safe because their lifetime
is directly tied to a service which lives the whole program's lifetime,
but any small move of that object even during initialization can break
things (see redpanda-data#11155, redpanda-data#11095, redpanda-data#11107).

This takes a big hammer approach to removing this foot gun by making all
probes immovable and wrapping them in `std::unique_ptr`.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this issue Jun 12, 2023
Probes register metrics by capturing `this`, and it's not save to move
them after that. In a lot of places this is safe because their lifetime
is directly tied to a service which lives the whole program's lifetime,
but any small move of that object even during initialization can break
things (see redpanda-data#11155, redpanda-data#11095, redpanda-data#11107).

This takes a big hammer approach to removing this foot gun by making all
probes immovable and wrapping them in `std::unique_ptr`.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem ci-failure kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants