stuck cloud-storage readers with chunked reads enabled #16465

jason-da-redpanda · 2024-02-03T10:31:17Z

Version & Environment

23.2.24

What went wrong?

cloud-storage readers were getting stuck..

More in depth analysis has been done internally (please add/also please change title if it ends up not being chunked reads)..

cloud-storage is creating a reader and this reader is trying to acquire a segment_reader_units semaphore.
This semaphore limits number of segment readers that can be simultaneously created.

Disabling chunk reads via
cloud_storage_disable_chunk_reads=true
and restarting the broker cleared the issue

What should have happened instead?

Don't get stuck when chunked reads are enabled

How to reproduce the issue?

Have not reproduced at this point

Additional information

Messages such as these were observed with DEBUG :
DEBUG 2024-02-02 08:39:49,166 [shard 20] cloud_storage - materialized_resources.cc:379 - Materialized segment {kafka/request/95} base-offset 321713857 is not stale: 2 readers=0

JIRA Link: CORE-1752

The text was updated successfully, but these errors were encountered:

jason-da-redpanda · 2024-02-06T14:15:49Z

related to finding the cause
cloud_storage: Add watchdog utility and use it to detect stuck eviction loop #16466

jason-da-redpanda · 2024-03-27T10:10:57Z

Variation of this seen ..in recent issue

 ERROR 2024-03-26 13:46:13,269 [shard 4:main] cloud_storage - remote_segment.cc:1470 - remote_segment_batch_reader {"927653cc/kafka/my-topic.traces/1_16024049/14914798-14952049-9074378-23-v1.log.23"} stop operation stuck
  ERROR 2024-03-26 13:46:13,269 [shard 4:main] cloud_storage - remote_partition.cc:920 - Eviction loop for partition {kafka/my-topic.traces/1} stuck
2024-03-26 13:46:13.269

Disabling chunked reads again mitigates

piyushredpanda · 2024-06-07T04:53:55Z

We now RCAed this fully. Closing.

jason-da-redpanda added the kind/bug Something isn't working label Feb 3, 2024

jason-da-redpanda assigned abhijat Feb 3, 2024

piyushredpanda added the area/cloud-storage Shadow indexing subsystem label Feb 3, 2024

abhijat mentioned this issue Apr 4, 2024

use-parent-abort-source #17623

Closed

6 tasks

dotnwat added the sev/high loss of availability, pathological performance degradation, recoverable corruption label Apr 5, 2024

abhijat mentioned this issue Apr 11, 2024

CORE-1752: cst: improved logging #17785

Merged

6 tasks

abhijat mentioned this issue Apr 29, 2024

[CORE-2581] cst: move chunk downloads to remote segment bg loop #18093

Merged

7 tasks

piyushredpanda closed this as completed Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stuck cloud-storage readers with chunked reads enabled #16465

stuck cloud-storage readers with chunked reads enabled #16465

jason-da-redpanda commented Feb 3, 2024 •

edited by jira bot

Loading

jason-da-redpanda commented Feb 6, 2024

jason-da-redpanda commented Mar 27, 2024

piyushredpanda commented Jun 7, 2024

stuck cloud-storage readers with chunked reads enabled #16465

stuck cloud-storage readers with chunked reads enabled #16465

Comments

jason-da-redpanda commented Feb 3, 2024 • edited by jira bot Loading

Version & Environment

What went wrong?

What should have happened instead?

How to reproduce the issue?

Additional information

jason-da-redpanda commented Feb 6, 2024

jason-da-redpanda commented Mar 27, 2024

piyushredpanda commented Jun 7, 2024

jason-da-redpanda commented Feb 3, 2024 •

edited by jira bot

Loading