Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stuck cloud-storage readers with chunked reads enabled #16465

Closed
jason-da-redpanda opened this issue Feb 3, 2024 · 3 comments
Closed

stuck cloud-storage readers with chunked reads enabled #16465

jason-da-redpanda opened this issue Feb 3, 2024 · 3 comments
Assignees
Labels
area/cloud-storage Shadow indexing subsystem kind/bug Something isn't working sev/high loss of availability, pathological performance degradation, recoverable corruption

Comments

@jason-da-redpanda
Copy link

jason-da-redpanda commented Feb 3, 2024

Version & Environment

23.2.24

What went wrong?

cloud-storage readers were getting stuck..

More in depth analysis has been done internally (please add/also please change title if it ends up not being chunked reads)..

cloud-storage is creating a reader and this reader is trying to acquire a segment_reader_units semaphore.
This semaphore limits number of segment readers that can be simultaneously created.

Disabling chunk reads via
cloud_storage_disable_chunk_reads=true
and restarting the broker cleared the issue

What should have happened instead?

Don't get stuck when chunked reads are enabled

How to reproduce the issue?

Have not reproduced at this point

Additional information

Messages such as these were observed with DEBUG :
DEBUG 2024-02-02 08:39:49,166 [shard 20] cloud_storage - materialized_resources.cc:379 - Materialized segment {kafka/request/95} base-offset 321713857 is not stale: 2 readers=0

JIRA Link: CORE-1752

@jason-da-redpanda jason-da-redpanda added the kind/bug Something isn't working label Feb 3, 2024
@piyushredpanda piyushredpanda added the area/cloud-storage Shadow indexing subsystem label Feb 3, 2024
@jason-da-redpanda
Copy link
Author

@jason-da-redpanda
Copy link
Author

Variation of this seen ..in recent issue

 ERROR 2024-03-26 13:46:13,269 [shard 4:main] cloud_storage - remote_segment.cc:1470 - remote_segment_batch_reader {"927653cc/kafka/my-topic.traces/1_16024049/14914798-14952049-9074378-23-v1.log.23"} stop operation stuck
  ERROR 2024-03-26 13:46:13,269 [shard 4:main] cloud_storage - remote_partition.cc:920 - Eviction loop for partition {kafka/my-topic.traces/1} stuck
2024-03-26 13:46:13.269	

Disabling chunked reads again mitigates

@abhijat abhijat mentioned this issue Apr 4, 2024
6 tasks
@dotnwat dotnwat added the sev/high loss of availability, pathological performance degradation, recoverable corruption label Apr 5, 2024
@piyushredpanda
Copy link
Contributor

We now RCAed this fully. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem kind/bug Something isn't working sev/high loss of availability, pathological performance degradation, recoverable corruption
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants