-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI Failure (KgoVerifier failed waiting for worker) in CloudStorageChunkReadTest.test_read_when_cache_smaller_than_segment_size
#11449
Comments
In this one the consumer seems completely stuck:
|
Ugh, that is sev/high then, @abhijat ? |
No, I think it might a problem with the consumer, but will be able to say more after investigating |
The consumer doesn't seem to send any fetch requests to broker:
|
Compared with a passing run of the test where the fetch requests are sent to broker and processed and responded to:
|
|
There seem to be a couple of reasons for the recent increased frequency of this test failing:
With the change in coarse index entry calculation #11705 we are creating chunks much closer to the requested size of 1MB:
Before the above change, an offset index with 0 rows (see An entire segment being treated as a chunk results in the consumer being quite fast because it just has to download the segment once, overshooting the cache size (for this test the cache is 1MiB and segment is 5MiB) but resulting in very few calls to cloud storage. With the new calculation we are actually downloading 1MiB chunks, thus resulting in more calls to cloud storage and having to wait after each chunk is finished to get the next one. Additionally with a 1MiB chunk and only 4MiB cache, we now also see frequent throttling of cache trimming:
These 5 second waits slow down the consumer further. A fix here would be to change the assertions so that we ensure that we are able to read data from cloud storage, but not necessarily to the finish. Since the purpose of the test is to ensure that the read path can work in the extreme case where cache is smaller than segment, being able to read from the cache should be enough. |
https://buildkite.com/redpanda/redpanda/builds/31319#0188bbd1-22a0-41f9-b16e-153d93522399
failure similar to #9544
The text was updated successfully, but these errors were encountered: