Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v24.2.x] CORE-8082 cloud_io: add missing error handling to #24080

Merged

Conversation

pgellert
Copy link
Contributor

@pgellert pgellert commented Nov 8, 2024

Backport of PR #24059

Fixes #24076

Cherry pick conflicts:

  • remote.cc has moved to a different path

@pgellert pgellert added this to the v24.2.x-next milestone Nov 8, 2024
@pgellert pgellert added the kind/backport PRs targeting a stable branch label Nov 8, 2024
@pgellert pgellert self-assigned this Nov 8, 2024
@pgellert pgellert requested review from Lazin, nvartolomei, a team and michael-redpanda and removed request for a team November 8, 2024 15:51
@pgellert pgellert marked this pull request as ready for review November 8, 2024 16:34
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Nov 8, 2024

the below tests from https://buildkite.com/redpanda/redpanda/builds/57850#01930ca1-436d-4a31-8472-4ad835437a5a have failed and will be retried

gtest_raft_rpunit

the below tests from https://buildkite.com/redpanda/redpanda/builds/57893#01931a48-90e4-4dfd-b982-b89bfe05b33d have failed and will be retried

partition_balancer_simulator_test_rpunit

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Nov 8, 2024

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/57850#01930ce4-eb4b-4808-ad55-fd762312e833:

"rptest.tests.archive_retention_test.CloudArchiveRetentionTest.test_delete.cloud_storage_type=CloudStorageType.ABS.retention_type=retention.ms"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/57893#01931a8b-4f31-46c7-aa82-ff40b8069da4:

"rptest.tests.partition_balancer_test.PartitionBalancerTest.test_unavailable_nodes"

@vbotbuildovich
Copy link
Collaborator

Retry command for Build#57850

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/archive_retention_test.py::CloudArchiveRetentionTest.test_delete@{"cloud_storage_type":2,"retention_type":"retention.ms"}

This is to allow passing in a `retry_chain_logger` which does not
inherit from `ss::logger` but wraps it.

(cherry picked from commit f388831)
The call to `drain_response_stream` may throw various transport related
errors (see one example below of a Broken Pipe error observed in CI).
These errors should be handled inside the `remote::download_object`
method because the caller's expectation is that download-related errors
are communicated via the `download_result` return type rather than
through an exception. Some of these errors (like the broken pipe error
below) could also be retried, whereas with the previous implementation
they were not retried.

These exceptions are often ignored by the caller and may be printed as
"Exceptional future ignored" log lines, which cause CI failures and are
less useful for debugging.

The below is an example of one such ignored exceptional future in the
remote partition finalizing background fibre:
```
INFO  2024-10-29 12:41:17,708 [shard 1:main] cloud_storage - [fiber474 kafka/fuzzy-operator-6356-dzxvff/4] - remote_partition.cc:1406 - Finalizing remote storage state...
DEBUG 2024-10-29 12:41:17,723 [shard 1:main] cloud_io - [fiber819~0|1|19984ms] - remote.cc:430 - Receive OK response from "37836c6f-30b0-482f-bb4e-0f3dffdb5cbe/meta/kafka/fuzzy-operator-6356-dzxvff/1_3447/manifest.bin"
WARN  2024-10-29 12:41:17,723 [shard 1:main] http - /37836c6f-30b0-482f-bb4e-0f3dffdb5cbe/meta/kafka/fuzzy-operator-6356-dzxvff/1_3447/manifest.bin - client.cc:414 - receive error std::__1::system_error (error generic:32, System error during SSL read: [error:FFFFFFFF80000020:system library::Broken pipe]: Broken pipe)
WARN  2024-10-29 12:41:17,723 [shard 1:main] seastar - Exceptional future ignored: std::__1::system_error (error generic:32, System error during SSL read: [error:FFFFFFFF80000020:system library::Broken pipe]: Broken pipe), backtrace: 0xa73be23 0xa392e05 0x360a6b8 0x9352157 0x360a71a 0xa48cc6f 0xa49045c 0xa4e77ca 0xa402f3f /opt/redpanda/lib/libc.so.6+0x961b6 /opt/redpanda/lib/libc.so.6+0x11839b
```

(cherry picked from commit ad14537)
@pgellert pgellert force-pushed the manual-backport-24059-v24.2.x-793 branch from 52f30a9 to 453ea58 Compare November 11, 2024 08:11
@pgellert
Copy link
Contributor Author

force-push: noop; to rebase to the latest of v24.2.x and drop the merge commit (52f30a9) from the branch

@vbotbuildovich
Copy link
Collaborator

Retry command for Build#57893

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/partition_balancer_test.py::PartitionBalancerTest.test_unavailable_nodes

@pgellert pgellert merged commit d2fe194 into redpanda-data:v24.2.x Nov 11, 2024
16 checks passed
@piyushredpanda piyushredpanda modified the milestones: v24.2.x-next, v24.2.12 Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants