Add proper exception handler to around eviction_stm's call to replicate #12959

graphcareful · 2023-08-23T14:27:16Z

The issue is observed as a bad log line error within a delete-records test. A higher level exception handler in the delete records request handling logic caught an exception and logged it at error level making the test fail.

Looking into the cause of the issue, it was observed that within log_eviction_stm::truncate there's a call to consensus::replicate that does not catch exceptions, which may occur. Note that this is not a severe issue because the code eventually catches the exception and returns the proper error code to the client in its current state.

Fixes #12950

Backports Required

Release Notes

none

- Replicate may throw in the case the gate belonging to the consensus class has closed. The eviction_stm never had proper exception handling to cover this case, this change adds a catch all exception clause around the call to _c->replicate() - Fixes: redpanda-data#12950

- This log is printed when replicate fails, therefore it should be logged at warn (or error) and not info

mmaslankaprv · 2023-08-24T12:44:27Z

src/v/cluster/log_eviction_stm.cc

+          _log.warn,
+          "Replicating prefix_truncate failed with exception: {}",
+          std::current_exception());
+        result = errc::replication_error;


just a quick question, is this going to be propagated to Kafka handler ?

Yeah it eventually will be translated into a kafka::error_code here https://github.com/redpanda-data/redpanda/blob/dev/src/v/kafka/server/replicated_partition.cc#L399 then that is propagated back to the client in the delete_records handler.

From the linked snippet, wouldn't this return a USE? Curious why that instead of explicitly handling the is_shutdown_exception case with raft::errc::shutting_down which results in a (I think) retriable timeout error?

consensus::replicate will return errc::shutting_down as an error code not as an exception so that path is already covered. What was happening was gate_closed_exception was being thrown and not caught, you can see that from the bad log lines error in the PR cover.

Now that I think about it , seems like there's maybe a missing gate closed exception handler clause somewhere within the replicate logic.

BenPope · 2023-11-10T13:20:46Z

Seen on v23.2.x: https://buildkite.com/redpanda/redpanda/builds/40824#018bb8fe-0a15-4e62-8007-4c6b13d9bec0

BenPope · 2023-11-10T13:20:55Z

/backport v23.2.x

Rob Blafford added 2 commits August 23, 2023 10:19

cluster: Modify log from info to warn eviction_stm

a8eda91

- This log is printed when replicate fails, therefore it should be logged at warn (or error) and not info

graphcareful requested review from dotnwat, andrwng and mmaslankaprv August 23, 2023 14:27

github-actions bot added the area/redpanda label Aug 23, 2023

rockwotj approved these changes Aug 23, 2023

View reviewed changes

mmaslankaprv reviewed Aug 24, 2023

View reviewed changes

mmaslankaprv approved these changes Aug 24, 2023

View reviewed changes

graphcareful merged commit a21e8e7 into redpanda-data:dev Aug 24, 2023

This was referenced Nov 10, 2023

[v23.2.x] CI Failure (gate closed) in DeleteRecordsTest.test_delete_records_concurrent_truncations #14880

Closed

[v23.2.x] Add proper exception handler to around eviction_stm's call to replicate #14881

Merged

BenPope mentioned this pull request Nov 10, 2023

[v23.2.x] schema_registry: Support the compatible format for CONFIG value #14875

Merged

graphcareful deleted the eviction-stm-exc-handle branch November 10, 2023 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add proper exception handler to around eviction_stm's call to replicate #12959

Add proper exception handler to around eviction_stm's call to replicate #12959

graphcareful commented Aug 23, 2023 •

edited

Loading

mmaslankaprv Aug 24, 2023

graphcareful Aug 24, 2023

andrwng Aug 29, 2023

graphcareful Aug 29, 2023 •

edited

Loading

BenPope commented Nov 10, 2023

BenPope commented Nov 10, 2023

Add proper exception handler to around eviction_stm's call to replicate #12959

Add proper exception handler to around eviction_stm's call to replicate #12959

Conversation

graphcareful commented Aug 23, 2023 • edited Loading

Backports Required

Release Notes

mmaslankaprv Aug 24, 2023

Choose a reason for hiding this comment

graphcareful Aug 24, 2023

Choose a reason for hiding this comment

andrwng Aug 29, 2023

Choose a reason for hiding this comment

graphcareful Aug 29, 2023 • edited Loading

Choose a reason for hiding this comment

BenPope commented Nov 10, 2023

BenPope commented Nov 10, 2023

graphcareful commented Aug 23, 2023 •

edited

Loading

graphcareful Aug 29, 2023 •

edited

Loading