`io.camunda.zeebe.broker.client.api.BrokerErrorException`: Received error from broker (`INTERNAL_ERROR`): Processing paused for partition '3' #22928

korthout · 2024-10-01T14:10:03Z

Describe the bug

io.camunda.zeebe.broker.client.api.BrokerErrorException: Received error from broker (INTERNAL_ERROR): Processing paused for partition '3'

This error was logged about 40 times in a short timespan.

To Reproduce

Not sure, but likely it involves pausing a partition during processing.

Expected behavior

This is probably just noise. We should log that the partition is paused at INFO level and:

no longer accept requests for a paused partition
log at WARN level information about the requests that can no longer be handled

Log/Stacktrace

Full logs are available on Google Drive.

Full Stacktrace

io.camunda.zeebe.broker.client.api.BrokerErrorException: Received error from broker (INTERNAL_ERROR): Processing paused for partition '3'

at io.camunda.zeebe.broker.client.impl.BrokerRequestManager.handleResponse ( io/camunda.zeebe.broker.client.impl/BrokerRequestManager.java:195 )
at io.camunda.zeebe.broker.client.impl.BrokerRequestManager.lambda$sendRequestInternal$2 ( io/camunda.zeebe.broker.client.impl/BrokerRequestManager.java:144 )
at io.camunda.zeebe.scheduler.future.FutureContinuationRunnable.run ( io/camunda.zeebe.scheduler.future/FutureContinuationRunnable.java:28 )
at io.camunda.zeebe.scheduler.ActorJob.invoke ( io/camunda.zeebe.scheduler/ActorJob.java:85 )
at io.camunda.zeebe.scheduler.ActorJob.execute ( io/camunda.zeebe.scheduler/ActorJob.java:42 )
at io.camunda.zeebe.scheduler.ActorTask.execute ( io/camunda.zeebe.scheduler/ActorTask.java:122 )
at io.camunda.zeebe.scheduler.ActorThread.executeCurrentTask ( io/camunda.zeebe.scheduler/ActorThread.java:130 )
at io.camunda.zeebe.scheduler.ActorThread.doWork ( io/camunda.zeebe.scheduler/ActorThread.java:108 )
at io.camunda.zeebe.scheduler.ActorThread.run ( io/camunda.zeebe.scheduler/ActorThread.java:227 )

Environment:

OS:
Zeebe Version:
Configuration:

The text was updated successfully, but these errors were encountered:

npepinpe · 2024-10-03T12:01:33Z

INTERNAL_ERROR is also likely the wrong error code, honestly. We probably want to return something like UNAVAILABLE, indicating the system is currently unavailable, but may become so eventually. This, or something like INVALID_STATE, or FAILED_PRECONDITION. I would opt for unavailable, personally.

I would propose introducing a new error code, which is mapped to 503/SERVICE UNAVAILABLE (for REST) and 14/UNAVAILABLE (for gRPC), and not logged as an error but debug (as with other temporarily unavailable things).

I would also argue this is more of a rejection than an error. There is no error here, and the command may be well formed, we're simply rejecting to process it. However, I understand we only have an ErrorResponseWriter in the command API, and it would be quite a bit of refactoring to return a rejection here 🤷

So acceptance criteria are:

Add a new PARTITION_UNAVAILABLE error code (as in, to the SBE generated ErrorCode enum), which is documented as meaning that the command cannot be processed because the processor is temporarily unavailable.
Update the CommandApiRequestHandler to return this error, instead of the current INTERNAL_ERROR.
Map PARTITION_UNAVAILABLE error code in GrpcErrorMapper such that the error is logged as debug, and the mapped gRPC error is UNAVAILABLE.
Map PARTITION_UNAVAILABLE error code in RestErrorMapper such that the error is logged as debug, and the mapped HTTP code is 503 (SERVICE_UNAVAILABLE).

As far as tests go, you should write a (parameterized) integration/QA regression test which verifies the above behavior (e.g. create node, pause processing, send request, make sure you get appropriate code), and some unit tests for the mappers. Unit tests alone are likely not enough here as we want to ensure that pausing the processing causes such errors to return. Use the RegressionTest annotation :)

…essing (#23654) ## Description Create a new `PARTITION_UNAVAILABLE` error code corresponding to when a partition pauses processing requests. ## Checklist - [x] Add a new `PARTITION_UNAVAILABLE` error code (as in, to the SBE generated `ErrorCode` enum), which is documented as meaning that the command cannot be processed because the processor is temporarily unavailable. - [x] Update the `CommandApiRequestHandler` to return this error, instead of the current `INTERNAL_ERROR`. - [x] Map `PARTITION_UNAVAILABLE` error code in `GrpcErrorMapper` such that the error is logged as debug, and the mapped gRPC error is `UNAVAILABLE`. - [x] Map `PARTITION_UNAVAILABLE` error code in `RestErrorMapper` such that the error is logged as debug, and the mapped HTTP code is 503 (`SERVICE_UNAVAILABLE`). - [x] Write test for `GrpcErrorMapper` - [x] Write test for `RestErrorMapper` in `ErrorMapperTest` - [x] Updated integration/QA regression tests ## Related issues closes #22928

ana-vinogradova-camunda · 2025-02-14T15:08:01Z

Happened again here
Please feel free to let me know if you think it is a different issue.

korthout added kind/bug Categorizes an issue or PR as a bug component/zeebe Related to the Zeebe component/team labels Oct 1, 2024

npepinpe added the severity/low Marks a bug as having little to no noticeable impact for the user label Oct 3, 2024

filipecampos self-assigned this Oct 4, 2024

filipecampos mentioned this issue Oct 16, 2024

feat: Add PARTITION_UNAVAILABLE error code when partition pauses processing #23654

Merged

7 tasks

filipecampos closed this as completed in #23654 Oct 22, 2024

camundait added the version:8.7.0-alpha1 Label that represents issues released on verions 8.7.0-alpha1 label Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`io.camunda.zeebe.broker.client.api.BrokerErrorException`: Received error from broker (`INTERNAL_ERROR`): Processing paused for partition '3' #22928

`io.camunda.zeebe.broker.client.api.BrokerErrorException`: Received error from broker (`INTERNAL_ERROR`): Processing paused for partition '3' #22928

korthout commented Oct 1, 2024 •

edited

Loading

npepinpe commented Oct 3, 2024 •

edited by filipecampos

Loading

ana-vinogradova-camunda commented Feb 14, 2025

io.camunda.zeebe.broker.client.api.BrokerErrorException: Received error from broker (INTERNAL_ERROR): Processing paused for partition '3' #22928

io.camunda.zeebe.broker.client.api.BrokerErrorException: Received error from broker (INTERNAL_ERROR): Processing paused for partition '3' #22928

Comments

korthout commented Oct 1, 2024 • edited Loading

npepinpe commented Oct 3, 2024 • edited by filipecampos Loading

ana-vinogradova-camunda commented Feb 14, 2025

`io.camunda.zeebe.broker.client.api.BrokerErrorException`: Received error from broker (`INTERNAL_ERROR`): Processing paused for partition '3' #22928

`io.camunda.zeebe.broker.client.api.BrokerErrorException`: Received error from broker (`INTERNAL_ERROR`): Processing paused for partition '3' #22928

korthout commented Oct 1, 2024 •

edited

Loading

npepinpe commented Oct 3, 2024 •

edited by filipecampos

Loading