Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix scenarios where error messages aren't propagated correctly #424

Merged
merged 3 commits into from
Dec 12, 2024

Conversation

felippeduran
Copy link
Member

Currently, there are three scenarios where pitaya errors aren't propagated correctly:

  • When pitaya errors are wrapped: errors.NewError doesn't make proper use of errors.As, triggering the issue.
  • When using a custom serializer: in this case, utils.GetErrorFromPayload looks for explicit implementations, breaking polymorphism and inducing the bug.
  • When using a json serializer: this was caused by utils.GetErrorPayload, that would convert any error to the protobuf version of a pitaya error, which would then fail to get parsed by utils.GetErrorFromPayload

For my specific case, I encountered these issues when trying to return pitaya errors in the routing function. The first one would convert it to a generic pitaya error Internal before returning to clients, and the other two would make error messages not propagate correctly to metrics, having errors as Unknown in Prometheus.

This MR fixes all three scenarios, with the following changes:

  • utils.GetErrorFromPayload now always assumes error payloads are in protobuf schema. This is aligned with the utils.GetErrorPayload, which always serializes with the same schema.
  • errors.NewError now uses errors.As when trying to merge pitaya errors.

I also took the time to improve some test cases, implement a few more and test things against pitaya-cli.

Notes:
An alternative would be to make the serialization methods use a non-protobuf schema for JSON. However, the C# PitayaClient always assumes errors to be in the protobuf schema, with code and msg (see here). I don't know if that was intentional, or an accidental behavior due to the previously asymmetric utils.GetErrorFromPayload and utils.GetErrorPayload.

Copy link
Collaborator

@hspedro hspedro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hspedro hspedro merged commit c37f8a0 into v2 Dec 12, 2024
@hspedro hspedro deleted the fix/routing-error-propagation branch December 12, 2024 14:48
hspedro pushed a commit that referenced this pull request Dec 16, 2024
* Improve tests for agent and service/remote to expose bad behaviors
* Breakdown test into multiple ones to make them more clear
* Fix scenarios where error messages aren't propagated correctly

---------

Co-authored-by: Felippe Durán <felippe.duran@wildlifestudios.com>
hspedro added a commit that referenced this pull request Dec 18, 2024
…#425)

* Fix scenarios where error messages aren't propagated correctly (#424)

* Improve tests for agent and service/remote to expose bad behaviors
* Breakdown test into multiple ones to make them more clear
* Fix scenarios where error messages aren't propagated correctly

---------

Co-authored-by: Felippe Durán <felippe.duran@wildlifestudios.com>

* fix(groups): add cancelFunc and Close callback

When running tests we need to ensure a graceful cleanup by
terminating/closing the context by calling the Close() method,
which should call the cancelFunc of the group service context.
This should graceful handle goroutines like groupTTLCleanup
with a sane context cancellation

---------

Co-authored-by: Felippe Durán <felippe.duran@gmail.com>
Co-authored-by: Felippe Durán <felippe.duran@wildlifestudios.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants