Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report intermediate error messages during request forwarding #20643

Merged
merged 2 commits into from
May 18, 2023

Conversation

hghaf099
Copy link
Contributor

@hghaf099 hghaf099 commented May 17, 2023

OSS version of https://github.com/hashicorp/vault-enterprise/pull/3937

Addresses VAULT-15375
Core elides intermediate error messages during request forwarding.

There are two main places where Alex and I find that we need to report the errors considering request forwarding. One is in the request handling where we initially forward the request upon getting a read only error. Two is in the replication code here.

Having aggregate the errors in the above places, we noticed that the reported error has many duplicates. Below is a sample of the existing behaviour:

    backend_revocation_queue_ent_test.go:391: error revoking leaf cert, Error making API request.
        
        Namespace: ns1/
        URL: PUT https://127.0.0.1:59296/v1/pki/revoke
        Code: 500. Errors:
        
        * 2 errors occurred:
        	* errors from both primary and secondary; primary error was 2 errors occurred:
        	* 2 errors occurred:
        	* error from primary active: failed to write WAL entries for Delta CRLs: failed to write cross-cluster delta WAL entry: error saving delta CRL WAL entry: forwarded writer lacked replication client: cannot write to readonly storage
        	* error from perf secondary active: error persisting cross-cluster revocation request: refusing to write to write-forwarded storage when not the active node: cannot write to readonly storage
        This may occur when the active node of the primary performance replication cluster is unavailable.
        
        
        	* error from a standby node: error persisting cross-cluster revocation request: refusing to write to write-forwarded storage when not the active node: cannot write to readonly storage
        This may occur when the active node of the primary performance replication cluster is unavailable.
        
        ; secondary errors follow
        	* 2 errors occurred:
        	* 2 errors occurred:
        	* error from primary active: failed to write WAL entries for Delta CRLs: failed to write cross-cluster delta WAL entry: error saving delta CRL WAL entry: forwarded writer lacked replication client: cannot write to readonly storage
        	* error from perf secondary active: error persisting cross-cluster revocation request: refusing to write to write-forwarded storage when not the active node: cannot write to readonly storage
        This may occur when the active node of the primary performance replication cluster is unavailable.
        
        
        	* error from a standby node: error persisting cross-cluster revocation request: refusing to write to write-forwarded storage when not the active node: cannot write to readonly storage
        This may occur when the active node of the primary performance replication cluster is unavailable.

It would be very confusing to report such an error to the client. We found that the issue is in respondErrorCommon code where the call back function passed in the errwrap.Walk function will aggregate errors multiple times. Fixing the issue will result in the following reported error:

 backend_revocation_queue_ent_test.go:391: error revoking leaf cert, Error making API request.
        
        Namespace: ns1/
        URL: PUT https://127.0.0.1:63885/v1/pki/revoke
        Code: 500. Errors:
        
         errors from both primary and secondary; primary error was 2 errors occurred:
        	* error from primary active: failed to write WAL entries for Delta CRLs: failed to write cross-cluster delta WAL entry: error saving delta CRL WAL entry: forwarded writer lacked replication client: cannot write to readonly storage
        	* error from perf secondary active: error persisting cross-cluster revocation request: refusing to write to write-forwarded storage when not the active node: cannot write to readonly storage
        
        ; secondary errors follow: error from a standby node: error persisting cross-cluster revocation request: refusing to write to write-forwarded storage when not the active node: cannot write to readonly storage

@hghaf099 hghaf099 requested a review from a team May 17, 2023 22:13
Copy link
Contributor

@cipherboy cipherboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, thank you, Hamid! Great to see this landing. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants