Quorum driver take server requested retry interval into consideration #17520

halfprice · 2024-05-06T04:33:59Z

Description

ValidatorOverloadedRetryAfter error contains a server suggested retry after duration. So when the quorum driver retries under SystemOverloadRetryAfter error, it should take the suggested retry duration into consideration.

Test plan

Unit tests added.

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

vercel · 2024-05-06T04:34:05Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
sui-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 12, 2024 1:11am

3 Ignored Deployments

Name	Status	Preview	Updated (UTC)
multisig-toolkit	⬜️ Ignored (Inspect)	Visit Preview	May 12, 2024 1:11am
sui-kiosk	⬜️ Ignored (Inspect)	Visit Preview	May 12, 2024 1:11am
sui-typescript-docs	⬜️ Ignored (Inspect)	Visit Preview	May 12, 2024 1:11am

longbowlu · 2024-05-06T22:29:00Z

crates/sui-core/src/authority_aggregator.rs

@@ -1119,7 +1128,8 @@ where
                                    //
                                    // TODO: currently retryable overload and above overload error look redundant. We want to have a unified
                                    // code path to handle both overload scenarios.
-                                    state.retryable_overloaded_stake += weight;
+                                    state.retryable_overload_info.stake += weight;
+                                    state.retryable_overload_info.requested_retry_after = state.retryable_overload_info.requested_retry_after.max(Duration::from_secs(err.retry_after_secs()));


just thinking out loud, should we use 67 percentile of all suggested values instead of the highest one? cuz it could be MAX by a byzantine validator.

Very good point! For a second, I was thinking the malicious can go both ways: by sending a duration that is too big or too small. But I guess the too big case is more damaging since we already perform exponential backoff in the client.

I changed the code to make the retry_after corresponding to a good quorum threshold of validators with the smallest retry after duration.

mystenmark

Looks good!

…eshold of validators

vercel · 2024-05-12T01:10:08Z

@halfprice is attempting to deploy a commit to the Mysten Labs Team on Vercel.

A member of the Team first needs to authorize it.

vercel bot deployed to Preview – sui-docs May 6, 2024 04:35 View deployment

halfprice force-pushed the zhewu/quorum-driver-respect-retry-delay branch from c4e7de1 to 19df793 Compare May 6, 2024 19:59

vercel bot deployed to Preview – sui-docs May 6, 2024 20:00 View deployment

halfprice force-pushed the zhewu/quorum-driver-respect-retry-delay branch from 19df793 to 30bae9d Compare May 6, 2024 20:10

vercel bot deployed to Preview – sui-docs May 6, 2024 20:11 View deployment

halfprice force-pushed the zhewu/quorum-driver-respect-retry-delay branch from 30bae9d to 24fbd0a Compare May 6, 2024 21:11

vercel bot deployed to Preview – sui-docs May 6, 2024 21:13 View deployment

Quorum driver take server requested retry interval into consideration

362aa13

halfprice force-pushed the zhewu/quorum-driver-respect-retry-delay branch from 24fbd0a to 362aa13 Compare May 6, 2024 21:16

vercel bot deployed to Preview – sui-docs May 6, 2024 21:17 View deployment

halfprice requested review from lxfind, longbowlu, mystenmark and mwtian May 6, 2024 21:38

halfprice marked this pull request as ready for review May 6, 2024 21:38

longbowlu reviewed May 6, 2024

View reviewed changes

vercel bot deployed to Preview – sui-docs May 7, 2024 04:34 View deployment

halfprice force-pushed the zhewu/quorum-driver-respect-retry-delay branch from 9871adc to 182f901 Compare May 7, 2024 04:40

vercel bot deployed to Preview – sui-docs May 7, 2024 04:42 View deployment

halfprice requested a review from longbowlu May 7, 2024 04:42

mystenmark approved these changes May 10, 2024

View reviewed changes

Only return retry after duration that corresponding to the quorum thr…

03d600e

…eshold of validators

halfprice force-pushed the zhewu/quorum-driver-respect-retry-delay branch from 182f901 to 03d600e Compare May 12, 2024 01:10

vercel bot deployed to Preview – sui-docs May 12, 2024 01:11 View deployment

halfprice merged commit 93fdf03 into MystenLabs:main May 12, 2024
42 of 45 checks passed

halfprice deleted the zhewu/quorum-driver-respect-retry-delay branch May 12, 2024 14:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quorum driver take server requested retry interval into consideration #17520

Quorum driver take server requested retry interval into consideration #17520

halfprice commented May 6, 2024 •

edited

Loading

vercel bot commented May 6, 2024 •

edited

Loading

longbowlu May 6, 2024

halfprice May 7, 2024

mystenmark left a comment

vercel bot commented May 12, 2024

Quorum driver take server requested retry interval into consideration #17520

Quorum driver take server requested retry interval into consideration #17520

Conversation

halfprice commented May 6, 2024 • edited Loading

Description

Test plan

Release notes

vercel bot commented May 6, 2024 • edited Loading

longbowlu May 6, 2024

Choose a reason for hiding this comment

halfprice May 7, 2024

Choose a reason for hiding this comment

mystenmark left a comment

Choose a reason for hiding this comment

vercel bot commented May 12, 2024

halfprice commented May 6, 2024 •

edited

Loading

vercel bot commented May 6, 2024 •

edited

Loading