Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support backup request to alleviate the long tail problem #251

Closed
qinzuoyan opened this issue Jan 9, 2019 · 3 comments
Closed

support backup request to alleviate the long tail problem #251

qinzuoyan opened this issue Jan 9, 2019 · 3 comments
Assignees
Labels
type/enhancement Indicates new feature requests

Comments

@qinzuoyan
Copy link
Contributor

the backup requests will be sent to the secondary replicas.

@neverchanje
Copy link
Contributor

neverchanje commented Feb 10, 2020

Proposal: Backup Request

Background

For now, Pegasus only supports reading from the primary. This proposal aims to support reading from secondary. This may improve the condition where the primary is instable (e.g rebalancing, hotspot writing), while other secondaries can serve with lower latency. So reading secondary will lead to better availablity.

Backup request is also known as "hedge request", which was first introduced in The Tail at Scale Dean and Barroso 2013. BRPC also implements this mechanism.

Design

Backup Request is designed to be directly driven by the client, which means when it wants to read from secondary, it can read immediately without any registration to meta or replica, neither any additional communications to the server.

The client sends a read request, with an optionis_backup_request set in the RPC header, to secondary. When the replica handles the request identifies this option is true, it continues to DB retrieval, otherwise, it responds with ERR_INVALID_STATE.

TODO

@neverchanje
Copy link
Contributor

neverchanje commented Feb 16, 2020

I put the related paragraphs described "hedge request" in "The tail at scale" here:

Hedged requests. A simple way to curb latency variability is to issue the same request to multiple replicas and use the results from whichever replica responds first. We term such requests “hedged requests” because a client first sends one request to the replica believed to be the most appropriate, but then falls back on sending a secondary request after some brief delay. The client cancels remaining outstanding requests once the first result is received. Although naive implementations of this technique typically add unacceptable additional load, many variations exist that give most of the latency-reduction effects while increasing load only modestly.

One such approach is to defer sending a secondary request until the first choosing but rather enqueuing copies of a request in multiple servers simultaneously and allowing the servers to communicate updates on the status of these copies to each other. We call requests where servers perform cross-server status updates “tied requests.” The simplest form of a tied request has the client send the request to two different servers, each tagged with the identity of the other server (“tied”). When a request begins execution, it sends a cancellation message to its counterpart. The corresponding request, if still enqueued in the other server, can be aborted immediately or deprioritized substantially.

There is a brief window of one average network message delay where both servers may start executing the request while the cancellation messages are both in flight to the other server. A common case where this situation can occur is if both server queues are completely empty. It is useful therefore for the client to introduce a small delay of two times the average network message delay (1ms or less in modern data-center networks) between sending the first request and sending the second request.

Google’s implementation of this technique in the context of its cluster-level distributed file system is effective at reducing both median and tail latencies. Table 2 lists the times for servicing a small read request from a BigTable where the data is not cached in memory but must be read from the underlying file system; each file chunk has three replicas on distinct machines. The table includes read latencies observed with and without tied requests for two scenarios: The first is a cluster in which the benchmark is running in isolation, in which case latency variability is mostly from self-interference and regular cluster-management activities. In it, sending a tied request that does cross-server cancellation to another file system replica following 1ms reduces median latency by 16% and is increasingly effective along the tail of the latency distribution, achieving nearly 40% reduction at the 99.9th-percentile latency. The second scenario is like the first except there is also a large, concurrent sorting job running on the same cluster contending for the same disk resources in the shared file system. Although overall latencies are somewhat higher due to higher utilization, similar reductions in the latency profile are achieved with the tied-request technique discussed earlier. The latency profile with tied requests while running a concurrent large sorting job is nearly identical to the latency profile of a mostly idle cluster without tied requests. Tied requests allow the workloads to be consolidated into a single cluster, resulting in dramatic computing cost reductions. In both Table 2 scenarios, the overhead of tied requests in disk utilization is less than 1%, indicating the cancellation strategy is effective at eliminating redundant reads.

An alternative to the tied-request and hedged-request schemes is to probe remote queues first, then submit the request to the least-loaded server. It can be beneficial but is less effective than submitting work to two queues simultaneously for three main reasons: load levels can change between probe and request time; request service times can be difficult to estimate due to underlying system and hardware variability, and clients can create temporary hot spots by all clients picking the same (least- loaded) server at the same time. The Distributed Shortest-Positioning Time First system uses another variation in which the request is sent to one server and forwarded to replicas only if the initial server does not have it in its cache and uses cross-server cancellations.

Worth noting is this technique is not restricted to simple replication but is also applicable in more-complex coding schemes (such as Reed-Solomon) where a primary request is sent to the machine with the desired data block, and, if no response is received following a brief delay, a collection of requests is issued to a subset of the remaining replication group sufficient to reconstruct the desired data, with the whole ensemble forming a set of tied requests.

Note, too, the class of techniques described here is effective only when the phenomena that causes variability does not tend to simultaneously affect multiple request replicas. We expect such uncorrelated phenomena are rather common in large-scale systems.

@levy5307
Copy link
Contributor

levy5307 commented Mar 16, 2020

performance test

set/get operation:

test case enable backup request read/write propotion qps read avg read p95 read p99 read p999 read p9999 write avg write p95 write p99 write p999 write p9999
3-clients 15-threads no 1 : 3 7076 880.6512836149132 428.0 727.0 138495.0 988671.0 2495.0710801540517 6319.0 9023.0 36319.0 531455.0
3-clients 15-threads yes, delay 138ms 1 : 3 6987 1010.1412488662884 403.0  7747.0 138751.0 153599.0 2476.104380444753 6859.0 9119.0 13759.0 185855.0
3-clients 100-threads no 1 : 0 140607 707.98960978 1474.0 2731.0 5511.0 167551.0
3-clients 100-threads yes, delay 5ms 1 : 0 77429 1288.01461934 2935.0 3487.0 6323.0 71743.0 ---- ---- ---- ---- ---
3-clients 30-threads no 30 : 1 87198 306.9600544730426 513.0 805.0 4863.0 28271.0 1369.4669874672938 2661.0 5795.0 22319.0 51359.0
3-clients 30-threads yes, delay 5ms 30 : 1 88541 298.22470022339127 493.0 711.0 4483.0 18479.0 1467.6130963728997 3263.0 6411.0 17439.0 50975.0

Multi-get/Batch-Set operation:

test case enable backup request read/write porpotion qps read avg read p95 read p99 read p999 read p9999 write avg write p95 write p99 write p999 write p9999
3-clients 7-threads no 20 : 1 24113 200.37956913733476 277.0 410.0 2317.0 21647.0 2034.1923768463382 4283.0 6427.0 18271.0 62687.0
3-clients 7-threads yes, deley 2ms 20 : 1 23756 197.48540031650361 268.0 351.0 2173.0 5759.0 2187.199077764627 4531.0 6551.0 21551.0 63999.0
3-clients 15-threads no 20 : 1 30980 236.7482510418767 348.0 526.0 3535.0 25695.0 5361.380053671262 14087.0 20223.0 40639.0 90815.0
3-clients 15-threads yes, delay 3ms 20 : 1 30483 244.1182599024727 386.0 540.0 3105.0 13287.0 5377.992155339365 14119.0 19535.0 31311.0 103103.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Indicates new feature requests
Projects
None yet
Development

No branches or pull requests

3 participants