Skip to content
This repository has been archived by the owner on May 10, 2022. It is now read-only.

support backup request on client side #84

Closed
levy5307 opened this issue Jan 6, 2020 · 5 comments
Closed

support backup request on client side #84

levy5307 opened this issue Jan 6, 2020 · 5 comments
Labels
enhancement New feature or request

Comments

@levy5307
Copy link
Contributor

levy5307 commented Jan 6, 2020

The proposal resides in apache/incubator-pegasus#251.

一、背景

Backup request功能可以优化业务在服务抖动时读延迟的长尾问题,适合对一致性要求低的用户。

二、设计

  1. 增加一个配置项,让用户选择是否启用backup request功能。
  2. 在ReplicaConfiguration中添加Map<rpc_address, ReplicaSession> allSessions成员,保存所有的session。
  3. 在TableHandler::initTableConfiguration中:
    1. 如果没有开启backup request,则只创建primary副本的session连接放入allSessions。
    2. 如果开启了backup reqeust,secondary也同样要创建连接, 将所有创建的session存入allSessions。
  4. 在TableHandler::call中:
    1. 如果是写操作,则要沿用已有逻辑,将请求发往primary。
    2. 如果是读操作,则向allSessions中的所有session都发送请求, 只接收最快的response,其他都忽略(backup reqeust未开启时,allSessions中只有primary session,也相当于只向primary发送)。
      1. 增加is_success(boolean类型), 初始状态下为false,表示请求还没有成功返回,并将其传递给TableHandler::onRpcReply。
      2. 如果isEmpty(),说明还没有和该partition建立好连接,则tryQueryMeta并重新建立连接。
  5. 在TableHandler::onRpcReply中,处理请求response:
    1. 如果is_success=true,则说明之前已经有response返回了,则直接忽略。
    2. 如果is_success=false, 说明该response是第一个返回的, 则按照之前的逻辑处理该请求,处理完之后设置is_success=true。

备注:

  • 需要对is_success的处理进行加锁。
  • 需要对client_operator中添加一个字段,标记是读操作还是写操作。
  • 在4和5中,对于是否开启了backup request是透明的,无需添加额外逻辑判断。只需要在3中创建session时做一下判断。
@neverchanje neverchanje changed the title backup request方案讨论 backup request on client side Feb 15, 2020
@neverchanje neverchanje changed the title backup request on client side support backup request on client side Feb 15, 2020
@neverchanje neverchanje added the enhancement New feature or request label Feb 15, 2020
@neverchanje
Copy link

neverchanje commented Feb 15, 2020

如果是读操作,则向allSessions中的所有session都发送请求, 只接收最快的response,其他都忽略(backup reqeust未开启时,allSessions中只有primary session,也相当于只向primary发送)。

增加is_success(boolean类型), 初始状态下为false,表示请求还没有成功返回,并将其传递给TableHandler::onRpcReply。

如果isEmpty(),说明还没有和该partition建立好连接,则tryQueryMeta并重新建立连接。

Your first idea is to broadcast read requests to all the replicas in a group. Apparently, this is unacceptable. It will result in 3 times read throughput than previous. The overall latency will certainly increase by the higher load. Actually, the most difficult part of "backup request" is to make overhead as minimum as possible. Tripling or doubling the workload should not be our option.

One scheme ("hedged request") is to defer the secondary request for a short period, which is often the desired p999 latency, 15ms eg. If the first request can't get replied within the period, the second request is sent. Theoretically, this solution requires only 0.1% additional load, which is cost-effective for our latency-sensitive users. This scheme is easy to implement, but one problem is the user has to learn how to appropriately set the "period". In BRPC's implementation of hedged request, it's an option called backup_request_ms. Maybe we can make it adaptive in the future.

For more readings, I copied here the paragraphs related to "hedged request" in "The tail at scale". Take a look.

@levy5307
Copy link
Contributor Author

levy5307 commented Feb 18, 2020

There are two ways to implement backup request.

Hedged requests

A client first sends one request to the replica believed to be the most appropriate, but then falls back on sending a secondary request after the first request has been outstanding for more than the 95th-percentile(or 99th-percentline, etc) expected latency. The client cancels remaining outstanding requests once the first result is received. This approach limits the additional load to approximately 5%(1%) while substantially shortening the latency tail. This approach limits the additional load to approximately 5%(1%) while substantially shortening the latency tail.

This approach limits the benefits to only a small fraction of requests(the tail of the latency distribution).

Tied requests

the client send the request to two different servers, each tagged with the identity of the other server (“tied”). When a request begins execution, it sends a cancellation message to its counterpart. The corre- sponding request, if still enqueued in the other server, can be aborted imme- diately or deprioritized substantially.

There is another variation in which the request is sent to one server and forwarded to replicas only if the ini- tial server does not have it in its cache and uses cross-server cancellations.

This approach limits the benefits to not only the tail of the latency, but also median latency distribution. But it result in higher network load.

@neverchanje
Copy link

neverchanje commented Feb 18, 2020

My first choice is "hedged request". Because it's apparently simpler. We can leave optimization later after the initial version.

To dig deeper into the final design, there're still several problems remain:

then falls back on sending a secondary request after the first request has been outstanding for more than the 95th-percentile(or 99th-percentile, etc) expected latency.

Since we have 2 secondaries, we can choose randomly one in 50:50 for the second request.

Another question is, how to design the API for configuring the period waiting for the secondary request (call it backup_request_ms)?

One way I suggest is to add an argument in PegasusClient.openTable, and passes backup_request_ms to it.

@levy5307
Copy link
Contributor Author

levy5307 commented Mar 16, 2020

It is more effective to send to one sencondary randomly than send to all of the secondaries. Because according to performance test, send to all of the two secondaries will increase p95. Because send to all secondaries will increase server load a lot.

@levy5307
Copy link
Contributor Author

levy5307 commented Mar 16, 2020

performance test

set/get operation:

test case enable backup request qps read/write propotion read avg read p95 read p99 read p999 read p9999 write avg write p95 write p99 write p999 write p9999
3-clients 15-threads no 1 : 3 7076 880.6512836149132 428.0 727.0 138495.0 988671.0 2495.0710801540517 6319.0 9023.0 36319.0 531455.0
3-clients 15-threads yes, delay 138ms 1 : 3 6987 1010.1412488662884 403.0  7747.0 138751.0 153599.0 2476.104380444753 6859.0 9119.0 13759.0 185855.0
3-clients 100-threads no 1 : 0 140607 707.98960978 1474.0 2731.0 5511.0 167551.0
3-clients 100-threads yes, delay 5ms 1 : 0 77429 1288.01461934 2935.0 3487.0 6323.0 71743.0 ---- ---- ---- ---- ---
3-clients 30-threads no 30 : 1 87198 306.9600544730426 513.0 805.0 4863.0 28271.0 1369.4669874672938 2661.0 5795.0 22319.0 51359.0
3-clients 30-threads yes, delay 5ms 30 : 1 88541 298.22470022339127 493.0 711.0 4483.0 18479.0 1467.6130963728997 3263.0 6411.0 17439.0 50975.0

Multi-get/Batch-Set operation:

test case enable backup request qps read/write porpotion read avg read p95 read p99 read p999 read p9999 write avg write p95 write p99 write p999 write p9999
3-clients 7-threads no 20 : 1 24113 200.37956913733476 277.0 410.0 2317.0 21647.0 2034.1923768463382 4283.0 6427.0 18271.0 62687.0
3-clients 7-threads yes, deley 2ms 20 : 1 23756 197.48540031650361 268.0 351.0 2173.0 5759.0 2187.199077764627 4531.0 6551.0 21551.0 63999.0
3-clients 15-threads no 20 : 1 30980 236.7482510418767 348.0 526.0 3535.0 25695.0 5361.380053671262 14087.0 20223.0 40639.0 90815.0
3-clients 15-threads yes, delay 3ms 20 : 1 30483 244.1182599024727 386.0 540.0 3105.0 13287.0 5377.992155339365 14119.0 19535.0 31311.0 103103.0

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants