Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial draft of the batch requests API HLD. #959

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mint570
Copy link
Collaborator

@mint570 mint570 commented Mar 9, 2022

Initial draft of the batch requests API HLD.

Repo PR title State
sonic-swss-common Added batched APIs in ProducerStateTable  GitHub issue/pull request detail
sonic-pins Use batching API to send requests to the AppDb tables  GitHub issue/pull request detail
sonic-buildimage [submodule] Advance sonic-p4rt/sonic-pins pointer  GitHub issue/pull request detail

@mint570 mint570 marked this pull request as ready for review March 10, 2022 18:37
* In the current proposed APIs, a batched request can only contain similar entries - for example - either all SET or all DEL entries. A mix of SET and DEL in a batch is not supported.
* A batched request can only apply to a single APPL DB table. Requests to different tables cannot be batched.
* Order is not maintained in a batch request. The entries within a single batched request will be handled in any order.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this not cause a problem for SET case, if order is not maintained for SET, two sets on same object can result in random state at the end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two sets of the same object will have the final result of the combined attributes.

Order-less is the limitation of the current ProducerStateTable APIs, for both batch and non-batch APIs. Even in the non-batch APIs, if the producer writes multiple requests in a short time, the consumer might pop multiple entries at the time it wakes up. The reason of the order-less is that the API uses a Redis set to store the new keys, which is order-less.

@marian-pritsak
Copy link
Contributor

What is the measured performance difference? And what are the memory usage implications?

@mint570
Copy link
Collaborator Author

mint570 commented Mar 17, 2022

What is the measured performance difference? And what are the memory usage implications?

About performance difference, our data shows about 8% improvement in batch write into the DB. This is just measuring the time to write to the APPL DB. End-to-end performance improvement will be even better if the orchagent supports bulk SAI calls. (But that is also platform dependent.)
About memory usage, there should be no difference than the existing non-batch API, since all the entries will eventually be written in Redis DB.


## Overview

When an application writes an entry into APPL DB, it uses the ProducerStateTable API. Currently, the ProducerStateTable API does not support batched writes. In the current form, the application needs to call the API multiple times for programming a batch of requests. This HLD proposes a new API in ProducerStateTable that can allow applications to program multiple entries in a single invocation. This can reduce the number of Redis DB notifications and potentially reduce the latency for responses when the response path is enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain "potentially reduce the latency for responses when the response path is enabled"?

  1. "response path" is not well defined.
  2. For latency issue, there is a pipeline feature developed. Please investigate if it fits your use case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Response path is referring to Initial version of APPL STATE DB & Response Path HLD #846. It is end-to-end sync mode from the application to the hardware. Should we add a reference here? The HLD pr is not merged yet.
  2. The latency reduction in on orchagent making bulk SAI call. (For example, the p4orch route manager makes bulk SAI calls to program routes.) This feature will send one final notification at the end. Comparing to the individual calls, which will send notification for each call, this new batch API will prevent orchagent to wake up prematurely, so that orchagent can take full advantage of the batch in the bulk SAI call.

qiluo-msft pushed a commit to sonic-net/sonic-swss-common that referenced this pull request Apr 14, 2022
swss-common: Added batched APIs in ProducerStateTable.
HLD: sonic-net/SONiC#959
@yxieca yxieca force-pushed the master branch 2 times, most recently from 8498931 to 8837dc2 Compare April 15, 2022 16:51
@mint570
Copy link
Collaborator Author

mint570 commented May 10, 2022

@bocon13
I saw you had a comment but I couldn't see it here.
The prs in your comments are not related to the batch requests.

itamar-talmon pushed a commit to itamar-talmon/sonic-swss-common that referenced this pull request Jul 19, 2022
swss-common: Added batched APIs in ProducerStateTable.
HLD: sonic-net/SONiC#959
@yuxuehong
Copy link

The previous design uses m_queueLength to ensure that each data will be completely consumed. Of course, this will also lead to a certain performance, because after the pop has been completed, the follow-up still needs to have io operations with redis server. This batch design, the scenario which the amount of data is greater than the amount that orch can pop, cause it will only be published once, and will there be residual data that is not consumed by the orch?

@mint570
Copy link
Collaborator Author

mint570 commented Mar 3, 2023

Orch will pop all data available: https://github.com/sonic-net/sonic-swss/blob/ebe8de73e74cf3d9fd298b8b74b5da797c889c40/orchagent/orch.cpp#L228

Of course, that might have other issues like starvation, which is what the pop size is for.
In a real scenario, the user will need to use a pop size that has balance of performance and scheduling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants