Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ResponsePublisher] add pipeline support #2511

Merged

Conversation

stepanblyschak
Copy link
Contributor

Signed-off-by: Stepan Blyschak stepanb@nvidia.com

DEPENDS ON sonic-net/sonic-swss-common#708

What I did

I added ResponsePublisher support for redis pipeline

Why I did it

I did it to improve performance when sending many responses. Responses are buffered in redis client before beeing sent out to redis server, while orchagent has no more pending tasks to do, responses are flushed to redis.

How I verified it

I did manual tests and verified togather with bgp suppress feature.

Details if related

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
@liat-grozovik
Copy link
Collaborator

@marian-pritsak @prsunny could you please help to review?

ResponsePublisher::ResponsePublisher() : m_db("APPL_STATE_DB", 0)
ResponsePublisher::ResponsePublisher(bool buffered) :
m_db(std::make_unique<swss::DBConnector>("APPL_STATE_DB", 0)),
m_pipe(std::make_unique<swss::RedisPipeline>(m_db.get())),
Copy link
Contributor

@qiluo-msft qiluo-msft Jan 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_db

You are assuming m_pipe initialized after m_db. The sequence is tricky. Suggest move m_pipe to function body. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
m_notifiers[table] = std::make_unique<swss::NotificationProducer>(&m_db, response_channel);
}
swss::NotificationProducer notificationProducer{m_pipe.get(), response_channel, m_buffered};
Copy link
Contributor

@qiluo-msft qiluo-msft Jan 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NotificationProducer

Is it too expensive to create a new NotificationProducer for each function call? #Closed

Copy link
Contributor Author

@stepanblyschak stepanblyschak Jan 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiluo-msft I find it cheap, just arguments copy. It might be cheaper then a hash table lookup, depending on hash function and a load factor, so without it it takes a bit more deterministic time.

In my testing old and new implementation show the same result (measured published responses/sec), probably bound to redis IO anyway.

{
m_tables[table] = std::make_unique<swss::Table>(&m_db, table);
}
swss::Table applStateTable{m_pipe.get(), table, m_buffered};
Copy link
Contributor

@qiluo-msft qiluo-msft Jan 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applStateTable

Is it too expensive to create a new Table for each function call? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the argument is "arguments copy". Seem the constructor is not just arguments copy. For example: TableBase::TableBase() includes a map find.

Copy link
Contributor Author

@stepanblyschak stepanblyschak Jan 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiluo-msft So it's one find with an integer key + arguments copy vs 2 finds with string keys.
I did a benchmark to prove my point and got these results:

static void BM_newResponsePublisher(benchmark::State& state) {
    ResponsePublisher publisher{};

    for (auto _ : state) {
        publisher.publish("SOME_TABLE", "SOME_KEY", {}, ReturnCode(SAI_STATUS_SUCCESS));
    }
}

static void BM_oldResponsePublisher(benchmark::State& state) {
    old::ResponsePublisher publisher{};

    for (auto _ : state) {
        publisher.publish("SOME_TABLE", "SOME_KEY", {}, ReturnCode(SAI_STATUS_SUCCESS));
    }
}

Compiled with:

g++ -std=c++14 -O2 -I ../orchagent/ -I /usr/include/swss -I /usr/include/sai/ ../orchagent/response_publisher.cpp ../orchagent/response_publisher_old.cpp main.cpp -o main -lswsscommon -lhiredis -lbenchmark

Full code is here - https://github.com/stepanblyschak/sonic-swss/blob/response-publisher-benchmarks/benchmarks/main.cpp

stepanb@333f1eb97224:/sonic/src/sonic-swss/benchmarks$ ./main 
2023-01-29T17:42:27+00:00
Running ./main
Run on (12 X 1600 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x12)
  L1 Instruction 32 KiB (x12)
  L2 Unified 256 KiB (x12)
  L3 Unified 15360 KiB (x2)
Load Average: 0.55, 0.32, 0.30
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
BM_newResponsePublisher      61831 ns        46291 ns        14984
BM_oldResponsePublisher      63855 ns        47892 ns        15570

Most of the time it is IO bound of course, but I constantly get a little bit better result with new implementation every time a run the benchmark. So, the caching makes it even a little bit worse and the new implementation is 3.16 % faster. Please note, that both don't use redis pipeline as the new implementation will clearly be a winner by a couple of times.

@qiluo-msft qiluo-msft self-requested a review January 27, 2023 19:43
Copy link
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add new comment

Copy link
Contributor

@mint570 mint570 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How frequent is the OrchDaemon::flush being called?

Also, I saw a PR on doing pipeline on the notification API. Won't that covers this one already since the publisher just uses the notification producer.

@stepanblyschak
Copy link
Contributor Author

How frequent is the OrchDaemon::flush being called?

Also, I saw a PR on doing pipeline on the notification API. Won't that covers this one already since the publisher just uses the notification producer.

@mint570 It is at least every second. It is not just about adding pipeline support to NotificationProducer. Someone must flush the pipeline when it is not full, otherwise events stay in the buffer until the buffer itself is not full.
It completelly follows the design of orchagent's async sairedis mode.

qiluo-msft
qiluo-msft previously approved these changes Jan 30, 2023
@stepanblyschak
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mint570
Copy link
Contributor

mint570 commented Jan 30, 2023

How frequent is the OrchDaemon::flush being called?
Also, I saw a PR on doing pipeline on the notification API. Won't that covers this one already since the publisher just uses the notification producer.

@mint570 It is at least every second. It is not just about adding pipeline support to NotificationProducer. Someone must flush the pipeline when it is not full, otherwise events stay in the buffer until the buffer itself is not full. It completelly follows the design of orchagent's async sairedis mode.

This change will help in the batch cases, but it will take at most 1 second in a single request case.

@stepanblyschak
Copy link
Contributor Author

@mint570 Orch can call Orch::flushResponses() on demand

mint570
mint570 previously approved these changes Jan 31, 2023
@stepanblyschak
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stepanblyschak
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stepanblyschak
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

… response-publisher-pipeline

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
@stepanblyschak
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@qiluo-msft
Copy link
Contributor

            copp_ut.cpp \

Why remove existing test file?


Refers to: tests/mock_tests/Makefile.am:33 in 235bbde. [](commit_id = 235bbde, deletion_comment = True)

@qiluo-msft
Copy link
Contributor

qiluo-msft commented Feb 6, 2023

            warmrestartassist_ut.cpp \

Why remove existing test file?


In reply to: 1419521504


Refers to: tests/mock_tests/Makefile.am:52 in 235bbde. [](commit_id = 235bbde, deletion_comment = True)

@stepanblyschak
Copy link
Contributor Author

@qiluo-msft I didn't touch these files

@stepanblyschak
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stepanblyschak
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stepanblyschak stepanblyschak requested review from qiluo-msft and mint570 and removed request for prsunny, marian-pritsak, qiluo-msft and mint570 February 7, 2023 16:26
@prsunny prsunny merged commit 4df5cab into sonic-net:master Feb 9, 2023
prsunny pushed a commit that referenced this pull request Feb 16, 2023
*Merge remote-tracking branch 'upstream/master' into dash (#2663)
* Modify coppmgr mergeConfig to support preserving copp tables through reboot. (#2548)
* Avoid aborting orchagent when setting TUNNEL attributes (#2591)
* Handle Mac address 'none' (#2593)
* Increase diff coverage to 80% (#2599)
* Add missing parameter to on_switch_shutdown_request method. (#2567)
* Add ZMQ based ProducerStateTable and CustomerStateTable.
* Revert "[voq][chassis]Add show fabric counters port/queue commands (#2522)" (#2611)
* Added new attributes for Vnet and Vxlan ecmp configurations. (#2584)
* added support for monitoring, primary and adv_prefix and overlay_dmac.
* [routesync] Fix for stale dynamic neighbor (#2553)
* [MuxOrch] Enabling neighbor when adding in active state (#2601)
* Changed the BFD default detect multiplier to 10x (#2614)
* Remove TODO comments that are no longer relevant (#2622)
* Fix issue #13341 ARP entry can be out of sync between kernel and APPL_DB if multiple updates are received from RTNL (#2619)
* [refactor]Refactoring sai handle status (#2621)
* Vxlan tunnel endpoint custom monitoring APPL DB table. (#2589)
* added support for monitoring, primary and adv_prefix. changed filter_mac to overlay_dmac
* Data Structures and code to write APP_DB VNET_MONITOR table entries for custom monitoring of Vxlan tunnel endpoints.
* [bfdorch] add local discriminator to state DB (#2629)
* [acl] Add new ACL key BTH_OPCODE and AETH_SYNDROME  (#2617)
* [voq][chassis] Remove created ports from the default vlan. (#2607)
* [EVPN]Handling race condition when remote VNI arrives before tunnel map entry (#2642)
*Added check in remote VNI add to ensure vxlan tunnel map is created before adding the remote end point.
* [test_mux] add sleep in test_NH (#2648)
* [autoneg]Fixing adv interface types to be set when AN is disabled (#2638)
* [hash]: Add UT infra. (#2660)
*Added UT infra for Generic Hash feature
*Aligned PBH tests with Generic Hash UT infra
* [sai_failure_dump]Invoking dump during SAI failure (#2644)
* [ResponsePublisher] add pipeline support  (#2511)
* [dash] Fix compilation issue caused by missing include.
AntonHryshchuk added a commit to AntonHryshchuk/sonic-buildimage that referenced this pull request Feb 22, 2023
Update sonic-swss submodule pointer to include the following:
* f66abed Support for tc-dot1p and tc-dscp qosmap ([sonic-net#2559](sonic-net/sonic-swss#2559))
* 35385ad [RouteOrch] Record ROUTE_TABLE entry programming status to APPL_STATE_DB ([sonic-net#2512](sonic-net/sonic-swss#2512))
* 0704f78 [Workaround] EvpnRemoteVnip2pOrch warmboot check failure ([sonic-net#2626](sonic-net/sonic-swss#2626))
* 4df5cab [ResponsePublisher] add pipeline support  ([sonic-net#2511](sonic-net/sonic-swss#2511))

Signed-off-by: AntonHryshchuk <antonh@nvidia.com>
dprital added a commit to dprital/sonic-buildimage that referenced this pull request Feb 23, 2023
Update sonic-swss submodule pointer to include the following:
* baa302e Do not allow to add port to .1Q bridge while router port deletion is not completed  ([sonic-net#2669](sonic-net/sonic-swss#2669))
* f66abed Support for tc-dot1p and tc-dscp qosmap ([sonic-net#2559](sonic-net/sonic-swss#2559))
* 35385ad [RouteOrch] Record ROUTE_TABLE entry programming status to APPL_STATE_DB ([sonic-net#2512](sonic-net/sonic-swss#2512))
* 0704f78 [Workaround] EvpnRemoteVnip2pOrch warmboot check failure ([sonic-net#2626](sonic-net/sonic-swss#2626))
* 4df5cab [ResponsePublisher] add pipeline support  ([sonic-net#2511](sonic-net/sonic-swss#2511))

Signed-off-by: dprital <drorp@nvidia.com>
prsunny pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Feb 23, 2023
Update sonic-swss submodule pointer to include the following:
* baa302e Do not allow to add port to .1Q bridge while router port deletion is not completed  ([#2669](sonic-net/sonic-swss#2669))
* f66abed Support for tc-dot1p and tc-dscp qosmap ([#2559](sonic-net/sonic-swss#2559))
* 35385ad [RouteOrch] Record ROUTE_TABLE entry programming status to APPL_STATE_DB ([#2512](sonic-net/sonic-swss#2512))
* 0704f78 [Workaround] EvpnRemoteVnip2pOrch warmboot check failure ([#2626](sonic-net/sonic-swss#2626))
* 4df5cab [ResponsePublisher] add pipeline support  ([#2511](sonic-net/sonic-swss#2511))
theasianpianist pushed a commit to theasianpianist/sonic-swss that referenced this pull request Jul 21, 2023
)

*Merge remote-tracking branch 'upstream/master' into dash (sonic-net#2663)
* Modify coppmgr mergeConfig to support preserving copp tables through reboot. (sonic-net#2548)
* Avoid aborting orchagent when setting TUNNEL attributes (sonic-net#2591)
* Handle Mac address 'none' (sonic-net#2593)
* Increase diff coverage to 80% (sonic-net#2599)
* Add missing parameter to on_switch_shutdown_request method. (sonic-net#2567)
* Add ZMQ based ProducerStateTable and CustomerStateTable.
* Revert "[voq][chassis]Add show fabric counters port/queue commands (sonic-net#2522)" (sonic-net#2611)
* Added new attributes for Vnet and Vxlan ecmp configurations. (sonic-net#2584)
* added support for monitoring, primary and adv_prefix and overlay_dmac.
* [routesync] Fix for stale dynamic neighbor (sonic-net#2553)
* [MuxOrch] Enabling neighbor when adding in active state (sonic-net#2601)
* Changed the BFD default detect multiplier to 10x (sonic-net#2614)
* Remove TODO comments that are no longer relevant (sonic-net#2622)
* Fix issue #13341 ARP entry can be out of sync between kernel and APPL_DB if multiple updates are received from RTNL (sonic-net#2619)
* [refactor]Refactoring sai handle status (sonic-net#2621)
* Vxlan tunnel endpoint custom monitoring APPL DB table. (sonic-net#2589)
* added support for monitoring, primary and adv_prefix. changed filter_mac to overlay_dmac
* Data Structures and code to write APP_DB VNET_MONITOR table entries for custom monitoring of Vxlan tunnel endpoints.
* [bfdorch] add local discriminator to state DB (sonic-net#2629)
* [acl] Add new ACL key BTH_OPCODE and AETH_SYNDROME  (sonic-net#2617)
* [voq][chassis] Remove created ports from the default vlan. (sonic-net#2607)
* [EVPN]Handling race condition when remote VNI arrives before tunnel map entry (sonic-net#2642)
*Added check in remote VNI add to ensure vxlan tunnel map is created before adding the remote end point.
* [test_mux] add sleep in test_NH (sonic-net#2648)
* [autoneg]Fixing adv interface types to be set when AN is disabled (sonic-net#2638)
* [hash]: Add UT infra. (sonic-net#2660)
*Added UT infra for Generic Hash feature
*Aligned PBH tests with Generic Hash UT infra
* [sai_failure_dump]Invoking dump during SAI failure (sonic-net#2644)
* [ResponsePublisher] add pipeline support  (sonic-net#2511)
* [dash] Fix compilation issue caused by missing include.
theasianpianist pushed a commit to theasianpianist/sonic-swss that referenced this pull request Jul 25, 2023
)

*Merge remote-tracking branch 'upstream/master' into dash (sonic-net#2663)
* Modify coppmgr mergeConfig to support preserving copp tables through reboot. (sonic-net#2548)
* Avoid aborting orchagent when setting TUNNEL attributes (sonic-net#2591)
* Handle Mac address 'none' (sonic-net#2593)
* Increase diff coverage to 80% (sonic-net#2599)
* Add missing parameter to on_switch_shutdown_request method. (sonic-net#2567)
* Add ZMQ based ProducerStateTable and CustomerStateTable.
* Revert "[voq][chassis]Add show fabric counters port/queue commands (sonic-net#2522)" (sonic-net#2611)
* Added new attributes for Vnet and Vxlan ecmp configurations. (sonic-net#2584)
* added support for monitoring, primary and adv_prefix and overlay_dmac.
* [routesync] Fix for stale dynamic neighbor (sonic-net#2553)
* [MuxOrch] Enabling neighbor when adding in active state (sonic-net#2601)
* Changed the BFD default detect multiplier to 10x (sonic-net#2614)
* Remove TODO comments that are no longer relevant (sonic-net#2622)
* Fix issue #13341 ARP entry can be out of sync between kernel and APPL_DB if multiple updates are received from RTNL (sonic-net#2619)
* [refactor]Refactoring sai handle status (sonic-net#2621)
* Vxlan tunnel endpoint custom monitoring APPL DB table. (sonic-net#2589)
* added support for monitoring, primary and adv_prefix. changed filter_mac to overlay_dmac
* Data Structures and code to write APP_DB VNET_MONITOR table entries for custom monitoring of Vxlan tunnel endpoints.
* [bfdorch] add local discriminator to state DB (sonic-net#2629)
* [acl] Add new ACL key BTH_OPCODE and AETH_SYNDROME  (sonic-net#2617)
* [voq][chassis] Remove created ports from the default vlan. (sonic-net#2607)
* [EVPN]Handling race condition when remote VNI arrives before tunnel map entry (sonic-net#2642)
*Added check in remote VNI add to ensure vxlan tunnel map is created before adding the remote end point.
* [test_mux] add sleep in test_NH (sonic-net#2648)
* [autoneg]Fixing adv interface types to be set when AN is disabled (sonic-net#2638)
* [hash]: Add UT infra. (sonic-net#2660)
*Added UT infra for Generic Hash feature
*Aligned PBH tests with Generic Hash UT infra
* [sai_failure_dump]Invoking dump during SAI failure (sonic-net#2644)
* [ResponsePublisher] add pipeline support  (sonic-net#2511)
* [dash] Fix compilation issue caused by missing include.
theasianpianist pushed a commit to theasianpianist/sonic-swss that referenced this pull request Jul 25, 2023
)

*Merge remote-tracking branch 'upstream/master' into dash (sonic-net#2663)
* Modify coppmgr mergeConfig to support preserving copp tables through reboot. (sonic-net#2548)
* Avoid aborting orchagent when setting TUNNEL attributes (sonic-net#2591)
* Handle Mac address 'none' (sonic-net#2593)
* Increase diff coverage to 80% (sonic-net#2599)
* Add missing parameter to on_switch_shutdown_request method. (sonic-net#2567)
* Add ZMQ based ProducerStateTable and CustomerStateTable.
* Revert "[voq][chassis]Add show fabric counters port/queue commands (sonic-net#2522)" (sonic-net#2611)
* Added new attributes for Vnet and Vxlan ecmp configurations. (sonic-net#2584)
* added support for monitoring, primary and adv_prefix and overlay_dmac.
* [routesync] Fix for stale dynamic neighbor (sonic-net#2553)
* [MuxOrch] Enabling neighbor when adding in active state (sonic-net#2601)
* Changed the BFD default detect multiplier to 10x (sonic-net#2614)
* Remove TODO comments that are no longer relevant (sonic-net#2622)
* Fix issue #13341 ARP entry can be out of sync between kernel and APPL_DB if multiple updates are received from RTNL (sonic-net#2619)
* [refactor]Refactoring sai handle status (sonic-net#2621)
* Vxlan tunnel endpoint custom monitoring APPL DB table. (sonic-net#2589)
* added support for monitoring, primary and adv_prefix. changed filter_mac to overlay_dmac
* Data Structures and code to write APP_DB VNET_MONITOR table entries for custom monitoring of Vxlan tunnel endpoints.
* [bfdorch] add local discriminator to state DB (sonic-net#2629)
* [acl] Add new ACL key BTH_OPCODE and AETH_SYNDROME  (sonic-net#2617)
* [voq][chassis] Remove created ports from the default vlan. (sonic-net#2607)
* [EVPN]Handling race condition when remote VNI arrives before tunnel map entry (sonic-net#2642)
*Added check in remote VNI add to ensure vxlan tunnel map is created before adding the remote end point.
* [test_mux] add sleep in test_NH (sonic-net#2648)
* [autoneg]Fixing adv interface types to be set when AN is disabled (sonic-net#2638)
* [hash]: Add UT infra. (sonic-net#2660)
*Added UT infra for Generic Hash feature
*Aligned PBH tests with Generic Hash UT infra
* [sai_failure_dump]Invoking dump during SAI failure (sonic-net#2644)
* [ResponsePublisher] add pipeline support  (sonic-net#2511)
* [dash] Fix compilation issue caused by missing include.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants