-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xcvrd crash and restart should not cause link flap on platforms needing custom NPU SI settings #541
Xcvrd crash and restart should not cause link flap on platforms needing custom NPU SI settings #541
Conversation
…ng custom SI settings Signed-off-by: Mihir Patel <patelmi@microsoft.com>
@@ -2437,6 +2439,30 @@ def wait_for_port_config_done(self, namespace): | |||
if key in ["PortConfigDone", "PortInitDone"]: | |||
break | |||
|
|||
""" | |||
Initialize NPU_SI_SETTINGS_SYNC_STATUS_KEY field in STATE_DB PORT_TABLE|<lport> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mihirpat1 where do we delete the key when optics is removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor We don't delete the key when optics is removed. We instead mark it as NPU_SI_SETTINGS_DEFAULT
.
This change is at line 2163-2164 in xcvrd.py.
@bingwang-ms Can you please help to cherry-pick this PR to 202405 branch? |
…ng custom NPU SI settings (sonic-net#541) * Xcvrd crash and restart should not cause link flap on platforms needing custom SI settings Signed-off-by: Mihir Patel <patelmi@microsoft.com> * Improved code coverage --------- Signed-off-by: Mihir Patel <patelmi@microsoft.com>
Cherry-pick PR to 202405: #547 |
…ng custom NPU SI settings (#541) * Xcvrd crash and restart should not cause link flap on platforms needing custom SI settings Signed-off-by: Mihir Patel <patelmi@microsoft.com> * Improved code coverage --------- Signed-off-by: Mihir Patel <patelmi@microsoft.com>
Description
Ensure Xcvrd crash and restart should be handled gracefully on platforms that need custom SI settings.
The goal is to avoid link flap on platforms that need custom SI settings during xcvrd restart.
Motivation and Context
Changeset 1
The changeset is partial implementation of the changes required for sonic-net/SONiC#1432.
Specifically, the current change adds
NPU_SI_SETTINGS_SYNC_STATUS
key to the PORT_TABLE of STATE_DB. Following is the usage ofNPU_SI_SETTINGS_SYNC_STATUS
keyPlease note that the above table needs to be further modified while implementing complete OA crash HLD. The table assumes NPU SI settings are applied from SfpStateUpdateTask thread always (unlike the HLD wherein CMIS transceivers have the NPU SI settings being sent from CmisManagerTask).
Changeset 2
In addition to the above change,
TRANSCEIVER_INFO
table will not be deleted as part of xcvrd deinit.On some platforms,
TRANSCEIVER_INFO
table is used to detect transceiver presence to handlehost_tx_ready
behavior and hence, deletingTRANSCEIVR_INFO
table will causehost_tx_ready
to change tofalse
during xcvrd shutdown.Impact due to skipping TRANSCEIVER_INFO table deletion
Changeset 3
PHYSICAL_PORT_NOT_EXIST
has been defined in media_settings_parser.py to prevent crash due to undefined value.When the media_settings_parser.py file was created, the below line was moved from xcvrd.py. However, the definition of
PHYSICAL_PORT_NOT_EXIST
was never ported. Hence, adding the definition now.sonic-platform-daemons/sonic-xcvrd/xcvrd/xcvrd_utilities/media_settings_parser.py
Line 280 in 8c89f6b
How Has This Been Tested?
On platforms which do not restart xcvrd upon swss/syncd crash/restart, following behavior is observed
The first xcvrd crash after swss/syncd restarts will cause a link flap for ports which require NPU SI settings.
This is expected since as part of swss crash handling, the entire PORT_TABLE of STATE_DB is deleted.
Since xcvrd is not restarted as part of swss/syncd crash handling, the NPU_SI_SETTINGS_SYNC_STATUS field is never populated. Hence, when xcvrd crashes/restarts first time after swss/syncd crash is triggered, xcvrd will update the NPU SI settings in the APPL_DB which in turn forces the port to go through a link flap while OA configures the NPU SI settings
Test summary
Additional Information (Optional)
MSFT ADO - 29278409