From f4533cded1e0152d2baa18cc9e2a0406b1bf9f89 Mon Sep 17 00:00:00 2001 From: Mihir Patel Date: Tue, 25 Jul 2023 06:53:14 +0000 Subject: [PATCH 01/16] OA crash handling to reinitialize port through xcvrd --- .../Interface-Link-bring-up-sequence.md | 231 +++++++++++++++++- 1 file changed, 230 insertions(+), 1 deletion(-) diff --git a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md index 9a993ad6cc..7beb7cebb4 100644 --- a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md +++ b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md @@ -17,6 +17,7 @@ Deterministic Approach for Interface Link bring-up sequence * [Pre-requisite](#pre-requisite) * [Breakout handling](#breakout-handling) * [Proposed Work-Flows](#proposed-work-flows) + * [Port reinitialization during syncd/swss/orchagent crash](#port-reinitialization-during-syncdswssorchagent-crash) # List of Tables * [Table 1: Definitions](#table-1-definitions) @@ -184,7 +185,235 @@ if transceiver is not present: - All the workflows mentioned above will reamin same ( or get exercised) till host_tx_ready field update - xcvrd will not perform any action on receiving host_tx_ready field update - +# Port reinitialization during syncd/swss/orchagent crash +## Overview: + +When syncd/swss/orchagent crashes, all ports in the corresponding namespace will be reinitialized by xcvrd irrespective of its current state. +If just xcvrd crashes and restarts, then forced reinitialization (CMIS reinit + media settings notify) of port will not be performed. +Following infra will ensure port reinitialization by xcvrd in case of syncd/swss/orchagent crash: + +1. XCVRD main thread init + - XCVRD main thread creates the key CMIS_REINIT_REQUIRED in PORT_TABLE:\ (APPL_DB) with value as true for ports which do NOT have this key present + - XCVRD main thread creates the key MEDIA_SETTINGS_SYNC_STATUS in PORT_TABLE:\ (APPL_DB) with value MEDIA_SETTINGS_DEFAULT for ports which do NOT have this key present. + Following are the possible values for MEDIA_SETTINGS_SYNC_STATUS + - MEDIA_SETTINGS_DEFAULT - xcvrd main thread creates this after cold start and sets to this after transceiver removal + - MEDIA_SETTINGS_NOTIFIED - SfpStateUpdateTask sets this during boot-up and transceiver insertion + - MEDIA_SETTINGS_DONE - OA sets this after applying SI settings + +2. SfpStateUpdateTask thread will notify the media settings to OA based on the value of PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS +If PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS != MEDIA_SETTINGS_DONE, media settings sync will be invoked and will be set to MEDIA_SETTINGS_NOTIFIED for a port supporting media settings. +3. The OA upon receiving media settings will + - Disable port admin status + - Apply SI settings + - PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS = MEDIA_SETTINGS_DONE +4. In the CMIS_STATE_INSERTED state, if 'admin_status' is up and 'host_tx_ready' is true, CmisManagerTask thread will check if + - the port supports media settings (will be checked using g_dict and finding valid SI values) and + - MEDIA_SETTINGS_SYNC_STATUS != MEDIA_SETTINGS_DONE +If all the above conditions are true, CMIS SM transitions to CMIS_STATE_MEDIA_SETTINGS_WAIT state. +If port doesn't require media settings to be applied, CMIS SM will proceed with normal code flow (transitions to CMIS_STATE_DP_DEINIT) +Overall, no functionality change related to CMIS SM transitions is intended for ports not supporting media settings +5. CMIS_STATE_MEDIA_SETTINGS_WAIT state will wait for MEDIA_SETTINGS_DONE and upon reaching to MEDIA_SETTINGS_DONE, CMIS SM will transition to CMIS_STATE_DP_DEINIT. +There will be a timeout of 5s for every retry +6. The CmisManagerTask thread will set “CMIS_REINIT_REQUIRED" to false after CMIS SM reaches to a steady state (CMIS_STATE_UNKNOWN, CMIS_STATE_FAILED, CMIS_STATE_READY and CMIS_STATE_REMOVED) for the corresponding port +7. XCVRD will subscribe to PORT_TABLE in APPL_DB and trigger self-restart if the PORT_TABLE is deleted for the namespace. +All threads will be gracefully terminated and xcvrd deinit will be performed followed by issuing a SIGABRT to ensure XCVRD is restarted automatically by supervisord. After respawn, CMIS re-init and media_settings notified is triggered for the ports belonging to the affected namespace +8. syncd/swss/orchagent restart clears the entire APPL-DB, including “MEDIA_SETTINGS_SYNC_STATUS” and "CMIS_REINIT_REQUIRED" in PORT_TABLE + +## XCVRD init sequence to support port reinitialization during syncd/swss/orchagent crash + +```mermaid +sequenceDiagram + participant APPL_DB + participant XCVRDMT as XCVRD main thread + participant CmisManagerTask + participant SfpStateUpdateTask + participant DomInfoUpdateTask + + Note over XCVRDMT: Load new platform specific api class,
sfputil class and load namespace details + XCVRDMT ->> XCVRDMT: Wait for port config completion + loop lport in logical_port_list + alt if CMIS_REINIT_REQUIRED not in PORT_TABLE: + XCVRDMT ->> APPL_DB: PORT_TABLE:.CMIS_REINIT_REQUIRED = true + end + alt if MEDIA_SETTINGS_SYNC_STATUS not in PORT_TABLE: + XCVRDMT ->> APPL_DB: PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS = MEDIA_SETTINGS_DEFAULT + end + end + Note over APPL_DB: PORT_TABLE:
CMIS_REINIT_REQUIRED : true/false
MEDIA_NOTIFY_REQUIRED : true/false + XCVRDMT ->> CmisManagerTask: Spawns + XCVRDMT ->> DomInfoUpdateTask: Spawns + XCVRDMT ->> SfpStateUpdateTask: Spawns + par XCVRDMT, CmisManagerTask, SfpStateUpdateTask, DomInfoUpdateTask + loop Wait for stop_event else poll every 60s + DomInfoUpdateTask->>DomInfoUpdateTask: Update TRANSCEIVER_DOM_SENSOR,
TRANSCEIVER_STATUS (HW section)
TRANSCEIVER_PM tables + end + loop Wait for stop_event + XCVRDMT->>XCVRDMT: Check for changes in PORT_TABLE and act upon receiving DEL event + end + Note over CmisManagerTask: Subscribe to CONFIG_DB:PORT,
STATE_DB:TRANSCEIVER_INFO and STATE_DB:PORT_TABLE + loop Wait for stop_event + Note over CmisManagerTask: Start the CMIS SM and act based on subscribed DB related changes + end + Note over SfpStateUpdateTask: _post_port_sfp_info_and_dom_thr_to_db_once
_init_port_sfp_status_tbl
Subscribe to CONFIG_DB:PORT + loop Wait for stop_event + SfpStateUpdateTask ->> SfpStateUpdateTask: Handle config change event
retry_eeprom_reading()
_wrapper_get_transceiver_change_event + end + end +``` + +## SfpStateUpdateTask's role to notify media settings to OA + +```mermaid +sequenceDiagram + participant OA + participant APPL_DB + participant SfpStateUpdateTask + + Note over SfpStateUpdateTask: Subscribe to CONFIG_DB:PORT,
STATE_DB:TRANSCEIVER_INFO and STATE_DB:PORT_TABLE + Note over SfpStateUpdateTask: _post_port_sfp_info_and_dom_thr_to_db_once + loop lport in logical_port_list + alt post_port_sfp_info_to_db != SFP_EEPROM_NOT_READY + Note over SfpStateUpdateTask: post_port_dom_threshold_info_to_db + opt PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS != MEDIA_SETTINGS_DONE + opt if lport supports media settings + SfpStateUpdateTask ->> APPL_DB: PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS = MEDIA_SETTINGS_NOTIFIED + APPL_DB -->> OA: Notify media settings for ports + Note over OA: Disable admin status
setPortSerdesAttribute + OA ->> APPL_DB: PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS = MEDIA_SETTINGS_DONE + Note over OA: initHostTxReadyState + end + end + else + Note over SfpStateUpdateTask: retry_eeprom_set.add(lport) + end + end + Note over SfpStateUpdateTask: _init_port_sfp_status_tbl
Subscribe to CONFIG_DB + loop Wait for stop_event + SfpStateUpdateTask ->> SfpStateUpdateTask: Handle config change event
retry_eeprom_reading()
_wrapper_get_transceiver_change_event + end +``` + +## CMIS State machine with CMIS_STATE_MEDIA_SETTINGS_WAIT state + +The below state machine is a high level flow and doesn't capture details for states other than CMIS_STATE_MEDIA_SETTINGS_WAIT + +```mermaid +stateDiagram + [*] --> CMIS_STATE_INSERTED + state if_state <> + state if_state2 <> + CMIS_STATE_INSERTED --> if_state + if_state --> CMIS_STATE_READY : if host_tx_ready != True or
admin_status != up
Action - disable TX + if_state --> if_state2 : if host_tx_ready == True and
admin_status == up + if_state2 --> CMIS_STATE_DP_DEINIT : if PORT_TABLE.port.CMIS_REINIT_REQUIRED == true or
is_cmis_application_update_required + if_state2 --> CMIS_STATE_MEDIA_SETTINGS_WAIT : if is_media_settings_supported and
MEDIA_SETTINGS_SYNC_STATUS != MEDIA_SETTINGS_DONE + note left of CMIS_STATE_READY : PORT_TABLE.port.CMIS_REINIT_REQUIRED = false + if_state2 --> CMIS_STATE_FAILED : if appl < 1 or
host_lanes_mask <= 0 or
media_lanes_mask <= 0 + note left of CMIS_STATE_FAILED : PORT_TABLE.port.CMIS_REINIT_REQUIRED = false + + CMIS_STATE_MEDIA_SETTINGS_WAIT --> CMIS_STATE_DP_DEINIT : if PORT_TABLE<port>.MEDIA_SETTINGS_SYNC_STATUS == MEDIA_SETTINGS_DONE + CMIS_STATE_MEDIA_SETTINGS_WAIT --> CMIS_STATE_INSERTED : Through force_cmis_reinit upon reaching timeout + note right of CMIS_STATE_MEDIA_SETTINGS_WAIT + Checks if PORT_TABLE<port>.MEDIA_SETTINGS_SYNC_STATUS == MEDIA_SETTINGS_DONE + After 5s timeout, force_cmis_reinit will be called + end note + + CMIS_STATE_DP_DEINIT --> CMIS_STATE_AP_CONF + CMIS_STATE_AP_CONF --> CMIS_STATE_DP_INIT + CMIS_STATE_DP_INIT --> CMIS_STATE_DP_TXON + CMIS_STATE_DP_TXON --> CMIS_STATE_DP_ACTIVATE + CMIS_STATE_DP_ACTIVATE --> CMIS_STATE_READY +``` + +## Transceiver OIR handling + +```mermaid +sequenceDiagram + participant STATE_DB + participant OA + participant APPL_DB + participant CmisManagerTask + participant SfpStateUpdateTask + + SfpStateUpdateTask ->> SfpStateUpdateTask : event = SFP_STATUS_REMOVED + SfpStateUpdateTask -x STATE_DB : Delete TRANSCEIVER_INFO table for the port + par CmisManagerTask, SfpStateUpdateTask + CmisManagerTask ->> CmisManagerTask : Transition CMIS SM to CMIS_STATE_REMOVED + SfpStateUpdateTask ->> APPL_DB : PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS =
MEDIA_SETTINGS_DEFAULT + end + + SfpStateUpdateTask ->> SfpStateUpdateTask : event = SFP_STATUS_INSERTED + SfpStateUpdateTask ->> STATE_DB : Create TRANSCEIVER_INFO table for the port + par CmisManagerTask, SfpStateUpdateTask + CmisManagerTask ->> CmisManagerTask : Transition CMIS SM to CMIS_STATE_INSERTED + SfpStateUpdateTask ->> APPL_DB: PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS =
MEDIA_SETTINGS_NOTIFIED + activate OA + SfpStateUpdateTask ->> OA: Notify media settings for ports + Note over OA: Disable admin status
setPortSerdesAttribute + OA ->> APPL_DB: PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS = MEDIA_SETTINGS_DONE + Note over OA: initHostTxReadyState + deactivate OA + end +``` + +## XCVRD termination during syncd/swss/orchagent crash + +The below sequence diagram captures the termination of XCVRD during syncd/swss/orchagent crash. +
supervisord will respawn XCVRD after termination as xcvrd is killed using SIGABRT signal + +```mermaid +sequenceDiagram + participant OA + participant APPL_DB + participant XCVRDMT as XCVRD main thread + participant CmisManagerTask + participant DomInfoUpdateTask + participant SfpStateUpdateTask + + activate OA + activate XCVRDMT + activate CmisManagerTask + activate DomInfoUpdateTask + activate SfpStateUpdateTask + OA -x OA: Crashes while handling a routine + deactivate OA + OA ->> APPL_DB : DEL PORT_TABLE + + XCVRDMT -x APPL_DB : XCVRD main thread proecesses DEL event of APPL_DB PORT_TABLE + Note over XCVRDMT: generate_sigabrt = True + alt If threads > 0 are dead + XCVRDMT -x XCVRDMT : Kill XCVRD with SIGKILL + end + XCVRDMT -x CmisManagerTask : Stop CmisManagerTask + deactivate CmisManagerTask + XCVRDMT -x DomInfoUpdateTask : Stop DomInfoUpdateTask + deactivate DomInfoUpdateTask + XCVRDMT -x SfpStateUpdateTask : Stop SfpStateUpdateTask + deactivate SfpStateUpdateTask + Note over XCVRDMT : deinit() + alt self.sfp_error_event.is_set() + XCVRDMT -x XCVRDMT : sys.exit(SFP_SYSTEM_ERROR) + else if generate_sigabrt is True + XCVRDMT -x XCVRDMT : Kill XCVRD with SIGABRT + + else + XCVRDMT -x XCVRDMT : Graceful exit + end + deactivate XCVRDMT +``` + +## Test plan and expectation +| Event | APPL_DB cleared | Xcvrd restarted | Media renotify | MEDIA_SETTINGS_SYNC_DONE value on xcvrd boot-up for initialized transceiver | CMIS re-init triggered | Link flap | +|:----------------:|:---------------:|:---------------:|:--------------:|:-----------------------------------------------------------------------------:|:----------------------:|:---------:| +| Xcvrd restart | N | Y | N | MEDIA_SETTINGS_DONE | N | N | +| Pmon restart | N | Y | N | MEDIA_SETTINGS_DONE | N | N | +| Swss restart | Y | Y | Y | MEDIA_SETTINGS_DEFAULT | Y | Y | +| Syncd restart | Y | Y | Y | MEDIA_SETTINGS_DEFAULT | Y | Y | +| config reload | Y | Y | Y | MEDIA_SETTINGS_DEFAULT | Y | Y | +| Cold reboot | Y | Y | Y | MEDIA_SETTINGS_DEFAULT | Y | Y | +| Config shut | N | N | N | MEDIA_SETTINGS_DONE | N | Y | +| Config no shut | N | N | N | MEDIA_SETTINGS_DONE | N | Y | +| Warm reboot | N | Y | N | MEDIA_SETTINGS_DONE | N | N | # Out of Scope Following items are not in the scope of this document. They would be taken up separately 1. xcvrd restart From de1c2766b34573f77f50b48187cc0f25773e6d2b Mon Sep 17 00:00:00 2001 From: Mihir Patel Date: Tue, 25 Jul 2023 21:20:15 +0000 Subject: [PATCH 02/16] Added table to list values for MEDIA_SETTINGS_SYNC_STATUS --- .../Interface-Link-bring-up-sequence.md | 34 +++++++++---------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md index 7beb7cebb4..3bf29ba505 100644 --- a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md +++ b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md @@ -186,22 +186,28 @@ if transceiver is not present: - xcvrd will not perform any action on receiving host_tx_ready field update # Port reinitialization during syncd/swss/orchagent crash -## Overview: +## Overview -When syncd/swss/orchagent crashes, all ports in the corresponding namespace will be reinitialized by xcvrd irrespective of its current state. +When syncd/swss/orchagent crashes, all ports in the corresponding namespace will be reinitialized by xcvrd irrespective of the current state of the port. If just xcvrd crashes and restarts, then forced reinitialization (CMIS reinit + media settings notify) of port will not be performed. Following infra will ensure port reinitialization by xcvrd in case of syncd/swss/orchagent crash: 1. XCVRD main thread init - XCVRD main thread creates the key CMIS_REINIT_REQUIRED in PORT_TABLE:\ (APPL_DB) with value as true for ports which do NOT have this key present - XCVRD main thread creates the key MEDIA_SETTINGS_SYNC_STATUS in PORT_TABLE:\ (APPL_DB) with value MEDIA_SETTINGS_DEFAULT for ports which do NOT have this key present. - Following are the possible values for MEDIA_SETTINGS_SYNC_STATUS - - MEDIA_SETTINGS_DEFAULT - xcvrd main thread creates this after cold start and sets to this after transceiver removal - - MEDIA_SETTINGS_NOTIFIED - SfpStateUpdateTask sets this during boot-up and transceiver insertion - - MEDIA_SETTINGS_DONE - OA sets this after applying SI settings + - For transceivers which do not support media settings, MEDIA_SETTINGS_SYNC_STATUS will stay with value MEDIA_SETTINGS_DEFAULT + + Following table describes the various values for MEDIA_SETTINGS_SYNC_STATUS + +| Value | Modifier thread and event | Consumer thread and purpose | +|:-----------------------:|:------------------------------------------------------:|:--------------------------------------------------------------------------------------------:| +| MEDIA_SETTINGS_DEFAULT | XCVRD main thread during cold start of xcvrd | XCVRD main thread during boot-up for deciding to notify media settings | +| | SfpStateUpdateTask during transceiver removal | | +| MEDIA_SETTINGS_NOTIFIED | SfpStateUpdateTask while updating the media settings | Not being used currently | +| MEDIA_SETTINGS_DONE | Orchagent after applying the SI settings | CmisManagerTask for proceeding to CMIS_STATE_DP_DEINIT from CMIS_STATE_MEDIA_SETTINGS_WAIT | 2. SfpStateUpdateTask thread will notify the media settings to OA based on the value of PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS -If PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS != MEDIA_SETTINGS_DONE, media settings sync will be invoked and will be set to MEDIA_SETTINGS_NOTIFIED for a port supporting media settings. +If PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS != MEDIA_SETTINGS_DONE, notify media settings will be invoked and will be set to MEDIA_SETTINGS_NOTIFIED for a port supporting media settings. 3. The OA upon receiving media settings will - Disable port admin status - Apply SI settings @@ -270,7 +276,7 @@ sequenceDiagram participant SfpStateUpdateTask Note over SfpStateUpdateTask: Subscribe to CONFIG_DB:PORT,
STATE_DB:TRANSCEIVER_INFO and STATE_DB:PORT_TABLE - Note over SfpStateUpdateTask: _post_port_sfp_info_and_dom_thr_to_db_once + Note over SfpStateUpdateTask: Following loop represents _post_port_sfp_info_and_dom_thr_to_db_once loop lport in logical_port_list alt post_port_sfp_info_to_db != SFP_EEPROM_NOT_READY Note over SfpStateUpdateTask: post_port_dom_threshold_info_to_db @@ -348,7 +354,7 @@ sequenceDiagram CmisManagerTask ->> CmisManagerTask : Transition CMIS SM to CMIS_STATE_INSERTED SfpStateUpdateTask ->> APPL_DB: PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS =
MEDIA_SETTINGS_NOTIFIED activate OA - SfpStateUpdateTask ->> OA: Notify media settings for ports + APPL_DB -->> OA: Notify media settings for ports Note over OA: Disable admin status
setPortSerdesAttribute OA ->> APPL_DB: PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS = MEDIA_SETTINGS_DONE Note over OA: initHostTxReadyState @@ -416,15 +422,9 @@ sequenceDiagram | Warm reboot | N | Y | N | MEDIA_SETTINGS_DONE | N | N | # Out of Scope Following items are not in the scope of this document. They would be taken up separately -1. xcvrd restart - - If the xcvrd goes for restart, then all the DB events will be replayed. - Here the Datapath init/activate for CMIS compliant optical modules, tx-disable register set (for SFF complaint optical modules), will be a no-op if the optics is already in that state -2. syncd/gbsyncd/swss docker container restart - - Cleanup scenario - Check if the host_tx_ready field in STATE-DB need to be updated to “False” for any use-case, either in going down or coming up path - - Discuss further on the possible use-cases -3. CMIS API feature is not part of this design and the APIs will be used in this design. For CMIS HLD, Please refer to: +1. CMIS API feature is not part of this design and the APIs will be used in this design. For CMIS HLD, Please refer to: https://github.com/sonic-net/SONiC/blob/9d480087243fd1158e785e3c2f4d35b73c6d1317/doc/sfp-cmis/cmis-init.md -4. Error handling of SAI attributes +2. Error handling of SAI attributes a) At present, If there is a set attribute failure, orch agent will exit. Refer the error handling API : https://github.com/sonic-net/sonic-swss/blob/master/orchagent/orch.cpp#L885 b) Error handling for SET_ADMIN_STATUS attribute will be added in future. From b5713ecb4456c78dc72205a1410e561838982690 Mon Sep 17 00:00:00 2001 From: Mihir Patel Date: Tue, 25 Jul 2023 22:11:16 +0000 Subject: [PATCH 03/16] Fixed typo --- doc/sfp-cmis/Interface-Link-bring-up-sequence.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md index 3bf29ba505..4e82a35bcc 100644 --- a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md +++ b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md @@ -409,7 +409,7 @@ sequenceDiagram ``` ## Test plan and expectation -| Event | APPL_DB cleared | Xcvrd restarted | Media renotify | MEDIA_SETTINGS_SYNC_DONE value on xcvrd boot-up for initialized transceiver | CMIS re-init triggered | Link flap | +| Event | APPL_DB cleared | Xcvrd restarted | Media renotify | MEDIA_SETTINGS_SYNC_STATUS value on xcvrd boot-up for initialized transceiver | CMIS re-init triggered | Link flap | |:----------------:|:---------------:|:---------------:|:--------------:|:-----------------------------------------------------------------------------:|:----------------------:|:---------:| | Xcvrd restart | N | Y | N | MEDIA_SETTINGS_DONE | N | N | | Pmon restart | N | Y | N | MEDIA_SETTINGS_DONE | N | N | From 2fc0904e6678856c9ab18ec79e6c35a4964a89b9 Mon Sep 17 00:00:00 2001 From: Mihir Patel Date: Fri, 28 Jul 2023 15:06:22 +0000 Subject: [PATCH 04/16] Addressed PR comment --- doc/sfp-cmis/Interface-Link-bring-up-sequence.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md index 4e82a35bcc..45f873c705 100644 --- a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md +++ b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md @@ -195,7 +195,7 @@ Following infra will ensure port reinitialization by xcvrd in case of syncd/swss 1. XCVRD main thread init - XCVRD main thread creates the key CMIS_REINIT_REQUIRED in PORT_TABLE:\ (APPL_DB) with value as true for ports which do NOT have this key present - XCVRD main thread creates the key MEDIA_SETTINGS_SYNC_STATUS in PORT_TABLE:\ (APPL_DB) with value MEDIA_SETTINGS_DEFAULT for ports which do NOT have this key present. - - For transceivers which do not support media settings, MEDIA_SETTINGS_SYNC_STATUS will stay with value MEDIA_SETTINGS_DEFAULT + - For transceivers which do not require media settings, MEDIA_SETTINGS_SYNC_STATUS will stay with value MEDIA_SETTINGS_DEFAULT Following table describes the various values for MEDIA_SETTINGS_SYNC_STATUS From 04f841733ada64f63da72b3be7bdd1c8edeabb7c Mon Sep 17 00:00:00 2001 From: Mihir Patel Date: Tue, 1 Aug 2023 18:03:33 +0000 Subject: [PATCH 05/16] Addressed PR comments and separated media notification between SfpStateUpdateTask and CmisManagerTask threads --- .../Interface-Link-bring-up-sequence.md | 111 ++++++++++-------- 1 file changed, 65 insertions(+), 46 deletions(-) diff --git a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md index 45f873c705..b01745ba9e 100644 --- a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md +++ b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md @@ -53,6 +53,8 @@ Interface link bring-up sequence and workflows for use-cases around it | gbsyncd | Gearbox (External PHY) docker container | | DPInit | Data-Path Initialization | | QSFP-DD | QSFP-Double Density (i.e. 400G) optical module | +| OIR | Online Insertion and Removal | +| SM | State Machine | # References @@ -189,8 +191,12 @@ if transceiver is not present: ## Overview When syncd/swss/orchagent crashes, all ports in the corresponding namespace will be reinitialized by xcvrd irrespective of the current state of the port. -If just xcvrd crashes and restarts, then forced reinitialization (CMIS reinit + media settings notify) of port will not be performed. -Following infra will ensure port reinitialization by xcvrd in case of syncd/swss/orchagent crash: +If just xcvrd crashes and restarts, then forced reinitialization (CMIS reinit + media settings notify) of ports will not be performed. +CMIS_REINIT_REQUIRED and MEDIA_SETTINGS_SYNC_STATUS keys in APPL_DB PORT_TABLE:\ are used to determine if port reinitialization is required or not. + - CMIS_REINIT_REQUIRED key states if CMIS reinitialization is required for a port after xcvrd is spawned. CMIS_REINIT_REQUIRED helps in mainly driving CMIS reinitialization after syncd/swss/orchagent crash since it will allow reinitializing ports belonging to the relevant namespace of the crashing process. This key is not planned to drive CMIS initialization after transceiver insertion. + - MEDIA_SETTINGS_SYNC_STATUS key is used as a means to communicate the status of applying media settings for a transceiver requiring media settings. This key is used to update the media settings application status between SfpStateUpdateTask, CmisManagerTask and Orchagent. In case of warm reboot or xcvrd restart, this key will prevent application of media settings on the port if media settings are already applied. In case of transceiver insertion, media settings will be applied irrespective of the media settings application status for the port. + +Following infra will ensure port reinitialization by xcvrd in case of syncd/swss/orchagent crash 1. XCVRD main thread init - XCVRD main thread creates the key CMIS_REINIT_REQUIRED in PORT_TABLE:\ (APPL_DB) with value as true for ports which do NOT have this key present @@ -198,44 +204,47 @@ Following infra will ensure port reinitialization by xcvrd in case of syncd/swss - For transceivers which do not require media settings, MEDIA_SETTINGS_SYNC_STATUS will stay with value MEDIA_SETTINGS_DEFAULT Following table describes the various values for MEDIA_SETTINGS_SYNC_STATUS +A transceiver is classified as CMIS SM driven transceiver if its module type is CMIS and it does not have flat memory + +| Value | Modifier thread and event | Consumer thread and purpose | +| ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- | +| MEDIA_SETTINGS_DEFAULT | 1\. XCVRD main thread during cold start of XCVRD
2. SfpStateUpdateTask during transceiver removal | XCVRD main thread during boot-up for deciding to notify media settings | +| MEDIA_SETTINGS_NOTIFIED | 1\. SfpStateUpdateTask while updating and notifying the media settings for non-CMIS SM driven transceivers
2. CmisManagerTask while updating and notifying the media settings for CMIS SM driven transceivers | Not being used currently | +| MEDIA_SETTINGS_DONE | Orchagent after applying the SI settings | CmisManagerTask for proceeding from CMIS_STATE_MEDIA_SETTINGS_WAIT to CMIS_STATE_AP_CONF state | -| Value | Modifier thread and event | Consumer thread and purpose | -|:-----------------------:|:------------------------------------------------------:|:--------------------------------------------------------------------------------------------:| -| MEDIA_SETTINGS_DEFAULT | XCVRD main thread during cold start of xcvrd | XCVRD main thread during boot-up for deciding to notify media settings | -| | SfpStateUpdateTask during transceiver removal | | -| MEDIA_SETTINGS_NOTIFIED | SfpStateUpdateTask while updating the media settings | Not being used currently | -| MEDIA_SETTINGS_DONE | Orchagent after applying the SI settings | CmisManagerTask for proceeding to CMIS_STATE_DP_DEINIT from CMIS_STATE_MEDIA_SETTINGS_WAIT | +2. Update and notify media settings to OA + For non-CMIS SM driven transceivers, SfpStateUpdateTask thread will update and notify the media settings to OA based on the value of PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS + If PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS != MEDIA_SETTINGS_DONE, notify media settings will be invoked and will be set to MEDIA_SETTINGS_NOTIFIED for a port requiring media settings. + For CMIS SM driven transceivers, CmisManagerTask thread will update and notify the media settings to OA based on the value of PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS -2. SfpStateUpdateTask thread will notify the media settings to OA based on the value of PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS -If PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS != MEDIA_SETTINGS_DONE, notify media settings will be invoked and will be set to MEDIA_SETTINGS_NOTIFIED for a port supporting media settings. 3. The OA upon receiving media settings will - Disable port admin status - - Apply SI settings - - PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS = MEDIA_SETTINGS_DONE -4. In the CMIS_STATE_INSERTED state, if 'admin_status' is up and 'host_tx_ready' is true, CmisManagerTask thread will check if - - the port supports media settings (will be checked using g_dict and finding valid SI values) and - - MEDIA_SETTINGS_SYNC_STATUS != MEDIA_SETTINGS_DONE -If all the above conditions are true, CMIS SM transitions to CMIS_STATE_MEDIA_SETTINGS_WAIT state. -If port doesn't require media settings to be applied, CMIS SM will proceed with normal code flow (transitions to CMIS_STATE_DP_DEINIT) -Overall, no functionality change related to CMIS SM transitions is intended for ports not supporting media settings -5. CMIS_STATE_MEDIA_SETTINGS_WAIT state will wait for MEDIA_SETTINGS_DONE and upon reaching to MEDIA_SETTINGS_DONE, CMIS SM will transition to CMIS_STATE_DP_DEINIT. + - Request SAI-SDK to apply the media settings via syncd + - If SAI-SDK returns success, OA will update the PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS to MEDIA_SETTINGS_DONE + - In case of failure, OA will log an error message and proceed to handling the next port + +4. For transceivers requiring media settings, media settings will be updated and notified to OA in the CMIS_STATE_AP_PRE_CONF state. Then, CMIS SM will transition to CMIS_STATE_MEDIA_SETTINGS_WAIT state. + If port doesn't require media settings to be applied, CMIS SM will transition to CMIS_STATE_AP_CONF state. + +5. CMIS_STATE_MEDIA_SETTINGS_WAIT state will wait for MEDIA_SETTINGS_DONE and upon reaching to MEDIA_SETTINGS_DONE, CMIS SM will transition to CMIS_STATE_AP_CONF state. There will be a timeout of 5s for every retry 6. The CmisManagerTask thread will set “CMIS_REINIT_REQUIRED" to false after CMIS SM reaches to a steady state (CMIS_STATE_UNKNOWN, CMIS_STATE_FAILED, CMIS_STATE_READY and CMIS_STATE_REMOVED) for the corresponding port 7. XCVRD will subscribe to PORT_TABLE in APPL_DB and trigger self-restart if the PORT_TABLE is deleted for the namespace. All threads will be gracefully terminated and xcvrd deinit will be performed followed by issuing a SIGABRT to ensure XCVRD is restarted automatically by supervisord. After respawn, CMIS re-init and media_settings notified is triggered for the ports belonging to the affected namespace 8. syncd/swss/orchagent restart clears the entire APPL-DB, including “MEDIA_SETTINGS_SYNC_STATUS” and "CMIS_REINIT_REQUIRED" in PORT_TABLE +9. In case of warm reboot, the APPL_DB is not cleared and hence, once xcvrd is spawned after the reboot, the ports are not initialized again. ## XCVRD init sequence to support port reinitialization during syncd/swss/orchagent crash ```mermaid sequenceDiagram - participant APPL_DB + participant APPL_DB as APPL_DB@asic_n participant XCVRDMT as XCVRD main thread participant CmisManagerTask participant SfpStateUpdateTask participant DomInfoUpdateTask - Note over XCVRDMT: Load new platform specific api class,
sfputil class and load namespace details + Note over XCVRDMT: Load new platform specific api class,
sfputil class and load namespace details
load_media_settings() XCVRDMT ->> XCVRDMT: Wait for port config completion loop lport in logical_port_list alt if CMIS_REINIT_REQUIRED not in PORT_TABLE: @@ -254,7 +263,7 @@ sequenceDiagram DomInfoUpdateTask->>DomInfoUpdateTask: Update TRANSCEIVER_DOM_SENSOR,
TRANSCEIVER_STATUS (HW section)
TRANSCEIVER_PM tables end loop Wait for stop_event - XCVRDMT->>XCVRDMT: Check for changes in PORT_TABLE and act upon receiving DEL event + XCVRDMT->>XCVRDMT: Check for changes in APPL_DB:PORT_TABLE and act upon receiving DEL event end Note over CmisManagerTask: Subscribe to CONFIG_DB:PORT,
STATE_DB:TRANSCEIVER_INFO and STATE_DB:PORT_TABLE loop Wait for stop_event @@ -267,12 +276,12 @@ sequenceDiagram end ``` -## SfpStateUpdateTask's role to notify media settings to OA +## SfpStateUpdateTask's role to notify media settings to OA during xcvrd boot-up ```mermaid sequenceDiagram participant OA - participant APPL_DB + participant APPL_DB as APPL_DB@asic_n participant SfpStateUpdateTask Note over SfpStateUpdateTask: Subscribe to CONFIG_DB:PORT,
STATE_DB:TRANSCEIVER_INFO and STATE_DB:PORT_TABLE @@ -281,7 +290,8 @@ sequenceDiagram alt post_port_sfp_info_to_db != SFP_EEPROM_NOT_READY Note over SfpStateUpdateTask: post_port_dom_threshold_info_to_db opt PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS != MEDIA_SETTINGS_DONE - opt if lport supports media settings + opt if lport requires media settings + SfpStateUpdateTask ->> APPL_DB: Update SI params from media_settings.json to PORT_TABLE: SfpStateUpdateTask ->> APPL_DB: PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS = MEDIA_SETTINGS_NOTIFIED APPL_DB -->> OA: Notify media settings for ports Note over OA: Disable admin status
setPortSerdesAttribute @@ -299,33 +309,38 @@ sequenceDiagram end ``` -## CMIS State machine with CMIS_STATE_MEDIA_SETTINGS_WAIT state +## CMIS State machine with introduction of CMIS_STATE_AP_PRE_CONF and CMIS_STATE_MEDIA_SETTINGS_WAIT states -The below state machine is a high level flow and doesn't capture details for states other than CMIS_STATE_MEDIA_SETTINGS_WAIT +The below state machine is a high level flow and doesn't capture details for states other than CMIS_STATE_AP_PRE_CONF, CMIS_STATE_MEDIA_SETTINGS_WAIT and CMIS_STATE_AP_CONF ```mermaid stateDiagram [*] --> CMIS_STATE_INSERTED state if_state <> state if_state2 <> + state if_state3 <> CMIS_STATE_INSERTED --> if_state if_state --> CMIS_STATE_READY : if host_tx_ready != True or
admin_status != up
Action - disable TX if_state --> if_state2 : if host_tx_ready == True and
admin_status == up if_state2 --> CMIS_STATE_DP_DEINIT : if PORT_TABLE.port.CMIS_REINIT_REQUIRED == true or
is_cmis_application_update_required - if_state2 --> CMIS_STATE_MEDIA_SETTINGS_WAIT : if is_media_settings_supported and
MEDIA_SETTINGS_SYNC_STATUS != MEDIA_SETTINGS_DONE note left of CMIS_STATE_READY : PORT_TABLE.port.CMIS_REINIT_REQUIRED = false if_state2 --> CMIS_STATE_FAILED : if appl < 1 or
host_lanes_mask <= 0 or
media_lanes_mask <= 0 note left of CMIS_STATE_FAILED : PORT_TABLE.port.CMIS_REINIT_REQUIRED = false - CMIS_STATE_MEDIA_SETTINGS_WAIT --> CMIS_STATE_DP_DEINIT : if PORT_TABLE<port>.MEDIA_SETTINGS_SYNC_STATUS == MEDIA_SETTINGS_DONE + CMIS_STATE_DP_DEINIT --> CMIS_STATE_AP_PRE_CONF + + note left of CMIS_STATE_AP_PRE_CONF : Ensure current states are ModuleReady and DataPathDeactivated
Configure laser frequency for ZR module
Update host and media SI settings and notify to OA + CMIS_STATE_AP_PRE_CONF --> if_state3 + if_state3 --> CMIS_STATE_MEDIA_SETTINGS_WAIT : if module_requires_media_settings + if_state3 --> CMIS_STATE_AP_CONF : if not module_requires_media_settings + CMIS_STATE_MEDIA_SETTINGS_WAIT --> CMIS_STATE_AP_CONF : if PORT_TABLE<port>.MEDIA_SETTINGS_SYNC_STATUS == MEDIA_SETTINGS_DONE CMIS_STATE_MEDIA_SETTINGS_WAIT --> CMIS_STATE_INSERTED : Through force_cmis_reinit upon reaching timeout note right of CMIS_STATE_MEDIA_SETTINGS_WAIT Checks if PORT_TABLE<port>.MEDIA_SETTINGS_SYNC_STATUS == MEDIA_SETTINGS_DONE After 5s timeout, force_cmis_reinit will be called end note - - CMIS_STATE_DP_DEINIT --> CMIS_STATE_AP_CONF CMIS_STATE_AP_CONF --> CMIS_STATE_DP_INIT + note right of CMIS_STATE_AP_CONF: set_application CMIS_STATE_DP_INIT --> CMIS_STATE_DP_TXON CMIS_STATE_DP_TXON --> CMIS_STATE_DP_ACTIVATE CMIS_STATE_DP_ACTIVATE --> CMIS_STATE_READY @@ -335,9 +350,9 @@ stateDiagram ```mermaid sequenceDiagram - participant STATE_DB + participant STATE_DB as STATE_DB_@asic_n participant OA - participant APPL_DB + participant APPL_DB as APPL_DB@asic_n participant CmisManagerTask participant SfpStateUpdateTask @@ -352,7 +367,10 @@ sequenceDiagram SfpStateUpdateTask ->> STATE_DB : Create TRANSCEIVER_INFO table for the port par CmisManagerTask, SfpStateUpdateTask CmisManagerTask ->> CmisManagerTask : Transition CMIS SM to CMIS_STATE_INSERTED - SfpStateUpdateTask ->> APPL_DB: PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS =
MEDIA_SETTINGS_NOTIFIED + opt does_xcvr_require_cmis_sm and module_requires_media_settings + SfpStateUpdateTask ->> APPL_DB: Update SI params from media_settings.json to PORT_TABLE: + SfpStateUpdateTask ->> APPL_DB: PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS =
MEDIA_SETTINGS_NOTIFIED + end activate OA APPL_DB -->> OA: Notify media settings for ports Note over OA: Disable admin status
setPortSerdesAttribute @@ -370,7 +388,7 @@ The below sequence diagram captures the termination of XCVRD during syncd/swss/o ```mermaid sequenceDiagram participant OA - participant APPL_DB + participant APPL_DB as APPL_DB@asic_n participant XCVRDMT as XCVRD main thread participant CmisManagerTask participant DomInfoUpdateTask @@ -409,17 +427,18 @@ sequenceDiagram ``` ## Test plan and expectation -| Event | APPL_DB cleared | Xcvrd restarted | Media renotify | MEDIA_SETTINGS_SYNC_STATUS value on xcvrd boot-up for initialized transceiver | CMIS re-init triggered | Link flap | -|:----------------:|:---------------:|:---------------:|:--------------:|:-----------------------------------------------------------------------------:|:----------------------:|:---------:| -| Xcvrd restart | N | Y | N | MEDIA_SETTINGS_DONE | N | N | -| Pmon restart | N | Y | N | MEDIA_SETTINGS_DONE | N | N | -| Swss restart | Y | Y | Y | MEDIA_SETTINGS_DEFAULT | Y | Y | -| Syncd restart | Y | Y | Y | MEDIA_SETTINGS_DEFAULT | Y | Y | -| config reload | Y | Y | Y | MEDIA_SETTINGS_DEFAULT | Y | Y | -| Cold reboot | Y | Y | Y | MEDIA_SETTINGS_DEFAULT | Y | Y | -| Config shut | N | N | N | MEDIA_SETTINGS_DONE | N | Y | -| Config no shut | N | N | N | MEDIA_SETTINGS_DONE | N | Y | -| Warm reboot | N | Y | N | MEDIA_SETTINGS_DONE | N | N | +| Event | APPL_DB_ cleared | Xcvrd restarted | Media settings renotify | MEDIA_SETTINGS_SYNC_STATUS value on xcvrd boot-up for initialized transceiver | CMIS re-init triggered | Link flap | +|:--------------:|:------------------------:|:---------------:|:-----------------------:|:-------------------------------------------------------------------------------:|:----------------------:|:---------:| +| Xcvrd restart | N | Y | N | MEDIA_SETTINGS_DONE | N | N | +| Pmon restart | N | Y | N | MEDIA_SETTINGS_DONE | N | N | +| Swss restart | Y | Y | Y | MEDIA_SETTINGS_DEFAULT | Y | N/A | +| Syncd restart | Y | Y | Y | MEDIA_SETTINGS_DEFAULT | Y | N/A | +| config reload | Y | Y | Y | MEDIA_SETTINGS_DEFAULT | Y | N/A | +| Cold reboot | Y | Y | Y | MEDIA_SETTINGS_DEFAULT | Y | N/A | +| Config shut | N | N | N | MEDIA_SETTINGS_DONE | N | N/A | +| Config no shut | N | N | N | MEDIA_SETTINGS_DONE | N | N/A | +| Warm reboot | N | Y | N | MEDIA_SETTINGS_DONE | N | N | + # Out of Scope Following items are not in the scope of this document. They would be taken up separately 1. CMIS API feature is not part of this design and the APIs will be used in this design. For CMIS HLD, Please refer to: From 2aadb30659006f815448011b22d5acaca6aa6f66 Mon Sep 17 00:00:00 2001 From: Mihir Patel Date: Thu, 3 Aug 2023 17:33:11 +0000 Subject: [PATCH 06/16] Modified media settings to NPU SI settings for clearly differentiating module v/s NPU SI settings in future --- .../Interface-Link-bring-up-sequence.md | 148 +++++++++--------- 1 file changed, 78 insertions(+), 70 deletions(-) diff --git a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md index b01745ba9e..c4417507dc 100644 --- a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md +++ b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md @@ -191,48 +191,50 @@ if transceiver is not present: ## Overview When syncd/swss/orchagent crashes, all ports in the corresponding namespace will be reinitialized by xcvrd irrespective of the current state of the port. -If just xcvrd crashes and restarts, then forced reinitialization (CMIS reinit + media settings notify) of ports will not be performed. -CMIS_REINIT_REQUIRED and MEDIA_SETTINGS_SYNC_STATUS keys in APPL_DB PORT_TABLE:\ are used to determine if port reinitialization is required or not. +If just xcvrd crashes and restarts, then forced reinitialization (CMIS reinit + NPU SI settings notification) of ports will not be performed. +CMIS_REINIT_REQUIRED and NPU_SI_SETTINGS_SYNC_STATUS keys in PORT_TABLE:\ (APPL_DB) are used to determine if port reinitialization is required or not. - CMIS_REINIT_REQUIRED key states if CMIS reinitialization is required for a port after xcvrd is spawned. CMIS_REINIT_REQUIRED helps in mainly driving CMIS reinitialization after syncd/swss/orchagent crash since it will allow reinitializing ports belonging to the relevant namespace of the crashing process. This key is not planned to drive CMIS initialization after transceiver insertion. - - MEDIA_SETTINGS_SYNC_STATUS key is used as a means to communicate the status of applying media settings for a transceiver requiring media settings. This key is used to update the media settings application status between SfpStateUpdateTask, CmisManagerTask and Orchagent. In case of warm reboot or xcvrd restart, this key will prevent application of media settings on the port if media settings are already applied. In case of transceiver insertion, media settings will be applied irrespective of the media settings application status for the port. + - NPU_SI_SETTINGS_SYNC_STATUS key is used as a means to communicate the status of applying NPU SI settings for a transceiver requiring NPU SI settings. This key is used to update the NPU SI settings application status between SfpStateUpdateTask, CmisManagerTask and Orchagent. In case of warm reboot or xcvrd restart, this key will prevent application of NPU SI settings on the port if the settings are already applied. In case of transceiver insertion, NPU SI settings will be applied irrespective of the NPU SI settings application status for the port. Following infra will ensure port reinitialization by xcvrd in case of syncd/swss/orchagent crash 1. XCVRD main thread init - XCVRD main thread creates the key CMIS_REINIT_REQUIRED in PORT_TABLE:\ (APPL_DB) with value as true for ports which do NOT have this key present - - XCVRD main thread creates the key MEDIA_SETTINGS_SYNC_STATUS in PORT_TABLE:\ (APPL_DB) with value MEDIA_SETTINGS_DEFAULT for ports which do NOT have this key present. - - For transceivers which do not require media settings, MEDIA_SETTINGS_SYNC_STATUS will stay with value MEDIA_SETTINGS_DEFAULT + - XCVRD main thread creates the key NPU_SI_SETTINGS_SYNC_STATUS in PORT_TABLE:\ (APPL_DB) with value NPU_SI_SETTINGS_DEFAULT for ports which do NOT have this key present. + - For transceivers which do not require NPU SI settings, NPU_SI_SETTINGS_SYNC_STATUS will stay with value NPU_SI_SETTINGS_DEFAULT - Following table describes the various values for MEDIA_SETTINGS_SYNC_STATUS + Following table describes the various values for NPU_SI_SETTINGS_SYNC_STATUS A transceiver is classified as CMIS SM driven transceiver if its module type is CMIS and it does not have flat memory -| Value | Modifier thread and event | Consumer thread and purpose | -| ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- | -| MEDIA_SETTINGS_DEFAULT | 1\. XCVRD main thread during cold start of XCVRD
2. SfpStateUpdateTask during transceiver removal | XCVRD main thread during boot-up for deciding to notify media settings | -| MEDIA_SETTINGS_NOTIFIED | 1\. SfpStateUpdateTask while updating and notifying the media settings for non-CMIS SM driven transceivers
2. CmisManagerTask while updating and notifying the media settings for CMIS SM driven transceivers | Not being used currently | -| MEDIA_SETTINGS_DONE | Orchagent after applying the SI settings | CmisManagerTask for proceeding from CMIS_STATE_MEDIA_SETTINGS_WAIT to CMIS_STATE_AP_CONF state | +| Value | Modifier thread and event | Consumer thread and purpose | +| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- | +| NPU_SI_SETTINGS_DEFAULT | 1\. XCVRD main thread during cold start of XCVRD
2. SfpStateUpdateTask during transceiver removal | XCVRD main thread during boot-up for deciding to notify NPU SI settings | +| NPU_SI_SETTINGS_NOTIFIED | 1\. SfpStateUpdateTask while updating and notifying the NPU SI settings for non-CMIS SM driven transceivers
2. CmisManagerTask while updating and notifying the NPU SI settings for CMIS SM driven transceivers | Not being used currently | +| NPU_SI_SETTINGS_DONE | Orchagent after applying the SI settings | CmisManagerTask for proceeding from CMIS_STATE_NPU_SI_SETTINGS_WAIT to CMIS_STATE_DP_INIT state | -2. Update and notify media settings to OA - For non-CMIS SM driven transceivers, SfpStateUpdateTask thread will update and notify the media settings to OA based on the value of PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS - If PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS != MEDIA_SETTINGS_DONE, notify media settings will be invoked and will be set to MEDIA_SETTINGS_NOTIFIED for a port requiring media settings. - For CMIS SM driven transceivers, CmisManagerTask thread will update and notify the media settings to OA based on the value of PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS -3. The OA upon receiving media settings will +2. Update and notify NPU SI settings to OA + For non-CMIS SM driven transceivers, SfpStateUpdateTask thread will update NPU SI settings in the PORT_TABLE (APPL_DB) and notify to OA based on the value of PORT_TABLE:\.NPU_SI_SETTINGS_SYNC_STATUS + If PORT_TABLE:\.NPU_SI_SETTINGS_SYNC_STATUS != NPU_SI_SETTINGS_DONE, update and notify NPU SI settings will be invoked and will be set to NPU_SI_SETTINGS_NOTIFIED for a port requiring NPU SI settings. + For CMIS SM driven transceivers, based on the value of PORT_TABLE:\.NPU_SI_SETTINGS_SYNC_STATUS, CmisManagerTask thread will update NPU SI settings in the PORT_TABLE (APPL_DB) and notify to OA for a port requiring NPU SI settings. The CMIS SM will then transition from CMIS_STATE_AP_CONF to CMIS_STATE_NPU_SI_SETTINGS_WAIT. If port doesn't require NPU SI settings, CMIS SM will transition to CMIS_STATE_DP_INIT state. + +3. The OA upon receiving NPU SI settings will - Disable port admin status - - Request SAI-SDK to apply the media settings via syncd - - If SAI-SDK returns success, OA will update the PORT_TABLE:\.MEDIA_SETTINGS_SYNC_STATUS to MEDIA_SETTINGS_DONE + - Request SAI-SDK to apply the NPU SI settings via syncd + - If SAI-SDK returns success, OA will update the PORT_TABLE:\.NPU_SI_SETTINGS_SYNC_STATUS to NPU_SI_SETTINGS_DONE - In case of failure, OA will log an error message and proceed to handling the next port -4. For transceivers requiring media settings, media settings will be updated and notified to OA in the CMIS_STATE_AP_PRE_CONF state. Then, CMIS SM will transition to CMIS_STATE_MEDIA_SETTINGS_WAIT state. - If port doesn't require media settings to be applied, CMIS SM will transition to CMIS_STATE_AP_CONF state. - -5. CMIS_STATE_MEDIA_SETTINGS_WAIT state will wait for MEDIA_SETTINGS_DONE and upon reaching to MEDIA_SETTINGS_DONE, CMIS SM will transition to CMIS_STATE_AP_CONF state. +4. CMIS_STATE_NPU_SI_SETTINGS_WAIT state will wait for NPU_SI_SETTINGS_DONE and upon reaching to NPU_SI_SETTINGS_DONE, CMIS SM will transition to CMIS_STATE_DP_INIT state. There will be a timeout of 5s for every retry -6. The CmisManagerTask thread will set “CMIS_REINIT_REQUIRED" to false after CMIS SM reaches to a steady state (CMIS_STATE_UNKNOWN, CMIS_STATE_FAILED, CMIS_STATE_READY and CMIS_STATE_REMOVED) for the corresponding port -7. XCVRD will subscribe to PORT_TABLE in APPL_DB and trigger self-restart if the PORT_TABLE is deleted for the namespace. -All threads will be gracefully terminated and xcvrd deinit will be performed followed by issuing a SIGABRT to ensure XCVRD is restarted automatically by supervisord. After respawn, CMIS re-init and media_settings notified is triggered for the ports belonging to the affected namespace -8. syncd/swss/orchagent restart clears the entire APPL-DB, including “MEDIA_SETTINGS_SYNC_STATUS” and "CMIS_REINIT_REQUIRED" in PORT_TABLE -9. In case of warm reboot, the APPL_DB is not cleared and hence, once xcvrd is spawned after the reboot, the ports are not initialized again. + +5. The CmisManagerTask thread will set “CMIS_REINIT_REQUIRED" to false after CMIS SM reaches to a steady state (CMIS_STATE_UNKNOWN, CMIS_STATE_FAILED, CMIS_STATE_READY and CMIS_STATE_REMOVED) for the corresponding port + +6. XCVRD will subscribe to PORT_TABLE in APPL_DB and trigger self-restart if the PORT_TABLE is deleted for the namespace. +All threads will be gracefully terminated and xcvrd deinit will be performed followed by issuing a SIGABRT to ensure XCVRD is restarted automatically by supervisord. After respawn, CMIS re-init and NPU_SI_SETTINGS notified is triggered for the ports belonging to the affected namespace + +7. syncd/swss/orchagent restart clears the entire APPL-DB, including “NPU_SI_SETTINGS_SYNC_STATUS” and "CMIS_REINIT_REQUIRED" in PORT_TABLE + +8. In case of warm reboot, the APPL_DB is not cleared and hence, once xcvrd is spawned after the reboot, the ports are not initialized again. ## XCVRD init sequence to support port reinitialization during syncd/swss/orchagent crash @@ -244,21 +246,21 @@ sequenceDiagram participant SfpStateUpdateTask participant DomInfoUpdateTask - Note over XCVRDMT: Load new platform specific api class,
sfputil class and load namespace details
load_media_settings() + Note over XCVRDMT: Load new platform specific api class,
sfputil class and load namespace details
load_NPU_SI_SETTINGS() XCVRDMT ->> XCVRDMT: Wait for port config completion loop lport in logical_port_list alt if CMIS_REINIT_REQUIRED not in PORT_TABLE: XCVRDMT ->> APPL_DB: PORT_TABLE:.CMIS_REINIT_REQUIRED = true end - alt if MEDIA_SETTINGS_SYNC_STATUS not in PORT_TABLE: - XCVRDMT ->> APPL_DB: PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS = MEDIA_SETTINGS_DEFAULT + alt if NPU_SI_SETTINGS_SYNC_STATUS not in PORT_TABLE: + XCVRDMT ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_DEFAULT end end - Note over APPL_DB: PORT_TABLE:
CMIS_REINIT_REQUIRED : true/false
MEDIA_NOTIFY_REQUIRED : true/false + Note over APPL_DB: PORT_TABLE:
CMIS_REINIT_REQUIRED : true/false
NPU SI_NOTIFY_REQUIRED : true/false XCVRDMT ->> CmisManagerTask: Spawns XCVRDMT ->> DomInfoUpdateTask: Spawns XCVRDMT ->> SfpStateUpdateTask: Spawns - par XCVRDMT, CmisManagerTask, SfpStateUpdateTask, DomInfoUpdateTask + par XCVRD main thread, CmisManagerTask, SfpStateUpdateTask, DomInfoUpdateTask loop Wait for stop_event else poll every 60s DomInfoUpdateTask->>DomInfoUpdateTask: Update TRANSCEIVER_DOM_SENSOR,
TRANSCEIVER_STATUS (HW section)
TRANSCEIVER_PM tables end @@ -276,7 +278,7 @@ sequenceDiagram end ``` -## SfpStateUpdateTask's role to notify media settings to OA during xcvrd boot-up +## SfpStateUpdateTask's role to notify NPU SI settings to OA during xcvrd boot-up ```mermaid sequenceDiagram @@ -289,13 +291,13 @@ sequenceDiagram loop lport in logical_port_list alt post_port_sfp_info_to_db != SFP_EEPROM_NOT_READY Note over SfpStateUpdateTask: post_port_dom_threshold_info_to_db - opt PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS != MEDIA_SETTINGS_DONE - opt if lport requires media settings - SfpStateUpdateTask ->> APPL_DB: Update SI params from media_settings.json to PORT_TABLE: - SfpStateUpdateTask ->> APPL_DB: PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS = MEDIA_SETTINGS_NOTIFIED - APPL_DB -->> OA: Notify media settings for ports + opt if is_module_cmis_sm_driven and PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS != NPU_SI_SETTINGS_DONE + opt if module_requires_npu_si_settings + SfpStateUpdateTask ->> APPL_DB: Update SI params from NPU_SI_SETTINGS.json to PORT_TABLE: + SfpStateUpdateTask ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_NOTIFIED + APPL_DB -->> OA: Notify NPU SI settings for ports Note over OA: Disable admin status
setPortSerdesAttribute - OA ->> APPL_DB: PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS = MEDIA_SETTINGS_DONE + OA ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_DONE Note over OA: initHostTxReadyState end end @@ -309,9 +311,9 @@ sequenceDiagram end ``` -## CMIS State machine with introduction of CMIS_STATE_AP_PRE_CONF and CMIS_STATE_MEDIA_SETTINGS_WAIT states +## CMIS State machine with introduction of CMIS_STATE_NPU_SI_SETTINGS_WAIT states -The below state machine is a high level flow and doesn't capture details for states other than CMIS_STATE_AP_PRE_CONF, CMIS_STATE_MEDIA_SETTINGS_WAIT and CMIS_STATE_AP_CONF +The below state machine is a high level flow and doesn't capture details for states other than CMIS_STATE_AP_CONF and CMIS_STATE_NPU_SI_SETTINGS_WAIT ```mermaid stateDiagram @@ -324,23 +326,21 @@ stateDiagram if_state --> if_state2 : if host_tx_ready == True and
admin_status == up if_state2 --> CMIS_STATE_DP_DEINIT : if PORT_TABLE.port.CMIS_REINIT_REQUIRED == true or
is_cmis_application_update_required note left of CMIS_STATE_READY : PORT_TABLE.port.CMIS_REINIT_REQUIRED = false - if_state2 --> CMIS_STATE_FAILED : if appl < 1 or
host_lanes_mask <= 0 or
media_lanes_mask <= 0 + if_state2 --> CMIS_STATE_FAILED : if appl < 1 or
host_lanes_mask <= 0 or
NPU SI_lanes_mask <= 0 note left of CMIS_STATE_FAILED : PORT_TABLE.port.CMIS_REINIT_REQUIRED = false - CMIS_STATE_DP_DEINIT --> CMIS_STATE_AP_PRE_CONF + CMIS_STATE_DP_DEINIT --> CMIS_STATE_AP_CONF - note left of CMIS_STATE_AP_PRE_CONF : Ensure current states are ModuleReady and DataPathDeactivated
Configure laser frequency for ZR module
Update host and media SI settings and notify to OA - CMIS_STATE_AP_PRE_CONF --> if_state3 - if_state3 --> CMIS_STATE_MEDIA_SETTINGS_WAIT : if module_requires_media_settings - if_state3 --> CMIS_STATE_AP_CONF : if not module_requires_media_settings - CMIS_STATE_MEDIA_SETTINGS_WAIT --> CMIS_STATE_AP_CONF : if PORT_TABLE<port>.MEDIA_SETTINGS_SYNC_STATUS == MEDIA_SETTINGS_DONE - CMIS_STATE_MEDIA_SETTINGS_WAIT --> CMIS_STATE_INSERTED : Through force_cmis_reinit upon reaching timeout - note right of CMIS_STATE_MEDIA_SETTINGS_WAIT - Checks if PORT_TABLE<port>.MEDIA_SETTINGS_SYNC_STATUS == MEDIA_SETTINGS_DONE + note left of CMIS_STATE_AP_CONF : Ensure current states are ModuleReady and DataPathDeactivated
Configure laser frequency for ZR module
Apply module SI settings
Update NPU SI settings to PORT_TABLE (APPL_DB) and notify to OA
set_application + CMIS_STATE_AP_CONF --> if_state3 + if_state3 --> CMIS_STATE_NPU_SI_SETTINGS_WAIT : if module_requires_npu_si_settings + if_state3 --> CMIS_STATE_DP_INIT : if not module_requires_npu_si_settings + CMIS_STATE_NPU_SI_SETTINGS_WAIT --> CMIS_STATE_DP_INIT : if PORT_TABLE<port>.NPU_SI_SETTINGS_SYNC_STATUS == NPU_SI_SETTINGS_DONE + CMIS_STATE_NPU_SI_SETTINGS_WAIT --> CMIS_STATE_INSERTED : Through force_cmis_reinit upon reaching timeout + note right of CMIS_STATE_NPU_SI_SETTINGS_WAIT + Checks if PORT_TABLE<port>.NPU_SI_SETTINGS_SYNC_STATUS == NPU_SI_SETTINGS_DONE After 5s timeout, force_cmis_reinit will be called end note - CMIS_STATE_AP_CONF --> CMIS_STATE_DP_INIT - note right of CMIS_STATE_AP_CONF: set_application CMIS_STATE_DP_INIT --> CMIS_STATE_DP_TXON CMIS_STATE_DP_TXON --> CMIS_STATE_DP_ACTIVATE CMIS_STATE_DP_ACTIVATE --> CMIS_STATE_READY @@ -360,23 +360,31 @@ sequenceDiagram SfpStateUpdateTask -x STATE_DB : Delete TRANSCEIVER_INFO table for the port par CmisManagerTask, SfpStateUpdateTask CmisManagerTask ->> CmisManagerTask : Transition CMIS SM to CMIS_STATE_REMOVED - SfpStateUpdateTask ->> APPL_DB : PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS =
MEDIA_SETTINGS_DEFAULT + SfpStateUpdateTask ->> APPL_DB : PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS =
NPU_SI_SETTINGS_DEFAULT end SfpStateUpdateTask ->> SfpStateUpdateTask : event = SFP_STATUS_INSERTED SfpStateUpdateTask ->> STATE_DB : Create TRANSCEIVER_INFO table for the port par CmisManagerTask, SfpStateUpdateTask CmisManagerTask ->> CmisManagerTask : Transition CMIS SM to CMIS_STATE_INSERTED - opt does_xcvr_require_cmis_sm and module_requires_media_settings - SfpStateUpdateTask ->> APPL_DB: Update SI params from media_settings.json to PORT_TABLE: - SfpStateUpdateTask ->> APPL_DB: PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS =
MEDIA_SETTINGS_NOTIFIED + CmisManagerTask ->> CmisManagerTask : Eventually, CMIS SM transitions to CMIS_STATE_AP_CONF + opt is_module_cmis_sm_driven and module_requires_npu_si_settings + SfpStateUpdateTask ->> APPL_DB: Update SI params from NPU_SI_SETTINGS.json to PORT_TABLE: + SfpStateUpdateTask ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS =
NPU_SI_SETTINGS_NOTIFIED + end + opt not is_module_cmis_sm_driven and module_requires_npu_si_settings + CmisManagerTask ->> APPL_DB: Update SI params from NPU_SI_SETTINGS.json to PORT_TABLE: + CmisManagerTask ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS =
NPU_SI_SETTINGS_NOTIFIED + CmisManagerTask ->> CmisManagerTask : Transition CMIS SM to CMIS_STATE_NPU_SI_SETTINGS_WAIT end activate OA - APPL_DB -->> OA: Notify media settings for ports + APPL_DB -->> OA: Notify NPU SI settings for ports Note over OA: Disable admin status
setPortSerdesAttribute - OA ->> APPL_DB: PORT_TABLE:.MEDIA_SETTINGS_SYNC_STATUS = MEDIA_SETTINGS_DONE + OA ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_DONE + APPL_DB --> CmisManagerTask: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_DONE Note over OA: initHostTxReadyState deactivate OA + CmisManagerTask ->> CmisManagerTask : Transition CMIS SM to CMIS_STATE_DP_INIT end ``` @@ -427,17 +435,17 @@ sequenceDiagram ``` ## Test plan and expectation -| Event | APPL_DB_ cleared | Xcvrd restarted | Media settings renotify | MEDIA_SETTINGS_SYNC_STATUS value on xcvrd boot-up for initialized transceiver | CMIS re-init triggered | Link flap | -|:--------------:|:------------------------:|:---------------:|:-----------------------:|:-------------------------------------------------------------------------------:|:----------------------:|:---------:| -| Xcvrd restart | N | Y | N | MEDIA_SETTINGS_DONE | N | N | -| Pmon restart | N | Y | N | MEDIA_SETTINGS_DONE | N | N | -| Swss restart | Y | Y | Y | MEDIA_SETTINGS_DEFAULT | Y | N/A | -| Syncd restart | Y | Y | Y | MEDIA_SETTINGS_DEFAULT | Y | N/A | -| config reload | Y | Y | Y | MEDIA_SETTINGS_DEFAULT | Y | N/A | -| Cold reboot | Y | Y | Y | MEDIA_SETTINGS_DEFAULT | Y | N/A | -| Config shut | N | N | N | MEDIA_SETTINGS_DONE | N | N/A | -| Config no shut | N | N | N | MEDIA_SETTINGS_DONE | N | N/A | -| Warm reboot | N | Y | N | MEDIA_SETTINGS_DONE | N | N | +| Event | APPL_DB_ cleared | Xcvrd restarted | NPU SI settings renotify | NPU_SI_SETTINGS_SYNC_STATUS value on xcvrd boot-up for initialized transceiver | CMIS re-init triggered | Link flap | +| -------------- | ------------------------ | --------------- | ------------------------ | ------------------------------------------------------------------------------ | ---------------------- | --------- | +| Xcvrd restart | N | Y | N | NPU_SI_SETTINGS_DONE | N | N | +| Pmon restart | N | Y | N | NPU_SI_SETTINGS_DONE | N | N | +| Swss restart | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | +| Syncd restart | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | +| config reload | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | +| Cold reboot | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | +| Config shut | N | N | N | NPU_SI_SETTINGS_DONE | N | N/A | +| Config no shut | N | N | N | NPU_SI_SETTINGS_DONE | N | N/A | +| Warm reboot | N | Y | N | NPU_SI_SETTINGS_DONE | N | N | # Out of Scope Following items are not in the scope of this document. They would be taken up separately From cb4e2d205e75a850f922cf8dfcedf91edf83a85e Mon Sep 17 00:00:00 2001 From: Mihir Patel Date: Thu, 10 Aug 2023 20:33:14 +0000 Subject: [PATCH 07/16] Fixed typo --- doc/sfp-cmis/Interface-Link-bring-up-sequence.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md index c4417507dc..c89427efbc 100644 --- a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md +++ b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md @@ -291,7 +291,7 @@ sequenceDiagram loop lport in logical_port_list alt post_port_sfp_info_to_db != SFP_EEPROM_NOT_READY Note over SfpStateUpdateTask: post_port_dom_threshold_info_to_db - opt if is_module_cmis_sm_driven and PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS != NPU_SI_SETTINGS_DONE + opt if not is_module_cmis_sm_driven and PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS != NPU_SI_SETTINGS_DONE opt if module_requires_npu_si_settings SfpStateUpdateTask ->> APPL_DB: Update SI params from NPU_SI_SETTINGS.json to PORT_TABLE: SfpStateUpdateTask ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_NOTIFIED From 49acc4c77eff2cce1928017f7ccf5437d51b50bb Mon Sep 17 00:00:00 2001 From: Mihir Patel Date: Fri, 22 Sep 2023 05:20:54 +0000 Subject: [PATCH 08/16] Addressed PR comments --- .../Interface-Link-bring-up-sequence.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md index c89427efbc..8c2d6f59cf 100644 --- a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md +++ b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md @@ -17,7 +17,7 @@ Deterministic Approach for Interface Link bring-up sequence * [Pre-requisite](#pre-requisite) * [Breakout handling](#breakout-handling) * [Proposed Work-Flows](#proposed-work-flows) - * [Port reinitialization during syncd/swss/orchagent crash](#port-reinitialization-during-syncdswssorchagent-crash) + * [Port re-initialization during syncd/swss/orchagent crash](#port-re-initialization-during-syncdswssorchagent-crash) # List of Tables * [Table 1: Definitions](#table-1-definitions) @@ -126,7 +126,7 @@ Plan is to follow this high-level work-flow sequence to accomplish the Objective - deterministic approach to bring the interface will eliminate any link stability issue which will be difficult to chase in the production network e.g. If there is a PHY device in between, and this 'deterministic approach' is not followed, PHY may adapt to a bad signal or interface flaps may occur when the optics tx/rx enabled during PHY initialization. - there is a possibility of interface link flaps with non-quiescent optical modules if this 'deterministic approach' is not followed - - It helps bring down the optical module laser when interface is adminstiratively shutdown. Per the workflow here, this is acheived by xcvrd listening to host_tx_ready field from PORT_TABLE of STATE_DB. Turning the laser off would reduce the power consumption and avoid any lab hazard + - It helps bring down the optical module laser when interface is adminstiratively shutdown. Per the workflow here, this is achieved by xcvrd listening to host_tx_ready field from PORT_TABLE of STATE_DB. Turning the laser off would reduce the power consumption and avoid any lab hazard - Additionally provides uniform workflow (from SONiC NOS) across all interface types with or without module presence. - This synchronization will also benefit SFP+ optical modules as they are "plug N play" and may not have quiescent functionality. (xcvrd can use the optional 'soft tx disable' ctrl reg to disable the tx) @@ -187,16 +187,17 @@ if transceiver is not present: - All the workflows mentioned above will reamin same ( or get exercised) till host_tx_ready field update - xcvrd will not perform any action on receiving host_tx_ready field update -# Port reinitialization during syncd/swss/orchagent crash +# Port re-initialization during syncd/swss/orchagent crash ## Overview -When syncd/swss/orchagent crashes, all ports in the corresponding namespace will be reinitialized by xcvrd irrespective of the current state of the port. -If just xcvrd crashes and restarts, then forced reinitialization (CMIS reinit + NPU SI settings notification) of ports will not be performed. -CMIS_REINIT_REQUIRED and NPU_SI_SETTINGS_SYNC_STATUS keys in PORT_TABLE:\ (APPL_DB) are used to determine if port reinitialization is required or not. - - CMIS_REINIT_REQUIRED key states if CMIS reinitialization is required for a port after xcvrd is spawned. CMIS_REINIT_REQUIRED helps in mainly driving CMIS reinitialization after syncd/swss/orchagent crash since it will allow reinitializing ports belonging to the relevant namespace of the crashing process. This key is not planned to drive CMIS initialization after transceiver insertion. +When syncd/swss/orchagent crashes, all ports in the corresponding namespace will be reinitialized by xcvrd irrespective of the current state of the port. All the corresponding ports are expected to experience link down until the initialization is complete. +If just xcvrd crashes and restarts, then forced re-initialization (CMIS reinit + NPU SI settings notification) of ports will not be performed. Hence, the ports will not experience link downtime during scenario. +CMIS_REINIT_REQUIRED and NPU_SI_SETTINGS_SYNC_STATUS keys in PORT_TABLE:\ (APPL_DB) are used to determine if port re-initialization is required or not. + - CMIS_REINIT_REQUIRED key states if CMIS re-initialization is required for a port after xcvrd is spawned. CMIS_REINIT_REQUIRED helps in mainly driving CMIS re-initialization after syncd/swss/orchagent crash since it will allow reinitializing ports belonging to the relevant namespace of the crashing process. This key is not planned to drive CMIS initialization after transceiver insertion. - NPU_SI_SETTINGS_SYNC_STATUS key is used as a means to communicate the status of applying NPU SI settings for a transceiver requiring NPU SI settings. This key is used to update the NPU SI settings application status between SfpStateUpdateTask, CmisManagerTask and Orchagent. In case of warm reboot or xcvrd restart, this key will prevent application of NPU SI settings on the port if the settings are already applied. In case of transceiver insertion, NPU SI settings will be applied irrespective of the NPU SI settings application status for the port. +In case of continuous restart of xcvrd, both the keys will still hold the same value as before the restart. This would ensure that the port re-initialization is resumed from the last known state. -Following infra will ensure port reinitialization by xcvrd in case of syncd/swss/orchagent crash +Following infra will ensure port re-initialization by xcvrd in case of syncd/swss/orchagent crash 1. XCVRD main thread init - XCVRD main thread creates the key CMIS_REINIT_REQUIRED in PORT_TABLE:\ (APPL_DB) with value as true for ports which do NOT have this key present @@ -236,7 +237,7 @@ All threads will be gracefully terminated and xcvrd deinit will be performed fol 8. In case of warm reboot, the APPL_DB is not cleared and hence, once xcvrd is spawned after the reboot, the ports are not initialized again. -## XCVRD init sequence to support port reinitialization during syncd/swss/orchagent crash +## XCVRD init sequence to support port re-initialization during syncd/swss/orchagent crash ```mermaid sequenceDiagram From 2b298bf97396abbdb7d1fd842d9ac77fd4af6a12 Mon Sep 17 00:00:00 2001 From: Mihir Patel Date: Fri, 22 Sep 2023 05:26:03 +0000 Subject: [PATCH 09/16] Reverted unrelated changeset --- doc/sfp-cmis/Interface-Link-bring-up-sequence.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md index 8c2d6f59cf..b4b8214f70 100644 --- a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md +++ b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md @@ -126,7 +126,7 @@ Plan is to follow this high-level work-flow sequence to accomplish the Objective - deterministic approach to bring the interface will eliminate any link stability issue which will be difficult to chase in the production network e.g. If there is a PHY device in between, and this 'deterministic approach' is not followed, PHY may adapt to a bad signal or interface flaps may occur when the optics tx/rx enabled during PHY initialization. - there is a possibility of interface link flaps with non-quiescent optical modules if this 'deterministic approach' is not followed - - It helps bring down the optical module laser when interface is adminstiratively shutdown. Per the workflow here, this is achieved by xcvrd listening to host_tx_ready field from PORT_TABLE of STATE_DB. Turning the laser off would reduce the power consumption and avoid any lab hazard + - It helps bring down the optical module laser when interface is adminstiratively shutdown. Per the workflow here, this is acheived by xcvrd listening to host_tx_ready field from PORT_TABLE of STATE_DB. Turning the laser off would reduce the power consumption and avoid any lab hazard - Additionally provides uniform workflow (from SONiC NOS) across all interface types with or without module presence. - This synchronization will also benefit SFP+ optical modules as they are "plug N play" and may not have quiescent functionality. (xcvrd can use the optional 'soft tx disable' ctrl reg to disable the tx) From b1082ba01d9ad93895d1095a177f5fea9f87b401 Mon Sep 17 00:00:00 2001 From: Mihir Patel Date: Fri, 22 Sep 2023 21:21:00 +0000 Subject: [PATCH 10/16] Addressed PR comments --- .../Interface-Link-bring-up-sequence.md | 29 ++++++++++++------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md index b4b8214f70..ec8a1c55c5 100644 --- a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md +++ b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md @@ -233,7 +233,7 @@ There will be a timeout of 5s for every retry 6. XCVRD will subscribe to PORT_TABLE in APPL_DB and trigger self-restart if the PORT_TABLE is deleted for the namespace. All threads will be gracefully terminated and xcvrd deinit will be performed followed by issuing a SIGABRT to ensure XCVRD is restarted automatically by supervisord. After respawn, CMIS re-init and NPU_SI_SETTINGS notified is triggered for the ports belonging to the affected namespace -7. syncd/swss/orchagent restart clears the entire APPL-DB, including “NPU_SI_SETTINGS_SYNC_STATUS” and "CMIS_REINIT_REQUIRED" in PORT_TABLE +7. syncd/swss/orchagent restart (restart triggered due to docker container crash) clears the entire APPL-DB, including “NPU_SI_SETTINGS_SYNC_STATUS” and "CMIS_REINIT_REQUIRED" in PORT_TABLE 8. In case of warm reboot, the APPL_DB is not cleared and hence, once xcvrd is spawned after the reboot, the ports are not initialized again. @@ -436,18 +436,25 @@ sequenceDiagram ``` ## Test plan and expectation + +**Process and device crash/restart and interface config command handling testplan** | Event | APPL_DB_ cleared | Xcvrd restarted | NPU SI settings renotify | NPU_SI_SETTINGS_SYNC_STATUS value on xcvrd boot-up for initialized transceiver | CMIS re-init triggered | Link flap | | -------------- | ------------------------ | --------------- | ------------------------ | ------------------------------------------------------------------------------ | ---------------------- | --------- | -| Xcvrd restart | N | Y | N | NPU_SI_SETTINGS_DONE | N | N | -| Pmon restart | N | Y | N | NPU_SI_SETTINGS_DONE | N | N | -| Swss restart | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | -| Syncd restart | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | -| config reload | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | -| Cold reboot | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | -| Config shut | N | N | N | NPU_SI_SETTINGS_DONE | N | N/A | -| Config no shut | N | N | N | NPU_SI_SETTINGS_DONE | N | N/A | -| Warm reboot | N | Y | N | NPU_SI_SETTINGS_DONE | N | N | - +| Xcvrd restart | N | Y | N | NPU_SI_SETTINGS_DONE | N | N | +| Pmon restart | N | Y | N | NPU_SI_SETTINGS_DONE | N | N | +| Swss restart | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | +| Syncd restart | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | +| config reload | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | +| Cold reboot | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | +| Warm reboot | N | Y | N | NPU_SI_SETTINGS_DONE | N | N | +| config interface shut | N | N | N | NPU_SI_SETTINGS_DONE | N | N/A | +| config interface no shut | N | N | N | NPU_SI_SETTINGS_DONE | N | N/A | + +**Transceiver OIR testplan** +| Event | APPL_DB_ cleared | Xcvrd restarted | NPU SI settings notified | NPU_SI_SETTINGS_SYNC_STATUS value upon event completion | CMIS init triggered | +| -------------- | ------------------------ | --------------- | ------------------------ | ------------------------------------------------------------------------------ | ---------------------- | +| Transceiver Removal | N | N | Y | NPU_SI_SETTINGS_DEFAULT | Y | +| Transceiver Insertion | N | N | Y | NPU_SI_SETTINGS_DONE | Y | # Out of Scope Following items are not in the scope of this document. They would be taken up separately 1. CMIS API feature is not part of this design and the APIs will be used in this design. For CMIS HLD, Please refer to: From 2d1e79b6d7b19643704dedee51d15a20008838d7 Mon Sep 17 00:00:00 2001 From: Mihir Patel Date: Fri, 29 Sep 2023 21:53:45 +0000 Subject: [PATCH 11/16] Added pre-requisite section and added testcase for OA restart --- doc/sfp-cmis/Interface-Link-bring-up-sequence.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md index ec8a1c55c5..2093f80126 100644 --- a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md +++ b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md @@ -199,6 +199,13 @@ In case of continuous restart of xcvrd, both the keys will still hold the same v Following infra will ensure port re-initialization by xcvrd in case of syncd/swss/orchagent crash +Pre-requisites for the infra to work: + - APPL_DB should be cleared after syncd/swss/orchagent crash/restart, config reload and cold reboot + - APPL_DB should be retained after + - pmon crash/restart + - xcvrd crash/restart (does not apply to xcvrd restart triggered due to a different docker container crash/restart) + - warm reboot + 1. XCVRD main thread init - XCVRD main thread creates the key CMIS_REINIT_REQUIRED in PORT_TABLE:\ (APPL_DB) with value as true for ports which do NOT have this key present - XCVRD main thread creates the key NPU_SI_SETTINGS_SYNC_STATUS in PORT_TABLE:\ (APPL_DB) with value NPU_SI_SETTINGS_DEFAULT for ports which do NOT have this key present. @@ -442,13 +449,14 @@ sequenceDiagram | -------------- | ------------------------ | --------------- | ------------------------ | ------------------------------------------------------------------------------ | ---------------------- | --------- | | Xcvrd restart | N | Y | N | NPU_SI_SETTINGS_DONE | N | N | | Pmon restart | N | Y | N | NPU_SI_SETTINGS_DONE | N | N | +| orchagent restart | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | | Swss restart | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | | Syncd restart | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | | config reload | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | | Cold reboot | Y | Y | Y | NPU_SI_SETTINGS_DEFAULT | Y | N/A | | Warm reboot | N | Y | N | NPU_SI_SETTINGS_DONE | N | N | -| config interface shut | N | N | N | NPU_SI_SETTINGS_DONE | N | N/A | -| config interface no shut | N | N | N | NPU_SI_SETTINGS_DONE | N | N/A | +| config interface shutdown | N | N | N | NPU_SI_SETTINGS_DONE | N | N/A | +| config interface startup | N | N | N | NPU_SI_SETTINGS_DONE | N | N/A | **Transceiver OIR testplan** | Event | APPL_DB_ cleared | Xcvrd restarted | NPU SI settings notified | NPU_SI_SETTINGS_SYNC_STATUS value upon event completion | CMIS init triggered | From b68d2923f6c04caac961a7ec60636a4b31e83813 Mon Sep 17 00:00:00 2001 From: Mihir Patel Date: Thu, 5 Oct 2023 21:43:37 +0000 Subject: [PATCH 12/16] Corrected typo in XCVRD init sequence diagram --- doc/sfp-cmis/Interface-Link-bring-up-sequence.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md index 2093f80126..c77514e005 100644 --- a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md +++ b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md @@ -224,7 +224,7 @@ A transceiver is classified as CMIS SM driven transceiver if its module type is 2. Update and notify NPU SI settings to OA For non-CMIS SM driven transceivers, SfpStateUpdateTask thread will update NPU SI settings in the PORT_TABLE (APPL_DB) and notify to OA based on the value of PORT_TABLE:\.NPU_SI_SETTINGS_SYNC_STATUS If PORT_TABLE:\.NPU_SI_SETTINGS_SYNC_STATUS != NPU_SI_SETTINGS_DONE, update and notify NPU SI settings will be invoked and will be set to NPU_SI_SETTINGS_NOTIFIED for a port requiring NPU SI settings. - For CMIS SM driven transceivers, based on the value of PORT_TABLE:\.NPU_SI_SETTINGS_SYNC_STATUS, CmisManagerTask thread will update NPU SI settings in the PORT_TABLE (APPL_DB) and notify to OA for a port requiring NPU SI settings. The CMIS SM will then transition from CMIS_STATE_AP_CONF to CMIS_STATE_NPU_SI_SETTINGS_WAIT. If port doesn't require NPU SI settings, CMIS SM will transition to CMIS_STATE_DP_INIT state. + For CMIS SM driven transceivers, if the value of PORT_TABLE:\.NPU_SI_SETTINGS_SYNC_STATUS is NPU_SI_SETTINGS_DEFAULT and the module requires NPU SI settings, CmisManagerTask thread will update NPU SI settings in the PORT_TABLE (APPL_DB) and notify to OA. The CMIS SM will then transition from CMIS_STATE_AP_CONF to CMIS_STATE_NPU_SI_SETTINGS_WAIT. If port doesn't require NPU SI settings, CMIS SM will transition to CMIS_STATE_DP_INIT state. 3. The OA upon receiving NPU SI settings will - Disable port admin status @@ -264,7 +264,7 @@ sequenceDiagram XCVRDMT ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_DEFAULT end end - Note over APPL_DB: PORT_TABLE:
CMIS_REINIT_REQUIRED : true/false
NPU SI_NOTIFY_REQUIRED : true/false + Note over APPL_DB: PORT_TABLE:
CMIS_REINIT_REQUIRED : true/false
NPU_SI_SETTINGS_SYNC_STATUS : NPU_SI_SETTINGS_DEFAULT/NPU_SI_SETTINGS_NOTIFIED/NPU_SI_SETTINGS_DONE XCVRDMT ->> CmisManagerTask: Spawns XCVRDMT ->> DomInfoUpdateTask: Spawns XCVRDMT ->> SfpStateUpdateTask: Spawns From 2bfc7796bf13bc89c02f79303894ea43a322b80d Mon Sep 17 00:00:00 2001 From: Mihir Patel Date: Tue, 10 Oct 2023 00:16:28 +0000 Subject: [PATCH 13/16] Added description and functioning related to is_npu_si_settings_update_required --- .../Interface-Link-bring-up-sequence.md | 39 ++++++++++--------- 1 file changed, 21 insertions(+), 18 deletions(-) diff --git a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md index c77514e005..8f081dda11 100644 --- a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md +++ b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md @@ -222,9 +222,14 @@ A transceiver is classified as CMIS SM driven transceiver if its module type is 2. Update and notify NPU SI settings to OA - For non-CMIS SM driven transceivers, SfpStateUpdateTask thread will update NPU SI settings in the PORT_TABLE (APPL_DB) and notify to OA based on the value of PORT_TABLE:\.NPU_SI_SETTINGS_SYNC_STATUS - If PORT_TABLE:\.NPU_SI_SETTINGS_SYNC_STATUS != NPU_SI_SETTINGS_DONE, update and notify NPU SI settings will be invoked and will be set to NPU_SI_SETTINGS_NOTIFIED for a port requiring NPU SI settings. - For CMIS SM driven transceivers, if the value of PORT_TABLE:\.NPU_SI_SETTINGS_SYNC_STATUS is NPU_SI_SETTINGS_DEFAULT and the module requires NPU SI settings, CmisManagerTask thread will update NPU SI settings in the PORT_TABLE (APPL_DB) and notify to OA. The CMIS SM will then transition from CMIS_STATE_AP_CONF to CMIS_STATE_NPU_SI_SETTINGS_WAIT. If port doesn't require NPU SI settings, CMIS SM will transition to CMIS_STATE_DP_INIT state. + The API is_npu_si_settings_update_required will return true if a module requires NPU SI settings and PORT_TABLE:\.NPU_SI_SETTINGS_SYNC_STATUS == NPU_SI_SETTINGS_DEFAULT. It will return false in other cases + + For non-CMIS SM driven transceivers, if is_npu_si_settings_update_required returns true, SfpStateUpdateTask thread will update NPU SI settings in the PORT_TABLE (APPL_DB) and OA will be notified eventually. Also, NPU_SI_SETTINGS_SYNC_STATUS will be set to NPU_SI_SETTINGS_NOTIFIED. + + For CMIS SM driven transceivers, if is_npu_si_settings_update_required returns true, CmisManagerTask thread will update NPU SI settings in the PORT_TABLE (APPL_DB) and OA will be notified eventually. Also, NPU_SI_SETTINGS_SYNC_STATUS will be set to NPU_SI_SETTINGS_NOTIFIED. + The CMIS SM will then transition from CMIS_STATE_AP_CONF to CMIS_STATE_NPU_SI_SETTINGS_WAIT. If port doesn't require NPU SI settings, CMIS SM will transition to CMIS_STATE_DP_INIT state. + + 3. The OA upon receiving NPU SI settings will - Disable port admin status @@ -299,15 +304,13 @@ sequenceDiagram loop lport in logical_port_list alt post_port_sfp_info_to_db != SFP_EEPROM_NOT_READY Note over SfpStateUpdateTask: post_port_dom_threshold_info_to_db - opt if not is_module_cmis_sm_driven and PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS != NPU_SI_SETTINGS_DONE - opt if module_requires_npu_si_settings - SfpStateUpdateTask ->> APPL_DB: Update SI params from NPU_SI_SETTINGS.json to PORT_TABLE: - SfpStateUpdateTask ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_NOTIFIED - APPL_DB -->> OA: Notify NPU SI settings for ports - Note over OA: Disable admin status
setPortSerdesAttribute - OA ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_DONE - Note over OA: initHostTxReadyState - end + opt if not is_module_cmis_sm_driven and is_npu_si_settings_update_required + SfpStateUpdateTask ->> APPL_DB: Update SI params from media_settings.json to PORT_TABLE: + SfpStateUpdateTask ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_NOTIFIED + APPL_DB -->> OA: Notify NPU SI settings for ports + Note over OA: Disable admin status
setPortSerdesAttribute + OA ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_DONE + Note over OA: initHostTxReadyState end else Note over SfpStateUpdateTask: retry_eeprom_set.add(lport) @@ -341,8 +344,8 @@ stateDiagram note left of CMIS_STATE_AP_CONF : Ensure current states are ModuleReady and DataPathDeactivated
Configure laser frequency for ZR module
Apply module SI settings
Update NPU SI settings to PORT_TABLE (APPL_DB) and notify to OA
set_application CMIS_STATE_AP_CONF --> if_state3 - if_state3 --> CMIS_STATE_NPU_SI_SETTINGS_WAIT : if module_requires_npu_si_settings - if_state3 --> CMIS_STATE_DP_INIT : if not module_requires_npu_si_settings + if_state3 --> CMIS_STATE_NPU_SI_SETTINGS_WAIT : if is_npu_si_settings_update_required + if_state3 --> CMIS_STATE_DP_INIT : if not is_npu_si_settings_update_required CMIS_STATE_NPU_SI_SETTINGS_WAIT --> CMIS_STATE_DP_INIT : if PORT_TABLE<port>.NPU_SI_SETTINGS_SYNC_STATUS == NPU_SI_SETTINGS_DONE CMIS_STATE_NPU_SI_SETTINGS_WAIT --> CMIS_STATE_INSERTED : Through force_cmis_reinit upon reaching timeout note right of CMIS_STATE_NPU_SI_SETTINGS_WAIT @@ -376,12 +379,12 @@ sequenceDiagram par CmisManagerTask, SfpStateUpdateTask CmisManagerTask ->> CmisManagerTask : Transition CMIS SM to CMIS_STATE_INSERTED CmisManagerTask ->> CmisManagerTask : Eventually, CMIS SM transitions to CMIS_STATE_AP_CONF - opt is_module_cmis_sm_driven and module_requires_npu_si_settings - SfpStateUpdateTask ->> APPL_DB: Update SI params from NPU_SI_SETTINGS.json to PORT_TABLE: + opt not is_module_cmis_sm_driven and is_npu_si_settings_update_required + SfpStateUpdateTask ->> APPL_DB: Update SI params from media_settings.json to PORT_TABLE: SfpStateUpdateTask ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS =
NPU_SI_SETTINGS_NOTIFIED end - opt not is_module_cmis_sm_driven and module_requires_npu_si_settings - CmisManagerTask ->> APPL_DB: Update SI params from NPU_SI_SETTINGS.json to PORT_TABLE: + opt is_module_cmis_sm_driven and is_npu_si_settings_update_required + CmisManagerTask ->> APPL_DB: Update SI params from media_settings.json to PORT_TABLE: CmisManagerTask ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS =
NPU_SI_SETTINGS_NOTIFIED CmisManagerTask ->> CmisManagerTask : Transition CMIS SM to CMIS_STATE_NPU_SI_SETTINGS_WAIT end From 2fe94b5454a6f9c8b28e52712d437232f286ddff Mon Sep 17 00:00:00 2001 From: Mihir Patel Date: Tue, 10 Oct 2023 00:19:40 +0000 Subject: [PATCH 14/16] Fixed typo --- doc/sfp-cmis/Interface-Link-bring-up-sequence.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md index 8f081dda11..b295ddeab6 100644 --- a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md +++ b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md @@ -464,7 +464,7 @@ sequenceDiagram **Transceiver OIR testplan** | Event | APPL_DB_ cleared | Xcvrd restarted | NPU SI settings notified | NPU_SI_SETTINGS_SYNC_STATUS value upon event completion | CMIS init triggered | | -------------- | ------------------------ | --------------- | ------------------------ | ------------------------------------------------------------------------------ | ---------------------- | -| Transceiver Removal | N | N | Y | NPU_SI_SETTINGS_DEFAULT | Y | +| Transceiver Removal | N | N | Y | NPU_SI_SETTINGS_DEFAULT | N/A | | Transceiver Insertion | N | N | Y | NPU_SI_SETTINGS_DONE | Y | # Out of Scope Following items are not in the scope of this document. They would be taken up separately From 23b7aa3c156a8c7660d7059ceaad24d20c17f7bf Mon Sep 17 00:00:00 2001 From: Mihir Patel Date: Fri, 20 Oct 2023 18:28:04 +0000 Subject: [PATCH 15/16] Addressed review comments --- doc/sfp-cmis/Interface-Link-bring-up-sequence.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md index b295ddeab6..882d53691e 100644 --- a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md +++ b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md @@ -35,6 +35,7 @@ Deterministic Approach for Interface Link bring-up sequence | 0.7 | 02/02/2022 | Jaganathan Anbalagan | Added Breakout Handling | 0.8 | 02/16/2022 | Shyam Kumar | Updated feature-enablement workflow | 0.9 | 04/05/2022 | Shyam Kumar | Addressed review comments | +| 1.0 | 10/20/2023 | Mihir Patel | Added Port re-initialization during syncd/swss/orchagent crash section | # About this Manual @@ -192,13 +193,17 @@ if transceiver is not present: When syncd/swss/orchagent crashes, all ports in the corresponding namespace will be reinitialized by xcvrd irrespective of the current state of the port. All the corresponding ports are expected to experience link down until the initialization is complete. If just xcvrd crashes and restarts, then forced re-initialization (CMIS reinit + NPU SI settings notification) of ports will not be performed. Hence, the ports will not experience link downtime during scenario. -CMIS_REINIT_REQUIRED and NPU_SI_SETTINGS_SYNC_STATUS keys in PORT_TABLE:\ (APPL_DB) are used to determine if port re-initialization is required or not. +CMIS_REINIT_REQUIRED and NPU_SI_SETTINGS_SYNC_STATUS keys in PORT_TABLE:\ (APPL_DB) are used to determine if port re-initialization is required or not. - CMIS_REINIT_REQUIRED key states if CMIS re-initialization is required for a port after xcvrd is spawned. CMIS_REINIT_REQUIRED helps in mainly driving CMIS re-initialization after syncd/swss/orchagent crash since it will allow reinitializing ports belonging to the relevant namespace of the crashing process. This key is not planned to drive CMIS initialization after transceiver insertion. - NPU_SI_SETTINGS_SYNC_STATUS key is used as a means to communicate the status of applying NPU SI settings for a transceiver requiring NPU SI settings. This key is used to update the NPU SI settings application status between SfpStateUpdateTask, CmisManagerTask and Orchagent. In case of warm reboot or xcvrd restart, this key will prevent application of NPU SI settings on the port if the settings are already applied. In case of transceiver insertion, NPU SI settings will be applied irrespective of the NPU SI settings application status for the port. + Also, this key helps in preventing additional link flap which can happen if CMIS state machine initializes a port before NPU SI settings are applied (since application of NPU SI settings involves disabling the port followed by enabling it). In case of continuous restart of xcvrd, both the keys will still hold the same value as before the restart. This would ensure that the port re-initialization is resumed from the last known state. -Following infra will ensure port re-initialization by xcvrd in case of syncd/swss/orchagent crash +Rationale for choosing APPL_DB over STATE_DB for storing CMIS_REINIT_REQUIRED and NPU_SI_SETTINGS_SYNC_STATUS keys + - APPL_DB is the first DB which gets cleared as part of SWSS crash handling. This inturn helps XCVRD to detect DB removal and terminate child threads and itself earlier than other DBs. + - The PORT_TABLE in STATE_DB has multiple fields which can dynamically change (such as netdev_oper_status and host_tx_ready) more often than PORT_TABLE in APPL_DB. Hence, XCVRD will receive less events if it watches PORT_TABLE in APPL_DB. The watch is needed to kill XCVRD and its child threads during syncd/swss/orcahagent crash +Following infra will ensure port re-initialization by xcvrd in case of syncd/swss/orchagent crash Pre-requisites for the infra to work: - APPL_DB should be cleared after syncd/swss/orchagent crash/restart, config reload and cold reboot - APPL_DB should be retained after @@ -217,7 +222,7 @@ A transceiver is classified as CMIS SM driven transceiver if its module type is | Value | Modifier thread and event | Consumer thread and purpose | | ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- | | NPU_SI_SETTINGS_DEFAULT | 1\. XCVRD main thread during cold start of XCVRD
2. SfpStateUpdateTask during transceiver removal | XCVRD main thread during boot-up for deciding to notify NPU SI settings | -| NPU_SI_SETTINGS_NOTIFIED | 1\. SfpStateUpdateTask while updating and notifying the NPU SI settings for non-CMIS SM driven transceivers
2. CmisManagerTask while updating and notifying the NPU SI settings for CMIS SM driven transceivers | Not being used currently | +| NPU_SI_SETTINGS_NOTIFIED | 1\. SfpStateUpdateTask while updating and notifying the NPU SI settings for non-CMIS SM driven transceivers (this approach was chosen to preserve the existing behavior for non-CMIS SM driven transceivers)
2. CmisManagerTask while updating and notifying the NPU SI settings for CMIS SM driven transceivers | Not being used currently | | NPU_SI_SETTINGS_DONE | Orchagent after applying the SI settings | CmisManagerTask for proceeding from CMIS_STATE_NPU_SI_SETTINGS_WAIT to CMIS_STATE_DP_INIT state | From ce1712180c9fa5b6e4a06a999b8393b0c36b1016 Mon Sep 17 00:00:00 2001 From: Mihir Patel Date: Fri, 5 Jan 2024 06:50:24 +0000 Subject: [PATCH 16/16] Replaced usage of PORT_TABLE for storing the keys from APPL_DB to STATE_DB --- .../Interface-Link-bring-up-sequence.md | 75 +++++++++---------- 1 file changed, 36 insertions(+), 39 deletions(-) diff --git a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md index 882d53691e..351fcb8dc5 100644 --- a/doc/sfp-cmis/Interface-Link-bring-up-sequence.md +++ b/doc/sfp-cmis/Interface-Link-bring-up-sequence.md @@ -192,28 +192,24 @@ if transceiver is not present: ## Overview When syncd/swss/orchagent crashes, all ports in the corresponding namespace will be reinitialized by xcvrd irrespective of the current state of the port. All the corresponding ports are expected to experience link down until the initialization is complete. -If just xcvrd crashes and restarts, then forced re-initialization (CMIS reinit + NPU SI settings notification) of ports will not be performed. Hence, the ports will not experience link downtime during scenario. -CMIS_REINIT_REQUIRED and NPU_SI_SETTINGS_SYNC_STATUS keys in PORT_TABLE:\ (APPL_DB) are used to determine if port re-initialization is required or not. +If just xcvrd crashes and restarts, then forced re-initialization (CMIS reinit + NPU SI settings notification) of ports will not be performed. Hence, the ports will not experience link downtime during this scenario. +CMIS_REINIT_REQUIRED and NPU_SI_SETTINGS_SYNC_STATUS keys in PORT_TABLE|\ (STATE_DB) are used to determine if port re-initialization is required or not. - CMIS_REINIT_REQUIRED key states if CMIS re-initialization is required for a port after xcvrd is spawned. CMIS_REINIT_REQUIRED helps in mainly driving CMIS re-initialization after syncd/swss/orchagent crash since it will allow reinitializing ports belonging to the relevant namespace of the crashing process. This key is not planned to drive CMIS initialization after transceiver insertion. - NPU_SI_SETTINGS_SYNC_STATUS key is used as a means to communicate the status of applying NPU SI settings for a transceiver requiring NPU SI settings. This key is used to update the NPU SI settings application status between SfpStateUpdateTask, CmisManagerTask and Orchagent. In case of warm reboot or xcvrd restart, this key will prevent application of NPU SI settings on the port if the settings are already applied. In case of transceiver insertion, NPU SI settings will be applied irrespective of the NPU SI settings application status for the port. - Also, this key helps in preventing additional link flap which can happen if CMIS state machine initializes a port before NPU SI settings are applied (since application of NPU SI settings involves disabling the port followed by enabling it). + Also, this key helps in preventing additional link flap which can happen if CMIS state machine initializes a port before NPU SI settings are applied (since application of NPU SI settings involves disable followed by enable of it). In case of continuous restart of xcvrd, both the keys will still hold the same value as before the restart. This would ensure that the port re-initialization is resumed from the last known state. -Rationale for choosing APPL_DB over STATE_DB for storing CMIS_REINIT_REQUIRED and NPU_SI_SETTINGS_SYNC_STATUS keys - - APPL_DB is the first DB which gets cleared as part of SWSS crash handling. This inturn helps XCVRD to detect DB removal and terminate child threads and itself earlier than other DBs. - - The PORT_TABLE in STATE_DB has multiple fields which can dynamically change (such as netdev_oper_status and host_tx_ready) more often than PORT_TABLE in APPL_DB. Hence, XCVRD will receive less events if it watches PORT_TABLE in APPL_DB. The watch is needed to kill XCVRD and its child threads during syncd/swss/orcahagent crash - -Following infra will ensure port re-initialization by xcvrd in case of syncd/swss/orchagent crash +Following infra will ensure port re-initialization by xcvrd in case of syncd/swss/orchagent crash Pre-requisites for the infra to work: - - APPL_DB should be cleared after syncd/swss/orchagent crash/restart, config reload and cold reboot - - APPL_DB should be retained after + - All PORT_TABLE|Ethernet* keys of STATE_DB should be cleared after syncd/swss/orchagent crash/restart, config reload and cold reboot + - All PORT_TABLE|Ethernet* keys of STATE_DB should be retained after - pmon crash/restart - xcvrd crash/restart (does not apply to xcvrd restart triggered due to a different docker container crash/restart) - warm reboot 1. XCVRD main thread init - - XCVRD main thread creates the key CMIS_REINIT_REQUIRED in PORT_TABLE:\ (APPL_DB) with value as true for ports which do NOT have this key present - - XCVRD main thread creates the key NPU_SI_SETTINGS_SYNC_STATUS in PORT_TABLE:\ (APPL_DB) with value NPU_SI_SETTINGS_DEFAULT for ports which do NOT have this key present. + - XCVRD main thread creates the key CMIS_REINIT_REQUIRED in PORT_TABLE|\ (STATE_DB) with value as true for ports which do NOT have this key present + - XCVRD main thread creates the key NPU_SI_SETTINGS_SYNC_STATUS in PORT_TABLE|\ (STATE_DB) with value NPU_SI_SETTINGS_DEFAULT for ports which do NOT have this key present. - For transceivers which do not require NPU SI settings, NPU_SI_SETTINGS_SYNC_STATUS will stay with value NPU_SI_SETTINGS_DEFAULT Following table describes the various values for NPU_SI_SETTINGS_SYNC_STATUS @@ -227,11 +223,11 @@ A transceiver is classified as CMIS SM driven transceiver if its module type is 2. Update and notify NPU SI settings to OA - The API is_npu_si_settings_update_required will return true if a module requires NPU SI settings and PORT_TABLE:\.NPU_SI_SETTINGS_SYNC_STATUS == NPU_SI_SETTINGS_DEFAULT. It will return false in other cases + The API is_npu_si_settings_update_required will return true if a module requires NPU SI settings and PORT_TABLE|\.NPU_SI_SETTINGS_SYNC_STATUS == NPU_SI_SETTINGS_DEFAULT. It will return false in other cases - For non-CMIS SM driven transceivers, if is_npu_si_settings_update_required returns true, SfpStateUpdateTask thread will update NPU SI settings in the PORT_TABLE (APPL_DB) and OA will be notified eventually. Also, NPU_SI_SETTINGS_SYNC_STATUS will be set to NPU_SI_SETTINGS_NOTIFIED. + For non-CMIS SM driven transceivers, if is_npu_si_settings_update_required returns true, SfpStateUpdateTask thread will update NPU SI settings in the PORT_TABLE (APPL_DB) and OA will be notified eventually. Also, NPU_SI_SETTINGS_SYNC_STATUS will be set to NPU_SI_SETTINGS_NOTIFIED in PORT_TABLE (STATE_DB). - For CMIS SM driven transceivers, if is_npu_si_settings_update_required returns true, CmisManagerTask thread will update NPU SI settings in the PORT_TABLE (APPL_DB) and OA will be notified eventually. Also, NPU_SI_SETTINGS_SYNC_STATUS will be set to NPU_SI_SETTINGS_NOTIFIED. + For CMIS SM driven transceivers, if is_npu_si_settings_update_required returns true, CmisManagerTask thread will update NPU SI settings in the PORT_TABLE (APPL_DB) and OA will be notified eventually. Also, NPU_SI_SETTINGS_SYNC_STATUS will be set to NPU_SI_SETTINGS_NOTIFIED in PORT_TABLE (STATE_DB). The CMIS SM will then transition from CMIS_STATE_AP_CONF to CMIS_STATE_NPU_SI_SETTINGS_WAIT. If port doesn't require NPU SI settings, CMIS SM will transition to CMIS_STATE_DP_INIT state. @@ -239,7 +235,7 @@ A transceiver is classified as CMIS SM driven transceiver if its module type is 3. The OA upon receiving NPU SI settings will - Disable port admin status - Request SAI-SDK to apply the NPU SI settings via syncd - - If SAI-SDK returns success, OA will update the PORT_TABLE:\.NPU_SI_SETTINGS_SYNC_STATUS to NPU_SI_SETTINGS_DONE + - If SAI-SDK returns success, OA will update the PORT_TABLE|\.NPU_SI_SETTINGS_SYNC_STATUS to NPU_SI_SETTINGS_DONE in STATE_DB - In case of failure, OA will log an error message and proceed to handling the next port 4. CMIS_STATE_NPU_SI_SETTINGS_WAIT state will wait for NPU_SI_SETTINGS_DONE and upon reaching to NPU_SI_SETTINGS_DONE, CMIS SM will transition to CMIS_STATE_DP_INIT state. @@ -247,18 +243,18 @@ There will be a timeout of 5s for every retry 5. The CmisManagerTask thread will set “CMIS_REINIT_REQUIRED" to false after CMIS SM reaches to a steady state (CMIS_STATE_UNKNOWN, CMIS_STATE_FAILED, CMIS_STATE_READY and CMIS_STATE_REMOVED) for the corresponding port -6. XCVRD will subscribe to PORT_TABLE in APPL_DB and trigger self-restart if the PORT_TABLE is deleted for the namespace. +6. XCVRD will subscribe to PORT_TABLE in STATE_DB and trigger self-restart if the PORT_TABLE|Ethernet* is deleted for the namespace. All threads will be gracefully terminated and xcvrd deinit will be performed followed by issuing a SIGABRT to ensure XCVRD is restarted automatically by supervisord. After respawn, CMIS re-init and NPU_SI_SETTINGS notified is triggered for the ports belonging to the affected namespace -7. syncd/swss/orchagent restart (restart triggered due to docker container crash) clears the entire APPL-DB, including “NPU_SI_SETTINGS_SYNC_STATUS” and "CMIS_REINIT_REQUIRED" in PORT_TABLE +7. syncd/swss/orchagent restart (restart triggered due to docker container crash) clears the entire APPL-DB and PORT_TABLE|Ethernet* of STATE_DB (including “NPU_SI_SETTINGS_SYNC_STATUS” and "CMIS_REINIT_REQUIRED" keys in PORT_TABLE of STATE_DB) -8. In case of warm reboot, the APPL_DB is not cleared and hence, once xcvrd is spawned after the reboot, the ports are not initialized again. +8. In case of warm reboot, the PORT_TABLE in STATE_DB is not cleared. Hence, once xcvrd is spawned after the device reboot, the ports are not initialized again. ## XCVRD init sequence to support port re-initialization during syncd/swss/orchagent crash ```mermaid sequenceDiagram - participant APPL_DB as APPL_DB@asic_n + participant STATE_DB as STATE_DB@asic_n participant XCVRDMT as XCVRD main thread participant CmisManagerTask participant SfpStateUpdateTask @@ -267,14 +263,14 @@ sequenceDiagram Note over XCVRDMT: Load new platform specific api class,
sfputil class and load namespace details
load_NPU_SI_SETTINGS() XCVRDMT ->> XCVRDMT: Wait for port config completion loop lport in logical_port_list - alt if CMIS_REINIT_REQUIRED not in PORT_TABLE: - XCVRDMT ->> APPL_DB: PORT_TABLE:.CMIS_REINIT_REQUIRED = true + alt if CMIS_REINIT_REQUIRED not in PORT_TABLE| + XCVRDMT ->> STATE_DB: PORT_TABLE|.CMIS_REINIT_REQUIRED = true end - alt if NPU_SI_SETTINGS_SYNC_STATUS not in PORT_TABLE: - XCVRDMT ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_DEFAULT + alt if NPU_SI_SETTINGS_SYNC_STATUS not in PORT_TABLE| + XCVRDMT ->> STATE_DB: PORT_TABLE|.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_DEFAULT end end - Note over APPL_DB: PORT_TABLE:
CMIS_REINIT_REQUIRED : true/false
NPU_SI_SETTINGS_SYNC_STATUS : NPU_SI_SETTINGS_DEFAULT/NPU_SI_SETTINGS_NOTIFIED/NPU_SI_SETTINGS_DONE + Note over STATE_DB: PORT_TABLE|
CMIS_REINIT_REQUIRED : true/false
NPU_SI_SETTINGS_SYNC_STATUS : NPU_SI_SETTINGS_DEFAULT/NPU_SI_SETTINGS_NOTIFIED/NPU_SI_SETTINGS_DONE XCVRDMT ->> CmisManagerTask: Spawns XCVRDMT ->> DomInfoUpdateTask: Spawns XCVRDMT ->> SfpStateUpdateTask: Spawns @@ -283,7 +279,7 @@ sequenceDiagram DomInfoUpdateTask->>DomInfoUpdateTask: Update TRANSCEIVER_DOM_SENSOR,
TRANSCEIVER_STATUS (HW section)
TRANSCEIVER_PM tables end loop Wait for stop_event - XCVRDMT->>XCVRDMT: Check for changes in APPL_DB:PORT_TABLE and act upon receiving DEL event + XCVRDMT->>XCVRDMT: Check for changes in STATE_DB:PORT_TABLE and act upon receiving DEL event end Note over CmisManagerTask: Subscribe to CONFIG_DB:PORT,
STATE_DB:TRANSCEIVER_INFO and STATE_DB:PORT_TABLE loop Wait for stop_event @@ -302,6 +298,7 @@ sequenceDiagram sequenceDiagram participant OA participant APPL_DB as APPL_DB@asic_n + participant STATE_DB as STATE_DB@asic_n participant SfpStateUpdateTask Note over SfpStateUpdateTask: Subscribe to CONFIG_DB:PORT,
STATE_DB:TRANSCEIVER_INFO and STATE_DB:PORT_TABLE @@ -311,10 +308,10 @@ sequenceDiagram Note over SfpStateUpdateTask: post_port_dom_threshold_info_to_db opt if not is_module_cmis_sm_driven and is_npu_si_settings_update_required SfpStateUpdateTask ->> APPL_DB: Update SI params from media_settings.json to PORT_TABLE: - SfpStateUpdateTask ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_NOTIFIED + SfpStateUpdateTask ->> STATE_DB: PORT_TABLE|.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_NOTIFIED APPL_DB -->> OA: Notify NPU SI settings for ports Note over OA: Disable admin status
setPortSerdesAttribute - OA ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_DONE + OA ->> STATE_DB: PORT_TABLE|.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_DONE Note over OA: initHostTxReadyState end else @@ -347,7 +344,7 @@ stateDiagram CMIS_STATE_DP_DEINIT --> CMIS_STATE_AP_CONF - note left of CMIS_STATE_AP_CONF : Ensure current states are ModuleReady and DataPathDeactivated
Configure laser frequency for ZR module
Apply module SI settings
Update NPU SI settings to PORT_TABLE (APPL_DB) and notify to OA
set_application + note left of CMIS_STATE_AP_CONF : Ensure current states are ModuleReady and DataPathDeactivated
Configure laser frequency for ZR module
Apply module SI settings
Update NPU SI settings to PORT_TABLE (APPL_DB) and notify to OA
Set PORT_TABLE|lport.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_DONE (STATE_DB)
set_application CMIS_STATE_AP_CONF --> if_state3 if_state3 --> CMIS_STATE_NPU_SI_SETTINGS_WAIT : if is_npu_si_settings_update_required if_state3 --> CMIS_STATE_DP_INIT : if not is_npu_si_settings_update_required @@ -376,7 +373,7 @@ sequenceDiagram SfpStateUpdateTask -x STATE_DB : Delete TRANSCEIVER_INFO table for the port par CmisManagerTask, SfpStateUpdateTask CmisManagerTask ->> CmisManagerTask : Transition CMIS SM to CMIS_STATE_REMOVED - SfpStateUpdateTask ->> APPL_DB : PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS =
NPU_SI_SETTINGS_DEFAULT + SfpStateUpdateTask ->> STATE_DB : PORT_TABLE|.NPU_SI_SETTINGS_SYNC_STATUS =
NPU_SI_SETTINGS_DEFAULT end SfpStateUpdateTask ->> SfpStateUpdateTask : event = SFP_STATUS_INSERTED @@ -386,18 +383,18 @@ sequenceDiagram CmisManagerTask ->> CmisManagerTask : Eventually, CMIS SM transitions to CMIS_STATE_AP_CONF opt not is_module_cmis_sm_driven and is_npu_si_settings_update_required SfpStateUpdateTask ->> APPL_DB: Update SI params from media_settings.json to PORT_TABLE: - SfpStateUpdateTask ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS =
NPU_SI_SETTINGS_NOTIFIED + SfpStateUpdateTask ->> STATE_DB: PORT_TABLE|.NPU_SI_SETTINGS_SYNC_STATUS =
NPU_SI_SETTINGS_NOTIFIED end opt is_module_cmis_sm_driven and is_npu_si_settings_update_required CmisManagerTask ->> APPL_DB: Update SI params from media_settings.json to PORT_TABLE: - CmisManagerTask ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS =
NPU_SI_SETTINGS_NOTIFIED + CmisManagerTask ->> STATE_DB: PORT_TABLE|.NPU_SI_SETTINGS_SYNC_STATUS =
NPU_SI_SETTINGS_NOTIFIED CmisManagerTask ->> CmisManagerTask : Transition CMIS SM to CMIS_STATE_NPU_SI_SETTINGS_WAIT end activate OA APPL_DB -->> OA: Notify NPU SI settings for ports Note over OA: Disable admin status
setPortSerdesAttribute - OA ->> APPL_DB: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_DONE - APPL_DB --> CmisManagerTask: PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_DONE + OA ->> STATE_DB: PORT_TABLE|.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_DONE + STATE_DB --> CmisManagerTask: PORT_TABLE|.NPU_SI_SETTINGS_SYNC_STATUS = NPU_SI_SETTINGS_DONE Note over OA: initHostTxReadyState deactivate OA CmisManagerTask ->> CmisManagerTask : Transition CMIS SM to CMIS_STATE_DP_INIT @@ -412,7 +409,7 @@ The below sequence diagram captures the termination of XCVRD during syncd/swss/o ```mermaid sequenceDiagram participant OA - participant APPL_DB as APPL_DB@asic_n + participant STATE_DB as STATE_DB@asic_n participant XCVRDMT as XCVRD main thread participant CmisManagerTask participant DomInfoUpdateTask @@ -425,9 +422,9 @@ sequenceDiagram activate SfpStateUpdateTask OA -x OA: Crashes while handling a routine deactivate OA - OA ->> APPL_DB : DEL PORT_TABLE + OA ->> STATE_DB : DEL PORT_TABLE - XCVRDMT -x APPL_DB : XCVRD main thread proecesses DEL event of APPL_DB PORT_TABLE + XCVRDMT -x STATE_DB : XCVRD main thread proecesses DEL event of STATE_DB PORT_TABLE Note over XCVRDMT: generate_sigabrt = True alt If threads > 0 are dead XCVRDMT -x XCVRDMT : Kill XCVRD with SIGKILL @@ -453,7 +450,7 @@ sequenceDiagram ## Test plan and expectation **Process and device crash/restart and interface config command handling testplan** -| Event | APPL_DB_ cleared | Xcvrd restarted | NPU SI settings renotify | NPU_SI_SETTINGS_SYNC_STATUS value on xcvrd boot-up for initialized transceiver | CMIS re-init triggered | Link flap | +| Event | STATE_DB_ cleared | Xcvrd restarted | NPU SI settings renotify | NPU_SI_SETTINGS_SYNC_STATUS value on xcvrd boot-up for initialized transceiver | CMIS re-init triggered | Link flap | | -------------- | ------------------------ | --------------- | ------------------------ | ------------------------------------------------------------------------------ | ---------------------- | --------- | | Xcvrd restart | N | Y | N | NPU_SI_SETTINGS_DONE | N | N | | Pmon restart | N | Y | N | NPU_SI_SETTINGS_DONE | N | N | @@ -467,7 +464,7 @@ sequenceDiagram | config interface startup | N | N | N | NPU_SI_SETTINGS_DONE | N | N/A | **Transceiver OIR testplan** -| Event | APPL_DB_ cleared | Xcvrd restarted | NPU SI settings notified | NPU_SI_SETTINGS_SYNC_STATUS value upon event completion | CMIS init triggered | +| Event | STATE_DB_ cleared | Xcvrd restarted | NPU SI settings notified | NPU_SI_SETTINGS_SYNC_STATUS value upon event completion | CMIS init triggered | | -------------- | ------------------------ | --------------- | ------------------------ | ------------------------------------------------------------------------------ | ---------------------- | | Transceiver Removal | N | N | Y | NPU_SI_SETTINGS_DEFAULT | N/A | | Transceiver Insertion | N | N | Y | NPU_SI_SETTINGS_DONE | Y |