From ee69977618cee1747ea2f0ef6375bde29b66b9ad Mon Sep 17 00:00:00 2001 From: jaganbal-a <97986478+jaganbal-a@users.noreply.github.com> Date: Thu, 9 Feb 2023 17:35:53 -0500 Subject: [PATCH 01/19] Update CMIS_and_C-CMIS_support_for_ZR.md --- .../CMIS_and_C-CMIS_support_for_ZR.md | 101 ++++++++++++++++++ 1 file changed, 101 insertions(+) diff --git a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md index c462481429..7eb988636c 100644 --- a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md +++ b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md @@ -1632,3 +1632,104 @@ def write_cdb(port,cmd): write_reg(port, LPLPAGE, INIT_OFFSET+CMDLEN, len(cmd)-CMDLEN, cmd[CMDLEN:]) write_reg(port, LPLPAGE, INIT_OFFSET, CMDLEN, cmd[:CMDLEN]) ``` + +### 7.Performance Monitoring in 400-G ZR module +#### 7.1 Overview +Performance monitoring in 400G ZR/CCMIS optical modules is essential for detecting link degradation and correction. It can be used to compare optical link performance against desired parameters and benchmarks, providing valuable insight into the overall health of the interface link. Below sub-sections will walk through the CLI syntax, output format and high level design. + +#### 7.2 PM parameters +Please refer to the [2.1.5 Transceiver PM Table](https://github.com/sonic-net/SONiC/edit/master/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md#215-transceiver-pm-table) for the parameters that will be monitored with this CLI. + + +#### 7.3 CLI Sub-options and Syntax +``` +#show int transceiver ​ +commands:​ +pm show interface transceiver performance monitoring​ +​ +#Show int trans pm ​ +commands:​ +Current show current pm data​ +history show historical pm data​ + +​ +#show int trans pm current ​ +commands:​ +60sec show cumulative pm statistics for 60sec time window. ​ +15min show cumulative pm statistics for 15min time window.​ +24Hr show cumulative pm statistics for 24Hr time window.​ + Without time window, the CLI will display the current snapshot of pm parameter.​ + +#show int trans pm history​ +commands:​ +60sec show cumulative pm statistics for 36sec time window. ​ +15min show cumulative pm statistics for 15min time window.​ +24Hr show cumulative pm statistics for 24Hr time window.​ +​ +#show int trans pm history 30sec window -n asic0 Ethernet0​ +​ +Optional subset display:​ +#Show int trans pm current 60sec –n asic0 Ethernet0​ +commands:​ +fec shows pm fec data​ +``` +#### 7.4 CLI Sample Output format +``` +root@sonic:/home/cisco# show interface transceiver pm history 60sec window 1 -n asic0 Ethernet2 +Tue Jan 31 09:25:16 UTC 2023 +PM window: 60 sec +PM window start time: Tue Jan 31 09:24:03 UTC 2023 +Ethernet2: + Parameter Unit Min Avg Max Threshold Threshold Threshold Threshold Threshold Threshold + High High Crossing Low Low Crossing + Alarm Warning Alert-High Alarm Warning Alert-Low + --------------- ------ -------- -------- -------- ----------- ----------- ------------ ----------- ----------- ----------- + Tx Power dBm -8.22 -8.23 -8.24 -5.0 -6.0 False -16.99 -16.003 False + Rx Total Power dBm -10.61 -10.62 -10.62 2.0 0.0 False -21.0 -18.0 False + Rx Signal Power dBm -40.0 0.0 40.0 13.0 10.0 True -18.0 -15.0 True + CD-short link ps/nm 0.0 0.0 0.0 1000.0 500.0 False -1000.0 -500.0 False + PDL dB 0.5 0.6 0.6 4.0 4.0 False 0.0 0.0 False + OSNR dB 36.5 36.5 36.5 99.0 99.0 False 0.0 0.0 False + eSNR dB 30.5 30.5 30.5 99.0 99.0 False 0.0 0.0 False + CFO MHz 54.0 70.0 121.0 3800.0 3800.0 False -3800.0 -3800.0 False + DGD ps 5.37 5.56 5.81 7.0 7.0 False 0.0 0.0 False + SOPMD ps^2 0.0 0.0 0.0 655.35 655.35 False 0.0 0.0 False + SOP ROC krad/s 1.0 1.0 2.0 N/A N/A N/A N/A N/A N/A + Pre-FEC BER N/A 4.58E-04 4.66E-04 5.76E-04 1.25E-02 1.10E-02 0.0 0.0 0.0 0.0 + Post-FEC BER N/A 0.0 0.0 0.0 1000.0 1.0 False 0.0 0.0 False + EVM % 100.0 100.0 100.0 N/A N/A N/A N/A N/A N/A +``` +#### 7.5 High Level Design + +##### 7.5.1 Configurations +1. performance monitor enable - Global CLI to enable PM on all ports. +Before this configuration get implemented, PM will be enabled by default on all ports + +##### 7.5.2 Tables: +1. The PM window table consists of the following: + - Window number + - Start timestamp of PM window + - Parameters from PM table + +2. Total 28 PM window slots per interface. + - 15 slots of 60sec window (15mins of history). + - 12 slots of 15min window (3Hr of history) and + - 1 slot of 24hr window. + + + + +##### 7.5.3 Platform specific flags/inputs: +"xcvrd_pm_poll_interval" - Platform to define the PM polling periodicity as 30sec or 60sec of the PM thread which will be fed as input argument. When the arg is not defined, default periodicity is 60sec. + +The PM parameter polling period option is given as there will be platform which requires more time for IO read . + + **HLD pointers:** + 1. PM statistics will be sampled for pre-defined time window of 60sec, 15min and 24Hr. The pre-defined time window period is arrived based on the PM application usage in real time. + 2. New thread (PM thread) will be created to periodically fetch and update the PM window table. + + + + + 3. PM history CLI command data will be fetched from PM window slot table from State-DB. + 4. PM current CLI command PM data will be fetched from Module. From 0604af7aa9410b8f4129ff27898e9b38059adcc6 Mon Sep 17 00:00:00 2001 From: jaganbal-a <97986478+jaganbal-a@users.noreply.github.com> Date: Fri, 10 Mar 2023 13:14:38 -0500 Subject: [PATCH 02/19] Update CMIS_and_C-CMIS_support_for_ZR.md --- doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md index 7eb988636c..b5d3315afa 100644 --- a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md +++ b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md @@ -1655,16 +1655,16 @@ history show historical pm data​ ​ #show int trans pm current ​ commands:​ -60sec show cumulative pm statistics for 60sec time window. ​ -15min show cumulative pm statistics for 15min time window.​ -24Hr show cumulative pm statistics for 24Hr time window.​ +60sec show cumulative pm statistics for 60sec time window for the current window. ​ +15min show cumulative pm statistics for 15min time window for the current window.​ +24Hr show cumulative pm statistics for 24Hr time window for the current window.​ Without time window, the CLI will display the current snapshot of pm parameter.​ #show int trans pm history​ commands:​ -60sec show cumulative pm statistics for 36sec time window. ​ -15min show cumulative pm statistics for 15min time window.​ -24Hr show cumulative pm statistics for 24Hr time window.​ +60sec show cumulative pm statistics for 60sec time window for the given window number.​ +15min show cumulative pm statistics for 15min time window for the given window number.​ +24Hr show cumulative pm statistics for 24Hr time window for the given window number.​ ​ #show int trans pm history 30sec window -n asic0 Ethernet0​ ​ From 64c949fa8b6b88cbefbd80b57ab598e55e5fb648 Mon Sep 17 00:00:00 2001 From: jaganbal-a <97986478+jaganbal-a@users.noreply.github.com> Date: Wed, 12 Apr 2023 13:43:52 -0400 Subject: [PATCH 03/19] Update CMIS_and_C-CMIS_support_for_ZR.md --- doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md index b5d3315afa..21090eed05 100644 --- a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md +++ b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md @@ -1638,7 +1638,7 @@ def write_cdb(port,cmd): Performance monitoring in 400G ZR/CCMIS optical modules is essential for detecting link degradation and correction. It can be used to compare optical link performance against desired parameters and benchmarks, providing valuable insight into the overall health of the interface link. Below sub-sections will walk through the CLI syntax, output format and high level design. #### 7.2 PM parameters -Please refer to the [2.1.5 Transceiver PM Table](https://github.com/sonic-net/SONiC/edit/master/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md#215-transceiver-pm-table) for the parameters that will be monitored with this CLI. +Please refer to the [2.1.5 Transceiver PM Table]https://github.com/sonic-net/SONiC/blob/c91b25ed8c79cb6e415e6c999affc309e35200f2/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md#215-transceiver-pm-table) for the parameters that will be monitored with this CLI. #### 7.3 CLI Sub-options and Syntax From e5e1e6371e3d1d3d1363056915afcc7111cab30e Mon Sep 17 00:00:00 2001 From: jaganbal-a <97986478+jaganbal-a@users.noreply.github.com> Date: Tue, 18 Apr 2023 10:53:03 -0400 Subject: [PATCH 04/19] Update CMIS_and_C-CMIS_support_for_ZR.md --- doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md index 21090eed05..534a20cce1 100644 --- a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md +++ b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md @@ -1635,7 +1635,7 @@ def write_cdb(port,cmd): ### 7.Performance Monitoring in 400-G ZR module #### 7.1 Overview -Performance monitoring in 400G ZR/CCMIS optical modules is essential for detecting link degradation and correction. It can be used to compare optical link performance against desired parameters and benchmarks, providing valuable insight into the overall health of the interface link. Below sub-sections will walk through the CLI syntax, output format and high level design. +Performance monitoring in 400G ZR/CCMIS optical modules is essential for detecting link degradation and correction. It can be used to compare optical link performance against desired parameters and benchmarks, providing valuable insight into the overall health of the interface link. Below sub-sections will walk through the CLI syntax, output format and high level design. Currently the performance monitoring will be done only for 400G-ZR modules. #### 7.2 PM parameters Please refer to the [2.1.5 Transceiver PM Table]https://github.com/sonic-net/SONiC/blob/c91b25ed8c79cb6e415e6c999affc309e35200f2/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md#215-transceiver-pm-table) for the parameters that will be monitored with this CLI. @@ -1733,3 +1733,6 @@ The PM parameter polling period option is given as there will be platform which 3. PM history CLI command data will be fetched from PM window slot table from State-DB. 4. PM current CLI command PM data will be fetched from Module. + 5. PM statistics slot for the port is cleared when an optics is inserted/deleted to/from the port. + 6. When xcvrd process is restarted, PM statistics colection will be resumed once the ZR module SW DP state become CMIS_READY. + From 4578e77fc617bc3884a3de345905e65543a780d7 Mon Sep 17 00:00:00 2001 From: jaganbal-a <97986478+jaganbal-a@users.noreply.github.com> Date: Tue, 18 Apr 2023 10:55:21 -0400 Subject: [PATCH 05/19] Update CMIS_and_C-CMIS_support_for_ZR.md --- doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md index 534a20cce1..0dfb6ffae5 100644 --- a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md +++ b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md @@ -1727,7 +1727,7 @@ The PM parameter polling period option is given as there will be platform which **HLD pointers:** 1. PM statistics will be sampled for pre-defined time window of 60sec, 15min and 24Hr. The pre-defined time window period is arrived based on the PM application usage in real time. 2. New thread (PM thread) will be created to periodically fetch and update the PM window table. - + From 4a1663eb3ba211757cc89222107954a59fef0888 Mon Sep 17 00:00:00 2001 From: jaganbal-a <97986478+jaganbal-a@users.noreply.github.com> Date: Tue, 18 Apr 2023 10:59:41 -0400 Subject: [PATCH 06/19] Update CMIS_and_C-CMIS_support_for_ZR.md --- doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md index 0dfb6ffae5..f92f265e89 100644 --- a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md +++ b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md @@ -1734,5 +1734,5 @@ The PM parameter polling period option is given as there will be platform which 3. PM history CLI command data will be fetched from PM window slot table from State-DB. 4. PM current CLI command PM data will be fetched from Module. 5. PM statistics slot for the port is cleared when an optics is inserted/deleted to/from the port. - 6. When xcvrd process is restarted, PM statistics colection will be resumed once the ZR module SW DP state become CMIS_READY. + 6. When xcvrd process is restarted, PM statistics collection will be resumed once the ZR module SW DP state become CMIS_READY for the port. From 09b6e3e98090fc644c4e06b0483c171d9122f5c7 Mon Sep 17 00:00:00 2001 From: jaganbal-a <97986478+jaganbal-a@users.noreply.github.com> Date: Mon, 8 May 2023 15:34:02 -0400 Subject: [PATCH 07/19] Update CMIS_and_C-CMIS_support_for_ZR.md Addressing the OCP review comments --- .../CMIS_and_C-CMIS_support_for_ZR.md | 146 ++++++++++++++---- 1 file changed, 120 insertions(+), 26 deletions(-) diff --git a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md index f92f265e89..cba40def8a 100644 --- a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md +++ b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md @@ -1635,11 +1635,60 @@ def write_cdb(port,cmd): ### 7.Performance Monitoring in 400-G ZR module #### 7.1 Overview -Performance monitoring in 400G ZR/CCMIS optical modules is essential for detecting link degradation and correction. It can be used to compare optical link performance against desired parameters and benchmarks, providing valuable insight into the overall health of the interface link. Below sub-sections will walk through the CLI syntax, output format and high level design. Currently the performance monitoring will be done only for 400G-ZR modules. +Performance monitoring in 400G ZR/CCMIS optical modules is essential for detecting link degradation and correction. It involves measuring and analyzing various parameters of the optical signal, such as its power, wavelength, polarization, and phase. By monitoring these parameters, operators can detect and diagnose problems in the system, such as signal distortion, loss, or noise, and take corrective actions to maintain the performance of the system. +It can be used to compare optical link performance against desired parameters and benchmarks, providing valuable insight into the overall health of the interface link. Below sub-sections will walk through the CLI syntax, output format and high level design. Currently the performance monitoring will be executed only for 400G-ZR modules. -#### 7.2 PM parameters + **Solution:** +Over the defined PM interval time, the statistics are collected for the paramters defined in 7.2 from the transciever, sampled and updated to fixed time window slots of TRANSCEIVER_PM_WINDOW_STATS table for each port in a linecard/box. +The statistics then can be displayed for a specific time window using a CLI. All this functionality will be done by xcvrd process in pmon container. + +#### 7.2 Transceiver PM Window statistics parameters Please refer to the [2.1.5 Transceiver PM Table]https://github.com/sonic-net/SONiC/blob/c91b25ed8c79cb6e415e6c999affc309e35200f2/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md#215-transceiver-pm-table) for the parameters that will be monitored with this CLI. + ; Defines Transceiver PM Window statistics table information for a port + key = TRANSCEIVER_PM_WINDOW_STATS|ifname ; information of PM on port + ; field = value + pm_win_num = INTEGER ; PM window number + pm_stat_start_time = 1*255VCHAR ; PM statistics start time for the window. + pm_win_period = 1*255VCHAR ; PM window time period + prefec_ber_avg = FLOAT ; prefec ber avg + prefec_ber_min = FLOAT ; prefec ber min + prefec_ber_max = FLOAT ; prefec ber max + cd_avg = FLOAT ; chromatic dispersion avg + cd_min = FLOAT ; chromatic dispersion min + cd_max = FLOAT ; chromatic dispersion max + dgd_avg = FLOAT ; differential group delay avg + dgd_min = FLOAT ; differential group delay min + dgd_max = FLOAT ; differential group delay max + sopmd_avg = FLOAT ; second order polarization mode dispersion avg + sopmd_min = FLOAT ; second order polarization mode dispersion min + sopmd_max = FLOAT ; second order polarization mode dispersion max + pdl_avg = FLOAT ; polarization dependent loss avg + pdl_min = FLOAT ; polarization dependent loss min + pdl_max = FLOAT ; polarization dependent loss max + osnr_avg = FLOAT ; optical signal to noise ratio avg + osnr_min = FLOAT ; optical signal to noise ratio min + osnr_max = FLOAT ; optical signal to noise ratio max + esnr_avg = FLOAT ; electrical signal to noise ratio avg + esnr_min = FLOAT ; electrical signal to noise ratio min + esnr_max = FLOAT ; electrical signal to noise ratio max + cfo_avg = FLOAT ; carrier frequency offset avg + cfo_min = FLOAT ; carrier frequency offset min + cfo_max = FLOAT ; carrier frequency offset max + soproc_avg = FLOAT ; state of polarization rate of change avg + soproc_min = FLOAT ; state of polarization rate of change min + soproc_max = FLOAT ; state of polarization rate of change max + tx_power_avg = FLOAT ; tx output power avg + tx_power_min = FLOAT ; tx output power min + tx_power_max = FLOAT ; tx output power max + rx_tot_power_avg = FLOAT ; rx total power avg + rx_tot_power_min = FLOAT ; rx total power min + rx_tot_power_max = FLOAT ; rx total power max + rx_sig_power_avg = FLOAT ; rx signal power avg + rx_sig_power_min = FLOAT ; rx signal power min + rx_sig_power_max = FLOAT ; rx signal power max + + #### 7.3 CLI Sub-options and Syntax ``` @@ -1666,13 +1715,14 @@ commands:​ 15min show cumulative pm statistics for 15min time window for the given window number.​ 24Hr show cumulative pm statistics for 24Hr time window for the given window number.​ ​ -#show int trans pm history 30sec window -n asic0 Ethernet0​ +#show int trans pm history 30sec window -n asic0 Ethernet0​ ​ Optional subset display:​ #Show int trans pm current 60sec –n asic0 Ethernet0​ commands:​ fec shows pm fec data​ ``` + #### 7.4 CLI Sample Output format ``` root@sonic:/home/cisco# show interface transceiver pm history 60sec window 1 -n asic0 Ethernet2 @@ -1705,34 +1755,78 @@ Ethernet2: 1. performance monitor enable - Global CLI to enable PM on all ports. Before this configuration get implemented, PM will be enabled by default on all ports -##### 7.5.2 Tables: -1. The PM window table consists of the following: - - Window number - - Start timestamp of PM window - - Parameters from PM table +##### 7.5.2 PM Interval and predefined time window: -2. Total 28 PM window slots per interface. - - 15 slots of 60sec window (15mins of history). - - 12 slots of 15min window (3Hr of history) and - - 1 slot of 24hr window. - +- ##### PM interval time/statistic collection interval: +Duration between two ‘VDM freeze’ requests issued by host, which the host collects the cumulative statistics for all PM parameter after the 2nd freeze request. It is a host-controlled monitoring interval. During this time, the module that supports statistics takes short term measurements which are also called samples over a module vendor specific fine measurement time interval (eg : 1ms) and then updates internal statistics variables(min, max, avg), thus providing cumulative statistics until the host issues the 2nd freeze request. - +This feature allows a platform(Pizza-box or distributed system-linecard with CPU ) to choose from following interval period for PM interval. + - 30sec + - 60sec and + - 120sec +It is recommended to choose 30sec, platforms that have high CPU load can choose 120sec as PM interval. By default 60sec is the PM interval if no input provided by platform, please refer '7.5.4' for platform input. -##### 7.5.3 Platform specific flags/inputs: -"xcvrd_pm_poll_interval" - Platform to define the PM polling periodicity as 30sec or 60sec of the PM thread which will be fed as input argument. When the arg is not defined, default periodicity is 60sec. +- ##### Pre-defined PM time window: +The cumulative PM statistics over an interval of time is called a PM time window. The PM statistics will be reset after updating a time window and the statistics will be computed from samples collected from the start time of next ‘PM time window’. The cumulative statistics of a specific time window allow the user to estimate the quality of the link over a specific time slot. +The period of a time window and number of windows are not specified in any standard. In this feature three-time window intervals are defined with a granularity of 60seconds, 15mins and 24hour. The 60sec time window is based on the PM interval time and the 15mins and 24hour are defined for the ease of debuggability in realtime. For example, to understand/debug the link stability while bringing up an 400G-ZR interface connection on a topology, a 60sec statistics/current sample will be useful and four 15mins window statistics monitoring will be useful to understand the link stability from initial connection. +For the long term link health monitoring, device telemetry data can be analyzed to estimate the link health but it is out of scope of this HLD/feature. + +The number of time window for each granularity is as follow. +-15 ‘time windows’ of 60sec (at any given time user can view 15mins of statistics with 1 min granularity) +-12 ‘time windows’ of 15min (at any given time user can view 3Hr of history with 15min granularity) and +-2 ‘time windows’ of 24hr (at any given time user can view 24Hr of statistics). + +So total 29 PM time window slots will be maintained per port/interface when inserted with 400G-ZR module. + +- ##### Caveat: +Platform which choose 120sec as PM interval time, the 60sec granularity of 15 ‘time window’ is not valid/not computed as the chosen PM interval time is greater than 60sec. The CLI o/p is valid only for the 15min/24Hr predefined time window. + +#### 7.5.3 Examples: +- ##### CLI usage and expected output; example: (Platform with PM interval time : 30sec) + - ##### 15min time window: +``` + Show int trans pm current 15min –n asic0 Ethernet0 -The PM parameter polling period option is given as there will be platform which requires more time for IO read . + Assume the last 15min time window started at Time T0-> 9.45.00 by the PM thread running in xcvrd. + At T1 time->9:50:42, above pm current CLI with predefined time window interval of 15min is executed + The expected display is the cumulative statistics from time 9.45:00 to 9.50:30. (This is the current 15mins statistics in flight) + Show int trans pm history 15min window 2 –n asic0 Ethernet0 + + Assume the last 15min time window started at Time T0-> 9.44.00 by the PM thread running in xcvrd. + At T1 time->9:50:42, above pm history CLI with ‘predefined window’ of 15min and ‘window number’ 2 is executed + The expected display is the 2nd last 15mins cumulative statistics from time 9.43:30 to 9.43:45. - **HLD pointers:** - 1. PM statistics will be sampled for pre-defined time window of 60sec, 15min and 24Hr. The pre-defined time window period is arrived based on the PM application usage in real time. - 2. New thread (PM thread) will be created to periodically fetch and update the PM window table. - + 60sec time window: + + Show int trans pm history 60sec window 15 –n asic0 Ethernet0 + Assume the last PM statistics interval started at Time T0-> 9.44.00 by the PM thread running in xcvrd. + T2 time->9:45:44, above pm history CLI with ‘predefined window’ of 60sec and ‘window number’ 15 is executed. + the CLI o/p displays the cumulative data from time 9.00:00 to 9.01:00. + Show int trans pm current 60sec –n asic0 Ethernet0 + Assume the last PM statistics interval started at Time T0-> 9.44.00 by the PM thread running in xcvrd. + At T1 time->9:44:42, above pm current CLI with ‘predefined window’ of 60sec is executed + The expected display is the cumulative data from time 9.44:00 to 9.44:30 "this is the current 60sec window data which is in flight" +``` +For Platform with PM interval time of 60sec: the above CLI will display the statistics from 9:43:00 to 9:44:00, which will be same as history with window 1 as there is no PM statistics collected within the 60sec. + +- ##### Sampling for fixed time window; example: + In this example, PM started for the port at Apr 27 9.44.00 UTC 2023. At this time all the time window slots are empty and will be updated every PM interval time (default 60sec). +The 60sec time window are filled as it is read from module if the PM interval is 60sec, the 60sec sample collected from module is then sampled for 15mins and 24Hr window every 60seconds. + + + + +##### 7.5.4 Platform specific flags/inputs: +"xcvrd_pm_poll_interval" - Platform to define the PM polling periodicity as 30sec or 60sec of the PM thread which will be fed as input argument. When the arg is not defined, default periodicity is 60sec. + + +#### 7.5.5 High level Work flow: + 1. A new thread will be created 'PM thread' in xcvrd process to collect PM statistics every PM interval period from the 400G-ZR transciever and update both TRANSCEIVER_PM and TRANSCEIVER_PM_WINDOW_STATS. + 2. PM CLI command data will be fetched from PM window slot table from State-DB. + 3. PM statistics slot for the port is cleared when an optics is inserted/deleted to/from the port. + 4. When xcvrd process is restarted, PM statistics collection will be resumed. + - + - 3. PM history CLI command data will be fetched from PM window slot table from State-DB. - 4. PM current CLI command PM data will be fetched from Module. - 5. PM statistics slot for the port is cleared when an optics is inserted/deleted to/from the port. - 6. When xcvrd process is restarted, PM statistics collection will be resumed once the ZR module SW DP state become CMIS_READY for the port. From 6c31d76f05d775b6bdcbc0cc9c89ed69adbdb974 Mon Sep 17 00:00:00 2001 From: jaganbal-a <97986478+jaganbal-a@users.noreply.github.com> Date: Wed, 10 May 2023 15:12:52 -0400 Subject: [PATCH 08/19] Update CMIS_and_C-CMIS_support_for_ZR.md 7.Performance Monitoring - 400-G ZR module --- .../CMIS_and_C-CMIS_support_for_ZR.md | 35 +++++++++++-------- 1 file changed, 21 insertions(+), 14 deletions(-) diff --git a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md index cba40def8a..9a1636fa21 100644 --- a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md +++ b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md @@ -1633,7 +1633,7 @@ def write_cdb(port,cmd): write_reg(port, LPLPAGE, INIT_OFFSET, CMDLEN, cmd[:CMDLEN]) ``` -### 7.Performance Monitoring in 400-G ZR module +### 7.Performance Monitoring - 400-G ZR module #### 7.1 Overview Performance monitoring in 400G ZR/CCMIS optical modules is essential for detecting link degradation and correction. It involves measuring and analyzing various parameters of the optical signal, such as its power, wavelength, polarization, and phase. By monitoring these parameters, operators can detect and diagnose problems in the system, such as signal distortion, loss, or noise, and take corrective actions to maintain the performance of the system. It can be used to compare optical link performance against desired parameters and benchmarks, providing valuable insight into the overall health of the interface link. Below sub-sections will walk through the CLI syntax, output format and high level design. Currently the performance monitoring will be executed only for 400G-ZR modules. @@ -1781,7 +1781,7 @@ So total 29 PM time window slots will be maintained per port/interface when inse - ##### Caveat: Platform which choose 120sec as PM interval time, the 60sec granularity of 15 ‘time window’ is not valid/not computed as the chosen PM interval time is greater than 60sec. The CLI o/p is valid only for the 15min/24Hr predefined time window. -#### 7.5.3 Examples: +##### 7.5.3 Examples: - ##### CLI usage and expected output; example: (Platform with PM interval time : 30sec) - ##### 15min time window: ``` @@ -1810,23 +1810,30 @@ Platform which choose 120sec as PM interval time, the 60sec granularity of 15 For Platform with PM interval time of 60sec: the above CLI will display the statistics from 9:43:00 to 9:44:00, which will be same as history with window 1 as there is no PM statistics collected within the 60sec. - ##### Sampling for fixed time window; example: - In this example, PM started for the port at Apr 27 9.44.00 UTC 2023. At this time all the time window slots are empty and will be updated every PM interval time (default 60sec). + In this example only Rx total power statistics parameter is displayed for simplicity, The assumption is PM started for the port at Apr 27 9.44.00 UTC 2023. At this time all the time window slots are empty and will be updated every PM interval time (default 60sec). The 60sec time window are filled as it is read from module if the PM interval is 60sec, the 60sec sample collected from module is then sampled for 15mins and 24Hr window every 60seconds. - + ##### 7.5.4 Platform specific flags/inputs: "xcvrd_pm_poll_interval" - Platform to define the PM polling periodicity as 30sec or 60sec of the PM thread which will be fed as input argument. When the arg is not defined, default periodicity is 60sec. -#### 7.5.5 High level Work flow: - 1. A new thread will be created 'PM thread' in xcvrd process to collect PM statistics every PM interval period from the 400G-ZR transciever and update both TRANSCEIVER_PM and TRANSCEIVER_PM_WINDOW_STATS. - 2. PM CLI command data will be fetched from PM window slot table from State-DB. - 3. PM statistics slot for the port is cleared when an optics is inserted/deleted to/from the port. - 4. When xcvrd process is restarted, PM statistics collection will be resumed. - - - - - +##### 7.5.5 High level Work flow: + 1. A new thread will be created 'PM thread' in xcvrd process to collect PM statistics between every PM interval period from the 400G-ZR transciever and update both TRANSCEIVER_PM and TRANSCEIVER_PM_WINDOW_STATS, follwing is the work flow. + 2. PM thread to check "config pm enable" configuration presence to collect PM statistics. + 3. CMIS xcvrAPI to freeze the statistics in transceiver, record the timestamp and copy both recorded timestamp and PM statitics from transciver. + 4. Above Freezing request will reset and start new statistics set, CMIS xcvrAPI to unfreeze the statistics register in transceiver. + 5. Update the copied PM statistics and timestamp to TRANSCEIVER_PM table. + 6. Fetch the data from TRANSCEIVER_PM table and update the respective time window slots <60sec/15min/24Hrs> in TRANSCEIVER_PM_WINDOW_STATS table + 7. Sample the data between TRANSCEIVER_PM_WINDOW_STATS and TRANSCEIVER_PM table and update the respective time window slot <60sec/15min/24Hrs> of TRANSCEIVER_PM_WINDOW_STATS table. + 8. PM thread will iterate all the ports and will sleep for PM interval and repeat the steps from pointer 2 to 8. + 9. PM CLI mentoned in 7.3 will always fetch data from TRANSCEIVER_PM_WINDOW_STATS and provide the display. + 10. All the PM time window slots for the port is cleared when an optics is inserted/deleted to/from the port. + 11. When xcvrd process is restarted, PM statistics collection will be resumed. + + #### 7.6 Out of Scope + 1. Peformance monitoring is not enabled/performed for modules other than 400G-ZR. + 2. Thershold crossing setting/monitoring for PM params are not covered as part of this implementation. + From d8e372c4c7abc9aa9b83cc9df524d9a09f230822 Mon Sep 17 00:00:00 2001 From: jaganbal-a <97986478+jaganbal-a@users.noreply.github.com> Date: Wed, 28 Jun 2023 11:24:32 -0400 Subject: [PATCH 09/19] Update CMIS_and_C-CMIS_support_for_ZR.md Improvement --- .../CMIS_and_C-CMIS_support_for_ZR.md | 57 ++++++++++++------- 1 file changed, 37 insertions(+), 20 deletions(-) diff --git a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md index 9a1636fa21..7b9c50f1e7 100644 --- a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md +++ b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md @@ -1644,13 +1644,22 @@ The statistics then can be displayed for a specific time window using a CLI. All #### 7.2 Transceiver PM Window statistics parameters Please refer to the [2.1.5 Transceiver PM Table]https://github.com/sonic-net/SONiC/blob/c91b25ed8c79cb6e415e6c999affc309e35200f2/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md#215-transceiver-pm-table) for the parameters that will be monitored with this CLI. - +``` ; Defines Transceiver PM Window statistics table information for a port key = TRANSCEIVER_PM_WINDOW_STATS|ifname ; information of PM on port - ; field = value - pm_win_num = INTEGER ; PM window number - pm_stat_start_time = 1*255VCHAR ; PM statistics start time for the window. - pm_win_period = 1*255VCHAR ; PM window time period + ; field = value + window1 = 1*255VCHAR ; PM window number with PM window start time, end time, current and PM_TABLE fields. + window2 = 1*255VCHAR ; PM window number with PM window start time, end time, current and PM_TABLE fields. + . + . + . + window29 = 1*255VCHAR ; PM window number with PM window start time, end time, current and PM_TABLE fields. + +Each window field in TRANSCEIVER_PM_WINDOW_STATS will have following fields and respective value as string. + + pm_stat_start_time = 1*255VCHAR ; PM statistics start time for the window. + pm_stat_end_time = 1*255VCHAR ; PM statistics end time for the window. + pm_win_current = 1*255VCHAR ; PM statistics collection is Progressing on this window. prefec_ber_avg = FLOAT ; prefec ber avg prefec_ber_min = FLOAT ; prefec ber min prefec_ber_max = FLOAT ; prefec ber max @@ -1687,7 +1696,7 @@ Please refer to the [2.1.5 Transceiver PM Table]https://github.com/sonic-net/SON rx_sig_power_avg = FLOAT ; rx signal power avg rx_sig_power_min = FLOAT ; rx signal power min rx_sig_power_max = FLOAT ; rx signal power max - +``` #### 7.3 CLI Sub-options and Syntax @@ -1757,24 +1766,32 @@ Before this configuration get implemented, PM will be enabled by default on all ##### 7.5.2 PM Interval and predefined time window: -- ##### PM interval time/statistic collection interval: -Duration between two ‘VDM freeze’ requests issued by host, which the host collects the cumulative statistics for all PM parameter after the 2nd freeze request. It is a host-controlled monitoring interval. During this time, the module that supports statistics takes short term measurements which are also called samples over a module vendor specific fine measurement time interval (eg : 1ms) and then updates internal statistics variables(min, max, avg), thus providing cumulative statistics until the host issues the 2nd freeze request. +- ##### PM interval /statistic collection interval: +Duration between two ‘VDM freeze’ requests issued by host, which the host collects the cumulative statistics for all PM parameter after the 2nd freeze request. +It is a host-controlled monitoring interval. During this time, the module that supports statistics takes short term measurements which are also called samples over a module vendor specific fine measurement time interval (eg : 1ms) and then updates internal statistics variables(min, max, avg), thus providing cumulative statistics until the host issues the 2nd freeze request. -This feature allows a platform(Pizza-box or distributed system-linecard with CPU ) to choose from following interval period for PM interval. - - 30sec - - 60sec and - - 120sec -It is recommended to choose 30sec, platforms that have high CPU load can choose 120sec as PM interval. By default 60sec is the PM interval if no input provided by platform, please refer '7.5.4' for platform input. +This feature allows a platform(Pizza-box or distributed system-linecard with CPU ) to choose from following period for PM interval. +- 30sec +- 60sec and +- 120sec + +It is recommended to choose 30sec, platforms that have high CPU load can choose 120sec as PM interval. By default 60sec is the PM interval when no input provided by platform, please refer '7.5.4' for platform input. + +- ##### PM time window or PM window: +The cumulative PM statistics over an interval of time is called a PM time window AKA PM window. +Each PM window PM Statistics are computed from samples reseted from the start time of that PM window. The cumulative statistics of a specific PM window allow the user to estimate the quality of the link over a specific time period. -- ##### Pre-defined PM time window: -The cumulative PM statistics over an interval of time is called a PM time window. The PM statistics will be reset after updating a time window and the statistics will be computed from samples collected from the start time of next ‘PM time window’. The cumulative statistics of a specific time window allow the user to estimate the quality of the link over a specific time slot. -The period of a time window and number of windows are not specified in any standard. In this feature three-time window intervals are defined with a granularity of 60seconds, 15mins and 24hour. The 60sec time window is based on the PM interval time and the 15mins and 24hour are defined for the ease of debuggability in realtime. For example, to understand/debug the link stability while bringing up an 400G-ZR interface connection on a topology, a 60sec statistics/current sample will be useful and four 15mins window statistics monitoring will be useful to understand the link stability from initial connection. +The period of a PM time window and number of windows are not specified in any standard. In this feature three-PM time window intervals are defined with a granularity of 60seconds, 15mins and 24hour. + +The 60sec time window is based on the PM interval time and the 15mins and 24hour are defined for the ease of debuggability in realtime. +For example, to understand/debug the link stability while bringing up an 400G-ZR interface connection on a topology, a 60sec statistics/current sample will be useful and four 15mins window statistics monitoring will be useful to understand the link stability from initial connection. For the long term link health monitoring, device telemetry data can be analyzed to estimate the link health but it is out of scope of this HLD/feature. -The number of time window for each granularity is as follow. --15 ‘time windows’ of 60sec (at any given time user can view 15mins of statistics with 1 min granularity) --12 ‘time windows’ of 15min (at any given time user can view 3Hr of history with 15min granularity) and --2 ‘time windows’ of 24hr (at any given time user can view 24Hr of statistics). +The number of PM window for each granularity is defined as follow. + + - 15 ‘PM windows’ of 60sec (at any given time user can view 15mins of statistics history once accumulated with 1 min granularity) + - 12 ‘PM windows’ of 15min (at any given time user can view 3Hr of statistics history once accumulated with 15min granularity) and + - 2 ‘PM windows’ of 24hr (at any given time user can view 24Hr of statistics once accumulated). So total 29 PM time window slots will be maintained per port/interface when inserted with 400G-ZR module. From f04b9b15c9af17cc7b69d52638855cdd1869933b Mon Sep 17 00:00:00 2001 From: jaganbal-a <97986478+jaganbal-a@users.noreply.github.com> Date: Wed, 28 Jun 2023 11:57:00 -0400 Subject: [PATCH 10/19] Update CMIS_and_C-CMIS_support_for_ZR.md --- .../CMIS_and_C-CMIS_support_for_ZR.md | 32 ++++++++++++------- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md index 7b9c50f1e7..b23820ef96 100644 --- a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md +++ b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md @@ -1700,6 +1700,7 @@ Each window field in TRANSCEIVER_PM_WINDOW_STATS will have following fields and #### 7.3 CLI Sub-options and Syntax + ``` #show int transceiver ​ commands:​ @@ -1707,25 +1708,34 @@ pm show interface transceiver performance monitoring​ ​ #Show int trans pm ​ commands:​ -Current show current pm data​ +Current show progressing pm data​ history show historical pm data​ ​ #show int trans pm current ​ commands:​ -60sec show cumulative pm statistics for 60sec time window for the current window. ​ -15min show cumulative pm statistics for 15min time window for the current window.​ -24Hr show cumulative pm statistics for 24Hr time window for the current window.​ - Without time window, the CLI will display the current snapshot of pm parameter.​ +60sec show cumulative pm statistics from the progressing 60sec pm window. ​ +15min show cumulative pm statistics from the progressing 15min pm window.​ +24hrs show cumulative pm statistics from the progressing 24hrs pm window.​ #show int trans pm history​ commands:​ -60sec show cumulative pm statistics for 60sec time window for the given window number.​ -15min show cumulative pm statistics for 15min time window for the given window number.​ -24Hr show cumulative pm statistics for 24Hr time window for the given window number.​ -​ -#show int trans pm history 30sec window -n asic0 Ethernet0​ -​ +60sec show cumulative pm statistics for the given pm window number from 60sec pm window.​ +15min show cumulative pm statistics for the given pm window number from 15min pm window.​ +24hrs show cumulative pm statistics for the given pm window number from 24hrs pm window.​ + +#show int trans pm history 60sec window +commands: +1 - 14 PM window number + +#show int trans pm history 15min window +commands: +1 - 11 PM window number + +#show int trans pm history 24hrs window +commands: +1 PM window number + Optional subset display:​ #Show int trans pm current 60sec –n asic0 Ethernet0​ commands:​ From 59558312933f2b8c3b0dd13f3ebeaf832e01d91a Mon Sep 17 00:00:00 2001 From: jaganbal-a <97986478+jaganbal-a@users.noreply.github.com> Date: Wed, 28 Jun 2023 11:59:06 -0400 Subject: [PATCH 11/19] Update CMIS_and_C-CMIS_support_for_ZR.md --- doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md index b23820ef96..98a74906c0 100644 --- a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md +++ b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md @@ -1799,8 +1799,8 @@ For the long term link health monitoring, device telemetry data can be analyzed The number of PM window for each granularity is defined as follow. - - 15 ‘PM windows’ of 60sec (at any given time user can view 15mins of statistics history once accumulated with 1 min granularity) - - 12 ‘PM windows’ of 15min (at any given time user can view 3Hr of statistics history once accumulated with 15min granularity) and + - 15 ‘PM windows’ of 60sec (at any given time user can view 14mins of statistics history once accumulated with 1 min granularity) + - 12 ‘PM windows’ of 15min (at any given time user can view 2.45Hr of statistics history once accumulated with 15min granularity) and - 2 ‘PM windows’ of 24hr (at any given time user can view 24Hr of statistics once accumulated). So total 29 PM time window slots will be maintained per port/interface when inserted with 400G-ZR module. From d455c0e0fe24809194abba0bf85c0bbf79114879 Mon Sep 17 00:00:00 2001 From: jaganbal-a <97986478+jaganbal-a@users.noreply.github.com> Date: Wed, 5 Jul 2023 18:07:55 -0400 Subject: [PATCH 12/19] Update CMIS_and_C-CMIS_support_for_ZR.md --- .../CMIS_and_C-CMIS_support_for_ZR.md | 48 ++++++++++++------- 1 file changed, 32 insertions(+), 16 deletions(-) diff --git a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md index 98a74906c0..1b22139865 100644 --- a/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md +++ b/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md @@ -1655,11 +1655,11 @@ Please refer to the [2.1.5 Transceiver PM Table]https://github.com/sonic-net/SON . window29 = 1*255VCHAR ; PM window number with PM window start time, end time, current and PM_TABLE fields. -Each window field in TRANSCEIVER_PM_WINDOW_STATS will have following fields and respective value as string. +Each window field(Key) in TRANSCEIVER_PM_WINDOW_STATS have a value string comprises of following fields. - pm_stat_start_time = 1*255VCHAR ; PM statistics start time for the window. - pm_stat_end_time = 1*255VCHAR ; PM statistics end time for the window. - pm_win_current = 1*255VCHAR ; PM statistics collection is Progressing on this window. + pm_win_start_time = 1*255VCHAR ; PM statistics start time for the window. + pm_win_end_time = 1*255VCHAR ; PM statistics end time for the window. + pm_win_current = 1*255VCHAR ; PM statistics collection is Progressing on this window. (True/False) prefec_ber_avg = FLOAT ; prefec ber avg prefec_ber_min = FLOAT ; prefec ber min prefec_ber_max = FLOAT ; prefec ber max @@ -1711,8 +1711,7 @@ commands:​ Current show progressing pm data​ history show historical pm data​ -​ -#show int trans pm current ​ +​#show int trans pm current