Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Performance monitoring update to CMIS_and_C-CMIS_support_for_ZR.md #1258

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

jaganbal-a
Copy link
Contributor

@jaganbal-a jaganbal-a commented Feb 9, 2023

Adding performance monitoring HLD .
Please refer to below code PR for the implementation
sonic-platform-common: sonic-net/sonic-platform-common#402
sonic-platform-daemons :sonic-net/sonic-platform-daemons#390
sonic-utilities : sonic-net/sonic-utilities#2927

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Feb 9, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@prgeor prgeor self-assigned this Mar 1, 2023
@prgeor prgeor requested review from keboliu and mihirpat1 March 1, 2023 19:03
@prgeor
Copy link
Contributor

prgeor commented Mar 1, 2023

@keboliu @mihirpat1 @andywongarista please review

@prgeor
Copy link
Contributor

prgeor commented Mar 1, 2023

@qinchuanares Please review

@prgeor
Copy link
Contributor

prgeor commented Mar 1, 2023

@jaganbal-a you need to sign the EASY CLA

@zhangyanzhao
Copy link
Collaborator

zhangyanzhao commented Apr 17, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@zhangyanzhao
Copy link
Collaborator

The code PRs need be added to this HLD PR

@zhangyanzhao
Copy link
Collaborator

@jaganbal-a
Copy link
Contributor Author

jaganbal-a commented Apr 18, 2023

Review comments from the community review.

  1. Give platform with an option of 30,60 and 120 sec of PM time interval for the statistics collection.
    Jagan >Added 120sec for PM interval in the design with a caveat (60sec PM time window option in the CLI will not provide any output for the platform which chooses PM interval as 120sec).

  2. Add PM window slot schema for the state-DB
    Jagan> Added.

  3. Add an example of how the data is sampled by xcvrd for 15min/24hr window slots.
    Jagan> Added

  4. With fixed window, at 24hr completion the sampled data in the 24hr window will reset, so at any given point the 24hrs data will not be maintained. So move the 24hr window to a moving window or maintain 3hr window of 8slots (this will allow atleast 21hrs of data at any point of time)
    Jagan> 1 more slot added for 24 Hrs PM time window. So at any given time 24Hrs statistics will be available.

  5. Add yang model for the global config CLI to enable/disable the PM statistics collection.
    Jagan> It is pending, will add it.

  6. Are these sampling windows sliding or not? >> it seems the windows are fixed, is this possible to extend it for sliding (ex: last 23 hrs)
    Jagan> The time windows are fixed. the statistics are reset for every fixed window and the purpose of having fixed window is to understand the failure to a specific time window.

  7. Can the HLD allow multiple sampling windows to capture the PM statistics ?
    Jagan> The statistics can be collected only for a interval of time from the transceiver. So here the interval is defined by the host and updated to 60sec/15min and 24Hr time window slots. Hope this answer the question.

  8. What are the use cases trying to solve this problem? 
    Jagan> Added to the overview section.

  9. What is the SONiC default behaviour if customers don't use this feature? 
    Jagan> There will be no change.

  10. What is the CPU penalty to requirements to support this sampling feature with SONiC? 
    Jagan> The sampling of data from the collected statistics will not add any overhead to the CPU.

  11. How does a SONiC user know whether the platform driver has this capability?Sampling - is the driver dropping any data points or sending everything? 
    Jagan> User need to enable the global CLI to start the functionality and it start kicks in only when there is a 400G-ZR module present in the system/box. Driver not supposed to drop any data as per the design and update the statistics to the DB.

  12. Where is this sampling implemented?
    Jagan> It is in xcvrd process.

  13. What must be supported by each ASIC driver?
    Jagan>There is no involvement of ASIC driver.

Addressing the OCP review comments
Copy link
Collaborator

@zhangyanzhao zhangyanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaganbal-a can you please help to add the code PRs? Thanks.

@jaganbal-a
Copy link
Contributor Author

@jaganbal-a can you please help to add the code PRs? Thanks.

@zhangyanzhao , I am working on the code changes and I will post once I am complete with unit testing. But this PR is for the high level design. so please approve if you do not have any further question on the HLD.

@jaganbal-a jaganbal-a closed this May 10, 2023
@jaganbal-a jaganbal-a reopened this May 10, 2023
7.Performance Monitoring - 400-G ZR module
@bmridul
Copy link
Contributor

bmridul commented May 24, 2023

@lguohan , Pls review the document.

@zhangyanzhao
Copy link
Collaborator

@jaganbal-a code PRs need be included in the HLD before HLD can be merged, can you please help to add the code PRs? Thanks.


#show int trans pm history 15min window
commands:
1 - 11 PM window number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaganbal-a what would the output be for window number history for which sample is yet not collected.? Say after boot, 15 minutes has passed and user wants to see window# 11 for 15min window

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

root@sonic:/home/cisco# show int trans pm history 15min window 11 -n asic2 Ethernet232
Tue Oct 24 17:33:41 UTC 2023
PM window: 15min
Ethernet232: Transceiver performance monitoring data not available for the requested window


#show int trans pm history 60sec window
commands:
1 - 14 PM window number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaganbal-a can you show the calculation for the window range. Why 1-14? Why not 1-20?

Copy link
Contributor Author

@jaganbal-a jaganbal-a Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 60sec window size, by design 15 windows are allocated.
In that, 14 windows are used for history -numbering from 1 to 14 and 1 window is used for viewing current statistics.

For 15min window size, 12 windows are allocated.
In that, 11 windows are used for history and 1 window is used for current.

For 24Hrs window size, 2 windows are allocated.
In that, 1 window used for history and another window used for current.

Mentioned here : https://github.com/sonic-net/SONiC/blob/d0364205764aa49decfca1eb669c76d7f0c8f4e0/doc/platform_api/CMIS_and_C-CMIS_support_for_ZR.md#pm-time-window-or-pm-window-


##### 7.5.1 Configurations
1. performance monitor enable - Global CLI to enable PM on all ports.
Before this configuration get implemented, PM will be enabled by default on all ports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaganbal-a is this future implementation? i.e per port configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor , Yes, this will future implementation. Global CLI to enable/disable PM on all ports.
Raised a github issue :- sonic-net/sonic-platform-daemons#402

- 60sec and
- 120sec

It is recommended to choose 30sec, platforms that have high CPU load can choose 120sec as PM interval. By default 60sec is the PM interval when no input provided by platform, please refer '7.5.4' for platform input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to hyperlink

The 60sec time window are filled as it is read from module if the PM interval is 60sec, the 60sec sample collected from module is then sampled for 15mins and 24Hr window every 60seconds.


<img src ="https://user-images.githubusercontent.com/97986478/236910604-bbb52e77-5c62-4a7e-b64d-ff883d9b57d3.png" width=60% height=50%>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaganbal-a why is the avg rx power remain constant at -10.61

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected the avg in the example and added pm window end time.

sampled_pm_dict['prefec_ber_min'] = min(float(pm_data_dict1['prefec_ber_min']), float(pm_data_dict2['prefec_ber_min']))
sampled_pm_dict['prefec_ber_max'] = max(float(pm_data_dict1['prefec_ber_max']), float(pm_data_dict2['prefec_ber_max']))

sampled_pm_dict['uncorr_frames_avg'] = self.average_of_two_val(float(pm_data_dict1['uncorr_frames_avg']), float(pm_data_dict2['uncorr_frames_avg']))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaganbal-a one example for min, max, avg is sufficient. We dont have to show for all PM parameters

Comment on lines 1658 to 1684
Each window field(Key) in TRANSCEIVER_PM_WINDOW_STATS have a value string comprises of following fields.

pm_win_start_time = 1*255VCHAR ; PM statistics start time for the window.
pm_win_end_time = 1*255VCHAR ; PM statistics end time for the window.
pm_win_current = 1*255VCHAR ; PM statistics collection is Progressing on this window. (True/False)
prefec_ber_avg = FLOAT ; prefec ber avg
prefec_ber_min = FLOAT ; prefec ber min
prefec_ber_max = FLOAT ; prefec ber max
cd_avg = FLOAT ; chromatic dispersion avg
cd_min = FLOAT ; chromatic dispersion min
cd_max = FLOAT ; chromatic dispersion max
dgd_avg = FLOAT ; differential group delay avg
dgd_min = FLOAT ; differential group delay min
dgd_max = FLOAT ; differential group delay max
sopmd_avg = FLOAT ; second order polarization mode dispersion avg
sopmd_min = FLOAT ; second order polarization mode dispersion min
sopmd_max = FLOAT ; second order polarization mode dispersion max
pdl_avg = FLOAT ; polarization dependent loss avg
pdl_min = FLOAT ; polarization dependent loss min
pdl_max = FLOAT ; polarization dependent loss max
osnr_avg = FLOAT ; optical signal to noise ratio avg
osnr_min = FLOAT ; optical signal to noise ratio min
osnr_max = FLOAT ; optical signal to noise ratio max
esnr_avg = FLOAT ; electrical signal to noise ratio avg
esnr_min = FLOAT ; electrical signal to noise ratio min
esnr_max = FLOAT ; electrical signal to noise ratio max
cfo_avg = FLOAT ; carrier frequency offset avg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaganbal-a we don't need to copy these again from section 2.1.5. A reference to 2.1.5 is sufficient. can you remove this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the redundant fields and added link.

@jaganbal-a
Copy link
Contributor Author

@jaganbal-a can you please help to add the code PRs? Thanks.

@zhangyanzhao
sonic-platform-common: sonic-net/sonic-platform-common#402
sonic-platform-daemons :sonic-net/sonic-platform-daemons#390
sonic-utilities : sonic-net/sonic-utilities#2927

@jaganbal-a
Copy link
Contributor Author

@jaganbal-a can you please help to add the code PRs? Thanks.

@zhangyanzhao , I am working on the code changes and I will post once I am complete with unit testing. But this PR is for the high level design. so please approve if you do not have any further question on the HLD.

please check the description for the code PRs.

@jaganbal-a jaganbal-a closed this Oct 20, 2023
@jaganbal-a jaganbal-a reopened this Oct 20, 2023
@jaganbal-a jaganbal-a requested a review from prgeor October 25, 2023 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants