Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OA crash handling to reinitialize port through xcvrd #1432
OA crash handling to reinitialize port through xcvrd #1432
Changes from 4 commits
f4533cd
de1c276
b5713ec
2fc0904
04f8417
2aadb30
cb4e2d2
49acc4c
2b298bf
b1082ba
2d1e79b
b68d292
2bfc779
2fe94b5
23b7aa3
ce17121
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To know that the XCVRD main thread might need to parse "media_settings.json" file. Ports which have no match in this file don't require media settings. Is it the intention to do ? I know that with recent changes parsing/setting of media settings moved from init code of the XCVRD main thread to init of SfpStateUpdate task. May be this setting after parsing of .json file should be done there ? Or MEDIA_SETTING_DEFAULT can be put for all ports regardless media settings for them in .json file ? Then if SfpStateUpdate taks finds data for a port in .json it will set MEDIA_SETTING_NOTIFIED ? What is the impact of having MEDIA_SETTING_DEFAULT for all ports ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To know that the XCVRD main thread might need to parse "media_settings.json" file. Ports which have no match in this file don't require media settings. Is it the intention to do ?
[MP] Yes - if the media settings of a transceiver are not part of "media_settings.json", we would still keep the port with the value MEDIA_SETTING_DEFAULT
I know that with recent changes parsing/setting of media settings moved from init code of the XCVRD main thread to init of SfpStateUpdate task. May be this setting after parsing of .json file should be done there ? Or MEDIA_SETTING_DEFAULT can be put for all ports regardless media settings for them in .json file ?
[MP] I think it is better to initialize this key to MEDIA_SETTING_DEFAULT from xcvrd main thread since this key is used by both SfpStateUpdateTask and CmisManagerTask for handling media settings application for a port and these threads are running parallelly.
Then if SfpStateUpdate taks finds data for a port in .json it will set MEDIA_SETTING_NOTIFIED ? What is the impact of having MEDIA_SETTING_DEFAULT for all ports ?
[MP] Yes, once SfpStateUpdateTask find the media settings in .json file, the value of MEDIA_SETTINGS_SYNC_STATUS changes to MEDIA_SETTING_NOTIFIED. I think there is no harm in setting to MEDIA_SETTING_DEFAULT for all ports irrespective of its media settings requirement from the xcvrd main thread perspective. Also, the advantage with this approach is we do not introduce any race conditions between OA, SfpStateUpdateTask and CmisManagerTask threads related to the role or sequence in modifying the value of MEDIA_SETTINGS_SYNC_STATUS .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this row, I believe consumer thread captures both uses cases - boot-up and transceiver insertion.
In that case, please add 'transceiver insertion' here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a check in OA as well to perform port toggle and set MEDIA_SETTINGS_DONE only if MEDIA_SETTINGS_SYNC_STATUS == MEDIA_SETTINGS_NOTIFIED?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, SfpStateUpdateTask will perform apply media settings upon transceiver insertion irrespective of the value of MEDIA_SETTINGS_SYNC_STATUS (unlike the case of SfpStateUpdateTask checking the value during boot-up).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct indentation (unnecessary gaps between words)
Also bullet the two threads for readability, as:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this state used for? no consumer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This state is currently unused. We can use this for debugging purpose for now to check if XCVRD has applied and notified the media settings to OA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also check for MEDIA_SETTINGS_NOTIFIED also? Could there be a scenario where SfpStateUpdateTask tries to renotify media settings before MEDIA_SETTINGS_DONE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SfpStateUpdateTask could try to renotify the media settings before MEDIA_SETTINGS_DONE if xcvrd crashes just after notifying the media settings (in this case, MEDIA_SETTINGS_SYNC_STATUS would be MEDIA_SETTINGS_NOTIFIED).
In such case, OA would apply the media settings but CmisManagerTask will not perform CMIS initialization (along with handling port toggle message from OA) since CmisManagerTask is already dead.
Eventually, once xcvrd spawn again, SfpStateUpdateTask would renotify the media settings to OA followed by CmisManagerTask proceeding with CMIS initialization.
Hence, to allow port toggle message to be sent from OA to CmisManagerTask and allow CMIS initialization in the above xcvrd crash scenario, I am currently not checking for MEDIA_SETTINGS_NOTIFIED. Do you think there could be an issue if we apply the media settings twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In xcvrd restart scenario, re-applying or re-notifying media settings might have an impact to the system i.e.
OA is just a conduit. Re-applying would lead to re-notifying media_settings from OA->syncd->SAI->SDK and in turn making port toggle.
Should avoid port toggling here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[1] Please elaborate "Apply SI settings" to something on lines...Ask SAI-SDK to apply SI settings
[2] Add Next bullet - Based on the update/response as SUCCESS from SAI call,
set PORT_TABLE:<port>.MEDIA_SETTINGS_SYNC_STATUS = MEDIA_SETTINGS_DONE
Enable port admin status
In case of failure update/response from SAI call, what would be the system flow? how would the failure be handled?
I beleive host_tx_ready settings is part of this workflow. Please add it here to the workflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If port doesn't require media settings to be applied,..."
This should be checked as part of previous point/bullet (i.e. the port supports media settings...)
bail out from there itself (i.e. no further action) in case port doesn't require media settings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not related to this specific design but what is the reason for setting CMIS_STATE_READY when either "host_tx_ready" is not yet TRUE or admin_status is not UP ? Why do we skip to the final state of the CMIS state machine in this use-case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this was initially implemented, the thought at that time was to move the CMIS SM to a fixed state rather than waiting for an event.
However, if you think we should create a new state to handle host_tx_ready and admin_status rather than moving to CMIS_STATE_READY if either fields have the value false/down respectively, I can plan to implement it accordingly. Let me know your opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set CMIS_REINIT_REQUIRED to True upon module removal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not currently changing the value of CMIS_REINIT_REQUIRED after setting it as False since the purpose of this flag is to drive CMIS re-initialization and not CMIS initialization.
The existing flow in the CMIS SM already ensures that CMIS initialization will be performed after module insertion.
Let me know your thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if any SFF complaint transceiver requires different SI value to be programmed to NPU. eg copper module with different length without ANLT may need a different SI on NPU serdes.
This approach is tied with cmis manager/CMIS complaint modules only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The below will take care of handling SFF compliant transceivers requiring different NPU SI settings. Let me know if I am missing something here.
"For non-CMIS SM driven transceivers, SfpStateUpdateTask thread will update NPU SI settings in the PORT_TABLE (APPL_DB) and notify to OA based on the value of PORT_TABLE:.NPU_SI_SETTINGS_SYNC_STATUS"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change Media renotify to Media settings renotify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed this now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config reload and cold reboot triggers/use-cases doesn't have link flap (until link come operationally up and then go down post bring up).
Since all SW components went down along with HW ones (NPU, PHY, optics etc.), so should mark this case as N/A under this link flap column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed this now.