-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new fields to status/dom_sensor/pm tables in STATE_DB for CMIS/C-CMIS #304
Conversation
* Push new fields to DOM_SENSOR table in STATE_DB * Remove empty line Signed-off-by: Longyin Huang longhuan@cisco.com
@prgeor please review (couldn't add to reviewers list directly, thus add here) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please also test restarting xcvrd to ensure that the table gets deleted and updated accordingly
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
return SFP_EEPROM_NOT_READY | ||
|
||
except NotImplementedError: | ||
helper_logger.log_error("This functionality is currently not implemented for this platform") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be better to add some unique info here so that we can distinguish this error message from other error messages (as the same string is being used in multiple places in the file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added unique log.
Yea Attached restart test log in description. |
@prgeor @mihirpat1 I took care of the existing comments, can you please review? |
Added pm table related change, to use this PR to track changes for all three tables: status/dom_sensor/pm. (Since they all touch the similar flow in xcvrd, and some is dependent on another, it would be good to combine them together.) |
* Add pm tbl to STATE_DB * Use unique info in post_port_pm_info_to_db log
9b9fe4d
to
fa6ce8b
Compare
@@ -2124,6 +2212,8 @@ def retry_eeprom_reading(self): | |||
if rc != SFP_EEPROM_NOT_READY: | |||
post_port_dom_info_to_db(logical_port, self.port_mapping, self.xcvr_table_helper.get_dom_tbl(asic_index)) | |||
post_port_dom_threshold_info_to_db(logical_port, self.port_mapping, self.xcvr_table_helper.get_dom_tbl(asic_index)) | |||
update_port_transceiver_status_table_hw(logical_port, self.port_mapping, self.xcvr_table_helper.get_status_tbl(asic_index)) | |||
post_port_pm_info_to_db(logical_port, self.port_mapping, self.xcvr_table_helper.get_pm_tbl(asic_index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are we making sure this is done only for ZR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside post_port_pm_info_to_db(), the following statement will make sure to skip for other non-ZR cases (where platform API get_transceiver_pm() throws NotImplementedError and then _wrapper_get_transceiver_pm() returns empty dict):
if not pm_info_dict:
continue
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
pm_info_dict = _wrapper_get_transceiver_pm(physical_port) | ||
if pm_info_cache is not None: | ||
# If cache is enabled, put dom information to cache | ||
pm_info_cache[physical_port] = pm_info_dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where are we going to use this pm_info_cache[physica_port] ? I dont see it being used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pm_info_cache is passed to post_port_pm_info_to_db() from task_worker() of class DomInfoUpdateTask
(in task_worker, we already have similar cache logic for dom_info dom_th_info in today's code)
pm_info_cache is used here:
if pm_info_cache is not None and physical_port in pm_info_cache:
# If cache is enabled and pm info is in cache, just read from cache, no need read from EEPROM
pm_info_dict = pm_info_cache[physical_port]
One possible situation I can think of: task_worker() later walks through a different logical port which belongs to a previously visited physical port, then cache is hit.
|
||
def delete_port_from_status_table_hw(logical_port_name, port_mapping, status_tbl): | ||
for physical_port_name in get_physical_port_name_dict(logical_port_name, port_mapping).values(): | ||
found, fvs = status_tbl.get(physical_port_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use physical_port_name as the index instead of logical_port_name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this function is supposed to only delete the HW fields of status tbl, which are fetched from eeprom (more precisely VDM) and per physical port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe most of the current xcvrd relevant fields in the state DB are fetched from the hw(EEPROM), but all of them are using 'logical_port_name' as the index, there will be duplicated information in the DB(multiple logical_port can be created from the same physical port), but with this design, CLI/SNMP can easily query them with the logical port name. I suggest following the same convention here. @prgeor please also share your view here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, SW part (fields 'status' and 'error') of status table are not from EEPROM, and all the other fields on status table (introduced by this PR) and other tables are from EEPROM.
And currently, only SW part of status table is using logical_port_name as key, all the other existing tables (dom_sensor tbl/info tbl, those are eeprom-feteched) are using physical_port_name as index. I'm just making the HW part of status table aligned with other existing tables.
e.g. in today’s post_port_sfp_info_to_db(), post_port_dom_threshold_info_to_db() and post_port_dom_info_to_db(), they all have the following code to use physical port to set DB:
for physical_port in physical_port_list:
…..
port_name = get_physical_port_name(logical_port_name, ganged_member_num, ganged_port)
…..
table.set(port_name, fvs)
We discussed with @prgeor about this before, and we agreed to better use logical_port_name as key, and sonic-net/sonic-buildimage#10144 can be used to track this pending work item to clean up all the existing code to use logical_port_name as key for DB tables. But the plan was to keep this item as lower priority, since it may only cause issue on DPB case for now. And also the history/necessarity of ganged port needs to be figured out first (ganged port can have 1:n mapping from logical port to physical port), so that we can decide wether the ganged port related logic can be cleaned up around this area.
Also, to clean this area up, more consideration/discussion would be needed later on whether we need to divide the corresponding eeprom-fetched fields belonging to the same physical port into multiple groups based on logical ports.
e.g. for the following fields in TRANSCEIVER_INFO table, if we are in breakout mode, each logical port will only have part of the 8 lanes, we might need to divide them based on logical port.
active_apsel_hostlane1 = INTEGER ; active application selected code assigned to host lane 1
active_apsel_hostlane2 = INTEGER ; active application selected code assigned to host lane 2
active_apsel_hostlane3 = INTEGER ; active application selected code assigned to host lane 3
active_apsel_hostlane4 = INTEGER ; active application selected code assigned to host lane 4
active_apsel_hostlane5 = INTEGER ; active application selected code assigned to host lane 5
active_apsel_hostlane6 = INTEGER ; active application selected code assigned to host lane 6
active_apsel_hostlane7 = INTEGER ; active application selected code assigned to host lane 7
active_apsel_hostlane8 = INTEGER ; active application selected code assigned to host lane 8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, SW part (fields 'status' and 'error') of status table are not from EEPROM, and all the other fields on status table (introduced by this PR) and other tables are from EEPROM. And currently, only SW part of status table is using logical_port_name as key, all the other existing tables (dom_sensor tbl/info tbl, those are eeprom-feteched) are using physical_port_name as index. I'm just making the HW part of status table aligned with other existing tables.
e.g. in today’s post_port_sfp_info_to_db(), post_port_dom_threshold_info_to_db() and post_port_dom_info_to_db(), they all have the following code to use physical port to set DB:
for physical_port in physical_port_list: ….. port_name = get_physical_port_name(logical_port_name, ganged_member_num, ganged_port) ….. table.set(port_name, fvs)
If I am not wrong, from the below code seems that for all none-ganged ports it will always use logical port name including table DOM and Transceiver info table. By the way, I haven't seen any vendor claiming that they have ganged port on any of their platforms.
def get_physical_port_name(logical_port, physical_port, ganged):
if ganged:
return logical_port + ":{} (ganged)".format(physical_port)
else:
return logical_port
We discussed with @prgeor about this before, and we agreed to better use logical_port_name as key, and sonic-net/sonic-buildimage#10144 can be used to track this pending work item to clean up all the existing code to use logical_port_name as key for DB tables. But the plan was to keep this item as lower priority, since it may only cause issue on DPB case for now. And also the history/necessarity of ganged port needs to be figured out first (ganged port can have 1:n mapping from logical port to physical port), so that we can decide wether the ganged port related logic can be cleaned up around this area. Also, to clean this area up, more consideration/discussion would be needed later on whether we need to divide the corresponding eeprom-fetched fields belonging to the same physical port into multiple groups based on logical ports. e.g. for the following fields in TRANSCEIVER_INFO table, if we are in breakout mode, each logical port will only have part of the 8 lanes, we might need to divide them based on logical port.
If you change the index of the tables, I supposed the CLIs (e.g 'show interface transceiver eeprom') will be broken, and also SNMP and other telemetry applications assuming the index shall be logical port index will also be broken, please double confirm.
active_apsel_hostlane1 = INTEGER ; active application selected code assigned to host lane 1 active_apsel_hostlane2 = INTEGER ; active application selected code assigned to host lane 2 active_apsel_hostlane3 = INTEGER ; active application selected code assigned to host lane 3 active_apsel_hostlane4 = INTEGER ; active application selected code assigned to host lane 4 active_apsel_hostlane5 = INTEGER ; active application selected code assigned to host lane 5 active_apsel_hostlane6 = INTEGER ; active application selected code assigned to host lane 6 active_apsel_hostlane7 = INTEGER ; active application selected code assigned to host lane 7 active_apsel_hostlane8 = INTEGER ; active application selected code assigned to host lane 8
continue | ||
beautify_pm_info_dict(pm_info_dict, physical_port) | ||
fvs = swsscommon.FieldValuePairs([(k, v) for k, v in pm_info_dict.items()]) | ||
table.set(physical_port_name, fvs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep the key as logical port just like done for DOM and other table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the CLI and streaming telemetry will need logical port as key in future when we do breakout of physical port to many logical ports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keboliu if you check the usage of port_name in previous change:-
table.set(port_name, fvs)
port_name is nothing but physical port name at line 496 in previous code :-
port_name = get_physical_port_name(logical_port_name, ganged_member_num, ganged_port)
so, i don't see this PR causing any issue. We need to revisit once we support breakout in Xcvrd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request introduces 1 alert when merging 4e30850 into adcd69b - view on LGTM.com new alerts:
Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog. |
@yxieca please cherry-pick to 202205 |
@longhuan-cisco this change cannot be cherry-picked to 202205 cleanly. Please raise separate PR. Thanks! |
@StormLiangMS please help cherry-pick to 202211 |
@prgeor I suppose you want to have this one in 202211 other than 202111 |
Add new fields to status/dom_sensor/pm tables in STATE_DB for CMIS/C-CMIS
Description
following SW fields(non-eeprom-fetched) are already present in today's status table:
following HW fields (eeprom-fetched) are added per this HLD (this is just snippet, refer to HLD for complete list):
Here's the entire DB schema of status tbl and dom_sensor tbl, according to the new HLD from PR sonic-net/SONiC#1076
Motivation and Context
How Has This Been Tested?
Verified on CMIS/C-CMIS optics, interfaces come up fine, and new fields show up in DB
UT output for dom_sensor table:
dom_sensor_tbl_ut_output.txt
UT output for status table:
status_tbl_ut_output.txt
UT output for status table in xcvrd restart case:
xcvrd_restart_test.txt
UT output for pm table:
pm_table_output.txt
UT output for pm table in xcvrd restart case:
xcvrd_restart_test_pm_tbl.txt
UT result for test_xcvrd.py:
ut_w_coverage.txt
Additional Information (Optional)