Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[warm-reboot] pmon(xcvrd) race with warmboot finalizer #17943

Open
stepanblyschak opened this issue Jan 30, 2024 · 2 comments
Open

[warm-reboot] pmon(xcvrd) race with warmboot finalizer #17943

stepanblyschak opened this issue Jan 30, 2024 · 2 comments
Assignees
Labels
Triaged this issue has been triaged

Comments

@stepanblyschak
Copy link
Collaborator

Description

Steps to reproduce the issue:

  1. Execute sudo warm-reboot
  2. Look at the startup logs

Describe the results you received:

Jan 29 15:26:37.012634 r-leopard-72 INFO systemd[1]: warmboot-finalizer.service: Succeeded.
Jan 29 15:26:37.013449 r-leopard-72 INFO systemd[1]: Finished Monitor warm recovery and disable warmboot when done.
Jan 29 15:26:43.546427 r-leopard-72 INFO systemd[1]: Starting Platform monitor container...
Jan 29 15:26:44.037535 r-leopard-72 INFO systemd[1]: Started Platform monitor container.

In this particular run, warmboot finalizer finishes first, removes warm reboot flag from DB, then pmon starts.

Xcvrd thinks it is not a warm-reboot:

Jan 29 18:32:07.265439 r-leopard-58 NOTICE pmon#xcvrd[34]: Starting up...
Jan 29 18:32:07.265439 r-leopard-58 NOTICE pmon#xcvrd[34]: XCVRD INIT: Start daemon init...
Jan 29 18:32:07.690731 r-leopard-58 NOTICE pmon#xcvrd: XCVRD INIT: Wait for port config is done
Jan 29 18:32:07.692768 r-leopard-58 NOTICE pmon#xcvrd: XCVRD INIT: After port config is done
Jan 29 18:32:07.697708 r-leopard-58 NOTICE pmon#xcvrd: Start daemon main loop with thread count 2
Jan 29 18:32:07.697708 r-leopard-58 NOTICE pmon#xcvrd: Started thread DomInfoUpdateTask
Jan 29 18:32:07.697733 r-leopard-58 NOTICE pmon#xcvrd: Started thread SfpStateUpdateTask
Jan 29 18:32:07.704942 r-leopard-58 NOTICE pmon#xcvrd: xcvrd is_warm_start: False
Jan 29 18:32:07.958636 r-leopard-58 NOTICE pmon#xcvrd: SfpStateUpdateTask: Posted all port DOM/SFP info to DB

This means the following logic does not work correctly:

                # Do not notify media settings during warm reboot to avoid dataplane traffic impact
                if is_warm_start == False:
                    media_settings_parser.notify_media_setting(logical_port_name, transceiver_dict, xcvr_table_helper.get_app_port_tbl(asic_index), xcvr_table_helper.get_cfg_port_tbl(asic_index), port_mapping)
                    transceiver_dict.clear()

https://github.com/sonic-net/sonic-platform-daemons/blob/d8977f3608e5c9263d9b9d9b33087890637ac436/sonic-xcvrd/xcvrd/xcvrd.py#L1727

Describe the results you expected:

Output of show version:

SONiC Software Version: SONiC.202311_inde_new.17-afa106a09_Internal
SONiC OS Version: 11
Distribution: Debian 11.8
Kernel: 5.10.0-23-2-amd64
Build commit: 8ade51aef
Build date: Mon Jan 22 07:40:54 UTC 2024
Built by: sw-r2d2-bot@r-build-sonic-ci02-241

Platform: x86_64-mlnx_msn4700-r0
HwSKU: ACS-MSN4700
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2301XZ0NF8
Model Number: MSN4700-WS2F
Hardware Revision: A1
Uptime: 11:42:02 up  1:39,  1 user,  load average: 4.27, 3.83, 3.12
Date: Tue 30 Jan 2024 11:42:02

Docker images:
REPOSITORY                    TAG                                     IMAGE ID       SIZE
docker-platform-monitor       202311_inde_new.17-afa106a09_Internal   b00213b9bb6c   828MB
docker-platform-monitor       latest                                  b00213b9bb6c   828MB
docker-syncd-mlnx             202311_inde_new.17-afa106a09_Internal   e5fbf36ff22a   840MB
docker-syncd-mlnx             latest                                  e5fbf36ff22a   840MB
docker-dhcp-relay             latest                                  29976bedace0   310MB
docker-orchagent              202311_inde_new.17-afa106a09_Internal   bf0e68949258   338MB
docker-orchagent              latest                                  bf0e68949258   338MB
docker-macsec                 latest                                  637baad7c573   329MB
docker-sflow                  202311_inde_new.17-afa106a09_Internal   b374f6f3cca9   328MB
docker-sflow                  latest                                  b374f6f3cca9   328MB
docker-snmp                   202311_inde_new.17-afa106a09_Internal   4088a87ee66f   340MB
docker-snmp                   latest                                  4088a87ee66f   340MB
docker-fpm-frr                202311_inde_new.17-afa106a09_Internal   90e44a5a517f   358MB
docker-fpm-frr                latest                                  90e44a5a517f   358MB
docker-nat                    202311_inde_new.17-afa106a09_Internal   e1af3ad890e2   329MB
docker-nat                    latest                                  e1af3ad890e2   329MB
docker-teamd                  202311_inde_new.17-afa106a09_Internal   78f16ce51111   327MB
docker-teamd                  latest                                  78f16ce51111   327MB
docker-router-advertiser      202311_inde_new.17-afa106a09_Internal   b202aed2a2ec   301MB
docker-router-advertiser      latest                                  b202aed2a2ec   301MB
docker-eventd                 202311_inde_new.17-afa106a09_Internal   c78c33d56a6a   301MB
docker-eventd                 latest                                  c78c33d56a6a   301MB
docker-lldp                   202311_inde_new.17-afa106a09_Internal   fd7bc25c9e27   343MB
docker-lldp                   latest                                  fd7bc25c9e27   343MB
docker-mux                    202311_inde_new.17-afa106a09_Internal   2c3462578e1a   349MB
docker-mux                    latest                                  2c3462578e1a   349MB
docker-database               202311_inde_new.17-afa106a09_Internal   8a1674d4c7f2   301MB
docker-database               latest                                  8a1674d4c7f2   301MB
docker-sonic-gnmi             202311_inde_new.17-afa106a09_Internal   7df119bcfc9a   389MB
docker-sonic-gnmi             latest                                  7df119bcfc9a   389MB
docker-sonic-mgmt-framework   202311_inde_new.17-afa106a09_Internal   2e5aad30b85d   417MB
docker-sonic-mgmt-framework   latest                                  2e5aad30b85d   417MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@stepanblyschak
Copy link
Collaborator Author

@prgeor Please have a look

@keboliu

@gechiang gechiang added the Triaged this issue has been triaged label Feb 14, 2024
@gechiang
Copy link
Collaborator

@prgeor please help investigate Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

3 participants