Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Nokia-7215] Enhance Watchdog service #18850

Merged
merged 1 commit into from
Jun 5, 2024

Conversation

Pavan-Nokia
Copy link
Contributor

@Pavan-Nokia Pavan-Nokia commented May 2, 2024

Mask Watchdog-control.service and make sure only one watchdog service starts on this platform

Why I did it

To resolve conflict between ordering of the 2 watchdog services.

  1. watchdog-control.service -- common service designed to disable watchdog on all platforms
  2. cpu_wdt.service -- enable Watchdog on nokia-7215 platform.

Is some cases service 1 was started after service 2 leaving the watchdog on the box disabled

Work item tracking
  • Microsoft ADO (number only):

How I did it

Enhance service file to assure cpu_wdt.service always starts after watchdog-control.service

How to verify it

  1. Try multiple upgrades and install scenario to make sure watchdog is always in enabled state
  2. Try multiple reboots to make sure watchdog is always in enabled state
admin@sonic:~$ systemctl status watchdog-control.service 
â—� watchdog-control.service
     Loaded: masked (Reason: Unit watchdog-control.service is masked.)
     Active: inactive (dead)
admin@sonic:~$ 
admin@sonic:~$ 
admin@sonic:~$ systemctl status cpu_wdt.service
â—� cpu_wdt.service - CPU WDT
     Loaded: loaded (/etc/systemd/system/cpu_wdt.service; enabled; vendor prese>
     Active: active (running) since Mon 2024-05-20 14:57:15 UTC; 4min 57s ago
   Main PID: 635 (cpu_wdt.py)
      Tasks: 1 (limit: 4915)
     Memory: 13.9M
     CGroup: /system.slice/cpu_wdt.service
             └─635 /usr/bin/python /usr/local/bin/cpu_wdt.py

admin@sonic:~$ 

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305
  • 202405

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@Pavan-Nokia Pavan-Nokia requested a review from lguohan as a code owner May 2, 2024 14:20
@prgeor prgeor requested a review from saiarcot895 May 4, 2024 15:43
@Pavan-Nokia Pavan-Nokia force-pushed the dev_master_enhance_watchdog branch from f918bc9 to f8adf12 Compare May 14, 2024 15:37
@Pavan-Nokia Pavan-Nokia force-pushed the dev_master_enhance_watchdog branch from f8adf12 to 9519679 Compare May 20, 2024 15:12
@Pavan-Nokia Pavan-Nokia requested review from saiarcot895 and prgeor May 24, 2024 19:13
@@ -9,6 +9,8 @@ systemctl restart kmod
systemctl enable nokia-7215init.service
systemctl start nokia-7215init.service

systemctl mask watchdog-control.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to happen before watchdog-control.service starts? If there's a chance it could be after, then add --now to have systemd stop the service as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have verified all scenarios to make sure there is no race condition - fresh install, software upgrade, reboots. The watchdog-control.service is always masked correctly and the platform watchdog is active.

I have added the --now flag as recommended. I agree this is good to have in case something changes in the future

Mask Watchdog-control.service and make sure only one watchdog service
starts on this platform
@Pavan-Nokia Pavan-Nokia force-pushed the dev_master_enhance_watchdog branch from 9519679 to 20af2dd Compare May 31, 2024 18:28
@Pavan-Nokia
Copy link
Contributor Author

@yxieca @Blueve Please help merge and backport to 202405 branch
also help merge PR for 2311 backport #18851

Thank you

@yxieca yxieca merged commit 7966f8a into sonic-net:master Jun 5, 2024
11 checks passed
@yxieca
Copy link
Contributor

yxieca commented Jun 5, 2024

@bingwang-ms to tracking 202405 cherry-picking

mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Jun 6, 2024
Mask Watchdog-control.service and make sure only one watchdog service
starts on this platform
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #19236

mssonicbld pushed a commit that referenced this pull request Jun 6, 2024
Mask Watchdog-control.service and make sure only one watchdog service
starts on this platform
arun1355492 pushed a commit to arun1355492/sonic-buildimage that referenced this pull request Jul 26, 2024
Mask Watchdog-control.service and make sure only one watchdog service
starts on this platform
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants